This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Add adaptive elision to rwlocks
- From: Roland McGrath <roland at hack dot frob dot com>
- To: Andi Kleen <ak at linux dot intel dot com>
- Cc: Andi Kleen <andi at firstfloor dot org>, libc-alpha at sourceware dot org
- Date: Fri, 4 Apr 2014 21:42:01 -0700 (PDT)
- Subject: Re: [PATCH] Add adaptive elision to rwlocks
- Authentication-results: sourceware.org; auth=none
- References: <1396652083-18920-1-git-send-email-andi at firstfloor dot org> <20140404234516 dot 3DFAD74446 at topped-with-meat dot com> <20140405003759 dot GQ32556 at tassilo dot jf dot intel dot com>
> Thanks for the review. I'll do these changes.
I should have said that I'm not doing a complete review. Really only
a style and nit review (but as you know that is often what takes more
effort from you to get the way we want it than is the actual logic).
I don't understand either the TSX semantics or the rwlock requirements
enough off hand to confidently review the core logic of the change. I
hope someone like Torvald can step in for that.
> This is an important optimization to avoid unnecessary cache line
> transitions and aborts. In general unnecessary writes
> can be very expensive in parallel situations, as hardware
> may need to do a lot of work to transition the cache line
> from SHARED to EXCLUSIVE.
>
> I can add a comment.
I think you might need to do more than that if you want to ensure the
compiler never makes a different choice for you. I'm not positive,
but I think the rules of C allow it to transform the one into the
other ("optimizing out" the test and branch in favor of a write that
might be a no-op). In the absence of volatile (or atomic ops or
whatnot) then I don't think there are any actual guarantees about
anything like e.g. preserving semantics that if the value was already
what you tested for then the page will never see a write access. Even
if compilers don't violate our presumptions about this today, they
might in the future unless the C standard constrains them. And even
if the target hardware in question will never make that a wise
optimization decision and so the compiler will never do it, we should
still express to the compiler the constraints we want it to obey as
best we can. So--unless I'm wrong about what the standard specifies,
which I'm not entirely sure about--then I think we should implement
these cases with something explicit. It could start out as just a
macro that serves as documentation of the intent, while we look into
what we can or should do to express that intent to the compiler. e.g.
#define assign_probably_same(var, val) ({ if (var != val) var = val; var; })
(or perhaps get fancier to avoid multiple evaluation or something).
> The macros are generic for any kind of lock type.
>
> It would be a different e.g. for spinlocks. There are also
> some possible future optimizations where it may be different
> even for rwlocks.
At first blush this sounds like premature generalization that could
always be done later if and when you actually have additional uses
that could share some of the code. I certainly won't call that
necessarily an evil (I've clearly been known to do it myself), but its
pre-loaded potential future value always has to be weighed against
other considerations such as the readability/maintainability of the
code that goes in first before any such putative future uses the
generality.
At the very least, if they are intended to be used somewhat
generically, then certainly they should have thorough comments
describing their interface contracts.
> So I would prefer to keep the macro. Do you insist on the inline?
I insist on doing things the cleanest way that we can do them without
sacrificing meaningful performance. We always start from the
presumption that nontrivial macros are not the cleanest choice.
If you start with rwlock-specific functions it is obvious that they
can be clean and straightforward as inlines. If you want to start
with generic functions even before they have multiple users, then it
is more intricate to do something clean. I didn't try your actual
code, but in a simple experiment with GCC 4.8.2 on some mock-up code
that has the same essential structure, I got it to inline everything
when it's structured as:
static inline __attribute__ ((always_inline)) bool
elide_lock (uint8_t *count, bool (*predicate) (void *), void *arg)
{
... if ((*predicate) (arg)) ...
}
static inline __attribute__ ((always_inline)) bool
rdlock_elidable_p (void *arg)
{
pthread_rwlock_t *rwlock = arg;
return (rwlock->__data.__lock == 0
&& rwlock->__data.__writer == 0
&& rwlock->__data.__nr_readers == 0);
}
...
if (elide_lock (&rwlock->__data.__rwelision,
&rdlock_elidable_p, rwlock))
return 0;
...
(It unfortunately didn't inline the predicate function when it was
nested and referring to its parent's arguments, which should have come
out to exactly the same code as the above.)
So unless you have a good argument why having code in multi-line
macros is somehow cleaner than that or something else along those
lines, how about we try that? (Or you can just start simple without
the generality now and revisit the argument about how to implement the
generality later on when you have the second user ready to go in.)
Thanks,
Roland