This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Optimization of conditional stores (was: Re: [PATCH] Add adaptive elision to rwlocks)
- From: Torvald Riegel <triegel at redhat dot com>
- To: Alexander Monakov <amonakov at ispras dot ru>
- Cc: Andi Kleen <andi at firstfloor dot org>, Roland McGrath <roland at hack dot frob dot com>, Andi Kleen <ak at linux dot intel dot com>, libc-alpha at sourceware dot org
- Date: Thu, 10 Apr 2014 22:26:07 +0200
- Subject: Re: Optimization of conditional stores (was: Re: [PATCH] Add adaptive elision to rwlocks)
- Authentication-results: sourceware.org; auth=none
- References: <1396652083-18920-1-git-send-email-andi at firstfloor dot org> <20140404234516 dot 3DFAD74446 at topped-with-meat dot com> <20140405003759 dot GQ32556 at tassilo dot jf dot intel dot com> <20140405044201 dot 9B44D74445 at topped-with-meat dot com> <alpine dot LNX dot 2 dot 00 dot 1404071824530 dot 2531 at monopod dot intra dot ispras dot ru> <20140407161055 dot GV22728 at two dot firstfloor dot org> <alpine dot LNX dot 2 dot 00 dot 1404072027420 dot 2624 at monopod dot intra dot ispras dot ru>
On Mon, 2014-04-07 at 20:54 +0400, Alexander Monakov wrote:
>
> On Mon, 7 Apr 2014, Andi Kleen wrote:
>
> > > If the compiler can prove that `ptr' must be pointing to writeable location
> > > (for instance if there is a preceding (dominating) unconditional store), it
> > > can, and likely will, perform the optimization.
> >
> > Except it's not an optimization, but a pessimization
>
> I see where you're coming from, but is that really a pessimization for a case
> of non-multithreaded execution? Also, I (of course) agree with Jeff Law that
> such transformation has good chances of violating the memory model imposed by
> newer standards.
>
> > Which compiler would do that? It sounds very broken to me.
>
> Example:
>
> void foo(int * __restrict__ ptr, int val, volatile int * __restrict__ cond)
> {
> *ptr = 0;
> while (*cond);
> if (*ptr != val)
> *ptr = val;
> }
>
> In my tests, GCC versions before 4.8 optimize out the first store and the
> conditional branch. GCC 4.8.0 preserves both the first store and the branch.
> If you omit the busy-wait loop, GCC 4.8 performs the optimization as well.
If we consider just the standards (which don't provide for something
like read-only memory, I believe (and ptr isn't volatile)), then I think
both pre 4.8 and 4.8 behavior are correct. I don't know whether that's
actually the intention, but 4.8 might treat the while loop as
synchronization (which it isn't according to C11/C++11) and thus not
merge the stores.