This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Optimization of conditional stores (was: Re: [PATCH] Add adaptive elision to rwlocks)
- From: Andi Kleen <andi at firstfloor dot org>
- To: Alexander Monakov <amonakov at ispras dot ru>
- Cc: Andi Kleen <andi at firstfloor dot org>, Roland McGrath <roland at hack dot frob dot com>, Andi Kleen <ak at linux dot intel dot com>, libc-alpha at sourceware dot org
- Date: Mon, 7 Apr 2014 19:52:27 +0200
- Subject: Re: Optimization of conditional stores (was: Re: [PATCH] Add adaptive elision to rwlocks)
- Authentication-results: sourceware.org; auth=none
- References: <1396652083-18920-1-git-send-email-andi at firstfloor dot org> <20140404234516 dot 3DFAD74446 at topped-with-meat dot com> <20140405003759 dot GQ32556 at tassilo dot jf dot intel dot com> <20140405044201 dot 9B44D74445 at topped-with-meat dot com> <alpine dot LNX dot 2 dot 00 dot 1404071824530 dot 2531 at monopod dot intra dot ispras dot ru> <20140407161055 dot GV22728 at two dot firstfloor dot org> <alpine dot LNX dot 2 dot 00 dot 1404072027420 dot 2624 at monopod dot intra dot ispras dot ru>
On Mon, Apr 07, 2014 at 08:54:57PM +0400, Alexander Monakov wrote:
>
>
> On Mon, 7 Apr 2014, Andi Kleen wrote:
>
> > > If the compiler can prove that `ptr' must be pointing to writeable location
> > > (for instance if there is a preceding (dominating) unconditional store), it
> > > can, and likely will, perform the optimization.
> >
> > Except it's not an optimization, but a pessimization
>
> I see where you're coming from, but is that really a pessimization for a case
> of non-multithreaded execution?
If it's write back memory and the cache line would otherwise be clean,
the cache line eventually has to be flushed, which causes extra work for
the memory controller. The memory controller has a limited working set
(number of open DRAM pages etc.); if that is thrashing it may make things
significantly slower when it has to close some other actively
used page.
It's also quite bad for power consumption because it's a lot of
extra work.
In fact I would probably argue that this would be a good target
for a profile feedback pass to add such if()s
>
> In my tests, GCC versions before 4.8 optimize out the first store and the
> conditional branch. GCC 4.8.0 preserves both the first store and the branch.
> If you omit the busy-wait loop, GCC 4.8 performs the optimization as well.
Thanks.
>
> > > I would also suggest making the intent (perform the store only when necessary)
> > > explicit, and make sure to disallow the compiler optimization, for example:
> > >
> > > if (*ptr != value)
> > > *(volatile typeof(*ptr)*)ptr = value;
> >
> > That's really ugly.
>
> I simply expanded kernel's ACCESS_ONCE macro by hand for the sake of the
> example.
Ok I guess we can add a ACCESS_ONCE()
The existing elision code also uses this construct.
-Andi