This is the mail archive of the
mailing list for the libc-ports project.
Re: PI mutex support for pthread_cond_* now in nptl
On Wed, 2013-02-20 at 21:25 +0100, Torvald Riegel wrote:
> On Wed, 2013-02-20 at 10:59 -0600, Steven Munroe wrote:
> > On Tue, 2013-02-19 at 21:06 +0100, Torvald Riegel wrote:
> > > On Tue, 2013-02-19 at 17:18 +0000, Joseph S. Myers wrote:
> > > > On Tue, 19 Feb 2013, Richard Henderson wrote:
> > > >
> > > > > Any chance we can move these macros into a generic linux header?
> > > > > Given that we're using INTERNAL_SYSCALL macros, the definitions ought to be
> > > > > the same for all targets.
> > > >
> > > > Generally most of lowlevellock.h should probably be shared between
> > > > architectures. (If some architectures don't implement a particular
> > > > feature as of a particular kernel version, that's a matter for
> > > > kernel-features.h and __ASSUME_* conditionals.)
> > >
> > > On a related note: What are the reasons to have arch-specific assembler
> > > versions of many of the synchronization operations? I would be
> > > surprised if they'd provide a significant performance advantage; has
> > > anyone recent measurements for this?
> > >
> > The introduction of GCC compiler builtins like __sync is fairly recent
> > and the new __atomic builtins start with GCC-4.7. So until recently we
> > had no choice.
> Using assembler for the atomic operations is possible (e.g., as in
> Boehm's libatomic-ops, or in./sysdeps/powerpc/bits/atomic.h and others).
> It doesn't allow for the same level of compiler optimization across
> barriers, but it's unclear whether that has much benefit, and GCC
> doesn't do it yet anyway.
> There are some cases in which compilers that don't support the C11/C++11
> memory model can generate code that wouldn't be correct in such a model,
> and which can theoretically interfere with other concurrent code (e.g.,
> introduce data races due to accesses being too wide). However, because
> we don't have custom assembler for everything, we should be already
> exposed to that.
> > For platforms (like PowerPC) that implement acquire/release the GCC
> > __sync builtins are not sufficient and GCC-4.7 __atomic builtins are not
> > pervasive enough to make that the default.
> I agree regarding the __sync builtins, but using assembler in place of
> the __atomic builtins should work, or not?
> > > It seems to me that it would be useful to consolidate the different
> > > versions that exist for the synchronization operations into shared C
> > > code as long as this doesn't make a significant performance difference.
> > > They are all based on atomic operations and futex operations, both of
> > > which we have in C code (especially if we have compilers that support
> > > the C11 memory model). Or are there other reasons for keeping different
> > > versions that I'm not aware of?
> > >
> > I disagree. The performance of lowlevellocks and associated platform
> > specific optimizations are too import to move forward with the
> > consolidation you suggest.
> Which specific optimizations do you refer to? I didn't see any for
> powerpc, for example (i.e., the lock fast path is C up to the point of
> the atomic operation). The ones that I saw are for x86, and I'm
> wondering whether they provide much benefit. Especially because this
> can mostly just matter for the execution path taken when a free lock is
> acquired; once you get any cache miss, you're to some extent on the slow
> path anyway. Also, for the Linux platforms I looked at, the mutex
> algorithms are the same.
Like the lwarx MUTEX_HINT (EH field) hint.
> Do you have any recent measurements (or could point to them) that show
> the benefit of the optimizations you refer to?
No. I don't current have access to a machine big enough show this effect
and I cant tell you about the specific customer. So you will have to
trust me on this.