This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Unify pthread_spin_[try]lock implementations.
- From: Torvald Riegel <triegel at redhat dot com>
- To: Maxim Kuvyrkov <maxim at codesourcery dot com>
- Cc: Roland McGrath <roland at hack dot frob dot com>, Andrew Haley <aph at redhat dot com>, David Miller <davem at davemloft dot net>, "Joseph S. Myers" <joseph at codesourcery dot com>, Richard Sandiford <rdsandiford at googlemail dot com>, libc-ports at sourceware dot org, GLIBC Devel <libc-alpha at sourceware dot org>, Chris Metcalf <cmetcalf at tilera dot com>
- Date: Thu, 16 Aug 2012 12:20:57 +0200
- Subject: Re: [PATCH] Unify pthread_spin_[try]lock implementations.
- References: <Pine.LNX.4.64.1206282306320.20312@digraph.polyomino.org.uk> <65B470D2-4D01-4BA1-AEC5-A72C0006EA22@codesourcery.com> <20120711081441.73BB22C093@topped-with-meat.com> <20120711.012509.1325789838255235021.davem@davemloft.net> <4FFD3CD9.4030206@redhat.com> <84304C03-6A49-4263-9016-05486EDC0E98@codesourcery.com> <4FFD4114.9000806@redhat.com> <E1DB09C1-0E3E-4088-9793-C0CAB80B5084@codesourcery.com> <20120711112235.B28CA2C099@topped-with-meat.com> <7FBB4F87-9FF3-4239-818F-5A38C8094011@codesourcery.com> <20120725181300.DD1812C0B5@topped-with-meat.com> <36A2FFD8-0C98-4AB6-8C64-2EEC5CC67A63@codesourcery.com>
On Wed, 2012-08-15 at 15:16 +1200, Maxim Kuvyrkov wrote:
> On 26/07/2012, at 6:13 AM, Roland McGrath wrote:
> > /* Machine-dependent rationale about the selection of this value. */
> > #define SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
This looks like an arbitrary choice. I don't want to complain about
this patch (whose goal is to just unify similar code), but let me use it
as an example.
Elsewhere in the thread, you (IIRC) mentioned that the assumption is
that a CAS is 100x slower than a load. IMO, this is a flawed assumption.
First, this has more dimensions than one instruction being slower than
another one: cache architecture, what other threads are doing and where
in the cache hierarchy/graph they are, the CAS HW implementation, etc.
Second, it's not really about the slow-down for the current thread when
executing a CAS; it's about what the CAS might do in terms of caching
and the latency at which you detect a free lock on ARM, as Andrew Haley
pointed out. Third, not all machines in an architecture are similar; a
P4 cmpxchg performs much differently to a cmpxchg on a recent x86 CPU.
Fourth, there's no test for this assumption.
So, we will have to make such assumptions, but how do we make sure that
they are reasonable, and remain reasonable over time? If we don't,
these will bit-rot, and performance might degrade over time (assuming
that the assumptions were initially correct, which might be hard in the
first place). Is there a plan for this yet, or discussion about this?
> > while Teil will use -1.
Is there a plan to include a back-off component in this generic spin
lock? (-1 would spin forever, but not do back-off.)
Torvald