This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] correction for PPC __compare_and_swap
- To: <brianmc1 at us dot ibm dot com>
- Subject: Re: [PATCH] correction for PPC __compare_and_swap
- From: Kaz Kylheku <kaz at ashi dot footprints dot net>
- Date: Fri, 4 May 2001 09:10:51 -0700 (PDT)
- cc: <libc-alpha at sources dot redhat dot com>, <bmark at us dot ibm dot com>
On Fri, 4 May 2001 email@example.com wrote:
> This patch corrects an error in PPC __compare_and_swap.
> An isync is necessary after acquisition of a lock to discard all prefetched
> instructions. On page 335 of the The PowerPC Architecture: A
> Specification For A New Family Of RISC Processors book it states the
> following: The "sync" instruction is execution synchronizing. It is not
> context synchronizing, and therefore need not discard prefetched
> instructions. For context synchronization you can see page 371 where the
> following instructions rfi, sc and isync can be used. End of quote.
> What can happen is the processor could speculative load values into
> registers as it is acquiring the lock and there is an opportunity to have
> fetched stale data because another processor still owns the lock and is
> modifying data that is protected by the lock. The processor that is trying
> to acquire the lock has speculatively loaded the data the other processor
> is modifying. The processor finally succeeds in acquiring the lock and
> continues on with the data it had already loaded. The sync at the end does
> not cause the prefetched data to be discarded. The isync causes all the
> speculative execution to be thrown away and re-executed.
If you were to implement the macros READ_MEMORY_BARRIER,
WRITE_MEMORY_BARRIER and MEMORY_BARRIER for the PowerPC, how would you
assign the instructions to these? It seems that isync resembles a read
barrier, whereas sync is more like a write barrier that yet allows
stale reads. How about a full barrier?
> Therefore if written as separate routines then there would only be one sync
> and one isync per lock/unlock pair which will give better performance and
> thus better scalability.
How about having no barriers at all in __compare_and_swap and let
the caller take care of all the memory synchronization?
It's probably best to assume that compare_and_swap() has no
synchronizing properties, only atomic access to one location that is
not ordered with respect to any other. The user of compare_and_swap()
should always use memory barrier macros as appropriate. Even
compare_and_swap_with_release_semantics can't be counted on to do
anything particular because it's just mapped to compare_and_swap where
The MEMORY_BARRIER() macro should provide a full fence that no memory
accesses (read or write) can cross.
The WRITE_MEMORY_BARRIER() macro should provides, at the very least, a
write-write fence, for use in situations like ensuring that the update
to a list node is flushed before the pointer is linked into a list.
I.e. a situation with no read dependencies.
The READ_MEMORY_BARRIER() macro provides, at the very least, a fence
against stale reads, useful in ensuring that dependent reads access
coherent data: example, loading a pointer from one location and then
dereferencing it to gain access to the referenced location should be
divided by a READ_MEMORY_BARRIER().
If no specialized read or write barrier is available, the
corresponding macro is just mapped to MEMORY_BARRIER().
With these macros, the synchronization code in functions like
__pthread_lock() can be constructed to do what is needed without
depending on compare_and_swap() to have built-in fences.