This is the mail archive of the
mailing list for the glibc project.
Re: Implementing C++1x and C1x atomics
On 8/13/09, Joseph S. Myers <firstname.lastname@example.org> wrote:
> On Thu, 13 Aug 2009, Lawrence Crowl wrote:
> > > In that it defines functions, <stdatomic.h> is unlike all the
> > > headers presently required of freestanding implementations,
> > But <exception>, <new>, and <typeinfo> all define functions.
> I'm not familiar with the C++ requirements for freestanding
> implementations, so am just comparing with the requirements for C.
> > If you mean the OS-supplied platform-dependent library, then I
> > think the answer is yes. The names of the routines that back up
> > the intrinsics should be part of the platform ABI.
> I believe two relevant points are:
> * An implementation with kernel help (such as those for ARM,
> SH and PA GNU/Linux presently in libgcc), that is guaranteed
> to interoperate correctly with atomic instructions added in
> later subarchitectures or present in some subarchitectures,
> can be considered equivalent to a hardware instruction for most
> purposes; in particular, there is no need for programs to use
> only one such implementation and having them in libgcc is fine.
> It's only lock-based implementations that might have interoperation
> problems that need to go in libc.
> * libc only needs to export these functions for types that lack
> the operations in hardware on at least some subarchitectures.
> This will mean that the libc ABI does not generally need to
> contain the 1-byte, 2-byte or 4-byte operations, but on some
> targets it will need to export functions for 8-byte operations.
> These functions will in general have target-specific definitions,
> and certainly would appear in the target-specific Versions files.
Yes. However, we still have supported targets that are pretty weak.
"i386 < i486 < i586". If one can link two object files compiled for
these targets into the same program, you have the synchronization
> > > * The header therefore comes with libc.
> > I don't think we need a header. These calls are directly
> > generated by the compiler, not referenced by the user.
> The header I am referring to is the C header <stdatomic.h> that
> C1x users wanting atomic operations should be using.
Okay, lets call this the languaged-defined compiler header.
> > > * The header never uses an inline operation when compiling for a
> > > particular subarchitecture unless the corresponding version of
> > > libc, when executing on hardware capable of executing code for
> > > that subarchitecture, will always use an atomic operation that
> > > interoperates correctly with the header. (libc might need in
> > > some cases to determine the hardware in use at runtime.)
> > I'm not quite following that. Any any event, since I don't see
> > the need for a header, I think it is moot.
> Suppose you have an architecture X. Processors A, B and C for this
> architecture do not have 8-byte atomic operations, so glibc 2.12
> provides a fallback lock-based implementation in the port to X.
> GCC 4.6, targetting X (processors A, B and C), together with the
> stdatomic.h header, generates code using the fallback functions,
> and everything works OK.
I tend to think of these not as fallback functions, but as
platform functions, which can be optimized through inlining in
Now a processor D for this architecture comes out. All code for A,
B and C will work on D, but D also has 8-byte atomic operations.
GCC 4.7, with -march=D, generates code that uses these operations
inline. If code built with GCC 4.7 -march=D, and code built
with GCC 4.6 or without -march=D, are used together with the
glibc 2.12 shared library, both implementations of the atomic
operations are now used and things don't work.
Here is where the heavy platform nature of the atomics comes in.
The installed shared glibc 2.12 on the system with a D processor
must have been built with -march=D. If so, then all operations
share the same implementation, and everything works.
> glibc 2.13 changes the out-of-line implementation to test at
> runtime whether it is running on D, and use the new instruction
> instead of the lock-based implementation if so (probably using
> STT_GNU_IFUNC so this test is only run the first time the symbol
> is resolved). That new glibc will now work with objects built
> with either 4.6 or 4.7.
Well, okay, but there would be less exposure to problems if -march=D
implied that the library was compiled with -march=D (or better).
> But on GNU/Linux - unlike BSDs, say - it is expected that the
> compiler, libc and kernel versions can be updated more or less
> independently, and that it should be possible to use a newer
> compiler to build code that will run with an older C library.
> So the case of GCC 4.7 with glibc 2.12 needs to work. This means
> that code built with GCC 4.7 against the <stdatomic.h> header
> provided with glibc 2.12 must not use the 8-byte atomic instruction
> that GCC 4.7 knows how to use, because glibc 2.12 will not use
> it in the out-of-line implementation at runtime.
I am suggesting that certain highly processor-dependent routines
should be updated with the processor. That is, I don't think the
taxonomy is quite right.
> Are you proposing to avoid this issue by saying that the
> platform ABI for GNU/Linux on an X processor is that the 8-byte
> operations must never be inlined, and so making GCC not use the
> inline operations with -march=D (for GNU/Linux - it might be
> different for another OS)?
No, I am proposing that an object compiled with -march=D should
fail to load on a system that doesn't have both a >=D processor
and a >=D library.
> That would work, but I don't think it's necessary (and these
> are operations you'd really like to use as few instructions as
> possible, so avoiding shared library overhead if a single inline
> instruction will do) if you require programs to go via the standard
> <stdatomic.h> header.
The overhead in cycles of atomic operations is often high. With the
exception of a load, the minimum cycle count of the right instruction
sequence may well be greater than the call wrapping it, so I am
not terribly concerned about atomic operations being implemented
out of line.
> You could have libc provide <stdatomic.h> that does
> #include <bits/stdatomic.h>
> #ifndef __atomic_whatever_8
> #define __atomic_whatever_8 __builtin_atomic_whatever_8
> (and then uses __atomic_whatever_8 in the implementation of the
> type-generic macro), and <bits/stdatomic.h> would in glibc 2.12
> for X do
> #define __atomic_whatever_8 __out_of_line_atomic_whatever_8
> (repeated for each type that may lack atomic instructions on X)
> and in 2.13, knowing what implementations are present in libc,
> it could instead do
> #ifndef __arch_D__
> #define __atomic_whatever_8 __out_of_line_atomic_whatever_8
> and adjust that condition in future versions if there are other
> future variants, not defining __arch_D__, for which libc uses a
> hardware atomic operation.
I am suggesting something different. I am suggesting that the
<stdatomic.h> header always generate __builtin_atomic... and that
the conversion from RTL to assembler generate either the instructions
or the call.
The target of the call should never appear in any header. As a
consequence, it needs to be part of the platform ABI. (I'm hoping
we mean the same by platform, but I'm beginning to doubt that.)
We also need the __builtin_atomic... operations in the intermediate
language so that the compiler can recognize the operations effect
on the memory model and (sometime later) optimize those operations.