This is the mail archive of the
mailing list for the glibc project.
Re: RFC: Implement __libc_single_threaded support
- From: Rich Felker <dalias at libc dot org>
- To: Florian Weimer <fweimer at redhat dot com>
- Cc: libc-alpha at sourceware dot org
- Date: Mon, 4 Feb 2019 13:09:08 -0500
- Subject: Re: RFC: Implement __libc_single_threaded support
- References: <email@example.com>
On Mon, Feb 04, 2019 at 12:08:16PM +0100, Florian Weimer wrote:
> I'd like to see some feedback on the patch below. I think this is a
> useful addition, among other things it's important for a decent
> std::shared_ptr implementation.
> It will allow us to use replace __libc_multiple_threads inside glibc.
> The new mechanism is both more accurate (joining the last thread brings
> us back to single-threaded mode) and more reliable (the broadcast works
> across dlmopen). It also works in any shared object.
> We could pick a different name for the header file, such as
> <gnu/single_threaded.h>. Having a dedicated header file is relevant to
> the libstdc++ headers, I think, because they can then use __has_include
> to determine whether the installed glibc version supports this facility.
> If we snuck this into <pthread.h>, that would not be possible.
> The internal namespace (the __libc_ prefix) is a bit awkward, but I
> think it's the only way to keep things namespace-clean for use in C++
> The variable could be of type _Bool instead int, but I note that Richard
> Henderson's LSE atomics optimization uses _Bool as well.
> Implement __libc_single_threaded support
> The variable is defined in libc_nonshared.a as a hidden symbol, so
> that it can be read efficiently even on architectures which have
> limited support for position-indepndent code. To update it when the
I think this assessment of where it helps is incorrect. Let's look at
what would happen if it were just a normal external data object from
1. If you're accessing it from a non-PIE main program, there is no
overhead because it's just a copy relocation and gets a fixed
2. If you're accessing it from a PIE main program, it can still be a
copy relocation, but the access is PC-relative. Or it can use:
3. If you're accessing it from a shared library, the address of the
object is loaded via a PC-relative load from the GOT.
In case 1, your optimization makes no difference at all. In case 2, it
also makes no difference; either way there's a PC-relative access.
Only in case 3 is there a difference; with your optimization, a
PC-relative access to the data replaces a PC-relative access to the
GOT slot followed by an indirection.
So all your optimization is doing is saving one level of indirection
for access from shared libraries. It does not save any expensive setup
for PC-relative access on archs where it's expensive.
If you do away with this, there is no complex init-time setup, no O(n)
updating of local copies when threads are created/destroyed, no
dynamic management of a list of copies, no synchronization with
dlopen/dlclose, etc. The standard relocation mechanisms just do their
thing and the only cost is replacing:
and equivalent on all other archs. Saving one indirection does not
seem to be worth all this heavy machinery that imposes constraints on
the implementation and various costs, failure modes, and bug surface.