This is the mail archive of the
mailing list for the glibc project.
CPython vs libstdc++
- From: Zack Weinberg <zackw at panix dot com>
- To: GNU C Library <libc-alpha at sourceware dot org>, libstdc++ at gcc dot gnu dot org
- Cc: Sumana Harihareswara <sh at changeset dot nyc>
- Date: Thu, 11 Jul 2019 12:13:30 -0400
- Subject: CPython vs libstdc++
I have been investigating a mysterious problem with Python extension
modules that use C++ internally. If I'm right about the cause, I
suspect that it can't be fixed without changes to the dynamic linker,
and I think we may need to have a dialogue between Python core
maintainers and GNU toolchain maintainers to figure out what Python
wants to be possible and how much of that is feasible for GCC and
glibc to support.
The surface symptoms of the problem are that, if you load two
unrelated modules, both of which use "enough" C++ features internally,
into the same process, the entire interpreter crashes, with stack
traces pointing at the guts of libstdc++. It is unclear exactly which
C++ features trigger the crash and it is also unclear whether it
matters what version or versions of G++ the modules were compiled by.
I have not had any luck constructing a minimal test case.
People who are deeply familiar with the internals of the Python
interpreter tell me that this "should be impossible" because each
module is loaded into its own ELF namespace. I can't actually verify
that for myself -- I don't see any references to dlmopen() in CPython
3.7's source code, and as far as I know, that's the only way to do
that. But assuming it's true, it immediately raises a red flag for
me, because I do know that both g++-compiled C++ in general, and
critical bits of libstdc++ in particular (e.g. the exception unwinder)
rely on certain data objects being unique within the entire address
On the hypothesis that the problem is caused by two copies of
libstdc++.so and/or libgcc_s.so being loaded into a single address
space, which cannot reasonably be made to work, even if they're the
exact same version: we need some way of loading a shared object such
that only one copy will be loaded, and reused for each ELF namespace
that needs it. As far as I can tell, this is currently not possible.
Ideally the trigger for this behavior would be an annotation on each
shared object that needs it, rather than requiring all programs that
use ELF namespaces to be aware of the issue; however, we might _also_
want a way for a program that uses ELF namespaces to request this
behavior, in case it's trying to support old libraries that don't have
the annotation even though they ought to.
I have very limited time to work on this myself and I'm not even fully
confident I understand the problem. I'm writing this message as a
call for volunteers from the toolchain side who have the time and
understanding to tackle the problem; I can put you in touch with the
appropriate people from the Python side.