Sources Bugzilla – Bug 10652
getaddrinfo causes segfault if multithreaded and linked statically
Last modified: 2013-01-03 12:13:25 UTC
The getaddrinfo call causes an internal segmentation fault when called from
threads and the binary is linked with "-static". The documentation says the
function is thread safe. This should be also the case when linked with "-static"
since there is no exception mentioned.
The crash only occurs if the binary is executed on a multi core system, on a
single core system it does not crash. This seems to be a synchronization problem
inside the library, but somehow only in the static version.
To reproduce just use this small test program:
void *test(void *)
struct addrinfo *res = NULL;
int ret = getaddrinfo("localhost", NULL, NULL, &res);
fprintf(stderr, "%d ", ret);
for (int i = 0; i < 512; i++)
pthread_create(&thr, NULL, test, NULL);
Compile with "g++ -o dnstest -static dnstest.cpp -lpthread" and then start.
Usually when linked with "-static" it crashes immediately, without it works fine.
This was verified with different glibc versions from Fedore 7, 11, CentOS 5.3,
Ubuntu 8.x and 9.x, SuSE 11.1 32bit and 64bit.
The glibc versions tested are from 2.6 to 2.10.
I see no reason why this only works if dynamically linked. The documentation
also does not mention any restrictions if linked statically.
Here output with debug info:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffefdf2910 (LWP 8287)]
*__GI_fgets_unlocked (buf=0x7fffefdf17a0 "", n=992, fp=0x0) at iofgets_u.c:54
54 old_error = fp->_IO_file_flags & _IO_ERR_SEEN;
Current language: auto; currently minimal
#0 *__GI_fgets_unlocked (buf=0x7fffefdf17a0 "", n=992, fp=0x0) at iofgets_u.c:54
#1 0x00007fffefdf68c7 in internal_getent (result=<value optimized out>,
buffer=0x7fffefdf1780 "", buflen=<value optimized out>, errnop=0x7fffefdf1d4c,
flags=<value optimized out>) at nss_files/files-XXX.c:208
#2 0x00007fffefdf6e52 in _nss_files_gethostbyname4_r (name=0x4923b3
"localhost", pat=0x7fffefdf1d38, buffer=0x7fffefdf1780 "", buflen=1024,
errnop=<value optimized out>,
herrnop=<value optimized out>, ttlp=0x0) at nss_files/files-hosts.c:347
#3 0x0000000000435f86 in gaih_inet ()
#4 0x0000000000437c7f in getaddrinfo ()
#5 0x0000000000000000 in ?? ()
(gdb) print fp
$1 = (_IO_FILE *) 0x0
The segmentation fault happens on different addresses below the
_nss_files_gethostbyname4_r. This function shows a call to __libc_lock_lock in
the source, but this probably does not work!?
The assembler code shows calls to the phread_lock() function:
Dump of assembler code for function _nss_files_gethostbyname4_r:
0x00007ffff55d8d70 <_nss_files_gethostbyname4_r+0>: push %r15
0x00007ffff55d8d72 <_nss_files_gethostbyname4_r+2>: push %r14
0x00007ffff55d8d74 <_nss_files_gethostbyname4_r+4>: push %r13
0x00007ffff55d8d76 <_nss_files_gethostbyname4_r+6>: mov %rsi,%r13
0x00007ffff55d8d79 <_nss_files_gethostbyname4_r+9>: push %r12
0x00007ffff55d8d7b <_nss_files_gethostbyname4_r+11>: mov %rdi,%r12
0x00007ffff55d8d7e <_nss_files_gethostbyname4_r+14>: push %rbp
0x00007ffff55d8d7f <_nss_files_gethostbyname4_r+15>: mov %rdx,%rbp
0x00007ffff55d8d82 <_nss_files_gethostbyname4_r+18>: push %rbx
0x00007ffff55d8d83 <_nss_files_gethostbyname4_r+19>: mov %rcx,%rbx
0x00007ffff55d8d86 <_nss_files_gethostbyname4_r+22>: sub $0x88,%rsp
0x00007ffff55d8d8d <_nss_files_gethostbyname4_r+29>: cmpq
$0x0,0x20823b(%rip) # 0x7ffff57e0fd0 <fgetpos+2137728>
0x00007ffff55d8d95 <_nss_files_gethostbyname4_r+37>: mov %r8,0x30(%rsp)
0x00007ffff55d8d9a <_nss_files_gethostbyname4_r+42>: mov %r9,0x38(%rsp)
0x00007ffff55d8d9f <_nss_files_gethostbyname4_r+47>: je 0x7ffff55d8dad
0x00007ffff55d8da1 <_nss_files_gethostbyname4_r+49>: lea
0x208498(%rip),%rdi # 0x7ffff57e1240 <lock>
0x00007ffff55d8da8 <_nss_files_gethostbyname4_r+56>: callq 0x7ffff55d7020
0x00007ffff55d8dad <_nss_files_gethostbyname4_r+61>: mov
0x2084d1(%rip),%edi # 0x7ffff57e1284 <keep_stream>
0x00007ffff55d8db3 <_nss_files_gethostbyname4_r+67>: callq 0x7ffff55d8700
0x00007ffff55d8db8 <_nss_files_gethostbyname4_r+72>: cmp $0x1,%eax
0x00007ffff55d8dbb <_nss_files_gethostbyname4_r+75>: mov %eax,0x5c(%rsp)
0x00007ffff55d8dbf <_nss_files_gethostbyname4_r+79>: je 0x7ffff55d8ded
0x00007ffff55d8dc1 <_nss_files_gethostbyname4_r+81>: cmpq
$0x0,0x20820f(%rip) # 0x7ffff57e0fd8 <fgetpos+2137736>
0x00007ffff55d8dc9 <_nss_files_gethostbyname4_r+89>: je 0x7ffff55d8dd7
0x00007ffff55d8dcb <_nss_files_gethostbyname4_r+91>: lea
0x20846e(%rip),%rdi # 0x7ffff57e1240 <lock>
0x00007ffff55d8dd2 <_nss_files_gethostbyname4_r+98>: callq 0x7ffff55d7040
0x00007ffff55d8dd7 <_nss_files_gethostbyname4_r+103>: mov 0x5c(%rsp),%eax
0x00007ffff55d8ddb <_nss_files_gethostbyname4_r+107>: add $0x88,%rsp
When debugging the _nss_files_gethostbyname4_r function with dynamic linking the
pthread_mutex_lock function is executed and can be stepped into. But statically
linked the step does not reveal that function is called at all even when the
disassemble looks like it should!?
You shouldn't link statically, there are many reasons why it is a bad idea.
If you for whatever strange reason still need it, you need to make sure you link
all of libpthread.a into your application (e.g. using -Wl,--whole-archive around
-lpthread), otherwise many things won't work as expected.
Ok, I will try that. But why is there no warning or information when statically
linking pthread library. The linker warns about he would need the library for
lookups but no warning at all about the pthread library.
The reason we used to link statically is that the binary should run on different
linux version including versions which use older libraries.
Is there another way to e.g. link dynamically with glibc-2.10 and run on systems
with only glibc-2.6?
Please read http://people.redhat.com/drepper/no_static_linking.html, by linking
statically you make the portability far worse. Unless you are creating a system
recovery tool that needs to work when shared libraries are hosed up, you should
link at least glibc libraries dynamically.
Ok, thank you for that information!
My problem with dynamic linking on a new linux system e.g. using glibc-2.11 the
binary won't start on older linux, it says: /lib64/libc.so.6: version
`GLIBC_2.7' not found. The application does not need any functions of that new
library, it would work fine with e.g. glibc-2.6. Is there a way to change the
minimum dependency of the library? It works when I compile on an old linux
system, it will run on new systems.
When compiling the application on windows I can define the minimum needed
version in a define and then I can only uses functions available at that version
and not newer functions. Can this be done with glibc, that the binary still
works with libraries definine e.g. GLIBC_2.6?
Thank you very much for helper so far!
This is no place to ask question.
On the other hand you haven't responded to the question whether linking in the
entire libpthread helps. I assume it does.
I included the whole pbthread:
Using "-static -lpthread" or "-static /usr/lib64/libpthread.a" creates the same
binary. The lib pthread is also used by our code so should be included in the
But the binary create like above still has the problems!
Our solution is to link libc, libm and libpthread dynamic on a Ubuntu LTS 8.0.4
system. This binary works also on most other systems (with reasonable new glibc).
On strange thing is: if I compile dynamic the same on a Fedora 7 system and run
it on e.g. SLES 10 the binary breaks already in the loader with Floating
Exception. The binary compiled on Unbuntu with the same setup works fine. That
is strange (probable SLES has no standard glibc)
I have not bothered to actually trace this but I have a likely suspect.
As I understand it, resolution is handled by libnss_*.so, which are still
dynamically linked even if the executable is statically linked. They
presumably feature weak extern references to various pthread functions.
If pthreoads is dynamically linked, these references succeed. If
pthreads is statically linked then the pthread symbols are not reexported
to things loaded with dlopen() like the libnss libraries.
I don't know a good solution but perhaps -rdynamic has some role to play?
Or perhaps a less bloated libc than glibc could be used, one which has a
number of simple resolvers built in? The libnss resolvers on my Linux
system take up 275kB which is enough space for many other unixes to
implement an entire libc....
I have the same problem.
Sometimes call of getaddrinfo function in one of pthreads causes an segfault.
Application linked with -static flag. It should be linked statically because I
use it on systems without installed pthread libraries and don't have ability to
I didn't find any helpful suggestions in the thread. So, what should I do to fix
The same crash happens if the host program is not compiled with "-pthread" and
dynamically loads a module which is linked to libpthread.so and calls
getaddrinfo() from multiple threads.
I will attache two example C files that show case this problem.
Created attachment 5325 [details]
Example module which calls getaddrinfo() from many threads.
Compile this example module with:
gcc -o crash_getaddrinfo.so -Wall -fPIC -shared -pthread crash_getaddrinfo.c
Created attachment 5326 [details]
Simple host program to dynamically load a module with dlopen().
Compile without -pthread:
gcc -ldl -Wall -o crash_main_no_pthread crash_main.c
Compile with -pthread:
gcc -ldl -Wall -o crash_main_pthread crash_main.c -pthread
By default the program will try to load a module named:
I first ran into this problem when using a Lua C module (ZeroMQ bindings for
Lua) that uses IO threads in the background. The only work-around is to either
compile the Lua VM with -pthread (This shouldn't be required, since not all Lua
scripts need pthread support) or to use "LD_PRELOAD=/lib/libpthread.so
I would prefer an option where the host program (Lua VM) didn't have to either
be compiled with -pthread or wrapped in a script to preload libpthread.so.
Also the example program will even crash on a single-cpu(single-core) computer
running Debian 6.0, glibc 2.11.2.
Created attachment 5327 [details]
Valgrind output shows some invalid reads into freed memory before the program
crashes on a NULL pointer.
This problem seems to be caused by a race condition between the threads calling
getaddrinfo(). With a small number of threads it doesn't always happen.
Atleast the backtrace has always been the same.
(In reply to comment #12)
> The same crash happens if the host program is not compiled with "-pthread" and
> dynamically loads a module which is linked to libpthread.so and calls
> getaddrinfo() from multiple threads.
> I will attache two example C files that show case this problem.
A comment on this case is at:
my advise for now is to link your application against libpthread until somebody
really digs into this and figures out what is supposed to work and how.