This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug network/10652] getaddrinfo causes segfault if multithreaded and linked statically


https://sourceware.org/bugzilla/show_bug.cgi?id=10652

Carlos O'Donell <carlos at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |NEW

--- Comment #19 from Carlos O'Donell <carlos at redhat dot com> ---
In a test case where the application doesn't link against libpthread, but a
dlopen'd library does, parallel calls to getaddrinfo cause corruption in the IO
layers and eventually a crash.

Even though libpthread.so.1 has been loaded the weak-ref-and-check idiom in the
NSS code isn't working. The GOT entry stays zero and therefore the nss code
skips doing any locking and we get serious corruption via
get_contents->__GI_fgets_unlocked (doing unlocked file IO with multiple threads
causes data races and corruption). 

The skipped locks are in _nss_files_gethostbyname4_r (libnss_files.so). When
the application is compiled with -lpthread the GOT entry has a non-zero value
of 0x00007ffff77bc460 which is "0x7ffff77bc460 <__GI___pthread_mutex_lock>:   
sub    $0x8,%rsp" and therefore correct. That entry is the GOT entry #40 with
relocation: 000000000020bfd8  0000001a00000006 R_X86_64_GLOB_DAT     
0000000000000000 __pthread_mutex_lock + 0.

If libpthread is loaded *after* libnss_files.so is loaded I don't see that
there is anything you can do to make the NSS code use locks since the GOT
relocation has already been processed. However in this case libpthread is
loaded *before* libnss_files.so, but it appears as if the resolution scope
prevents the symbols from libpthread being made available to libnss_files.so?

e.g.
     20987:     object=/home/carlos/build/glibc/nss/libnss_files.so.2 [0]
     20987:      scope 0: ./crash_main_no_pthread
/home/carlos/build/glibc/dlfcn/libdl.so.2 /home/carlos/build/glibc/libc.so.6
/home/carlos/build/glibc/elf/ld.so
     20987:      scope 1: /home/carlos/build/glibc/nss/libnss_files.so.2
/home/carlos/build/glibc/libc.so.6 /home/carlos/build/glibc/elf/ld.so

Notice libnss_files.so.2 is in it's own scope without libpthread. As opposed to
crash_getaddrinfo.so's scope with libpthread in it

e.g.
     20987:     object=/home/carlos/support/2013-11-22/crash_getaddrinfo.so [0]
     20987:      scope 0: ./crash_main_no_pthread
/home/carlos/build/glibc/dlfcn/libdl.so.2 /home/carlos/build/glibc/libc.so.6
/home/carlos/build/glibc/elf/ld.so
     20987:      scope 1: /home/carlos/support/2013-11-22/crash_getaddrinfo.so
/home/carlos/build/glibc/nptl/libpthread.so.0
/home/carlos/build/glibc/libc.so.6 /home/carlos/build/glibc/elf/ld.so

I don't know what's the right answer here. There are really only two resolution
scopes, global and local, the scopes listed above are internal details of
glibc's dyanmic loader. Why libpthread's symbols wouldn't be used for the
relocation in libnss_files.so is what baffles me, one would have to track down
the exact relocation and determine why the libpthread symbol isn't used.

I'm not working on this so I'm flipping this to NEW, but I thought I'd post
what I saw during my analysis of a similar internal Red Hat bug.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]