Created attachment 5283 [details] proposed patch, incl. testcase In case a library is opened with RTLD_LOCAL, dlclose()ing that library will remove the local scope from all subsequently loaded libraries unconditionally, even though such a library is marked as RTLD_NODELETE. This causes subsequent lookups within that library to fail if the library depends on other libraries than those already loaded within the global scope. This has been exposed in a real-world case where libproxy opens a KDE4 plugin with RTLD_LOCAL, the plugin depends on libkde4_core and libkde4_core is marked as NODELETE due to having a STB_GNU_UNIQ symbol; the plugin is dlclose()d later but ld.so raises a fatal error when libkde4_core global destructor is called (it depends on libqt4, but libqt4 has been in the plugin's local scope only and is gone now).
The patch takes an approach that is probably too conservative. Some common .so like libpthread are marked NODELETE and any RTLD_LOCAL-opened .so that depends on such will be held in memory forever. The real solution should be to rebuild the scope for the NODELETE object.
That test case doesn't show any bug. If you you use the handle for a dlopen'ed object to look up an object, then close the object, and finally use the returned function it is bound to fail. Whether the symbol has been found in a different object doesn't matter. You have to provide a valid test case.
Created attachment 5749 [details] better testcase Indeed, this is a better testcase really reflecting what the proxy library does. The important part is that there needs to be two (independend) libraries loaded, that one dependency of one of them is nodelete, and has a finalizer that needs to lookup something in its own dependencies that wasn't available before. Then with unpatched glibc: # make # /tmp/mm/lib64/ld-linux-x86-64.so.2 --library-path /tmp/mm/lib64/ ./app ./app: symbol lookup error: /suse/matz/src/nodeletebug/lib2.so: undefined symbol: in_lib3 with patched glibc: # ./app #
thanks for the testcase, changing status now.
Is this bug still reproducible on glibc >= 2.15? I've failed to reproduce it with 2.16+, and I suppose commit http://sourceware.org/git/?p=glibc.git;a=commitdiff;h=glibc-2.14-208-g39dd69d has something to do with it, because reverting it reintroduces the bug.
I really can't make up my mind right now if Andreas' patch is a fix for this issue, or just hides it. The testcase here needed NODELETE libraries to force some deps to stay around. Andreas' patch has this in it: + * elf/dl-close.c (_dl_close_worker): Reset private search list if + it wasn't used. ... + else if (new_list != NULL) + { + /* We didn't change the scope array, so reset the search + list. */ + imap->l_searchlist.r_list = NULL; + imap->l_searchlist.r_nlist = 0; So, what happens if we _do_ have changed the scope array, or used the private search list? In other words, could the testcase from this report be extended to make this happen and retrigger the bug, or is it fixed for good?
Both testcases (the attached one and the unload8 test) are equivalent: in the attached testcase lib2 cannot be unloaded due to NODELETE, in the unload8 testcase unload8mod2 cannot be unloaded due to the dlopen dependency from unload8mod3. Thus they trigger the same bug.
Should be fixed in 2.15.