concurrent pthread_create (or tls access) and dlclose are not safe now because pthread_create can dereference link_map pointers that may be freed. tls access has the same problem but there this is only used for an assertion check that is not strictly necessary so easy to fix. pthread_create really needs to look at the link_maps in case they have static tls that needs tls and dtv initialization at thread creation time. neither pthread_create nor tls access hold the dl_load_lock that would prevent this issue.
The master branch has been updated by Szabolcs Nagy <nsz@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=1387ad6225c2222f027790e3f460e31aa5dd2c54 commit 1387ad6225c2222f027790e3f460e31aa5dd2c54 Author: Szabolcs Nagy <szabolcs.nagy@arm.com> Date: Wed Dec 30 19:19:37 2020 +0000 elf: Fix data races in pthread_create and TLS access [BZ #19329] DTV setup at thread creation (_dl_allocate_tls_init) is changed to take the dlopen lock, GL(dl_load_lock). Avoiding data races here without locks would require design changes: the map that is accessed for static TLS initialization here may be concurrently freed by dlclose. That use after free may be solved by only locking around static TLS setup or by ensuring dlclose does not free modules with static TLS, however currently every link map with TLS has to be accessed at least to see if it needs static TLS. And even if that's solved, still a lot of atomics would be needed to synchronize DTV related globals without a lock. So fix both bug 19329 and bug 27111 with a lock that prevents DTV setup running concurrently with dlopen or dlclose. _dl_update_slotinfo at TLS access still does not use any locks so CONCURRENCY NOTES are added to explain the synchronization. The early exit from the slotinfo walk when max_modid is reached is not strictly necessary, but does not hurt either. An incorrect acquire load was removed from _dl_resize_dtv: it did not synchronize with any release store or fence and synchronization is now handled separately at thread creation and TLS access time. There are still a number of racy read accesses to globals that will be changed to relaxed MO atomics in a followup patch. This should not introduce regressions compared to existing behaviour and avoid cluttering the main part of the fix. Not all TLS access related data races got fixed here: there are additional races at lazy tlsdesc relocations see bug 27137. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The master branch has been updated by Szabolcs Nagy <nsz@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=572bd547d57a39b6cf0ea072545dc4048921f4c3 commit 572bd547d57a39b6cf0ea072545dc4048921f4c3 Author: Szabolcs Nagy <szabolcs.nagy@arm.com> Date: Thu Dec 31 13:59:38 2020 +0000 elf: Fix DTV gap reuse logic [BZ #27135] For some reason only dlopen failure caused dtv gaps to be reused. It is possible that the intent was to never reuse modids for a different module, but after dlopen failure all gaps are reused not just the ones caused by the unfinished dlopened. So the code has to handle reused modids already which seems to work, however the data races at thread creation and tls access (see bug 19329 and bug 27111) may be more severe if slots are reused so this is scheduled after those fixes. I think fixing the races are not simpler if reuse is disallowed and reuse has other benefits, so set GL(dl_tls_dtv_gaps) whenever entries are removed from the middle of the slotinfo list. The value does not have to be correct: incorrect true value causes the next modid query to do a slotinfo walk, incorrect false will leave gaps and new entries are added at the end. Fixes bug 27135. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
fixed for 2.34