This part of elf/dl-find_object.c 420 if (obj != NULL) 421 { 422 /* Found the right mapping. Copy out the data prior to 423 checking if the read transaction was successful. */ 424 struct dl_find_object_internal copy = *obj; 425 if (_dlfo_read_success (start_version)) 426 { 427 _dl_find_object_to_external (©, result); 428 return 0; 429 } 430 else 431 /* Read transaction failure. */ 432 goto retry; 433 } gets compiled into: /root/git/elf/dl-find_object.c:420 494: addic. r8,r9,-32 498: beq 404 <__GI__dl_find_object+0xc4> 49c: ld r7,-32(r9) __atomic_wide_counter_load_acquire(): /root/git/elf/../include/atomic_wide_counter.h:36 4a0: ld r9,16(r12) __GI__dl_find_object(): /root/git/elf/dl-find_object.c:424 4a4: ld r6,16(r8) 4a8: ld r8,24(r8) __atomic_wide_counter_load_acquire(): /root/git/elf/../include/atomic_wide_counter.h:36 4ac: lwsync __GI__dl_find_object(): /root/git/elf/dl-find_object.c:425 4b0: cmpld cr7,r5,r9 4b4: bne cr7,3a0 <__GI__dl_find_object+0x60> _dl_find_object_to_external(): /root/git/elf/./dl-find_object.h:51 4b8: li r9,0 /root/git/elf/./dl-find_object.h:52 4bc: std r7,8(r4) /root/git/elf/./dl-find_object.h:53 4c0: std r10,16(r4) __GI__dl_find_object(): /root/git/elf/dl-find_object.c:428 4c4: li r3,0 _dl_find_object_to_external(): /root/git/elf/./dl-find_object.h:54 4c8: std r6,24(r4) /root/git/elf/./dl-find_object.h:55 4cc: std r8,32(r4) /root/git/elf/./dl-find_object.h:51 4d0: std r9,0(r4) __GI__dl_find_object(): /root/git/elf/dl-find_object.c:428 4d4: blr The critical code is at offsets 4a4 and 4a8: This is the defensive copy *within* the software TM region. The copy happens *after* the load of the TM version at offset 4a0. This means that a concurrent write cannot be detected reliably.
Created attachment 13889 [details] Preprocessed source code
The acquire load doesn't have a memory clobber: static inline uint64_t __atomic_wide_counter_load_acquire (__atomic_wide_counter *c) { return ({ __typeof (*(&c->__value64)) __atg101_val = ({ __typeof ((__typeof (*(&c->__value64))) *(&c->__value64)) __atg100_val; __asm ("" : "=r" (__atg100_val) : "0" (*(&c->__value64))); __atg100_val; }); __asm ("lwsync" ::: "memory"); __atg101_val; }); } So it's unordered with regards to the defensive copy. No GCC bug.
I asked for help: Help needed for glibc software transaction memory algorithm https://sourceware.org/pipermail/libc-alpha/2022-January/134965.html
Fixed via: commit acbaad31e8ea10fce8b9c0aef58afb388bf7489d Author: Florian Weimer <fweimer@redhat.com> Date: Fri Jan 7 13:21:57 2022 +0100 elf: Fix fences in _dl_find_object_update (bug 28745) As explained in Hans Boehm, Can Seqlocks Get Along with Programming Language Memory Models?, an acquire fence is needed in _dlfo_read_success. The lack of a fence resulted in an observable bug on powerpc64le compile-time load reordering. The fence in _dlfo_mappings_begin_update has been reordered, turning the fence/store sequence into a release MO store equivalent. Relaxed MO loads are used on the reader side, and relaxed MO stores on the writer side for the shared data, to avoid formal data races. This is just to be conservative; it should not actually be necessary given how the data is used. This commit also fixes the test run time. The intent was to run it for 3 seconds, but 0.3 seconds was enough to uncover the bug very occasionally (while 3 seconds did not reliably show the bug on every test run). Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>