Bug 28745 - _dl_find_object miscompilation on powerpc64le
Summary: _dl_find_object miscompilation on powerpc64le
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: dynamic-link (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: 2.35
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-01-04 12:06 UTC by Florian Weimer
Modified: 2024-02-14 02:48 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
Preprocessed source code (52.21 KB, text/plain)
2022-01-04 12:07 UTC, Florian Weimer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Florian Weimer 2022-01-04 12:06:09 UTC
This part of elf/dl-find_object.c

    420             if (obj != NULL)
    421               {
    422                 /* Found the right mapping.  Copy out the data prior to
    423                    checking if the read transaction was successful.  */
    424                 struct dl_find_object_internal copy = *obj;
    425                 if (_dlfo_read_success (start_version))
    426                   {
    427                     _dl_find_object_to_external (&copy, result);
    428                     return 0;
    429                   }
    430                 else
    431                   /* Read transaction failure.  */
    432                   goto retry;
    433               }

gets compiled into:

/root/git/elf/dl-find_object.c:420
     494:       addic.  r8,r9,-32
     498:       beq     404 <__GI__dl_find_object+0xc4>
     49c:       ld      r7,-32(r9)
__atomic_wide_counter_load_acquire():
/root/git/elf/../include/atomic_wide_counter.h:36
     4a0:       ld      r9,16(r12)
__GI__dl_find_object():
/root/git/elf/dl-find_object.c:424
     4a4:       ld      r6,16(r8)
     4a8:       ld      r8,24(r8)
__atomic_wide_counter_load_acquire():
/root/git/elf/../include/atomic_wide_counter.h:36
     4ac:       lwsync
__GI__dl_find_object():
/root/git/elf/dl-find_object.c:425
     4b0:       cmpld   cr7,r5,r9
     4b4:       bne     cr7,3a0 <__GI__dl_find_object+0x60>
_dl_find_object_to_external():
/root/git/elf/./dl-find_object.h:51
     4b8:       li      r9,0
/root/git/elf/./dl-find_object.h:52
     4bc:       std     r7,8(r4)
/root/git/elf/./dl-find_object.h:53
     4c0:       std     r10,16(r4)
__GI__dl_find_object():
/root/git/elf/dl-find_object.c:428
     4c4:       li      r3,0
_dl_find_object_to_external():
/root/git/elf/./dl-find_object.h:54
     4c8:       std     r6,24(r4)
/root/git/elf/./dl-find_object.h:55
     4cc:       std     r8,32(r4)
/root/git/elf/./dl-find_object.h:51
     4d0:       std     r9,0(r4)
__GI__dl_find_object():
/root/git/elf/dl-find_object.c:428
     4d4:       blr

The critical code is at offsets 4a4 and 4a8: This is the defensive copy *within* the software TM region. The copy happens *after* the load of the TM version at offset 4a0. This means that a concurrent write cannot be detected reliably.
Comment 1 Florian Weimer 2022-01-04 12:07:35 UTC
Created attachment 13889 [details]
Preprocessed source code
Comment 2 Florian Weimer 2022-01-04 12:11:14 UTC
The acquire load doesn't have a memory clobber:

static inline uint64_t
__atomic_wide_counter_load_acquire (__atomic_wide_counter *c)
{
  return ({ __typeof (*(&c->__value64)) __atg101_val = ({ __typeof ((__typeof (*(&c->__value64))) *(&c->__value64)) __atg100_val; __asm ("" : "=r" (__atg100_val) : "0" (*(&c->__value64))); __atg100_val; }); __asm ("lwsync" ::: "memory"); __atg101_val; });
}

So it's unordered with regards to the defensive copy. No GCC bug.
Comment 3 Florian Weimer 2022-01-04 14:49:03 UTC
I asked for help:

Help needed for glibc software transaction memory algorithm
https://sourceware.org/pipermail/libc-alpha/2022-January/134965.html
Comment 4 Florian Weimer 2022-01-07 12:23:03 UTC
Fixed via:

commit acbaad31e8ea10fce8b9c0aef58afb388bf7489d
Author: Florian Weimer <fweimer@redhat.com>
Date:   Fri Jan 7 13:21:57 2022 +0100

    elf: Fix fences in _dl_find_object_update (bug 28745)
    
    As explained in Hans Boehm, Can Seqlocks Get Along with Programming
    Language Memory Models?, an acquire fence is needed in
    _dlfo_read_success.  The lack of a fence resulted in an observable
    bug on powerpc64le compile-time load reordering.
    
    The fence in _dlfo_mappings_begin_update has been reordered, turning
    the fence/store sequence into a release MO store equivalent.
    
    Relaxed MO loads are used on the reader side, and relaxed MO stores
    on the writer side for the shared data, to avoid formal data races.
    This is just to be conservative; it should not actually be necessary
    given how the data is used.
    
    This commit also fixes the test run time.  The intent was to run it
    for 3 seconds, but 0.3 seconds was enough to uncover the bug very
    occasionally (while 3 seconds did not reliably show the bug on every
    test run).
    
    Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>