This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Protect _dl_profile_fixup data-dependency order [BZ #23690]

Carlos O'Donell <> writes:

> On 09/19/2018 05:48 PM, Tulio Magno Quites Machado Filho wrote:
>> The field reloc_result->addr is used to indicate if the rest of the
>> fields of reloc_result have already been written, creating a
>> data-dependency order.
>> Reading reloc_result->addr to the variable value requires to complete
>> before reading the rest of the fields of reloc_result.
>> Likewise, the writes to the other fields of the reloc_result must
>> complete before reloc_result-addr is updated.
> Good catch. This needs just a little more work and it's done.
> When you're ready please post a v2 and TO me for review.


>> diff --git a/elf/dl-runtime.c b/elf/dl-runtime.c
>> index 63bbc89776..6518e66fd6 100644
>> --- a/elf/dl-runtime.c
>> +++ b/elf/dl-runtime.c
>> @@ -183,9 +183,16 @@ _dl_profile_fixup (
>>    /* This is the address in the array where we store the result of previous
>>       relocations.  */
>>    struct reloc_result *reloc_result = &l->l_reloc_result[reloc_index];
>> -  DL_FIXUP_VALUE_TYPE *resultp = &reloc_result->addr;
>> -  DL_FIXUP_VALUE_TYPE value = *resultp;
>> +
>> +     The following code uses reloc_result->addr to indicate if it is the first
>> +     time this object is being relocated.
>> +     Reading/Writing from/to reloc_result->addr must not happen before previous
>> +     writes to reloc_result complete as they could end-up with an incomplete
>> +     struct.  */
> This is not quite accurate. The following code uses DL_FIXUP_VALUE_CODE_ADDR to
> access a potential member of addr to indicate it is the first time the object is
> being relocated. The "guard" variable in this access is either the address itself
> or the ip of a function descriptor (the only two implementations of "code addr").

Ack.  I'll elaborate the comments in the next version of the patch.

> Also we don't explain what other data is being accessed? What non-function-locale
> data is being updated as part of the guard variable check?

I don't follow you.
Do you mean other data being updated but not in reloc_result?
What do you mean by non-function-locale?

> In the case of the function descriptor it's easy to know that the fdesc's gp is 
> going to be written to, and without a acquire we won't see that write, or the new 
> ip value, so we might have two threads update the same thing, in theory a benign 
> data race which we should fix.

Why should we fix it?

> In theory elf_ifunc_invoke() is called and that could write to arbitrary memory
> with the IFUNC resolver, whose values should be seen by any future thread via
> the new load acquire you are putting in place. You might be able to run the IFUNC
> resolver twice which could be bad?

Good point.  I haven't tested IFUNCs yet.

>> +  DL_FIXUP_VALUE_TYPE *resultp = &reloc_result->addr;
>> +  DL_FIXUP_VALUE_TYPE value = atomic_load_acquire(resultp);
> You are potentially requiring an atomic load of a structure whose size can be
> an arbitrary size depending on machine and ABI design (64-bit fore 32-bit hppa,
> and 128-bit for 64-bit ia64). These architectures might not have such wide
> atomic operations, and ...

Would it help hppa and ia64 if I replaced it with just memory fences? e.g.:

  DL_FIXUP_VALUE_TYPE value = *resultp;
  atomic_thread_fence_acquire ();

Likewise for the atomic_store_release () later in the code.

>>    /* By default we do not call the pltexit function.  */
> What I'd like to see:
> - Refactor macros to get access to the "code addr" (ip of the fdesc or regular
>   addr), and do the atomic loads and stores against that. Or feel free to
>   delegate this to more macros at the machine level e.g.
>   but I feel this looses out on the fact that we have generic atomic macros
>   and so could use something like DL_FIXUP_VALUE_ADDR_CODE which returns
>   the address of the code addr to load, and then you operate atomically on
>   that to do the load.
> - Fix ./sysdeps/hppa/dl-machine.h (elf_machine_fixup_plt) to do an atomic
>   store release to update ip. Likewise for ia64. Both should reference the
>   concurrency notes in elf/dl-runtime.c.

If we use memory fences, I believe this won't be necessary.
Do you agree?

> - Can you write a test for this? A test which starts several threads, barriers
>   them all, then starts them, and lets them try to resolve (test is compiled
>   lazy binding forced) the plt entries, then repeat for a while, each iteration
>   with a different function that hasn't been bound yet (maybe synthetically
>   generate a few thousand functions). If we ever see this fail we'll know we
>   did something wrong.

Yes.  There is already something attached to the bug report, but it does
require dozens of threads to reproduce here.
If you're fine with that, I can adapt it.

Tulio Magno

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]