This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Protect _dl_profile_fixup data-dependency order [BZ #23690]
- From: Tulio Magno Quites Machado Filho <tuliom at linux dot ibm dot com>
- To: "Carlos O'Donell" <carlos at redhat dot com>, libc-alpha at sourceware dot org, John David Anglin <dave dot anglin at bell dot net>, Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
- Cc:
- Date: Thu, 20 Sep 2018 13:42:31 -0300
- Subject: Re: [PATCH] Protect _dl_profile_fixup data-dependency order [BZ #23690]
- References: <20180919214829.20551-1-tuliom@linux.ibm.com> <2009c13b-169f-e476-6380-ffe4ceede9ac@redhat.com>
Carlos O'Donell <carlos@redhat.com> writes:
> On 09/19/2018 05:48 PM, Tulio Magno Quites Machado Filho wrote:
>> The field reloc_result->addr is used to indicate if the rest of the
>> fields of reloc_result have already been written, creating a
>> data-dependency order.
>> Reading reloc_result->addr to the variable value requires to complete
>> before reading the rest of the fields of reloc_result.
>> Likewise, the writes to the other fields of the reloc_result must
>> complete before reloc_result-addr is updated.
>
> Good catch. This needs just a little more work and it's done.
>
> When you're ready please post a v2 and TO me for review.
Ack.
>> diff --git a/elf/dl-runtime.c b/elf/dl-runtime.c
>> index 63bbc89776..6518e66fd6 100644
>> --- a/elf/dl-runtime.c
>> +++ b/elf/dl-runtime.c
>> @@ -183,9 +183,16 @@ _dl_profile_fixup (
>> /* This is the address in the array where we store the result of previous
>> relocations. */
>> struct reloc_result *reloc_result = &l->l_reloc_result[reloc_index];
>> - DL_FIXUP_VALUE_TYPE *resultp = &reloc_result->addr;
>>
>> - DL_FIXUP_VALUE_TYPE value = *resultp;
>> + /* CONCURRENCY NOTES:
>> +
>> + The following code uses reloc_result->addr to indicate if it is the first
>> + time this object is being relocated.
>> + Reading/Writing from/to reloc_result->addr must not happen before previous
>> + writes to reloc_result complete as they could end-up with an incomplete
>> + struct. */
>
> This is not quite accurate. The following code uses DL_FIXUP_VALUE_CODE_ADDR to
> access a potential member of addr to indicate it is the first time the object is
> being relocated. The "guard" variable in this access is either the address itself
> or the ip of a function descriptor (the only two implementations of "code addr").
Ack. I'll elaborate the comments in the next version of the patch.
> Also we don't explain what other data is being accessed? What non-function-locale
> data is being updated as part of the guard variable check?
I don't follow you.
Do you mean other data being updated but not in reloc_result?
What do you mean by non-function-locale?
> In the case of the function descriptor it's easy to know that the fdesc's gp is
> going to be written to, and without a acquire we won't see that write, or the new
> ip value, so we might have two threads update the same thing, in theory a benign
> data race which we should fix.
Why should we fix it?
> In theory elf_ifunc_invoke() is called and that could write to arbitrary memory
> with the IFUNC resolver, whose values should be seen by any future thread via
> the new load acquire you are putting in place. You might be able to run the IFUNC
> resolver twice which could be bad?
Good point. I haven't tested IFUNCs yet.
>> + DL_FIXUP_VALUE_TYPE *resultp = &reloc_result->addr;
>> + DL_FIXUP_VALUE_TYPE value = atomic_load_acquire(resultp);
>
> You are potentially requiring an atomic load of a structure whose size can be
> an arbitrary size depending on machine and ABI design (64-bit fore 32-bit hppa,
> and 128-bit for 64-bit ia64). These architectures might not have such wide
> atomic operations, and ...
Would it help hppa and ia64 if I replaced it with just memory fences? e.g.:
DL_FIXUP_VALUE_TYPE value = *resultp;
atomic_thread_fence_acquire ();
Likewise for the atomic_store_release () later in the code.
>> /* By default we do not call the pltexit function. */
>>
>
> What I'd like to see:
>
> - Refactor macros to get access to the "code addr" (ip of the fdesc or regular
> addr), and do the atomic loads and stores against that. Or feel free to
> delegate this to more macros at the machine level e.g.
> DL_FIXUP_ATOMIC_LOAD_ACQUIRE_REALLY_LONG_NAME_VALUE_CODE_ADDR (value),
> but I feel this looses out on the fact that we have generic atomic macros
> and so could use something like DL_FIXUP_VALUE_ADDR_CODE which returns
> the address of the code addr to load, and then you operate atomically on
> that to do the load.
>
> - Fix ./sysdeps/hppa/dl-machine.h (elf_machine_fixup_plt) to do an atomic
> store release to update ip. Likewise for ia64. Both should reference the
> concurrency notes in elf/dl-runtime.c.
If we use memory fences, I believe this won't be necessary.
Do you agree?
> - Can you write a test for this? A test which starts several threads, barriers
> them all, then starts them, and lets them try to resolve (test is compiled
> lazy binding forced) the plt entries, then repeat for a while, each iteration
> with a different function that hasn't been bound yet (maybe synthetically
> generate a few thousand functions). If we ever see this fail we'll know we
> did something wrong.
Yes. There is already something attached to the bug report, but it does
require dozens of threads to reproduce here.
If you're fine with that, I can adapt it.
--
Tulio Magno