This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCHv3] Protect _dl_profile_fixup data-dependency order [BZ #23690]

From: Florian Weimer <fweimer at redhat dot com>
To: Carlos O'Donell <carlos at redhat dot com>
Cc: Tulio Magno Quites Machado Filho <tuliom at linux dot ibm dot com>, libc-alpha at sourceware dot org, John David Anglin <dave dot anglin at bell dot net>, Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, Joseph Myers <joseph at codesourcery dot com>
Date: Wed, 17 Oct 2018 22:12:42 +0200
Subject: Re: [PATCHv3] Protect _dl_profile_fixup data-dependency order [BZ #23690]
References: <87tvlwxmpi.fsf@mid.deneb.enyo.de> <20181011025754.23862-1-tuliom@linux.ibm.com> <54ac395f-7415-ce4b-7204-d3c3272ad88b@redhat.com> <87o9bvbc65.fsf@oldenburg.str.redhat.com> <741d001d-6a50-7dfc-2b55-07117da8bacb@redhat.com>

* Carlos O'Donell:

> On 10/15/18 8:57 AM, Florian Weimer wrote:
>> * Carlos O'Donell:
>> 
>>> (3) Fence-to-fence sync.
>>>
>>> For fence-to-fence synchronization to work we need an acquire and release
>>> fence, and we have that.
>>>
>>> We are missing the atomic read and write of the guard. Please review below.
>>> Florian mentioned this in his review. He is correct.
>>>
>>> And all the problems are back again because you can't do atomic loads of
>>> the large guards because they are actually the function descriptor structures.
>>> However, this is just laziness, we used the addr because it was convenient.
>>> It is no longer convenient. Just add a 'init' field to reloc_result and use
>>> that as the guard to synchronize the threads against for initialization of
>>> the results. This should solve the reloc_result problem (ignorning the issues
>>> hppa and ia64 have with the fdesc updates across multiple threads in _dl_fixup).
>> 
>> I think due to various external factors, we should go with the
>> fence-based solution for now, and change it later to something which
>> uses an acquire/release on the code address later, using proper atomics.
>
> Let me clarify.
>
> The fence fix as proposed in v3 is wrong for all architectures.
>
> We are emulating C/C++ 11 atomics within glibc, and a fence-to-fence sync
> *requires* an atomic load / store of the guard, you can't use a non-atomic
> access. The point of the atomic load/store is to ensure you don't have a
> data race.

Carlos, I'm sorry, but I think your position is logically inconsistent.

Formally, you cannot follow the memory model here without a substantial
rewrite of the code, breaking up the struct fdesc abstraction.  The
reason is that without blocking synchronization, you still end up with
two non-atomic writes to the same object, which is a data race, and
undefined, even if both threads write the same value.

As far as I can see, POWER is !USE_ATOMIC_COMPILER_BUILTINS, so our
relaxed MO store is just a regular store, without a compiler barrier.
That means after all that rewriting, we basically end up with the same
code and the same formal data race that we would have when we just used
fences.

This is different for USE_ATOMIC_COMPILER_BUILTINS architectures, where
we do use actual atomic stores.  But for !USE_ATOMIC_COMPILER_BUILTINS,
the fence-based approach is as good as we can get, with or without
breaking the abstractions.

So as I said, given the constraints we are working under, we should go
with the solution based on fences, and have that tested on Aarch64 as
well.

>> I don't want to see this bug fix blocked by ia64 and hppa.  The proper
>> fix needs some reshuffling of the macros here, or maybe use an unused
>> bit in the flags field as an indicator for initialization.
>
> The fix for this is straight forward.
>
> Add a new initializer field to the reloc_result, it's an internal data
> structure. It can be as big as we want and we can optimize it later.
>
> You don't need to do any big cleanups, but we *do* have to get the
> synchronization correct.

See above; I don't think we can get the synchronization formally
correct, even with any level of cleanups.  In the data race case, we
would have

  atomic acquire MO load of initializer field
  non-atomic writes to various struct fields
  atomic release MO store to initializer field

in each thread.  That's still undefined behavior due to the blocking
stores in the middle.

Let me reiterate: Just because you say our atomics are C11, it doesn't
make them so.  They are syntactically different, and they are not
presented to the compiler as atomics for !USE_ATOMIC_COMPILER_BUILTINS.
I know that you and Torvald didn't consider this a problem in the past,
but maybe you can reconsider your position?

Thanks,
Florian

Follow-Ups:
- Re: [PATCHv3] Protect _dl_profile_fixup data-dependency order [BZ #23690]
  - From: Carlos O'Donell

References:
- Re: [PATCHv2] Protect _dl_profile_fixup data-dependency order [BZ #23690]
  - From: Florian Weimer
- [PATCHv3] Protect _dl_profile_fixup data-dependency order [BZ #23690]
  - From: Tulio Magno Quites Machado Filho
- Re: [PATCHv3] Protect _dl_profile_fixup data-dependency order [BZ #23690]
  - From: Carlos O'Donell
- Re: [PATCHv3] Protect _dl_profile_fixup data-dependency order [BZ #23690]
  - From: Florian Weimer
- Re: [PATCHv3] Protect _dl_profile_fixup data-dependency order [BZ #23690]
  - From: Carlos O'Donell

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]