This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] pthread_once hangs when init routine throws an exception [BZ #18435]
- From: "Carlos O'Donell" <carlos at redhat dot com>
- To: Szabolcs Nagy <szabolcs dot nagy at arm dot com>, Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>
- Cc: Marcus Shawcroft <marcus dot shawcroft at arm dot com>
- Date: Wed, 08 Jul 2015 12:09:49 -0400
- Subject: Re: [PATCH] pthread_once hangs when init routine throws an exception [BZ #18435]
- Authentication-results: sourceware.org; auth=none
- References: <556B7F10 dot 40209 at redhat dot com> <557741C5 dot 5060203 at redhat dot com> <559A8029 dot 1000705 at arm dot com> <559A8DAE dot 9040604 at gmail dot com> <559A9789 dot 3090805 at linaro dot org> <559AADC8 dot 4030409 at arm dot com> <559AB627 dot 2050006 at arm dot com> <559D02E2 dot 5000303 at arm dot com>
On 07/08/2015 07:00 AM, Szabolcs Nagy wrote:
> On 06/07/15 18:08, Szabolcs Nagy wrote:
>> On 06/07/15 17:33, Szabolcs Nagy wrote:
>>> On 06/07/15 15:58, Adhemerval Zanella wrote:
>>>> On 06-07-2015 11:16, Martin Sebor wrote:
>>>>>> this broke
>>>>>>
>>>>>> nptl/tst-join5
>>>>>> nptl/tst-once3
>>>>>>
>>>>>> tests on aarch64.
>>>>>>
>>>>>> the cleanup handler of the pthread_once and pthread_join
>>>>>> implementation don't run when they are canceled.
>>>>>
>>>>> I'll look into it as soon as I get access to an aarch64 machine.
>>>>>
>>>>> Martin
>>>>>
>>>>
>>>> And I see a regression with
>>>>
>>>> nptl/tst-once3
>>>>
>>>> for armhf.
>>>>
>>>
>>> in case of aarch64 the bug is somewhere in __pthread_unwind
>>> (called from __do_cancel) so probably a libgcc issue.
>>>
>>
>> the problem seems to be that gcc on x86_64 turns on
>> -fasynchronous-unwind-tables by default, but not on
>> aarch64 or arm.
>>
>> now i added -fasynchronous-unwind-tables to the cflags
>> of the relevant tests, will send a patch if they pass.
>>
>
> This uncovered a serious issue that affects other archs too.
Thanks.
> Both test failures are caused by glibc switching the internal
> mechanism of pthread cancellation clean up handling to use
> __attribute__((cleanup(f))) and -fexceptions, but the two test
> failures are independent:
>
> (1) Should -fasynchronous-unwind-tables be on by default in gcc?
>
> nptl/tst-once3 fails because the callback passed to pthread_once
> now has to be compiled with -fasynchronous-unwind-tables which
> is not on by default on arm and aarch64 gcc. So does glibc
> expect the users to use this flag correctly or does glibc
> requires the compiler to have it on by default?
This is bad.
> (My understanding: posix conforming c code cannot observe the
> presence of -fasynchronous-unwind-tables without invoking UB, but
> the glibc implementation of cancellation cleanup and backtrace
> from signal handlers makes this detail observable. Any function
> which may be canceled needs this flag to make cleanup work, so
> glibc seems to impose this as a requirement on the compiler: the
> user may not be in control of all the code that may be canceled).
We already impose the requirement that all such called code be
cancel safe anyway and it might not be unless all called code
uses cancel handlers to cleanup during cancellation. This would
be another requirement that imposes -fasynchronous-unwind-tables
on cancellation users. However, this is a new requirement and
old code can't be fixed, and thus we have problem that requires
versioning and documentation. All for the purposes of implementing
C++ std::call_once via pthread_once, which seems like is going
to be problematic.
> (2) Should gcc support exceptions from async signal handlers?
No. I don't think you can support it safely.
> nptl/tst-join5 failure is more problematic: it fails because gcc
> does not seem to implement -fexceptions with the assumption that
> signal handlers can throw, in particular it assumes inline asm
> does not throw exceptions. If the syscall that is a cancellation
> point appears between pthread_cleanup_push and pthread_cleanup_pop
> in glibc internal code, the cleanup handler may not get run on
> cancellation depending on where gcc moved the syscall inline asm.
> (It is free to move it outside the code range that is marked for
> exception handling, this is what happens on aarch64 in pthread_join).
> This affects all archs, but some may get lucky.
Ah! That's truly a terrible scenario.
> (My understanding: gcc must be very strict about how it marks
> the code range for exception handling and assume any instruction
> may throw if it wants -fexceptions -fasynchronous-unwind-tables to
> work from signal handlers. Current compilers do not seem to support
> this so glibc internal code should not rely on it, which means the
> cancellation mechanism should not rely on exception handling at
> least not when the exception is thrown from the cancel signal
> handler. I think the gnu toolchain should not try to make pthread
> cancellation to interoperate with C++ exceptions nor to make
> exceptions work from signal handlers: no standard requires this
> behaviour and seems to cause problems).
No, we just need to revert this patch and have C++ implement
std::call_once by itself.
> Both issues cause silent omission of cleanup handlers running
> on cancellation, leaving libc internal state inconsistent.
>
> The second issue may be worth discussing on the gcc list.
>
Cheers,
Carlos.