Bug 31997

Summary: exit is not thread-safe, even when all atexit handlers are thread-safe
Product: glibc Reporter: Ralf Jung <post+sourceware.org>
Component: libcAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal CC: adhemerval.zanella, bugdal, drepper.fsp, dtolnay, gabravier, nsz
Priority: P2    
Version: unspecified   
Target Milestone: 2.41   
Host: Target:
Build: Last reconfirmed:

Description Ralf Jung 2024-07-19 15:13:42 UTC
I don't have a reproducer in C, but here is one in Rust:

use std::thread;

fn main() {
    for _ in 0..32 {
        unsafe { libc::atexit(exit_handler) };
    }
    for _ in 0..2 {
        thread::spawn(|| std::process::exit(0));
    }
}

extern "C" fn exit_handler() {
    thread::sleep_ms(1000);
}

(Taken from https://github.com/rust-lang/rust/issues/126600)

This first, on the main thread, registers 32 atexit handlers that all call the "exit_handler" function which does nothing but sleep for 1s. Then it spawns 2 threads that both call "exit" concurrently. This program segfaults with glibc 2.37.

"exit" is not on the list of thread-unsafe POSIX functions (https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09), so I would expect exit to be thread-safe. It also seems to commonly be thread-safe in other libc implementations (at least, that's what it looks like for musl and the macOS libc).

I have seen that glibc documents "exit" as not thread-safe ("MT-Unsafe race:exit"), but I think that is a violation of POSIX and should be fixed: it should be safe to call "exit" concurrently from many threads, as long as the atexit handlers are themselves all thread-safe.
Comment 1 Rich Felker 2024-07-19 22:05:04 UTC
exit is not formally thread-unsafe; it's just explicitly UB per text in the C standard (and copied in POSIX):

"If exit() is called more than once, the behavior is undefined."

So in terms of the standard's requirements, this is "not a bug".

I believe however that making it UB was an error in the specification, a carryover from before C had threads when the only way exit could get called "more than once" was recursively via atexit handlers (which arguably *should be* undefined). We should fix this, and push for the standards to adopt language fixing it.

FWIW, it is currently NOT safe on musl to call exit more than once, since it was deemed unnecessary based on the standard. There are certain ways, depending on what functionality is linked and is in use, that exit may take locks that synchronize multiple callers against each other, but this is not currently by design, and does not give the safety properties it should have. As long as implementors are in agreement that this should change, I'm happy to "fix" it on the musl side.
Comment 2 Ralf Jung 2024-07-22 10:22:27 UTC
I agree that ideally the C standard would be clarified. But for POSIX specifically, since exit is *not* on the list of thread-unsafe functions, I think the intent of the POSIX standard is that it is okay to call exit concurrently, just not from inside an atexit handler.
Comment 3 Szabolcs Nagy 2024-07-22 15:20:22 UTC
(In reply to Ralf Jung from comment #2)
> I agree that ideally the C standard would be clarified. But for POSIX
> specifically, since exit is *not* on the list of thread-unsafe functions, I
> think the intent of the POSIX standard is that it is okay to call exit
> concurrently, just not from inside an atexit handler.

thread-safety of exit means it can be called concurrently with any *other* thread-safe functions.

the issue you reported is not a thread-safety issue, but a "called more than once" issue.

so as Rich points out this is not a bug, but either a QoI or specification issue if we think allowing multiple exit calls is useful.
Comment 4 Ralf Jung 2024-07-22 16:50:18 UTC
> thread-safety of exit means it can be called concurrently with any *other* thread-safe functions.

That seems like an odd claim. Clearly if ctime() is thread-safe that also means I can call ctime() concurrently from two threads.

To me there seems to be a direct contradiction between "function is thread-safe" and "function can only be called once (ever, in the entire execution of the program)".

Also, isn't the same code as "exit" being run when the main thread terminates? So a program may spawn a thread and call exit there -- calling it only once, ever -- but there could still be a segfault when this races with a return from the main function. Or is that somehow not a problem?

> if we think allowing multiple exit calls is useful.

Without being allowed to do concurrent calls to "exit", it becomes basically impossible to call "exit" in any situation where there may be other threads running and I don't have tight control over what they do. In other words, a library may never call "exit" (or it has to document exactly when which of its functions may exit, so that the rest of the program can take this into account), and a binary can only call "exit" when there is just a single thread running or we carefully coordinate that only one thread may be calling "exit". That all seems completely impractical.

Put differently, when a language like Rust (or Java, or Go, or Swift, or really any language that uses multiple threads) wants to provide their programmers the ability to exit the process without causing Undefined Behavior, there are only two options:
- the "exit" function provided by the underlying system interface is thread-safe, in the sense that it can be called from multiple threads concurrently.
- The "exit" function provided by these languages actually calls _Exit, and many people are very confused because buffers don't get flushed and all sorts of other weirdness happens.
Comment 5 Rich Felker 2024-07-22 18:20:24 UTC
> That seems like an odd claim. Clearly if ctime() is thread-safe that also means I can call ctime() concurrently from two threads.

If a function is thread-safe, you can call it concurrently with other thread-safe functions, including itself, but that doesn't exempt you from following other rules for the function. For example, mbrtowc has a rule that it's only safe if you don't pass a null pst argument. The rule for exit is that you can't call it more than once. It says very explicitly in the specification that this is undefined. I believe that's unfortunate and wrong/unintended carry-over from old versions of the specification where it meant "undefined if you call it recursively". But it's what's there, and *as written*, it means concurrent calls are UB.
Comment 6 Adhemerval Zanella 2024-07-22 18:22:54 UTC
The main issue is that the current implementation aims to support calling atexit/exit from atexit handler, which requires releasing the lock on each atexit callback.  This will exit being called more than once, thus triggering UB.  We do not have a proper way to specify that some functions should not be called from an atexit handler.

There is another minor issue where __libc_start_main adds _dl_fini on atexit, but I think this should be easy to change.

So I think it would be QoI to allow calling exit from multiple threads, but I am fully sure about the implication of current guarantees. Maybe we can use the same strategy we use to fix the pthread_atfork, as done by 52a103e237329b9f88a28513fe7506ffc3bd8ced, where we use a generation counter to keep track of the current handler.  It would probably need to use a DYNARRAY instead of a list, as for atfork handlers.
Comment 7 Rich Felker 2024-07-22 18:25:00 UTC
> Without being allowed to do concurrent calls to "exit", it becomes basically impossible to call "exit" in any situation where there may be other threads running and I don't have tight control over what they do. In other words, a library may never call "exit" (or it has to document exactly when which of its functions may exit, so that the rest of the program can take this into account),

A library which calls exit is *functionally buggy* and library-unsafe (destructive to application state that does not belong to it) regardless of whether the behavior is undefined. So this is a bug which should be fixed in the library.

> and a binary can only call "exit" when there is just a single thread running or we carefully coordinate that only one thread may be calling "exit". That all seems completely impractical.

It's generally not useful for an application to have multiple paths that can lead to exit concurrently (usually means you have sloppy error handling and aren't reporting errors to the part of the program that needs to know about them but just throwing up your hands and bailing out) but if you want to do this, since it's a single application and you can have all the internal contracts you want, you're always free to put your own global lock around the calls to exit. This is the way you write code that is portable to all existing implementations without depending on the standard being fixed.
Comment 8 Rich Felker 2024-07-22 18:28:30 UTC
> Maybe we can use the same strategy we use to fix the pthread_atfork... where we use a generation counter...

Nothing like that is needed. Making exit safe against concurrent calls is as simple as adding at the top of the function:

static pthread_mutex_t exit_mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_lock(&exit_mutex);

or equivalent with a lighter implementation-internal lock.

Recursive calls will deadlock, which is fine because that was always intended to be unsupported and undefined behavior. Two threads happening to call exit() at the same time will just result in one blocking until the other finishes exiting, which is the only reasonable behavior here.
Comment 9 Ralf Jung 2024-07-22 19:08:11 UTC
> The main issue is that the current implementation aims to support calling
> atexit/exit from atexit handler

Wait, but calling exit from an atexit handler is UB...?

> We do not have a proper way to specify that some functions should
> not be called from an atexit handler.

Why is this any harder than saying "exit may not be called inside an atexit handler"?

> A library which calls exit is *functionally buggy* and library-unsafe
> (destructive to application state that does not belong to it) regardless
> of whether the behavior is undefined. So this is a bug which should be
> fixed in the library.

> It's generally not useful for an application to have multiple paths
> that can lead to exit concurrently

That may be the case, but elevating such a bug to full-blown Undefined Behavior (nasal demons included) is still a pretty severe problem, IMO.

It is generally very useful to exclude UB as a possible source of bugs and focus on more well-behaved ways that programs can go wrong.

> free to put your own global lock around the calls to exit

Yeah, Rust does that now. But in applications that mix languages, this doesn't fully mitigate the UB.

I have filed this as an issue with POSIX: https://austingroupbugs.net/view.php?id=1845. Ultimately this will probably have to go via C itself, but I won't pursue that path.
Comment 10 Adhemerval Zanella 2024-07-22 19:23:13 UTC
(In reply to Rich Felker from comment #8)
> > Maybe we can use the same strategy we use to fix the pthread_atfork... where we use a generation counter...
> 
> Nothing like that is needed. Making exit safe against concurrent calls is as
> simple as adding at the top of the function:
> 
> static pthread_mutex_t exit_mutex = PTHREAD_MUTEX_INITIALIZER;
> pthread_mutex_lock(&exit_mutex);
> 
> or equivalent with a lighter implementation-internal lock.
> 
> Recursive calls will deadlock, which is fine because that was always
> intended to be unsupported and undefined behavior. Two threads happening to
> call exit() at the same time will just result in one blocking until the
> other finishes exiting, which is the only reasonable behavior here.

I tend to agree, I was trying to avoid this big hammer mainly deadlocks is a PITA to debug, but it does seem the only sane solution.
Comment 11 Adhemerval Zanella 2024-07-22 19:29:17 UTC
(In reply to Ralf Jung from comment #9)
> > The main issue is that the current implementation aims to support calling
> > atexit/exit from atexit handler
> 
> Wait, but calling exit from an atexit handler is UB...?

That is my understanding, as Rich has put it as well, that recursive calls is undefined behavior.

> 
> > We do not have a proper way to specify that some functions should
> > not be called from an atexit handler.
> 
> Why is this any harder than saying "exit may not be called inside an atexit
> handler"?

It is not, but this is another 'class' of functions that runtime will have special semantics depending on the context they are called.  I don't have a strong opinion in fact, but I tend to agree with Rich approach.
Comment 12 Ralf Jung 2024-07-22 21:11:30 UTC
> The main issue is that the current implementation aims to support calling
> atexit/exit from atexit handler

So glibc is explicitly trying to support this despite it being UB, but it makes other patterns that are much more likely to happen in practice UB in the process? That seems odd, not sure if I am understanding this correctly.

> It is not, but this is another 'class' of functions that runtime will have special semantics depending on the context they are called.  I don't have a strong opinion in fact, but I tend to agree with Rich approach.

Sure, if we can just say this is defined to deadlock that's even better. I doubt that POSIX/C want to do that though, given that this was never legal. It is much easier to make the argument that the move to concurrency should have come with a change to the "exit" docs permitting concurrent use.
Comment 13 Rich Felker 2024-07-22 21:24:38 UTC
> The main issue is that the current implementation aims to support calling atexit/exit from atexit handler...

If this is actually a goal, you can achieve it with my solution just by making the mutex recursive. Then reentry via atexit handlers in the same thread would succeed in taking the lock but other threads would still block waiting for the first-arrived to finish calling atexit handlers and exiting.
Comment 14 Adhemerval Zanella 2024-07-23 17:52:59 UTC
(In reply to Rich Felker from comment #8)
> > Maybe we can use the same strategy we use to fix the pthread_atfork... where we use a generation counter...
> 
> Nothing like that is needed. Making exit safe against concurrent calls is as
> simple as adding at the top of the function:
> 
> static pthread_mutex_t exit_mutex = PTHREAD_MUTEX_INITIALIZER;
> pthread_mutex_lock(&exit_mutex);
> 
> or equivalent with a lighter implementation-internal lock.
> 
> Recursive calls will deadlock, which is fine because that was always
> intended to be unsupported and undefined behavior. Two threads happening to
> call exit() at the same time will just result in one blocking until the
> other finishes exiting, which is the only reasonable behavior here.

So I experimented with this straightforward solution and the main issue is the current solution to set the ELF destructors (_dl_fini) is by registering it on program startup at:

csu/libc-start.c
310   /* Register the destructor of the dynamic linker if there is any.  */
311   if (__glibc_likely (rtld_fini != NULL))
312     __cxa_atexit ((void (*) (void *)) rtld_fini, NULL, NULL);

It means that exit called in destructors will now deadlock.  I tried to avoid this issue by moving the _dl_fini to outside of atexit loop, so it is called with __exit_funcs_lock unlocked. This triggered another issue, where the test sysdeps/unix/sysv/linux/tst-rseq-nptl.c assumes that atexit handles can be registered on ELF destructors (it is true for glibc, I am not sure about C/POSIX standard).

So I think for glibc, making exit() safe to be called concurrently would mean changing current support; at least by not allowing an atexit on ELF destructors.
Comment 15 Rich Felker 2024-07-23 18:46:38 UTC
A new language-lawyering plot twist:

I'm not sure we can actually treat multiple/recursive calls to exit as UB, because exit is used internally to implement return-from-main, but I can't actually find anywhere the C standard specifies that returning from main behaves "as if you called exit". The behaviors are specified to be the same, but the language about calling exit "more than once" does not seem to apply if you return from main and one of the atexit handlers then calls exit, since, formally, exit was only called once.
Comment 16 Rich Felker 2024-07-23 19:04:45 UTC
OK, it looks like 5.1.2.2.3 Program termination covers that, at least in the case where standard declaration of main is in effect:

"If the return type of the main function is a type compatible with int, a return from the initial call to the main function is equivalent to calling the exit function with the value returned by the main function as its argument;11) reaching the } that terminates the main function returns a value of 0. If the return type is not compatible with int, the termination status returned to the host environment is unspecified."

Hilariously, it does not even seem to define whether atexit functions get called at all on an implementation that allows "void main();" or similar.

But I think at least this does give us license to treat "call to exit from atexit handler after return from main" as calling exit "more than once", thereby UB.
Comment 17 Adhemerval Zanella 2024-07-23 20:35:53 UTC
For glibc side, it seems that Rich's strategy to use a recursive lock works without any regressions. I am still not fully sure how to handle exit calls from atexit() since the current implementation does allow it (the call will keep execution from the point where exit() was interrupted). 

I am inclined to just keep current behavior instead of trapping or aborting.
Comment 18 Adhemerval Zanella 2024-07-25 19:05:28 UTC
I sent a patch to review [1].  It follows Rich suggestion, but for glibc I used a recursive lock to support exit called by handlers/destructors (since it seems the current semantic, which is exercised by some test cases).

[1] https://sourceware.org/pipermail/libc-alpha/2024-July/158579.html
Comment 19 Ralf Jung 2024-07-26 12:41:09 UTC
That patch looks great (from my very distant point of view), thanks a lot :)
Comment 20 Adhemerval Zanella 2024-07-30 14:00:10 UTC
Fixed on 2.41.