Bug 14697

Summary: Behavior of exit is nonconformant with respect to threads and stdio
Product: glibc Reporter: Rich Felker <bugdal>
Component: nptlAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal CC: drepper.fsp, fweimer, gabravier
Priority: P2 Flags: fweimer: security-
Version: unspecified   
Target Milestone: 2.38   
See Also: https://sourceware.org/bugzilla/show_bug.cgi?id=15142
Host: Target:
Build: Last reconfirmed:

Description Rich Felker 2012-10-10 18:04:29 UTC
Consider the following program:

#include <pthread.h>
#include <stdio.h>
#include <semaphore.h>
#include <stdlib.h>

void *f(void *p) { flockfile(stdin); sem_post(p); for (;;) pause(); }

int main()
{
    sem_t sem;
    sem_init(&sem, 0, 0);
    pthread_create(&(pthread_t){0}, 0, f, &sem);
    while (sem_wait(&sem));
    exit(0);
}

Per Austin Group interpretation for issue #611 (http://austingroupbugs.net/view.php?id=611), this program should deadlock in exit waiting for the lock it will never obtain. Under glibc/NPTL, it exits immediately.

If you'd like to make the example more interesting, you could have the thread wake up after 5 seconds and unlock stdin; in that case, the program should run for at least 5 seconds, rather than exiting immediately.

To make it even more interesting, have the thread performing a long-running write operation that's intended to be atomic with respect to other threads and also with respect to program termination (such that on normal program termination, either the entire write happened, or no write happened at all).

This bug is due to intentional hackery in glibc to avoid hanging on exit() due to locks being held by other threads, under the wrong assumption that exit() "should" immediately exit in this case. There is no language in the standards to support what glibc is doing.
Comment 1 Rich Felker 2014-10-06 17:51:29 UTC
It seems that this bug can also result in more serious corruption such as duplicate output, even without any explicit file locking. See the example in this question on Stack Overflow, which is a perfectly valid program producing incorrect output:

http://stackoverflow.com/questions/26211423/unexpected-output-in-a-multithreaded-program

Of course the program has unpredictable output, but there is a finite set of outputs it can produce on a correct implementation: different interleavings of the lines, and different cutoffs for the number of lines produced by sample_thread. A real-world example where this could easily happen is writing a log file using stdio.
Comment 2 Andreas Schwab 2019-11-25 15:27:59 UTC
Note that nptl/tst-stdio1 expects this behaviour.
Comment 3 Rich Felker 2019-11-25 15:55:46 UTC
I'm not sure if this tracker entry is the best place to raise it, but since you mention tests, I think it would be worth considering policy around tests asserting behavior that's already been determined to be buggy/erroneous. Even without a fix for the behavior, the test demanding it could be removed or changed to assert the opposite.
Comment 4 Florian Weimer 2023-06-06 09:43:30 UTC
(In reply to Andreas Schwab from comment #2)
> Note that nptl/tst-stdio1 expects this behaviour.

I posted patches to fix the test:

[PATCH 1/2] support: Add delayed__exit (with two underscores)
<https://sourceware.org/pipermail/libc-alpha/2023-June/148829.html>

[PATCH 2/2] pthreads: Use _exit to terminate the tst-stdio1 test
<https://sourceware.org/pipermail/libc-alpha/2023-June/148830.html>

I wonder if we should add a new symbol version for exit, so that the behavior only changes for new applications. We did that for quick_exit. (Otherwise we may have to add a tunable to bring back the old behavior—or we may have to do that anyway.)
Comment 5 Sourceware Commits 2023-07-03 08:03:52 UTC
The master branch has been updated by Andreas Schwab <schwab@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=af130d27099651e0d27b2cf2cfb44dafd6fe8a26

commit af130d27099651e0d27b2cf2cfb44dafd6fe8a26
Author: Andreas Schwab <schwab@suse.de>
Date:   Tue Jan 30 10:16:00 2018 +0100

    Always do locking when accessing streams (bug 15142, bug 14697)
    
    Now that abort no longer calls fflush there is no reason to avoid locking
    the stdio streams anywhere.  This fixes a conformance issue and potential
    heap corruption during exit.
Comment 6 Florian Weimer 2023-07-03 09:07:52 UTC
Fixed for 2.38.