Bug 15142

Summary: Missing locking in _IO_cleanup
Product: glibc Reporter: Andreas Schwab <schwab>
Component: stdioAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal CC: bugdal, dvyukov, fweimer, ppluzhnikov
Priority: P2 Flags: fweimer: security-
Version: 2.3.4   
Target Milestone: 2.38   
See Also: https://sourceware.org/bugzilla/show_bug.cgi?id=14697
Host: Target:
Build: Last reconfirmed:
Attachments: Testcase

Description Andreas Schwab 2013-02-13 13:12:21 UTC
Created attachment 6870 [details]
Testcase

When _IO_flush_all_lockp is called from _IO_cleanup it doesn't do any locking on _IO_list_all, which races with fopen/fclose from other threads.  This can result in heap corruption.
Comment 1 Rich Felker 2013-02-14 20:12:54 UTC
I have two related issues open on the Austin Group bug tracker:

http://austingroupbugs.net/view.php?id=610
http://austingroupbugs.net/view.php?id=611

Unfortunately, I believe the current glibc behavior of not performing appropriate locking is intentional, so that exit works even when locks would/should block exit. This is contrary to the requirements of the standard and harmful to applications that have expectations on the atomicity/integrity of stdio operations performed under lock.
Comment 2 Andreas Schwab 2014-03-25 09:23:54 UTC
Doesn't seem any recent progress on the issues.
Comment 3 Sourceware Commits 2017-10-05 15:43:39 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  19f82f358670f4b80533156b9edbf81223358bf9 (commit)
      from  91e7cf982d0104f0e71770f5ae8e3faf352dea9f (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=19f82f358670f4b80533156b9edbf81223358bf9

commit 19f82f358670f4b80533156b9edbf81223358bf9
Author: Andreas Schwab <schwab@suse.de>
Date:   Mon Aug 21 16:07:29 2017 +0200

    Always do locking when iterating over list of streams (bug 15142)
    
    _IO_list_all should only be traversed while locking list_all_lock.

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog      |    8 +++++++
 libio/genops.c |   60 ++++++++++++++++---------------------------------------
 2 files changed, 26 insertions(+), 42 deletions(-)
Comment 4 Paul Pluzhnikov 2023-06-04 22:16:03 UTC
*** Bug 30510 has been marked as a duplicate of this bug. ***
Comment 5 Florian Weimer 2023-07-03 09:08:46 UTC
Fixed for 2.38 via:

commit af130d27099651e0d27b2cf2cfb44dafd6fe8a26
Author: Andreas Schwab <schwab@suse.de>
Date:   Tue Jan 30 10:16:00 2018 +0100

    Always do locking when accessing streams (bug 15142, bug 14697)
    
    Now that abort no longer calls fflush there is no reason to avoid locking
    the stdio streams anywhere.  This fixes a conformance issue and potential
    heap corruption during exit.
Comment 6 Dmitry Vyukov 2024-03-13 10:13:50 UTC
We started getting hangs on the following program:

https://github.com/llvm/llvm-project/blob/995d1d114e4e4ff708a03cdb0a975209c6197f9f/compiler-rt/test/tsan/getline_nohang.cpp#L28

Basically just calls a blocking getline in one thread and another thread tries to exit.

Does it mean it's illegal to exit while there any blocking stream calls anywhere in the program?
Comment 7 Florian Weimer 2024-03-13 10:23:23 UTC
(In reply to Dmitry Vyukov from comment #6)
> We started getting hangs on the following program:
> 
> https://github.com/llvm/llvm-project/blob/
> 995d1d114e4e4ff708a03cdb0a975209c6197f9f/compiler-rt/test/tsan/
> getline_nohang.cpp#L28
> 
> Basically just calls a blocking getline in one thread and another thread
> tries to exit.

It's blocking on this:

  FILE *stream = fdopen(fd[0], "r");
  while (1) {
    volatile int res = getline(&line, &size, stream);
    (void)res;
  }

It's not a writable stream, so we could avoid the blocking with a more complex handshake between stdio streams and exit. I'm not sure if it's worth doing that.  We could perhaps add another flag to fopen/fdopen that indicates that the stream should not participate in fflush (NULL) or exit flushing.

For streams which are blocked in writing, POSIX does not really give us a way to make forward progress because we have to flush the unwritten data before exiting.
Comment 8 Dmitry Vyukov 2024-03-13 10:28:47 UTC
> For streams which are blocked in writing, POSIX does not really give us a way to make forward progress because we have to flush the unwritten data before exiting.

Is it really the case for this program?

If a write does not happen before exit (which is the case in any such blocking), then program cannot potentially know the write has even started before fflush/exit, so it cannot possibly expect the write side-effects to be flushed.

What am I missing?

> We could perhaps add another flag to fopen/fdopen that indicates that the stream should not participate in fflush (NULL) or exit flushing.

Should we worry about all of the existing programs that will start hanging?
Comment 9 Florian Weimer 2024-03-13 10:43:04 UTC
(In reply to Dmitry Vyukov from comment #8)
> > For streams which are blocked in writing, POSIX does not really give us a way to make forward progress because we have to flush the unwritten data before exiting.
> 
> Is it really the case for this program?

No, this program does not have any unflushed data to be written, hence my comment about a more complex locking protocol avoiding the issue.

Exit flushing is special and not specified as equivalent to fflush (NULL), so maybe it's sufficient to put read-only streams on a separate list, and flush only writable streams on exit. But it's not clear to me if it's worth making changes here if that only fixes this LLVM test case, and the real-world issues are with applications exiting with pending unwritten data.

> If a write does not happen before exit (which is the case in any such
> blocking), then program cannot potentially know the write has even started
> before fflush/exit, so it cannot possibly expect the write side-effects to
> be flushed.
> 
> What am I missing?

There are cases where we must block according to POSIX. Lack of blocking is observable by another process.

> > We could perhaps add another flag to fopen/fdopen that indicates that the stream should not participate in fflush (NULL) or exit flushing.
> 
> Should we worry about all of the existing programs that will start hanging?

Andreas Schwab wrote this:

“
This has been part of SUSE/openSUSE for several years, and I have not
seen any complaints so far.  It's more likely that you get a crash
during the unlocked access to the streams.
”

<https://inbox.sourceware.org/libc-alpha/mvmr0pptpmm.fsf@suse.de/>

This reduced my worries considerably.