Bug 2457

Summary: Allow control of automatic stdio buffering using env variables
Product: glibc Reporter: Pádraig Brady <P>
Component: libcAssignee: Ulrich Drepper <drepper.fsp>
Status: REOPENED ---    
Severity: enhancement CC: carlos, glibc-bugs, ppluzhnikov, zack+srcbugz
Priority: P3 Flags: fweimer: security-
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:

Description Pádraig Brady 2006-03-14 13:00:40 UTC
I was trying today to filter my access.log apache log with some coreutils
and was annoyed by the default buffering applied by glibc.
I was trying to do `tail -f ~/access.log | cut ... | uniq` but I was
only getting output when cut had more than 4K written to stdout.

So how to control this? Well each app could add an extra config
parameter (see grep --line-buffered for example), but this doesn't
seem general, and just requires duplicating both logic and documentation
for each application. What would be ideal IMHO would be to
add the config logic in glibc (which would have to be controlled
with environment variables). There seems to be resitance to that though:
http://sources.redhat.com/ml/bug-glibc/1999-09/msg00041.html

Anyway whether it's implemented in libc or the application (coreutils lib),
I think they should have the same config interface which would
be environment variables with something like the following format:
    BUF_X_=Y
Where X = the fd number
and Y = 0 for unbuffered, 1 for line buffered and >1 for a specific
buffer size.

So for my particular problem I could do:

tail -f ~/access.log | BUF_1_=1 cut ... | uniq
Comment 1 Ulrich Drepper 2006-04-02 17:42:06 UTC
Hell, no.  Programs expect a certain buffer mode and perhaps would work
unexpectedly if this changes.  By setting a mode to unbuffered, for instance,
you can easily DoS a system.  I can think about enough other reasons why this is
a terrible idea.  Programs explicitly must request a buffering scheme so that it
matches the way the program uses the stream.
Comment 2 Zack Weinberg 2018-05-21 17:51:40 UTC
According to https://blog.plover.com/Unix/stdio-buffering-2.html , this feature was in fact added to NetBSD (see also https://mail-index.netbsd.org/tech-userlevel/2015/07/14/msg009247.html et seq) and has not caused problems there.  I think we should reconsider.  Yes, a program (or an entire process tree) could be forced to suffer much greater I/O overhead using this feature, but I don't think it rises to the level of "denial of service", there are other ways for a determined Mallory to get the same effect, and there are obvious positive use cases for e.g. manually overriding the output of a long-running program to be line-buffered even if it's going to a pipe.
Comment 3 Paul Pluzhnikov 2018-05-24 00:09:16 UTC
I agree that this is a debugging facility (not unlike MALLOC_CHECK_), that can be very handy at times.
Comment 4 Carlos O'Donell 2024-08-20 19:57:51 UTC
Came across this today following this social media discussion:
https://fosstodon.org/@b0rk@jvns.ca/112995960127371014

We have glibc tunables today, so we could expose:

GLIBC_TUNABLES=glibc.stdin.buffer.mode=_IONBF;glibc.stdin.buffer.size=1048576

And likewise glibc.stdout.* and glibc.stderr.*

Someone would likely add the support into libio/stdfiles.c (_IO_stdfiles_init), and wire things into libio/genops.c when the buffers are created in _IO_*doallocate replacing the static use of BUFSIZ (8192 bytes) for all the setup.
Comment 5 Zack Weinberg 2024-08-20 20:06:39 UTC
I think we should use the same environment variables and value syntax as NetBSD did for their version of this feature.  See links in comment 2.
Comment 6 Carlos O'Donell 2024-08-21 14:32:51 UTC
(In reply to Zack Weinberg from comment #5)
> I think we should use the same environment variables and value syntax as
> NetBSD did for their version of this feature.  See links in comment 2.

I disagree with this because it creates a difficult to clean environment where environment variables have names that change dynamically based on the file descriptor numbers.

I have a hard objection to this kind of design because of the complexity it imposes on downstream applications trying to clear environments of certain kinds of variables.

I can support having STDBUF, STDBUF1, STDBUF2, and STDBUF3 as compatibiltiy for "all" and stdin, stdout, and stderr, but not more than that. Everything else should go into GLIBC_TUNABLES.