RFE: enable buffering on null-terminated data

Carl Edquist edquist@cs.wisc.edu
Sun Mar 10 20:36:32 GMT 2024


Hi Zack,

This sounds like a potentially useful feature (it'd probably belong with a 
corresponding new buffer mode in setbuf(3)) ...

> Filenames should be passed between utilities in a null-terminated 
> fashion, because the null byte is the only byte that can't appear within 
> one.

Out of curiosity, do you have an example command line for your use case?

> If I want to buffer output data on null bytes, the closest I can get is 
> 'stdbuf --output=0', which doesn't buffer at all. This is pretty 
> inefficient.

I'm just thinking that find(1), for instance, will end up calling write(2) 
exactly once per filename (-print or -print0) if run under stdbuf 
unbuffered, which is the same as you'd get with a corresponding stdbuf 
line-buffered mode (newline or null-terminated).

It seems that where line buffering improves performance over unbuffered is 
when there are several calls to (for example) printf(3) in constructing a 
single line.  find(1), and some filters like grep(1), will write a line at 
a time in unbuffered mode, and thus don't seem to benefit at all from line 
buffering.  On the other hand, cut(1) appears to putchar(3) a byte at a 
time, which in unbuffered mode will (like you say) be pretty inefficient.

So, depending on your use case, a new null-terminated line buffered option 
may or may not actually improve efficiency over unbuffered mode.


You can run your commands under strace like

     stdbuf --output=X  strace -c -ewrite  command ... | ...

to count the number of actual writes for each buffering mode.


Carl


PS, "find -printf" recognizes a '\c' escape to flush the output, in case 
that helps.  So "find -printf '%p\0\c'" would, for instance, already 
behave the same as "stdbuf --output=N  find -print0" with the new stdbuf 
output mode you're suggesting.

(Though again, this doesn't actually seem to be any more efficient than 
running "stdbuf --output=0  find -print0")

On Sun, 10 Mar 2024, Zachary Santer wrote:

> Was "stdbuf feature request - line buffering but for null-terminated data"
>
> See below.
>
> On Sun, Mar 10, 2024 at 5:38 AM Pádraig Brady <P@draigbrady.com> wrote:
>>
>> On 09/03/2024 16:30, Zachary Santer wrote:
>>> 'stdbuf --output=L' will line-buffer the command's output stream.
>>> Pretty useful, but that's looking for newlines. Filenames should be
>>> passed between utilities in a null-terminated fashion, because the
>>> null byte is the only byte that can't appear within one.
>>>
>>> If I want to buffer output data on null bytes, the closest I can get
>>> is 'stdbuf --output=0', which doesn't buffer at all. This is pretty
>>> inefficient.
>>>
>>> 0 means unbuffered, and Z is already taken for, I guess, zebibytes.
>>> --output=N, then?
>>>
>>> Would this require a change to libc implementations, or is it possible now?
>>
>> This does seem like useful functionality,
>> but it would require support for libc implementations first.
>>
>> cheers,
>> Pádraig
>
>


More information about the Libc-alpha mailing list