1.5.24: incorrect default behavior of dd in popen context on text-mounted filesystem

Eric Blake ebb9@byu.net
Wed Jul 25 15:29:00 GMT 2007

 <hsw <at> hodain.net> writes:

> 1) In the Cygwin User's Guide, page 33:
>     c. Pipes and non-file devices are opened in binary mode, except if
>     the CYGWIN environment variable contains nobinmode.
> 	Warning!
> 	In b20.1 of 12/98, a file will be opened in binary mode if any
> 	of the following conditions hold:

This documentation is rather old, so it must be read with a grain of salt.

> 	1. binary mode is specified in the open call
> 	2. the filename is a MS-DOS filename
> 	3. the file resides on a binary mounted partition
> 	4. CYGWIN contains binmode

In particular, CYGWIN defaults to binmode, but binmode/nobinmode only affects 
non-disk files (ie. pipes, special devices) - it has no bearing on disk files, 
since that is what mount is for.

> 	5. the file is not a disk file

In other words, 4 and 5 should be merged into a single condition.

>     d. When redirecting, the Cygwin shells uses rules (a-e) [sic]. For
>     these shells the relevant value of CYGWIN is that at the time the
>     shell was launched and not that at the time the program is
>     executed.
> My reading of this says that I should expect dd to use binary mode on
> its input and output files.  And I should expect that stdin and stdout
> from shell-launched programs will be in binmode, so that
> popen("gzip|dd>file", "w") will use binmode.  Please explain if my
> interpretation is incorrect.

popen invokes the shell, with the shell's stdout inherited from your current 
process's stdout, and with the shell's stdin being set to the other end of your 
pipe.  The "w" in the popen implies that CYGWIN=binmode is consulted for how 
the pipe behaves, and unless you changed the default CYGWIN settings (which I 
doubt), that means the shell's stdin is binary (whereas using "wt" forces text 
mode, although that is unusual on pipes, and using "wb" forces binary mode).  
As the shell command will not be writing to stdout, it is irrelevant whether 
the shell's inherited stdout was text or binary.

The shell then spawns two processes.  gzip's process is given a pipe as stdout, 
and based on the CYGWIN=binmode default, it is binary; gzip's stdin is 
inherited from the shell, which means it is still binary.  (The alternative 
command, popen("gzip > file", "w"), was a case where gzip-1.3.12-1 used text 
mode stdout, but gzip-1.3.12-2 correctly uses binary mode; it differs from the 
case in question based on stdout being a file rather than a pipe).

The other process is dd, where stdin is a pipe (again, the CYGWIN=binmode 
default means it is binary).  But dd's process is given a redirection to 'file' 
as its stdout.  And since 'file' is a disk file, mount point rules take 
effect.  Therefore, dd defaults to opening 'file' in text mode, unless dd takes 
extra pains to force binary mode.

Presently, dd from both coreutils 6.9-3 and 6.9-4 leaves the mode of stdout 
unchanged.  It only worries about explicitly (re-)setting the mode if you 
specify if= or of=, or if you use iflag= or oflag=.  And since you did neither, 
your example results in dd doing text-mode output.

On the other hand, popen("gzip|dd of=file", "w") makes dd, not the shell, 
responsible for opening 'file'.  In that case, dd in 6.9-3 uses textmode (due 
to a bug in my code that tried to default to binmode), and in 6.9-4 uses 
binmode (as I had always intended).

> 2) In http://cygwin.com/ml/cygwin/2007-07/msg00610.html Eric wrote:
>     [io]f= unspecified - no change to existing mode of std{in,out}
> My understanding is that the "existing mode" of stdin/stdout will be
> binary (given what the User's Guide says), so it appeared to me that
> dd was actively changing stdout back to text....

No, dd did not actively change stdout to text.  Rather, stdout was already text 
when dd started, and dd did nothing about it because you did not specify oflag=.

As I said before, I am still debating whether, in coreutils 6.9-5 (or 6.10-1, 
if upstream releases soon enough), dd will actively force binary even when of= 
is not specified, when oflag= is not specified.  It is doable (and is a one-
line patch), but I have not convinced myself that the change is worth it yet.  
On the other hand, since you seem to be so confused about the current dd 
behavior, that is an argument in favor of making the change.

> 3) In http://cygwin.com/ml/cygwin/2007-07/msg00610.html Eric wrote:
>     ....  Look for coreutils 6.9-4 coming soon to a mirror near you,
>     with dd once again defaulting to full binary operation.
> which I took to be further confirmation of dd using binary mode unless
> otherwise specified.  I understand now that you meant this comment to
> apply only to files that dd opens, but I took this as a more blanket
> statement and further confirmation that dd does whatever in needs to
> do to implement the User's Guide spec.

OK, so I probably could have been more careful in the wording of my release 

> So I'm trying to understand the state of things.  AFAICT, the spec in
> the User's Guide must not be being honored by popen().  Is that the
> case?  Otherwise, why would dd's stdout in popen("gzip|dd>file", "w")
> suffer text-mode modification?

See above.

>  And why could gzip be patched to fix
> things?

Any process can call setmode() to explicitly change the text/binary mode of an 
already open fd, it's just that most upstream packages don't do this, so the 
cygwin maintainers have to add it in as a cygwin-specific patch.  Basically, 
that's what cgf did in between gzip 1.3.12-1 and 1.3.12-2.  And that's what I 
do in dd, when oflag= but not of= is specified.

>  And why can Eric "be open" to the idea of changing dd's
> behavior in this case?  

Because I maintain the cygwin port of coreutils, because I read this list, and 
because I use cygwin on a daily basis.  In other words, given enough user 
feedback, or a personal usage scenario that I can't solve in any other way, I 
am very prone to patching coreutils to do the right thing, even if it means 
diverging even further from the official upstream package and putting more 
maintenance burden on myself.

> I don't know much about Cygwin internals.  Is there a bug in popen()?

No.  (There used to be, prior to Aug 2006, but I fixed that in newlib).

> Or, is it the case that each executable is responsible for ensuring
> that it honors the shell redirecting specifications in the User's
> Guide?

In the case of fds inherited across exec (such as stdin, stdout, and stderr), 
each process defaults to whatever the parent process gave it, unless it takes 
pains to do things differently (such as calling setmode() explicitly).  If I 
understand it correctly, linking in binmode.o affects all fds that the process 
later opens, but does not automatically re-orient inherited fds.

>  Or is my reading of the User's Guide incorrect?

Entirely plausible, but more likely it is because that section of the User's 
Guide is outdated and needs some TLC to bring it up-to-date.

> Thanks again for your responsiveness.
> -Hugh

Eric Blake

Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

More information about the Cygwin mailing list