Why text=binary mounts

Gary R. Van Sickle tiberius@braemarinc.com
Thu Jan 8 17:20:00 GMT 1998


This whole UNIX/DOS/text/binary situation drives me nuts.  Why can't this 
problem be solved once and for all by everybody for all time?  We are 
talking about one '\r', for crissake.  What's wrong with this solution:

1. If your program is opening a file that you want to get lines of text 
from (eg a compiler opening a source file), give fopen a "t"
2. If your program is opening a file that you want the 'binary image of' 
(eg TAR opening its input files), give fopen a "b"
3. Any crusty old program that doesn't conform to 1 & 2 gets fixed, 
replaced, or canned
4. fread, fgets, fgetc, etc get written so that when used on a "t" mode 
file, they strip out '\r's before a '\n' and any ctrl-z at the end of the 
file.
5. fopen is written so that you *must* give it a "b" or "t" or it abort()s. 
 This weeds out the crusty old programs mentioned in 3.  (I know it isn't 
ANSI.  What have they done for us lately? :) )
6. cat to the screen or a printer is binary.  Someone writes a filter to 
convert from text to a format which will look right on the screen or 
printer and you have to 'cat stdout << filter << textfile.txt'.  (I'm 
obviously not up on my UNIX so please forgive me of this is laughably 
wrong)

With this solution you have two equally valid text file formats, one with 
\n indicating end-of-line, one with \r\n indicating EOL and ctrl-z possibly 
indicating EOF.  To the program reading lines of text, they both look the 
same.  To the program not reading lines of text, they don't care what the 
file looks like, and they get the whole 'binary image'.  No 'mount mode' is 
needed.

Let me address one sure-to-come-up complaint right now: the notion that it 
would be too much work to 'fix' all the existing code.  How much time and 
effort is wasted on 'working around' the current situation?  Certainly more 
time than it would take to search-and-replace "w" with "wt", etc.

Gary R. Van Sickle (tiberius@braemarinc.com)
Electrical Design Engineer
Braemar Inc.
11481 Rupp Dr.
Burnsville, MN 55337
(612) 890-5135 Ext. 144
Fax: (612) 882-6550


-----Original Message-----
From:	marcus@bighorn.dr.lucent.com [SMTP:marcus@bighorn.dr.lucent.com]
Sent:	Thursday, January 08, 1998 10:29 AM
To:	gnu-win32@cygnus.com
Subject:	Re: Why text=binary mounts

Jeff Fried writes:
> Porting code from Unix to the PC should NOT require the same line
> termination mode since most Unix code which reads text uses fread/getc
> which automatically handle the end-of-line.  And from the replies of most
> people i would argue that most of us would prefer to work in the native
> mode of the operating system in which we are running rather than having 
to
> constantly convert files between the two models simply because we use 
tools
> from both operating systems under NT/95.  For examples of this
> compatibility look at many of the GNU tools which handle text, the file
> handling will work under both operating systems without any change 
because
> they use text mode I/O which is platform independent once all files have
> been converted to the form of the native OS.

This is true as long as you are considering text files only.  The problem
comes in when you also want to deal with binary files.  On Unix systems,
of course, there is no difference in operations on either, so most Unix
programs open all files using the same open() or fopen() calls.  On systems
that differentiate between these files, it is important to add O_BIARY or
O_TEXT to the second argument of open(), and "b" for binary files to the
second argument of fopen().  This tells the underlying routines whether to
apply any translation to the file.  If nothing is specified, the OS must
choose whether or not to make translations, and that is where the text=/!=
binary mounting comes in, as this specifies the default mode.

Now, there are some difficulties in this implementation.  First, since 
there
is no "t" that can be passed to fopen(), it is impossible to tell if a call
to fopen() wants a text mode open, or the default (blame POSIX/ANSI for 
that,
I guess).  If you know that all programs have conciously made a choice 
about
things, there would not be any need for a default, so we could assume that
the fopen() without a "b" wants a text mode open and mount things as
text!=binary.  However, if there exist Unix programs that call fopen() 
without
the "b" for binary files (since it isn't needed on Unix and was added to 
the
standard much later than the program may have been written), then these
programs won't run correctly without some additional porting effort.  The
same goes for programs that call open() without the O_BINARY bit set in the
second argument when opening binary files.

To compound this, there are times when it is extremely difficult to 
impossible
to tell if a file should be opened as text or binary.  For instance, should
TAR open the files that it is writing to an archive as binary or text 
files?
How can it determine which to use?

So, to avoid these issues, many people on this list try to avoid using 
anything
from the Microsoft world (except for NT/95 itself) and use only cygwin32
programs with text=binary so that any file is just like any other file just
like in Unix systems.  Since their text files are marginally exchangable
with other NT/95 users (or other NT/95 applications).  So, it seems to me
that this gives a slow, incomplete, and buggy (well, it is a Beta release!)
emulation of Unix with no advantages over Linux except that their boss has
declared that they must run NT (in true pointy-haired boss fashon).

Sure, it's fun to play with cygwin32, but to me it doesn't seem reasonable 
to
try to develop it as a Linux replacement.  I think that if it is to be 
truely
useful, cygwin32 must encourage interoperating with the native world that 
it
exists in.  Part of that is running well in a text!=binary mounted world.
Sure, that means that porting programs to Cygwin32 means that you have to
install an awareness of binary v.s. text files, and that does mean more 
work
to port the programs, but it also produces more useful programs as well.

This discussion keeps coming up, which I believe supports my feeling that 
it
is a major issue with cygwin32.  I know that the previous iteration I ended
with just agreeing to disagree and I said that I wouldn't say any more in 
it,
but I just wanted to give some support to this side in this iteration and
that'll be it (this time around, at least).

marcus hall


  Unfortunately, there is no "t" that
can be supplied to fopen() to fully disambiguate the three cases that may
occur, so we have the following situation:
-
For help on using this list (especially unsubscribing), send a message to
"gnu-win32-request@cygnus.com" with one line of text: "help".

-
For help on using this list (especially unsubscribing), send a message to
"gnu-win32-request@cygnus.com" with one line of text: "help".



More information about the Cygwin mailing list