Why text=binary mounts

Gary R. Van Sickle tiberius@braemarinc.com
Fri Jan 9 18:22:00 GMT 1998


>> I'm not sure how to do it though. One could just change the text mode.
That would be o.k. for me but I'm not sure everybody would be happy with
that. Another thought would be to invent another mode like "extended
text mode" e.g. with an fopen() specifier "T", an open() flag O_ETEXT
and an iostream mode ios::etext that could implement this. That way one
could port tools to this mode by simply adding the flags just like you
port binary tools now by adding O_BINARY, "b" or ios::bin.

Does anbody else have an opinion on that problem?

The real problem here is that files as they exist on disk don't have 
'modes', they have formats.  Adding 'modes' to a system that really doesn't 
work already will only make the situation worse.

What I think is really needed is a Text Access Library (TAL) that sits *on 
top* of a *binary* stdio file and reads and writes lines from UNIX, DOS, 
Mac, maybe HTML, etc., etc., text files.  Instead of fopen(???, "rt"), 
you'd use the library and then *not care* what the text file format is, 
only that it contains lines of text.  This TAL would become part of the 
standard C library (or the GNU library at least, which would make it a 
de-facto standard), all the tools that were dealing with text would use it, 
and eventually the "t" functionality of stdio would be deprecated and the 
problem would be solved.

I volunteer to write this library if someone else volunteers to get the GNU 
tools to use it.  I propose the following features:

1. Written in portable ANSI C (no K&R compilers need apply)
2. Provides all the fscanf, fprintf, etc. (i.e. line reading and writing) 
functionality for text-containing files only
3. Provides some extended, cool features TBD
4. Reads and writes at least UNIX, DOS, and Mac, with maybe HTML, etc. 
formats coming later
5. Operates kind of in this wise:  Opens for reading any supported format 
and they behave the same (i.e. 'read line', 'read next char', all retrieve 
the same text regardless of format),  writes in the format selected by the 
programmer (i.e. the fopen equivalent would require a format specifier if a 
file is opened for write)

Gary R. Van Sickle (tiberius@braemarinc.com)
Electrical Design Engineer
Braemar Inc.
11481 Rupp Dr.
Burnsville, MN 55337
(612) 890-5135 Ext. 144
Fax: (612) 882-6550


-----Original Message-----
From:	Benjamin Riefenstahl [SMTP:benny@crocodial.de]
Sent:	Friday, January 09, 1998 6:51 AM
To:	gnu-win32@cygnus.com
Subject:	Re: Why text=binary mounts

Hi All,


I'm new here so please forgive if I'm missing something. I also have not
yet a lot of experience with gnu-win32. I do have some experience with
porting C and C++ and with the rules of these languages and how they
affect porting. So this post that I'm replying to got my attention.


marcus@bighorn.dr.lucent.com wrote:
> This is true as long as you are considering text files only.  The problem
> comes in when you also want to deal with binary files.  On Unix systems,
> of course, there is no difference in operations on either, so most Unix
> programs open all files using the same open() or fopen() calls.  On 
systems
> that differentiate between these files, it is important to add O_BIARY or
> O_TEXT to the second argument of open(), and "b" for binary files to the
> second argument of fopen().  This tells the underlying routines whether 
to
> apply any translation to the file.

So far I agree.

> If nothing is specified, the OS must
> choose whether or not to make translations, and that is where the 
text=/!=
> binary mounting comes in, as this specifies the default mode.

No. At least for fopen() there is no choice. If you don't specify "b"
you get text mode and that's that. An application that opens a binary
file without the "b" has a bug. I don't think that fiddling with this
(like "binary" mounts) actually helps. Fix the buggy source code
instead, that seems to me is bound to be *much* more efficient in terms
of developer and user time spent on the problem. BTW on DOS-like systems
(DOS, Windows, OS/2) the RTL does the translation, not the OS. The OS
just sets the guidelines how text should be represented and of course
the OS tools enforce these guidelines.

> Now, there are some difficulties in this implementation.  First, since 
there
> is no "t" that can be passed to fopen(), it is impossible to tell if a 
call
> to fopen() wants a text mode open, or the default (blame POSIX/ANSI for 
that,
> I guess).

See above. The default is unambigously specified as text mode by the ISO
C language standard.

> ... However, if there exist Unix programs that call fopen() without
> the "b" for binary files (since it isn't needed on Unix and was added to 
the
> standard much later than the program may have been written), then these
> programs won't run correctly without some additional porting effort.

I'd prefer to invest a little time in porting the code instead of
investing a lot of time in users tweaking their system.

> The
> same goes for programs that call open() without the O_BINARY bit set in 
the
> second argument when opening binary files.

Being that open() is a Unix call and Unix doesn't have the distinction
between text and binary, it can be argued that the rules for Unix
compatibility libraries can be made whatever one wants. It has been
common practice though - and with good reason - to go by the same rules
as C and C++ go with fopen() and iostreams: The default is text mode and
you need the extra O_BINARY flag to get binary mode. This is done this
way in all compilers that I know.

> To compound this, there are times when it is extremely difficult to 
impossible
> to tell if a file should be opened as text or binary.  For instance, 
should
> TAR open the files that it is writing to an archive as binary or text 
files?
> How can it determine which to use?

Some applications have a design problem here. AFAIK most ports that are
designed for this allow the user to specify that all operations are to
be done in binary, which is what I prefer always. I can always convert
DOS text files to Unix text and back again. I can not convert a garbled
binary file back to it's original form.

> Sure, it's fun to play with cygwin32, but to me it doesn't seem 
reasonable to
> try to develop it as a Linux replacement.  I think that if it is to be 
truely
> useful, cygwin32 must encourage interoperating with the native world that 
it
> exists in.  Part of that is running well in a text!=binary mounted world.
> Sure, that means that porting programs to Cygwin32 means that you have to
> install an awareness of binary v.s. text files, and that does mean more 
work
> to port the programs, but it also produces more useful programs as well.

Here we agree again ;-)


Let me add another nit to the problem. I am actually using not only Unix
and DOS but also Mac files. This means another variation in line ends:
Unix uses <LF>, DOS uses <CR><LF> and Macs use <CR>. In my world these
are the prominent formats and most of my tools (editors, compilers and
other commercial tools) agree with that.

In DOS the translation for text mode works rather simple: On input all
<CR><LF> combinations are replaced by <LF> and on output all <LF> are
replaced with <CR><LF>. This means not only that DOS files are read
correctly but also that Unix files are automatically read correctly. The
coincidence is rather usefull, because in most simple tools one rarely
ever needs to translate explicitly from Unix to DOS, most DOS tools get
along with Unix files just fine.

For my own programs I often implement an extension to this behaviour.
Instead of only treating only <LF> and <CR><LF> as line ends I also
treat single <CR> the same. This means I loose the ability to use singe
<CR>s for formatting but than the only files thus formatted that I have
those are intended directly for a line printer. OTOH as I said I often
have Mac files and with this arrangement these are read correctly.

For my own programs this is done easy enough but when porting tools from
Unix it's a lot more diffcult. Porting Unix tools to this mode would be
a lot easier if this behaviour could be somehow included in the RTL
itself (like ordinary text mode is now).

I'm not sure how to do it though. One could just change the text mode.
That would be o.k. for me but I'm not sure everybody would be happy with
that. Another thought would be to invent another mode like "extended
text mode" e.g. with an fopen() specifier "T", an open() flag O_ETEXT
and an iostream mode ios::etext that could implement this. That way one
could port tools to this mode by simply adding the flags just like you
port binary tools now by adding O_BINARY, "b" or ios::bin.

Does anbody else have an opinion on that problem?


so long, benny
======================================
Benjamin Riefenstahl (benny@crocodial.de)
Crocodial Communications EntwicklungsGmbH
Ophagen 16a, D-20257 Hamburg, Germany
-
For help on using this list (especially unsubscribing), send a message to
"gnu-win32-request@cygnus.com" with one line of text: "help".

-
For help on using this list (especially unsubscribing), send a message to
"gnu-win32-request@cygnus.com" with one line of text: "help".



More information about the Cygwin mailing list