This is the mail archive of the cygwin@sourceware.cygnus.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: ASCII and BINARY files. Why?


Jim Balter writes:
> Fran Litterio wrote:
> > 
> > Jim Balter wrote:
> > 
> > >unix deals with byte streams, and there are many tools for
> > >manipulating them, rather than having systems that think
> > >they know what they are doing deleting every byte after a ^Z
> > >and destroying valuable work.
> > 
> > Yes.  I am now completely convinced that gnu-win32 should switch to an
> > all-binary-all-the-time scheme.  read() should not convert CRNL to NL
> > (nor write() do the reverse).  cat should not have implicit knowledge of
> > what a ^Z means (i.e., nothing under UNIX).  The gnu-win32 DLL should
> > probably even be made recognize a ^D typed on the keyboard (not coming
> > down a pipe) to mean end-of-file.
> 
> It would be nice if it can be done, but since this only a matter of what
> humans type, it does not break anything (other than possibly some
> existing documentation) to require people to type ^Z instead of ^D.  Of
> course, if it is done, it must be done right; ^D's should *only* be
> looked at when coming from a keyboard, nowhere else, and they cause a
> read() to return exactly as the ENter key does but without returning the
> ^D, so that it is possible to enter terminator-less lines from the
> keyboard (e.g., abc^D reads as abc with no newline at the end).

The unix world seems to do it like this:
.  Input from a non-tty should never treat _any_ character as EOF, only the
   physical end-of-file is noticed in this way
.  Any character seems to be assignable to EOF using the stty command.  ^D
   is the usual setting, but on Linux even 'q' can be the eof character.

   (Bash on Linux won't pay any attention to 'q' as the eof character on
   the commandline, but cat understood the setting.  Setting eof to ^]
   had an effect both in bash and in cat)

Once stty is more capable on gnu-win32, we could allow users to choose
between ^D, ^Z or anything else they wanted.  On input from a non-tty,
there is no EOF character.

I'm not ready to tackle the other, larger issues of binary vs. text files,
but I think that ^Z == EOF isn't a biggie, unless someone replies with a
good reason we want to treat ^Z == EOF when taking input from anywhere but
a tty.

Uh-oh, I'm going to state my position of BvsT anyway.

Here's what we should do:  Have a list of questions the library asks itself
to determine if the file should have text translation performed on it.
It tries the steps in order until it comes to a conclusion about the
particular file:

1. If the filename begins bin: or ends :bin, no (with : and bin stripped)
2. if the filename begins txt: or ends :txt, yes (with : and txt stripped)
3. If 'b' is specified in the 'fopen' second parameter, or O_BINARY is
   specified to open, no
4. If 't' is specified in the 'fopen' second parameter, or O_TRANS (O_TXT?)
   is specified to open, yes (isn't this currently undefined by relevant
   standards, so we could define it in any way we wished?)
5. If the filesystem is mounted -b, no
6. If the filesystem is mounted -t, yes
7. if the filename is matched by a glob in $BIN, no
	(default = *.com;*.exe;*.sys;*.dll;*.o;*.a;*.bin;*.tar;*.gz;*.tgz)
8. If the filename is matched by a glob in $TXT, yes
	(default = *.txt;*.doc;README;read.me;readme.1st)
9. Otherwise, no

One can quibble about the order, and whether #4 should be there at all.
Environment variable names can also be debated.  However, it has the
following features:
	. If the user knows better than the application, the user can
	always have his way, using bin: or txt:
	
	. The user can get an ultimate default of text mode by including *
	as one of the globs in $TXT, so don't bug me that my decision for
	#9 is wrong

	. Use of extension-based CRLF translation is currently used in the
	Linux FAT filesystem optionally; I find it pleasant to have most of
	the time.

Perhaps there should be an herustic somewhere in here that says 'if
<CR><LF> in first block of file, open in text mode' (step 8.5?) but I don't
want to get bit if I'm unlucky enough to get output like that from, say, a
nifty new compressor whose extension I didn't add to $BIN yet.

What currently happens when we fseek/lseek on a program not opened in
binary mode?

Jeff
-- 
\/ jepler@inetnebr.com http://incolor.inetnebr.com/jepler/ (0|1(01*0)*1)+
-
For help on using this list, send a message to
"gnu-win32-request@cygnus.com" with one line of text: "help".


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]