Rationale for line-ending recommendation?

Gary R. Van Sickle g.r.vansickle@att.net
Sun Dec 21 01:26:00 GMT 2008

> From: Spiro Trikaliotis
> Hello Gary,

Hey Spiro,
> * On Fri, Dec 19, 2008 at 03:42:34AM -0600 Gary R. Van Sickle wrote:
> > > From: Spiro Trikaliotis
> [...]
> > > Oh, and Subversion is problematic, too. Because the SVN 
> developers 
> > > decided to handle line endings on their own using libapr, opening 
> > > files in binary mode and reading and writing CR, LF or CR/LF on 
> > > their own,
> > 
> > This is the right way to do things...
> I hope you are ironic here, right?

I wish I were.  At the C runtime level, you have two options when writing
code which deals with files, text and/or otherwise:

1.  Ignore the problem and hope that all C runtimes with which your app may
be linked are able to correctly guess the intent of your fopen() and
correcty adjust the behavior of all the other f*() functions accordingly.
2.  Open all files as "binary" at the C runtime level.  If you're reading
"plain text" files, treat '\n', '\r\n', and '\r' explicitly and equally.  If
you're writing "plain text" files, use the line ending convention of the
platform you're running on.  If you're not dealing with "plain text" files,
limit your interactions with the opened FILE to fread(), fwrite(), and the
seeks (fseek()/fgetpos()/fsetpos()/rewind()).

#1 doesn't work real well, as evidenced by the fact that the calendar says
"21st Century" and we're still expending a considerable amount of effort
debating how many line ending characters can dance of the head of a pin.
That leaves us with #2 as the correct solution.

> > > on Cygwin, SVN is hard-coded to
> > > *nix line endings. This is not nice. Note that this approach will 
> > > also fail badly if you mount parts of your system in 
> textmode, and 
> > > parts in binmode.
> > > 
> > > Because of this, I am using an SVN version which I 
> compiled myself 
> > > with a patched libapr.
> > > 
> > 
> > ...I'm not following this.  Are you talking about the SVN 
> repository, 
> > or the clients, or...?  What's the issue?
> You are right, I was not specific enough. I never tested an 
> SVN server on Cygwin, thus, I can only talk about the client side.
> If you check out some files which are set to CRLF=native, 
> Cygwin's SVN checks them out with LF line ending (as SVN 
> handles the endings on its own, and libapr tells it that 
> Cygwin == Unixoid == LF).

Hmm, yeah that sounds wrong.  Can you force it to CRLF?

BTW, the CVS issue was way worse IIRC, you had to have the repository on a
"binary" mount, or it started on fire.  Yeah, not cool.

> Now, if you edit a file of this using some tool that 1. does 
> not have problems with the LF endings, but 2. generates CR/LF 
> endings on new lines (like, for example, MSVC does, but also 
> some other tools)

Right, many if not most native Windows text editors are going to do this (at
least by default), which is of course the correct thing to do.

> then SVN cannot check in that file. It complains that the 
> file has "mixed line endings". Yes, that's right, but why 
> does SVN care?

I'm... Hmmm, I use svn quite a bit and I've never actually run into that.
That sounds more like a design defect of svn that anything else, because
yeah, it shouldn't care.

> Even commands like "svn diff" behave erratically.
> You have to use d2u in order to get this file checked in. 
> This is really annoying.

Welcome to the 21st Century.  If you believe your calendar.  Which of course
you should not. ;-)

> Note: In my opinion, this is clearly an SVN/libapr problem, 
> not a Cygwin problem.

Well, it's a basic field-wide problem that should have been solved long ago.
Cygwin goes to herculean efforts to ameliorate the issue, and is amazingly
successful at it, but there are fundamental limits as to what it can do to
make broken programs not be broken.  In the end, it's a rare problem indeed
which can be fixed by not fixing the problem.

> > I know CVS had some problems with the
> > dreaded \n/\r\n issue back in the day, but I wasn't aware 
> of similar 
> > Subversion issues.
> The CVS issue was a mere cosmetical one, while the SVN one is 
> a real show-stopper, IMHO.

No, quite the reverse, see above (though we may be talking about two
different problems resulting from this NP-Hard "line ending" problem).

> > > Other than that, I never had any problems with the CR/LF 
> line endings.
> > 
> > Bash has some known problems in this area, but there's a 
> > Cygwin-specific fix (that unfortunately is off by default) which 
> > hopefully will get accepted upstream sometime in the next century.
> But this fix is in bash, right? So, it does not have a CR/LF 
> issue in Cygwin, as I said. ;)

It does have the problem unless you give it the correct option.  From the
latest Cygwin-specific bash doc:

"1. When using binary mounts, cygwin programs try to emulate Linux.  Bash
on Linux does not understand \r\n line endings, but interprets the \r
literally, which leads to syntax errors or odd variable assignments.
3. Cygwin text mounts automatically work with either line ending style,
because the \r is stripped before bash reads the file.  If you absolutely
must use files with \r\n line endings, consider mounting the directory
where those files live as a text mount.
4. This version of bash has a cygwin-specific shell option, named "igncr"
to force bash to ignore \r, independently of cygwin's mount style.  As of
bash-3.2.3-5, it controls regular scripts, command substitution, and
sourced files.  I hope to convince the upstream bash maintainer to accept
this patch into the future bash 4.0 even on Linux, rather than keeping it
a cygwin-specific patch, but only time will tell.  There are several ways
to activate this option:
4a. For a single affected script, add this line just after the she-bang:
 (set -o igncr) 2>/dev/null && set -o igncr; # comment is needed
4b. For a single script, invoke bash explicitly with the shopt, as in
'bash -o igncr ./myscript' rather than the simpler './myscript'.
4c. To affect all scripts, export the environment variable BASH_ENV,
pointing to a file that sets the shell option as desired.  Bash will
source this file on startup for every script.
4d. Added in the bash-3.2-2 release: export the environment variable
SHELLOPTS with igncr included in it.  It is read-only from within bash,
but you can set it before invoking bash; once in bash, it auto-tracks the
current state of 'set -o igncr' or 'shopt -s igncr'.  If exported, then
all bash child processes inherit the same option settings; with the
exception added in 3.2.9-11 that certain interactive options are not
inherited in non-interactive use."

> > > Thus, many programmers do not care about the "right" mode.
> > 
> > Well, better stated, they assume they can treat all files - text, 
> > executables, jpegs, whatever - as if they were 
> Unix-formatted text files.
> > Of course, they're not, hence problems ensue.
> Well, it is hard to test something against problems that are 
> unlikely to occur at all in your environment, right?

No, it isn't.  In fact that's pretty much the definition of testing.  In
"Computer McScience and the Case of The Intractable Line Endings", we're
talking about the tester generating a single '\r\n' text file, which is
trivial, and watching for signs of smoke.

> Of 
> course, you can generate artificial test cases for this. 

Test cases in general are "artificial".

> However, if you think about generating these test cases, 
> chances are high you would not do it wrong in the first place.

Yep.  And yet here we are, in 2008[1], talking about tools which can't
handle plain text files.

> Thus, while I think it is annoying that some tools have CR/LF 
> problems, I can understand WHY this happens.

I can too, and like most such problems, it has frustratingly little to do
with the basic technology under discussion.

> Best regards,
> Spiro.

Gary R. Van Sickle
[1] 2008 if you believe your calendar, and not the lack of a flying car in
your garage nor the lack of programs which can properly handle plain text
files on your computer.

Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

More information about the Cygwin mailing list