CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
Brian Inglis
Brian.Inglis@SystematicSw.ab.ca
Wed Jun 14 19:04:00 GMT 2017
On 2017-06-14 10:07, cyg Simple wrote:
> On 6/13/2017 1:34 PM, Brian Inglis wrote:
>> On 2017-06-13 08:11, cyg Simple wrote:
>>> On 6/10/2017 10:30 PM, Eric Blake wrote:
>>>> On 06/10/2017 08:48 AM, cyg Simple wrote:
>>>>> Uhm, 'wt' and 'wb' came from MS itself.
>>>> Not quite. fopen(,"wb") comes from POSIX. "wt" is probably a microsoft
>>>> extension, but it is certainly not in POSIX nor in glibc.
>>> I think it's a C standard so it should be in glibc. It may be mentioned
>>> in the POSIX standard as in support of the C standard.
>>>>> GNU GCC was adapted to allow it
>>>> Huh? It's not whether the compiler allows it, but whether libc allows
>>>> it. ALL libc that are remotely close to POSIX compliant support
>>>> fopen(,"wb"), but only Windows platforms (and NOT glibc) support
>>>> fopen(,"wt").
>>> Looking at http://www.cplusplus.com/reference/cstdio/fopen/ I see:
>>> "If additional characters follow the sequence, the behavior depends on
>>> the library implementation: some implementations may ignore additional
>>> characters so that for example an additional "t" (sometimes used to
>>> explicitly state a text file) is accepted."
>>> There is also a lot of discussion about the topic at:
>>> https://stackoverflow.com/questions/229924/difference-between-files-writen-in-binary-and-text-mode
>>> As for glibc, it will just ignore the extra character but it allows the
>>> use of "wt"; it just means nothing to that C runtime library. It does
>>> aide in portable code though.
>>> As for me conflating GCC with a C runtime - please forgive my lapse in
>>> memory.
>>
>> There's no need for open mode "t", as text is the default mode unless
>> "b" is specified, and assuming you use "cooked" line I/O functions like
>> fgets/fputs, not "raw" binary I/O like fread/fwrite; fscanf ignores all
>> line terminators unless you use formats like "%c" which could see them.
>>
>
> That isn't exactly true based on the MSDN[1] the "t" manages the CTRL-Z
> EOF marker. However, I agree that it worthless. But regardless the C
> standard states that "t" or whatever extra character can be added and
> left to the implementing library to interpret or ignored. If the C
> runtime library doesn't use it or ignore it then it isn't complying to
> the C standard.
The Standard supports only /[ra](b|+|b+|+b)?|w(b|+|b+|+b)?x?/, although
implementations may choose to ignore some of the allowed trailing
characters (presumably "b", "+", or "x", as the footnote is unclear), or
the file so created may not be accessible as a stream, and anything else
invokes UB.
"7.21.5.3 The fopen function
Synopsis
1 #include <stdio.h>
FILE *fopen(const char * restrict filename,
const char * restrict mode);
Description
...
3 The argument mode points to a string. If the string is one of the
following, the file is open in the indicated mode. Otherwise, the
behavior is undefined.[271]
r open text file for reading
w truncate to zero length or create text file for writing
wx create text file for writing
a append; open or create text file for writing at
end-of-file
rb open binary file for reading
wb truncate to zero length or create binary file for
writing
wbx create binary file for writing
ab append; open or create binary file for writing at
end-of-file
r+ open text file for update (reading and writing)
w+ truncate to zero length or create text file for update
w+x create text file for update
a+ append; open or create text file for update, writing at
end-of-file
r+b or rb+ open binary file for update (reading and writing)
w+b or wb+ truncate to zero length or create binary file for update
w+bx or wb+x create binary file for update
a+b or ab+ append; open or create binary file for update, writing
at end-of-file
...
[271] If the string begins with one of the above sequences, the
implementation might choose to ignore the remaining characters, or it
might use them to select different kinds of a file (some of which might
not conform to the properties in 7.21.2."
> [1] https://msdn.microsoft.com/en-us/library/yeby3zcb(v=vs.140).aspx
>
> "t
> Open in text (translated) mode. In this mode, CTRL+Z is interpreted as
> an EOF character on input. In files that are opened for reading/writing
> by using "a+", fopen checks for a CTRL+Z at the end of the file and
> removes it, if it is possible. This is done because using fseek and
> ftell to move within a file that ends with CTRL+Z may cause fseek to
> behave incorrectly near the end of the file."
Wonder if "t" is also required in order to have <ctrl-Z> recognized as
console input EOF?
That page also documents a bunch of other mode characters and encoding
arguments that make that implementation far from Standard.
--
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
More information about the Cygwin
mailing list