This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: problems with gawk 3.1.5-3 hanging -- more info

Corinna Vinschen wrote:

O_TEXT is correct because gawk is a text tool in the first place and
it should treat input lines identical, regardless if they have DOS
or UNIX lineendings.

Hi Corinna, thanks for the prompt reply.

If I understand you correctly, the fix in -3 has to do with converting DOS-style CRLFs to LFs. This appears to be the issue. The ouput from rsync (on all platforms--windows/unix/POSIX/whatever) contains CR characters (0x0d) by themselves. This is what accounts for the output of rsync "overwriting" itself when you run it alone from a bash prompt.

Here's a snippet of hexdump output from rsync:

$ rsync -Pv /cygdrive/c/backup2 | xxd
0000000: 6261 636b 7570 320a 2020 2020 2020 2020  backup2.
0000010: 2037 3030 2020 2030 2520 2020 2030 2e30   700   0%    0.0
0000020: 306b 422f 7320 2020 2030 3a30 303a 3030  0kB/s    0:00:00
0000030: 0d20 2020 2020 3133 3736 3137 3620 2020  .     1376176
0000040: 3025 2020 2020 312e 3238 4d42 2f73 2020  0%    1.28MB/s
0000050: 2020 303a 3133 3a33 350d 2020 2020 2032    0:13:35.     2

You can see the 0d all by itself at address 0000030, and again at 0000059.

It appears to me that by opening the file as O_TEXT, that gawk is hanging because it is waiting for that LF char to follow the CR (which never comes). Does this sound likely to you?

I can't tell why it fails for you, because I can't reproduce this

I'm working on a short script that reproduces the problem for all parties; I'll post it here when I have it. Or would you rather I send it directly to you?

Also, I took a look at some of the source for other utilites that work with text input; these included tail, head, cat, and sed. I don't see any of those utilities opening up the input file the way you are in gawk, and in fact a look at the ChangeLog for coreutils hints that they used setmode at one time and since removed it (why, I don't know). Comments abound like this in the ChangeLog:

ChangeLog: * src/cat.c (main): Avoid setmode; use POSIX-specified routines instead.

My thinking was, "gawk should probably open files the same way sed does," but maybe my thinking is in error on this point. Your thoughts?

As for the O_BINARY mode, in theory there's a way to
accomplish that without rebuilding gawk by setting the BINMODE

gawk -v BINMODE=r [...]

Unfortunately it turns out that this doesn't work because gawk fails
to call the setmode function in this case on Cygwin.  I'll upload a
patched gawk soon.  If you want to apply it by yourself, try this:

This is a suitable workaround for me, but I would like to humbly submit that gawk shouldn't hang regardless of the input given to it. If the input isn't acceptable, perhaps it should error to stderr or some such and exit. Your thoughts?

Again, I'll come up with a short shell script that reproduces the issue for you, and hopefully together we can come up with an agreeable solution.


David Carter

Unsubscribe info:
Problem reports:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]