Problem with output from gawk software in recent Cygwin installation

Brian Inglis Brian.Inglis@SystematicSw.ab.ca
Tue Jul 28 17:08:07 GMT 2020


On 2020-07-28 09:32, Bryan VanSchouwen via Cygwin wrote:
> On Tue, Jul 28, 2020 at 12:06 AM Brian Inglis wrote:
>> On 2020-07-27 15:58, Bryan VanSchouwen wrote:
>>> On Mon, Jul 27, 2020 at 4:20 PM Brian Inglis wrote:
>>>> On 2020-07-27 11:50, Michel LaBarre wrote:
>>>>> On July 27, 2020 12:52 PM, Eliot Moss wrote:
>>>>>> On 7/27/2020 11:47 AM, Bryan VanSchouwen wrote:
>>>>>>> I just tried executing an awk script using the most recent version
>>>>>>> of gawk, but the output did not turn out the way that it was supposed
>>>>>>> to.
>>>>>>> This script uses the following command to print the output data to
>>>>>>> the output file:
>>>>>>> print(cai[i], rpi[i], i) >
>>>>>>> "Fit_Height_correln_plot_-_cPuMP_vs_2NH2-cPuMP.dat"
>>>>>>> and previously, this command always printed the values of the three
>>>>>>> variables on a single line, separated by spaces; however, now the
>>>>>>> gawk software is automatically adding hard-returns between the
>>>>>>> values, resulting in the three values being printed on separate lines
>>>>>>> within the data file.
>>>>>>> What is going on here, and how do I permanently make it stop??
>>
>>>>> Here's a wondering: Could it have to do with line endings?  If Windows
>>>>> CRLF is getting in there, then the variables might get a CR in them,
>>>>> which might do weird things.  This assumes those are string variables,
>>>>> not numeric.
>>
>>>> Better yet, how about an example using manifest constants in a one line
>>>> sample to eliminate impact of arrays or changes in input data as in:
>> gawk
>>>> 'BEGIN {print(1,2,3)}' or gawk 'BEGIN {print(1,2,3) > "xxx.txt"}'>
>>> No problem with awk or gawk:
>>> $ for ((i = 0; i < 10; ++i))
>>>   do
>>>     printf "%d %d %d %d\n" $((i+1)) $((i+2)) $((i+3)) $((i+4))
>>>   done > test.txt
>>> $ awk '{print($1, $2, $3)}' test.txt
>>>     1 2 3
>>>     2 3 4
>>>     3 4 5
>>>     4 5 6
>>>     5 6 7
>>>     6 7 8
>>>     7 8 9
>>>     8 9 10
>>>     9 10 11
>>>     10 11 12
>>> So the issue appears to be with your command line, script, or input data
>>> file: please show the command line used to execute the script, attach
>> the
>>> complete awk script, and input data file for diagnosis, or selections of
>> the
>>> latter piped through or output using cat -A to show control characters.
>>> Here they are (attached). The script was executed with the following
>>> command:> gawk -f peak_intensity_correln_plot_compile.awk
>> Input files have <CR><LF> \r\n <ctrl-M><ctrl-J> line terminators and those
>> are
>> carried thru at the ends of the string fields:
>>
>> $ gawk -f peak_intensity_correln_plot_compile.awk
>> $ file *cPuMP*.dat
>> 2NH2-cPuMP_nh_-_pk_Fit_Height_data.dat:            ASCII text, with CRLF
>> line
>> terminators
>> cPuMP_nh_-_pk_Fit_Height_data.dat:                 ASCII text, with CRLF
>> line
>> terminators
>> Fit_Height_correln_plot_-_cPuMP_vs_2NH2-cPuMP.dat: ASCII text, with CR, LF
>> line
>> terminators
>> $ cat -A Fit_Height_correln_plot_-_cPuMP_vs_2NH2-cPuMP.dat | head
>> 1571697^M 1716833^M 224$
>> 2672863^M 2894992^M 225$
>> 2184902^M 9710015^M 226$
>> 4393362^M 4095908^M 227$
>> 3828609^M 4218978^M 229$
>> 6285045^M 4008320^M 233$
>> 3936959^M 4104667^M 234$
>> 1698322^M 1942553^M 237$
>> 4144791^M 4346435^M 238$
>> 2546328^M 2804338^M 239$
>>
>> You could change your input line terminators to "\r\n" e.g. option
>> -vRS="\r\n",
>> insert '{ sub( /\r$/, ""); before each 'split(x, s, " ")', convert your
>> input
>> fields from strings to numbers by adding zero i.e. cai[i] += 0; rpi[i] +=
>> 0; or
>> use belts, braces, and suspenders with all three, e.g.
>>
>> $ gawk -vRS="\r\n" -f peak_intensity_correln_plot_compile.awk
>> $ file *cPuMP*.dat
>> 2NH2-cPuMP_nh_-_pk_Fit_Height_data.dat:            ASCII text, with CRLF
>> line
>> terminators
>> cPuMP_nh_-_pk_Fit_Height_data.dat:                 ASCII text, with CRLF
>> line
>> terminators
>> Fit_Height_correln_plot_-_cPuMP_vs_2NH2-cPuMP.dat: ASCII text
>> $ cat -A Fit_Height_correln_plot_-_cPuMP_vs_2NH2-cPuMP.dat | head
>> 1571697 1716833 224$
>> 2672863 2894992 225$
>> 2184902 9710015 226$
>> 4393362 4095908 227$
>> 3828609 4218978 229$
>> 6285045 4008320 233$
>> 3936959 4104667 234$
>> 1698322 1942553 237$
>> 4144791 4346435 238$
>> 2546328 2804338 239$

> Just out of curiosity: Could this "<CR><LF>" issue be something new for 
> Windows 10? I ask because I don't recall having this issue with my old 
> Windows 7 computer.
If you had over 3 year old Cygwin packages on your Windows 7 system, as
changes for POSIX compatibility were made in the builds for test releases of
gawk, grep, sed coordinated and announced together in:

	https://cygwin.com/legacy-ml/cygwin/2017-02/threads.html#00152

Perhaps the issue is in whatever generated/s the files, or whatever you had
installed and in your path on Windows 7. Msys and Mingw versions of gawk may
ignore extra <CR>s on input, and may possibly also be included with Git for
Windows, or other Windows Unix-like packages.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in IEC units and prefixes, physical quantities in SI.]


More information about the Cygwin mailing list