Is this correct behaviour for 'rev'?

Brian Inglis Brian.Inglis@SystematicSW.ab.ca
Thu Oct 24 13:56:54 GMT 2024


On 2024-10-24 02:37, Thomas Wolff via Cygwin wrote:
> 
> Am 24.10.2024 um 07:01 schrieb Mark Geisert via Cygwin:
>> Replying to myself, I continue...
>>
>> On 10/22/2024 10:33 PM, Mark Geisert via Cygwin wrote:
>>> On 10/22/2024 8:00 PM, Backwoods BC via Cygwin wrote:
>>>> It appears that 'rev' is choking on any character \x80 or higher, but
>>>> is OK with those \x1f or smaller. It doesn't give an error or ignore
>>>> it, it just stops.
>>>>
>>>> I don't have access to a Linux box so I can't see if this happens
>>>> there and nothing in the documentation suggests that this is the
>>>> correct functionality.
>>>>
>>>> Test case:
>>>> printf 'no non-ASCII characters\nhex 01 >\x01< here\nhex 80 >\x80<
>>>> here\nLine 4\n'|rev|rev
>>>>
>>>> This is for "rev from util-linux 2.33.1"
>>>>
>>>> I don't have the current version of 'rev' on my system due to not
>>>> having updated in a while. I accidentally screwed up my installation
>>>> and have been reluctant to wipe it and start over.
>>>>
>>>> So, is this the expected behaviour for the current version of 'rev'
>>>> under Cygwin and/or Linux?
>>>
>>> The current Cygwin util-linux 2.39.3-2 rev behaves in the same,
>>> broken way.  It looks like line-ending char(s) are not being handled
>>> correctly.   Don't know yet if it's rev itself or fgetws() being used
>>> by rev that's busted.  I'll investigate further.  Thanks for the report!
>>
>> This is a locale issue.  In the default Cygwin locale, rev mishandles
>> the \x80 byte and instead of stopping with an error message it enters
>> an infinite loop.  I'll probably report this upstream instead of
>> working out a local fix.
>>
>> There is a work-around: change to the "C" locale just to run rev.
>>     LC_ALL=C rev zzz
>> where zzz is a file containing your four lines.  You can also run your
>> original testcase with "rev" replaced by "LC_ALL=C rev" in both places.
> Sorry, this is not a good workaround as it corrupts all (proper)
> non-ASCII characters.
> You could do e.g.
> grep . | rev

Not quite, as that just matches non-empty lines, you would have to do something 
more like `grep -o . ...`, but not sure that would do what you want either.

-- 
Take care. Thanks, Brian Inglis              Calgary, Alberta, Canada

La perfection est atteinte                   Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter  not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer     but when there is no more to cut
                                 -- Antoine de Saint-Exupéry


More information about the Cygwin mailing list