Is this correct behaviour for 'rev'?

Mark Geisert mark@maxrnd.com
Sun Nov 3 09:48:46 GMT 2024


Continuing my monologue, with due consideration of comments posted, ...

On 10/23/2024 10:01 PM, Mark Geisert via Cygwin wrote:
> Replying to myself, I continue...
> 
> On 10/22/2024 10:33 PM, Mark Geisert via Cygwin wrote:
>> On 10/22/2024 8:00 PM, Backwoods BC via Cygwin wrote:
>>> It appears that 'rev' is choking on any character \x80 or higher, but
>>> is OK with those \x1f or smaller. It doesn't give an error or ignore
>>> it, it just stops.
>>>
>>> I don't have access to a Linux box so I can't see if this happens
>>> there and nothing in the documentation suggests that this is the
>>> correct functionality.
>>>
>>> Test case:
>>> printf 'no non-ASCII characters\nhex 01 >\x01< here\nhex 80 >\x80<
>>> here\nLine 4\n'|rev|rev
>>>
>>> This is for "rev from util-linux 2.33.1"
>>>
>>> I don't have the current version of 'rev' on my system due to not
>>> having updated in a while. I accidentally screwed up my installation
>>> and have been reluctant to wipe it and start over.
>>>
>>> So, is this the expected behaviour for the current version of 'rev'
>>> under Cygwin and/or Linux?
>>
>> The current Cygwin util-linux 2.39.3-2 rev behaves in the same, broken 
>> way.  It looks like line-ending char(s) are not being handled 
>> correctly.   Don't know yet if it's rev itself or fgetws() being used 
>> by rev that's busted.  I'll investigate further.  Thanks for the report!
> 
> This is a locale issue.  In the default Cygwin locale, rev mishandles 
> the \x80 byte and instead of stopping with an error message it enters an 
> infinite loop.  I'll probably report this upstream instead of working 
> out a local fix.

Upstream util-linux 2.40.2 has an updated 'rev' that stops with an error 
message when the OP's testcase is tried.  I'm testing the full 2.40.2 
for Cygwin release before too long.

> There is a work-around: change to the "C" locale just to run rev.
>      LC_ALL=C rev zzz
> where zzz is a file containing your four lines.  You can also run your 
> original testcase with "rev" replaced by "LC_ALL=C rev" in both places.

Implicit in that suggestion is that the OP seemed to be uninterested in 
any form of multi-byte characters.. just straightforward operation on 
bytes, even if they have the high bit set.

That said, I appreciate the follow-up comments that dealt with the 
general problem.
Thanks all,

..mark


More information about the Cygwin mailing list