Is this correct behaviour for 'rev'?
Brian Inglis
Brian.Inglis@SystematicSW.ab.ca
Thu Oct 24 13:54:07 GMT 2024
On 2024-10-23 23:01, Mark Geisert via Cygwin wrote:
> On 10/22/2024 10:33 PM, Mark Geisert via Cygwin wrote:
>> On 10/22/2024 8:00 PM, Backwoods BC via Cygwin wrote:
>>> It appears that 'rev' is choking on any character \x80 or higher, but
>>> is OK with those \x1f or smaller. It doesn't give an error or ignore
>>> it, it just stops.
>>>
>>> I don't have access to a Linux box so I can't see if this happens
>>> there and nothing in the documentation suggests that this is the
>>> correct functionality.
>>>
>>> Test case:
>>> printf 'no non-ASCII characters\nhex 01 >\x01< here\nhex 80 >\x80<
>>> here\nLine 4\n'|rev|rev
>>>
>>> This is for "rev from util-linux 2.33.1"
>>>
>>> I don't have the current version of 'rev' on my system due to not
>>> having updated in a while. I accidentally screwed up my installation
>>> and have been reluctant to wipe it and start over.
>>>
>>> So, is this the expected behaviour for the current version of 'rev'
>>> under Cygwin and/or Linux?
>>
>> The current Cygwin util-linux 2.39.3-2 rev behaves in the same, broken way.
>> It looks like line-ending char(s) are not being handled correctly. Don't
>> know yet if it's rev itself or fgetws() being used by rev that's busted. I'll
>> investigate further. Thanks for the report!
>
> This is a locale issue. In the default Cygwin locale, rev mishandles the \x80
> byte and instead of stopping with an error message it enters an infinite loop.
> I'll probably report this upstream instead of working out a local fix.
>
> There is a work-around: change to the "C" locale just to run rev.
> LC_ALL=C rev zzz
> where zzz is a file containing your four lines. You can also run your original
> testcase with "rev" replaced by "LC_ALL=C rev" in both places.
I run with a UTF-8 locale and have not noticed any issues as I use UTF-8 files.
The man page for rev(1) says it works on wide characters, and `cygcheck rev`
shows it is built with gettext-devel libintl/libiconv.
I could see an issue if the shell and file locales mismatch, or possibly if the
file contains SMP aka non-BMP characters as UTF-16 surrogates.
The correct approach should be to match the execution locale to the file locale,
for example, `LC_ALL=...UTF-8 rev ...` which should produce the expected results.
--
Take care. Thanks, Brian Inglis Calgary, Alberta, Canada
La perfection est atteinte Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut
-- Antoine de Saint-Exupéry
More information about the Cygwin
mailing list