[BUG REPORT]sed -e 's/[B-D]/_/g' replaces unexpected characters
Buchbinder, Barry (NIH/NIAID) [E]
Tue Jun 25 16:09:00 GMT 2013
Lavrentiev, Anton sent the following at Tuesday, June 25, 2013 11:44 AM
>> The character ordering is based on the default Windows ordering for the
>> locale, and that's dictionary ordering, apparently.
>Ah, I see what you meant here. There's an elaborated explanation:
Also, the sed info documentation "Reporting Bugs" explicitly says that
this is not a bug.
`[a-z]' is case insensitive
You are encountering problems with locales. POSIX mandates that
`[a-z]' uses the current locale's collation order - in C parlance,
that means using `strcoll(3)' instead of `strcmp(3)'. Some
locales have a case-insensitive collation order, others don't.
Another problem is that `[a-z]' tries to use collation symbols.
This only happens if you are on the GNU system, using GNU libc's
regular expression matcher instead of compiling the one supplied
with GNU sed. In a Danish locale, for example, the regular
expression `^[a-z]$' matches the string `aa', because this is a
single collating symbol that comes after `a' and before `b'; `ll'
behaves similarly in Spanish locales, or `ij' in Dutch locales.
To work around these problems, which may cause bugs in shell
scripts, set the `LC_COLLATE' and `LC_CTYPE' environment variables
More information about the Cygwin