This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
iconv + normalization + upper/lower-case
- To: libc-alpha at sources dot redhat dot com
- Subject: iconv + normalization + upper/lower-case
- From: Stefan Hoffmeister <bug dot glibc-gnu dot org at econos dot de>
- Date: Sat, 22 Sep 2001 22:02:34 +0200
- Organization: Econos
Hi,
I have a case where upper/lower-casing and converting a character with
iconv yields "Invalid or incomplete multibyte or wide character" errors.
In locale en_US, the character "µ" (MICRO SIGN, 0xB5) is converted with
iconv to UNICODELITTLE, yielding the Unicode character U00B5.
Following that, the upper-case and lower-case characters are determined:
cu = towupper(0x00B5); // = 0x039C
cl = towlower(cu); // = 0x03BC
which is all well, given
http://www.unicode.org/unicode/reports/tr21/charts/CaseChart4.html
Trying to convert these characters back into the en_US locale, again using
iconv, only yields errors
"Invalid or incomplete multibyte or wide character"
I find this irritating, as
http://www.unicode.org/unicode/reports/tr15/charts/NormalizationChart17.html
specifies a normalization
0x03BC --> 0x00B5
which would imply that at least for the input of "0x03BC" a conversion to
the en_US locale should be possible (0x03BC -> 0x00B5 -> 0xB5)?
I am unable to find any text reference to character normalization in the
sources - does glibc implement this somehow?
The fact that the capital greek mu (i.e. towupper(0x00B5)) does not
convert back to a locale micro sign (via 0x039C -> 0x03BC -> 0x00B5 ->
0xB5) is as designed, I guess?
TIA,
Stefan