iconv + normalization + upper/lower-case
Bruno Haible
haible@ilog.fr
Sat Sep 22 14:00:00 GMT 2001
Stefan Hoffmeister wrote:
> http://www.unicode.org/unicode/reports/tr15/charts/NormalizationChart17.html
>
> specifies a normalization
>
> 0x03BC --> 0x00B5
The other way around: it specifies a normalization 0x00B5 -> 0x03BC.
When you are at the "genuine" Greek mu 0x03BC, no kind of Unicode
mapping will ever get you back to the micro sign 0x00B5.
> cu = towupper(0x00B5); // = 0x039C
> cl = towlower(cu); // = 0x03BC
Similarly, German "ÃÂ", when uppercased and then lowercased, becomes
"ss". Forget about the assumption that towlower (towupper (x)) == x.
It doesn't hold.
> I am unable to find any text reference to character normalization in the
> sources - does glibc implement this somehow?
No, glibc doesn't implement general Unicode normalization.
Your only chance to get back from 0x03BC to 0x00B5 in glibc is by
using an iconv converter to ISO-8859-1//TRAMSLIT. Transliteration is
off by default in iconv() and wcstombs().
Bruno
More information about the Libc-alpha
mailing list