This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

iconv + normalization + upper/lower-case

To: libc-alpha at sources dot redhat dot com
Subject: iconv + normalization + upper/lower-case
From: Stefan Hoffmeister <bug dot glibc-gnu dot org at econos dot de>
Date: Sat, 22 Sep 2001 22:02:34 +0200
Organization: Econos

Hi,

I have a case where upper/lower-casing and converting a character with
iconv yields "Invalid or incomplete multibyte or wide character" errors.

In locale en_US, the character "ľ" (MICRO SIGN, 0xB5) is converted with
iconv to UNICODELITTLE, yielding the Unicode character U00B5.

Following that, the upper-case and lower-case characters are determined:

  cu = towupper(0x00B5);  // = 0x039C
  cl = towlower(cu);      // = 0x03BC

which is all well, given 

  http://www.unicode.org/unicode/reports/tr21/charts/CaseChart4.html

Trying to convert these characters back into the en_US locale, again using
iconv, only yields errors

  "Invalid or incomplete multibyte or wide character"

I find this irritating, as 


http://www.unicode.org/unicode/reports/tr15/charts/NormalizationChart17.html

specifies a normalization

  0x03BC --> 0x00B5

which would imply that at least for the input of "0x03BC" a conversion to
the en_US locale should be possible (0x03BC -> 0x00B5 -> 0xB5)?

I am unable to find any text reference to character normalization in the
sources - does glibc implement this somehow?

The fact that the capital greek mu (i.e. towupper(0x00B5)) does not
convert back to a locale micro sign (via 0x039C -> 0x03BC -> 0x00B5 ->
0xB5) is as designed, I guess?

TIA,
Stefan

Follow-Ups:
- Re: iconv + normalization + upper/lower-case
  - From: Bruno Haible

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]