using iconv for conversion from/to Unicode
Bruno Haible
haible@ilog.fr
Wed Mar 15 05:58:00 GMT 2000
Markus Kuhn writes:
> If the implementation can handle surrogates, then better use "UTF-16BE",
> "UTF-16LE", "UTF-16" instead.
This is not useful for the programs I am talking about. The internal
representation I would like to see better supported is the one where each
Unicode character occupies exactly one element of an array: uint16_t[] and
uint32_t[].
> "UCS-2-INTERNAL" -> "UCS-2"
"UCS-2" has ambiguous endianness and sometimes also a BOM. Both of these
misfeatures make it unsuitable as a name for uint16_t[].
> "UNICODEBIG" -> "UCS-2BE"
> "UNICODELITTLE" -> "UCS-2LE"
That and the same for UCS-4 would be better than nothing. Ulrich, can you
add aliases "UCS-2BE", "UCS-2LE", "UCS-4BE" (= "UCS-4"), and implement
"UCS-4LE" ?
I'm not in favour of "UTF32-BE" and "UTF32-LE", because unicode.org wants
them to reject characters > 0x10FFFF, and some day even 0x110000 characters
may not be enough.
Bruno
More information about the Libc-alpha
mailing list