iconv and combining characters
Chris Heath
chris@heathens.co.nz
Sun Jan 18 18:46:00 GMT 2004
Hi,
I noticed that iconv isn't able to convert UTF-8 containing combining
characters into Latin1. I really think that iconv should be able to do this.
> printf 'A\xCC\x80' | iconv -f UTF-8 -t L1
Aiconv: illegal input sequence at position 1
(\xCC\x80 is UTF-8 for U+0300 COMBINING GRAVE ACCENT.)
The same is true for most other 8-bit encodings as well. But on the other hand,
the CP1255 converter handles it either way:
> printf '\xEF\xAC\x9D' | iconv -f UTF-8 -t CP1255 | od -tx1
0000000 e9 c4
0000002
> printf '\xD7\x99\xD6\xB4' | iconv -f UTF-8 -t CP1255 | od -tx1
0000000 e9 c4
0000002
(\xEF\xAC\x9D is UTF-8 for U+FB1D HEBREW LETTER YOD WITH HIRIQ.)
(\xD7\x99 is UTF-8 for U+05D9 HEBREW LETTER YOD.)
(\xD6\xB4 is UTF-8 for U+05B4 HEBREW POINT HIRIQ.)
So, my main question here is: do you agree that we should make the L1 converter
do the same kind of thing?
Now for a side issue. When I did the same conversions from UTF-8 to CP1255
in a C program, I noticed that iconv returned 0 in both instances. Shouldn't
the second one return a non-zero value since it is irreversible?
Chris
More information about the Libc-alpha
mailing list