[Bug localedata/20903] charmaps: glibc's Windows single-byte pages don't map like Windows for previously unmapped points
arthur200126 at gmail dot com
sourceware-bugzilla@sourceware.org
Fri Dec 2 17:53:00 GMT 2016
https://sourceware.org/bugzilla/show_bug.cgi?id=20903
--- Comment #3 from Mingye Wang <arthur200126 at gmail dot com> ---
$ iconv -f utf-8 -t windows-1252 <<< $'\u0081' | hexdump -C
iconv: illegal input sequence at position 0
(Expect output is \x80\n.)
$ iconv -f windows-874 -t utf-32le <<< $'\x9f\x81' | hexdump -C
iconv: illegal input sequence at position 0
(Expect output is \x9f\0\0\0\x81\0\0\0\n\0\0\0.)
Theoretically this should work when things are fixed:
$ blob=$' '
$ blob=${blob%' '}$'\u00a0'
$ blob=$(iconv -t cp1252 -f utf-8 <<< "$blob")
$ blob=$(iconv -t cp1252 -f utf-8 <<< "$blob") # ERROR
$ blob=$(iconv -t iso-8859-1 -f utf-8 <<< "$blob")
$ blob=$(iconv -t cp1252 -f utf-8 <<< "$blob")
$ blob=$(iconv -t iso-8859-1 -f utf-8 <<< "$blob")
$ [[ "$blob" == $'\xa0' ]]; echo $? # expected: 0
* * *
Correction: Windows only does the same-value assignment for C1 range
(0x80-0x9f). Outside of this range, Windows assigns PUA mappings as if they
were EUDC chars.[1] (Should have read "best fit" more carefully.)
[1]: https://bugs.python.org/issue28712#msg281044
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the Libc-locales
mailing list