This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
- From: "egor at kobylkin dot com" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sourceware dot org
- Date: Sat, 06 Oct 2018 15:24:22 +0000
- Subject: [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
- Auto-submitted: auto-generated
- References: <bug-2872-131@http.sourceware.org/bugzilla/>
https://sourceware.org/bugzilla/show_bug.cgi?id=2872
--- Comment #36 from Egor Kobylkin <egor at kobylkin dot com> ---
https://sourceware.org/ml/libc-locales/2018-q4/msg00013.html
After some kind help from Marko in the offline discussion
I realized the multi/single character approach I originally took was
against the of the iconv(1) logic anyway. So there is no harm in
dropping it and adopting Marko's suggestion instead. I will do so and
will resubmit the patch with ISO 9:1995/GOST 7.79 System A + fallback to
GOST 7.79 System B (for ASCII).
However this doesn't resolve the issue for ASCII part being different
for various locales. Again, I am offering the locale maintainers to let
me know if they want to 1) adopt the one I am supplying, 2) write their
own or 3) ignore the patch altogether. Your feedback is appreciated!
This is the relevant part that helped:
> The first part (ISO-8859-15 or ASCII) defines the target encoding for
> iconv(1). //TRANSLIT is described in the iconv(1) man page as:
>
> If the string //TRANSLIT is appended to to-encoding, characters
> being converted are transliterated when needed and possible. This
> means that when a character cannot be represented in the target
> character set, it can be approximated through one or sev‐ eral
> similar looking characters. Characters that are outside of the
> target character set and cannot be transliterated are replaced
> with a question mark (?) in the output.
>
> So in the above examples, iconv(1) encounters the character U+0428
> which is not part of either of the target encoding and since
> //TRANSLIT is specified, iconv(1) tries transliteration according to
> the rules defined above, in case of ASCII U+0160 is not part of the
> target encoding so the next alternative is used.
--
You are receiving this mail because:
You are on the CC list for the bug.