This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v9] Locales: Cyrillic -> ASCII transliteration table [BZ #2872]
On 19.11.18 08:13, Marko Myllynen wrote:
> Hi,
>
> On 17/11/2018 20.34, Egor Kobylkin wrote:
>>
>> Shouldn't we have two explicit rules for transcription and
>> transliteration not dependent on a destination character set?
>>
>> This would contradict ISO 9.1995. (System A).
>> System A was added on Marko's request (so setting him on TO:) I am
>> neutral on keeping it or dropping it, just to be clear.
>>
>> This particular rule with h/x would make sense it's own.
>> But again - it would contradict the standards.
>> On the other hand, for my personal needs I care less about standards but
>> about current functionality and data loss because of missing
>> transcription altogether due to the BZ #2872.
>
> Given the amount of questions above I think the way forward is to try
> follow the relevant standards as closely as possible and also check what
> the other implementations (i.e., uconv(1)) do. For example, checking the
> case earlier mentioned case may or may not give some hints:
>
> $ echo Шема | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin
> Šema
> $ echo Схема | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin
> Shema
> $ uconv -V
> uconv v2.1 ICU 50.1.2
Marko,
Your example only covers _tansliteration_ to Latin Diacritics
iconv -f UTF-8 -t ISO-8859-15//TRANSLIT \
| iconv -f ISO-8859-15 -t UTF-8
while BZ #2872 is about _transcription_ to ASCII
iconv -f UTF-8 -t ASCII//TRANSLIT
The glibc wiki explicitly lists this use case (ASCII) as the test
example https://sourceware.org/glibc/wiki/Locales#Testing_Locales
So again, you are asking to have ISO 9.1995. System A but the bug is
about ISO 9.1995. System B (GOST 7.79-2000)
Bests,
Egor