This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PING^6][PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872]




On 16.04.19 19:58, Carlos O'Donell wrote:
On 4/16/19 1:06 PM, Egor Kobylkin wrote:
Just FYI, this what I was testing: ./testrun.sh /usr/bin/iconv -f UTF-8 -t ASCII//TRANSLIT <<< "ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУУ́ФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуу́фхцчшщъыьэюяёђѓєѕіїјљњћќўџѪѫѲѳѴѵҌҍ ҐґҒғҔҕҖҗҚқҞҟҢңҤҥҦҧҨҩҪҫҬҭҮүҲҳҴҵҺһҼҽҾҿӀӁӂӋӌӐӑӒӓӖӗӘәӜӝӞӟӠӡӤӥӦӧӨөӰӱӲӳӴӵӸӹ’"

And this is the expected result ("" added by myself):
"YODJG`YEZ`IYIJL`N`TSHK`U`DHABVGDEZHZIJKLMNOPRSTUU?FXCZCHSHSHHA`Y``E`YUYAabvgdezhzijklmnoprstuu?fxczchshshh``y``e`yuyayodjg`yez`iyijl`n`tshk`u`dhO`o`FHfhYHyhE`e` G`g`GHghGHghZH`zh`K`k`K`k`N`n`NGngP`p`O`o`C`C`T`t`UuH`h`TCZtczSH`sh`CH`ch`CH`ch`iZH`zh`CH`ch`A`a`A`a`E`e`A`a`ZH`zh`Z`z`Z`z`I`i`O`o`O`o`U`u`U`u`CH`ch`Y`y`'"

Thanks.

I was using CyrTranslit (python translater) to review other work done in this area,
but it wasn't very fruitful.

$ python3
Python 3.7.3 (default, Mar 27 2019, 13:36:35)
[GCC 9.0.1 20190227 (Red Hat 9.0.1-0.8)] on linux
Type "help", "copyright", "credits" or "license" for more information.
import cyrtranslit
cyrtranslit.supported()
dict_keys(['sr', 'me', 'mk', 'ru'])
cyrtranslit.to_latin("ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУУ́ФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуу́фхцчшщъыьэюяёђѓєѕіїјљњћќўџѪѫѲѳѴѵҌҍҐґҒғҔҕҖҗҚқҞҟҢңҤҥҦҧҨҩҪҫҬҭҮүҲҳҴҵҺһҼҽҾҿӀӁӂӋӌӐӑӒӓӖӗӘәӜӝӞӟӠӡӤӥӦӧӨөӰӱӲӳӴӵӸӹ’")
'ЁĐЃЄЅІЇJLjNjĆЌЎDžABVGDEŽZIЙKLMNOPRSTUÚFHCČŠЩЪЫЬЭЮЯabvgdežziйklmnoprstuúfhcčšщъыьэюяёđѓєѕіїjljnjćќўdžѪѫѲѳѴѵҌҍҐґҒғҔҕҖҗҚқҞҟҢңҤҥҦҧҨҩҪҫҬҭҮүҲҳҴҵҺһҼҽҾҿӀӁӂӋӌӐӑӒӓӖӗӘәӜӝӞӟӠӡӤӥӦӧӨөӰӱӲӳӴӵӸӹ’'


"ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУУ́ФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуу́фхцчшщъыьэюяёђѓєѕіїјљњћќўџѪѫѲѳѴѵҌҍҐґҒғҔҕҖҗҚқҞҟҢңҤҥҦҧҨҩҪҫҬҭҮүҲҳҴҵҺһҼҽҾҿӀӁӂӋӌӐӑӒӓӖӗӘәӜӝӞӟӠӡӤӥӦӧӨөӰӱӲӳӴӵӸӹ’" 'ЁĐЃЄЅІЇJLjNjĆЌЎDžABVGDEŽZIЙKLMNOPRSTUÚFHCČŠЩЪЫЬЭЮЯabvgdežziйklmnoprstuúfhcčšщъыьэюяёđѓєѕіїjljnjćќўdžѪѫѲѳѴѵҌҍҐґҒғҔҕҖҗҚқҞҟҢңҤҥҦҧҨҩҪҫҬҭҮүҲҳҴҵҺһҼҽҾҿӀӁӂӋӌӐӑӒӓӖӗӘәӜӝӞӟӠӡӤӥӦӧӨөӰӱӲӳӴӵӸӹ’'

Which doesn't give a good transliteration.

I guess the reason for that is that it is using the first key 'sr' from your list that stands for Serbian. And Serbian doesn't have those characters that are omitted ( "Щ" for example).

But the table is better:
https://github.com/opendatakosovo/cyrillic-transliteration/blob/master/cyrtranslit/mapping.py#L138-L155

Ё -> YO.

Which is a good cross-check for me.

Yet the closest one from that codebase should be this https://github.com/opendatakosovo/cyrillic-transliteration/blob/master/cyrtranslit/mapping.py#L88

It is exactly the reason we had 12 iterations on this patch - we wanted to cover the most complete yet workable standard for the table. What we reference in the bug memo is the actual accepted standard. It is coalesced with the extended standard for further outdated cyrillic letters.

Bests,
Egor Kobylkin




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]