[PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29

Tue Oct 9 16:49:00 GMT 2018

Hi,

To clarify, the page has a section explaining the differences between
transliteration and transcription and how the terminology is not
entirely unambiguous. It also explains that the national standard SFS
4900 overrides ISO 9, thus ISO 9 can't be used as-is in Finnish context.

Thanks,

On 2018-10-09 19:22, Egor Kobylkin wrote:
> In the hope to be helpful: what you describe below from
> https://fi.wikipedia.org/wiki/Siirtokirjoitus is called _transcription_,
> not transliteration.
> 
> Transliteration is what we have done with ISO 9 or GOST 7.79 System A
> and it could be the same for all languages indeed.
> 
> The transcription can be phonetic or serve other purposes and depends on
> the target language or use case. We have used the GOST 7.79 System B.
> 
> Egor
> 
> On 09.10.2018 18:10, Marko Myllynen wrote:
>> Hi,
>>
>> On 2018-10-09 01:04, Rafal Luzynski wrote:
>>>
>>> Particularly, I think that those rules will not be helpful at all for
>>> the languages which use neither Latin nor Cyrillic alphabet.
>>
>> This is certainly a very good point.
>>
>>> If you refer to other languages than Russian which also use the Cyrillic
>>> alphabet but need a different transliteration rules than Russian for
>>> the same characters then it is OK for me now.  I am afraid that the iconv
>>> algorithm does not handle such case.  Of course, we should add this missing
>>> feature eventually but I do not volunteer to do it now.
>>
>> Yes, this would be needed for correct transliteration of different
>> languages, and this might be quite a bit of work. There's also the case
>> of transliteration and character sets, consider the transliteration
>> examples from https://fi.wikipedia.org/wiki/Siirtokirjoitus:
>>
>> Russian:        Ð‘Ð¾Ñ€Ð¸Ñ ÐÐ¸ÐºÐ¾Ð»Ð°ÐµÐ²Ð¸Ñ‡ Ð•Ð»ÑŒÑ†Ð¸Ð½
>> Int'l:          Boris NikolaeviÄ ElÊ¹cin
>> Finnish:        Boris NikolajevitÅ¡ Jeltsin
>> French:         Boris NikolaÃ¯evitch Ieltsine
>> Phonetic (IPA): [bÉËˆrÊ²is nÊ²ÉªkÉËˆlaÉªvÊ²ÉªtÉ• ËˆjelÊ²tsÉ¨n]
>>
>> For French you'll get the correct transliteration with iconv by using -t
>> ISO-8859-1//TRANSLIT, for Finnish with -t ISO-8859-15//TRANSLIT but it's
>> not so obvious how to get the above kind transliteration for ISO 9
>> international or especially for the phonetic case.
>>
>> One thing that might be helpful here could be something like:
>>
>> $ echo Ð¶ | LC_ALL=fi_FI.UTF-8 iconv -f UTF-8 -t UTF-8//TRANSLIT_FORCE
>> Å¾
>>
>> That is, force transliteration of each character (if defined) even if
>> it's part of the target character set. AFAICS this is not currently
>> possible.
>>
>>> But, while at this, is there anything that stops are from adding transliteration
>>> rules for additional Cyrillic characters not used in Russian but used in
>>> other languages?
>>
>> This would probably make sense.
>>
>> FWIW, for Finnish the diff for Russian to be applied in the locale on
>> top of translit_cyrillic (ISO 9) rules would be something like below, I
>> still need to check whether there are rules needed for other languages
>> than Russian that could be added (I hope to submit a proper patch
>> against fi_FI shortly after translit_cyrillic has landed):
>>
>> <U0446> "<U0074><U0073>"
>> <U0447> "<U0074><U0161>";"<U0074><U0073><U0068>"
>> <U0448> "<U0161>";"<U0073><U0068>"
>> <U0449> "<U0161><U0074><U0161>";"<U0073><U0068><U0074><U0073><U0068>"
>> <U044A> ""
>> <U044C> ""
>> <U044D> "<U0065>"
>> <U044E> "<U006A><U0075>"
>> <U044F> "<U006A><U0061>"
>> <U0451> "<U006A><U006F>"
>>
>> Thanks,
>>
> 

-- 
Marko Myllynen