This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29

In the hope to be helpful: what you describe below from is called _transcription_,
not transliteration.

Transliteration is what we have done with ISO 9 or GOST 7.79 System A
and it could be the same for all languages indeed.

The transcription can be phonetic or serve other purposes and depends on
the target language or use case. We have used the GOST 7.79 System B.


On 09.10.2018 18:10, Marko Myllynen wrote:
> Hi,
> On 2018-10-09 01:04, Rafal Luzynski wrote:
>> Particularly, I think that those rules will not be helpful at all for
>> the languages which use neither Latin nor Cyrillic alphabet.
> This is certainly a very good point.
>> If you refer to other languages than Russian which also use the Cyrillic
>> alphabet but need a different transliteration rules than Russian for
>> the same characters then it is OK for me now.  I am afraid that the iconv
>> algorithm does not handle such case.  Of course, we should add this missing
>> feature eventually but I do not volunteer to do it now.
> Yes, this would be needed for correct transliteration of different
> languages, and this might be quite a bit of work. There's also the case
> of transliteration and character sets, consider the transliteration
> examples from
> Russian:        Борис Николаевич Ельцин
> Int'l:          Boris Nikolaevič Elʹcin
> Finnish:        Boris Nikolajevitš Jeltsin
> French:         Boris Nikolaïevitch Ieltsine
> Phonetic (IPA): [bɐˈrʲis nʲɪkɐˈlaɪvʲɪtɕ ˈjelʲtsɨn]
> For French you'll get the correct transliteration with iconv by using -t
> ISO-8859-1//TRANSLIT, for Finnish with -t ISO-8859-15//TRANSLIT but it's
> not so obvious how to get the above kind transliteration for ISO 9
> international or especially for the phonetic case.
> One thing that might be helpful here could be something like:
> $ echo ж | LC_ALL=fi_FI.UTF-8 iconv -f UTF-8 -t UTF-8//TRANSLIT_FORCE
> ž
> That is, force transliteration of each character (if defined) even if
> it's part of the target character set. AFAICS this is not currently
> possible.
>> But, while at this, is there anything that stops are from adding transliteration
>> rules for additional Cyrillic characters not used in Russian but used in
>> other languages?
> This would probably make sense.
> FWIW, for Finnish the diff for Russian to be applied in the locale on
> top of translit_cyrillic (ISO 9) rules would be something like below, I
> still need to check whether there are rules needed for other languages
> than Russian that could be added (I hope to submit a proper patch
> against fi_FI shortly after translit_cyrillic has landed):
> <U0446> "<U0074><U0073>"
> <U0447> "<U0074><U0161>";"<U0074><U0073><U0068>"
> <U0448> "<U0161>";"<U0073><U0068>"
> <U0449> "<U0161><U0074><U0161>";"<U0073><U0068><U0074><U0073><U0068>"
> <U044A> ""
> <U044C> ""
> <U044D> "<U0065>"
> <U044E> "<U006A><U0075>"
> <U044F> "<U006A><U0061>"
> <U0451> "<U006A><U006F>"
> Thanks,

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]