This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH COMMITTED] locale/C-translit.h.in: Cyrillic -> ASCII transliteration [BZ #2872]


Rafal, 


let's revisit on more input after the release. 


The letters that are in GOST 7.79 System B are already transliterated as we agreed. This is the only standard we had considered for Cyrillic-ASCII.

The rest of the letters below seem to be rare or outdated, some are irregular in their respective languages. As you point out we should generally aim for consistency, of course. After the patch is in maybe we will hear from the people directly concerned and could integrate their input as well. 


Hope this helps,
Egor


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, July 22, 2019 10:53 PM, Rafal Luzynski <digitalfreak@lingonborough.com> wrote:

> Egor,
> 

> Here are my doubts and questions about the patch which I have
> committed. If they are resolved before the final release,
> it will be fine. If not - fine as well.
> 

> Sorry if they were discussed and answered before, my memory
> is getting lost in these.
> 

> 20.07.2019 22:01 Rafal Luzynski digitalfreak@lingonborough.com wrote:
> 

> > [...]
> > 

> > -   sysdeps/unix/sysv/linux/syscall-names.list: Add system calls
> >     diff --git a/locale/C-translit.h.in b/locale/C-translit.h.in
> >     index d5f00df0f3..758171c394 100644
> >     --- a/locale/C-translit.h.in
> >     +++ b/locale/C-translit.h.in
> >     @@ -56,6 +56,175 @@
> >     "\x02cd" "_" # <U02CD> MODIFIER LETTER LOW MACRON
> >     "\x02d0" ":" # <U02D0> MODIFIER LETTER TRIANGULAR COLON
> >     "\x02dc" "~" # <U02DC> SMALL TILDE
> >     

> 

> There are gaps. For example, here
> <U0400> CYRILLIC CAPITAL LETTER IE WITH GRAVE (Ѐ)
> is missing. Should we add it and transliterate as, e.g., "E`"?
> 

> > +"\x0401" "YO" # <U0401> CYRILLIC CAPITAL LETTER IO
> > +"\x0402" "DJ" # <U0402> CYRILLIC CAPITAL LETTER DJE
> > +"\x0403" "G`" # <U0403> CYRILLIC CAPITAL LETTER GJE +"\\x0404" "YE" # <U0404> CYRILLIC CAPITAL LETTER UKRAINIAN IE +"\\x0405" "Z`" # <U0405> CYRILLIC CAPITAL LETTER DZE
> > +"\x0406" "I" # <U0406> CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
> > +"\x0407" "YI" # <U0407> CYRILLIC CAPITAL LETTER YI
> > +"\x0408" "J" # <U0408> CYRILLIC CAPITAL LETTER JE
> > +"\x0409" "L`" # <U0409> CYRILLIC CAPITAL LETTER LJE +"\\x040a" "N`" # <U040A> CYRILLIC CAPITAL LETTER NJE
> 

> Isn't this ambiguous if we transliterate:
> 

> "Љ" -> "L`"
> "Њ" -> "N`"
> 

> but also:
> 

> "Ль" -> "L`"
> "Нь" -> "N`"
> 

> ?
> 

> > +"\x040b" "TSH" # <U040B> CYRILLIC CAPITAL LETTER TSHE
> > +"\x040c" "K`" # <U040C> CYRILLIC CAPITAL LETTER KJE +"\\x040e" "U`" # <U040E> CYRILLIC CAPITAL LETTER SHORT U
> 

> <U040D> CYRILLIC CAPITAL LETTER I WITH GRAVE (Ѝ)
> is missing here. Shouldn't we add it? "I`" maybe?
> 

> > +"\x040f" "DH" # <U040F> CYRILLIC CAPITAL LETTER DZHE
> > +"\x0410" "A" # <U0410> CYRILLIC CAPITAL LETTER A
> > +"\x0411" "B" # <U0411> CYRILLIC CAPITAL LETTER BE
> > [...]
> 

> > [...]
> > +"\x042a" "A`" # <U042A> CYRILLIC CAPITAL LETTER HARD SIGN
> > [...]
> > +"\x044a" "``" # <U044A> CYRILLIC SMALL LETTER HARD SIGN
> > [...]
> 

> This is slightly reordered to illustrate my question. Isn't it a problem
> that uppercase hard sigh is transliterated to "A`" while the lowercase
> is transliterated to "``"? My doubt is that the transliterated graphemes
> are not each others' upper/lower case variants. If you look at the soft
> sign:
> 

> > [...]
> > +"\x042c" "`" # <U042C> CYRILLIC CAPITAL LETTER SOFT SIGN [...] +"\\x044c" "`" # <U044C> CYRILLIC SMALL LETTER SOFT SIGN
> > [...]
> 

> they don't have this problem.
> 

> > [...]
> > +"\x042d" "E`" # <U042D> CYRILLIC CAPITAL LETTER E [...] +"\\x044d" "e`" # <U044D> CYRILLIC SMALL LETTER E
> > [...]
> > +"\x048c" "E`" # <U048C> CYRILLIC CAPITAL LETTER SEMISOFT SIGN +"\\x048d" "e`" # <U048D> CYRILLIC SMALL LETTER SEMISOFT SIGN
> > [...]
> 

> Isn't this again an ambiguity problem?
> 

> > +"\x045c" "k`" # <U045C> CYRILLIC SMALL LETTER KJE +"\\x045e" "u`" # <U045E> CYRILLIC SMALL LETTER SHORT U
> > +"\x045f" "dh" # <U045F> CYRILLIC SMALL LETTER DZHE
> 

> Here is a gap which is not critical because here is a place for some
> archaic letters which are hardly used and probably it is difficult to find
> the correct transliterations for them. But somehow you have managed to
> find a transliteration for this:
> 

> > +"\x046a" "O`" # <U046A> CYRILLIC CAPITAL LETTER BIG YUS +"\\x046b" "o`" # <U046B> CYRILLIC SMALL LETTER BIG YUS
> 

> Similarly, is it possible to find and provide tranlisterations for:
> 

> -   little yus (Ѧ/ѧ)?
> -   iotified big yus (Ѭ/ѭ) and little yus (Ѩ/ѩ)?
>     

>     While at this, the transliteration of big yus ("O`"/"o`")
>     is again ambiguous because it is the same as Abkhasian Ha (Ҩ),
>     O with diaeresis (Ӧ), and barred O (Ө).
>     

> 

> > [...]
> > +"\x049a" "K`" # <U049A> CYRILLIC CAPITAL LETTER KA WITH DESCENDER +"\\x049b" "k`" # <U049B> CYRILLIC SMALL LETTER KA WITH DESCENDER
> > +"\x049e" "K`" # <U049E> CYRILLIC CAPITAL LETTER KA WITH STROKE +"\\x049f" "k`" # <U049F> CYRILLIC SMALL LETTER KA WITH STROKE
> > +"\x04a2" "N`" # <U04A2> CYRILLIC CAPITAL LETTER EN WITH DESCENDER +"\\x04a3" "n`" # <U04A3> CYRILLIC SMALL LETTER EN WITH DESCENDER
> > [...]
> 

> As you can see, there are many more ambiguities. But while here, wouldn't
> "K," be a better transliteration for Ka with descender (Қ), and "N," for
> En with descender (Ң)?
> 

> > [...]
> > +"\x04a8" "O`" # <U04A8> CYRILLIC CAPITAL LETTER ABKHASIAN HA +"\\x04a9" "o`" # <U04A9> CYRILLIC SMALL LETTER ABKHASIAN HA
> 

> Is Abkhasian Ha (Ҩ) pronounced like "H"? Then why is it transliterated
> as "O" (with some additional punctuation character) instead of "H"?
> 

> There are more doubts about ambiguous transliterations and gaps which
> I don't list here for the sake of brevity. They can be easily found.
> 

> Regards,
> 

> Rafal

Attachment: publickey - egor@kobylkin.com - 0x01FEB4E8.asc
Description: application/pgp-keys

Attachment: signature.asc
Description: OpenPGP digital signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]