This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] locale/C-translit.h.in: Greek -> ASCII transliteration table [BZ #12031]


Carlos, Rafal,

here is another patch for ASCII transliteration bug [BZ #12031], this time for Greek.

You were instrumental in getting the other patch for the transliteration 

[BZ #2872] approved. So I want to make you aware of this patch. 


Just to make sure, it has nothing to do with Cyrillics. 

It is entirely Greek -> ASCII transliteration table. Yet it has exact same structure 

as [BZ #2872]. So it is only logical if you two could just re-run the same tests you 

did for [BZ #2872].

Given it is Greek there may be other considerations as well of course. Happy to hear 

on this from anyone else any time.

Best regards,
Egor



‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, September 4, 2019 9:31 AM, Diego (Egor) Kobylkin <egor@kobylkin.com> wrote:

> Dear locale maintainers,
> 

> fix the glibc bug 12031 "iconv -t ascii//translit with Greek characters" [1]
> add Greek transliteration rows to locale/C-translit.h.in.
> 

> This work is done on the heels of the successfully committed patch for the
> virtually the same bug [BZ #2872] but concerning Cyrillic characters. [2]
> 

> AFAIK there are many versions of transcription tables for Greek to ASCII
> transcription. Given that current iconv logic can only translit one to many
> but not many to many symbols we take the "Standard" part of
> the Romanization_of_Greek#Modern_Greek table [3]
> 

> and only keep the one letter Greek graphems. That "standard" seems to be close to
> the ELOT 743 indeed but not the same.
> 

> So we omit things like M and Μπ being transliterated as M and B accordingly.
> Rather Μπ will be treated like two separate graphems and transliterated as Mp.
> 

> Here is the list of some standards I have collected so far. There doesn't seem
> a way to harmonize them all into one. But if anyone want to propose a solution -
> please do.
> 

> -   ΕΛΟΤ 743 https://www.teicrete.gr/users/kutrulis/Ergalia/ELOT743.htm Passports.
> -   ISO 843 https://en.wikipedia.org/wiki/ISO_843
> -   ALA-LC https://www.loc.gov/catdir/cpso/romanization/greek.pdf Book titles.
> -   BGN/PCGN http://libraries.ucsd.edu/bib/fed/USBGN_romanization.pdf
> -   http://geonames.nga.mil/gns/html/Romanization/Romanization_Greek.pdf Geographical names.
>     

>     Furthermore to cover the whole U0370-U03FF Greek/Coptic Unicode range I have
>     asked around and made a best effort transliteration for the rest of characters
>     not covered in above standards.
>     

>     Should you have better sources for the actual translit entries please make sure to
>     send your feedback!
>     

>     The patch is attached.
>     

>     Best regards,
>     Egor Kobylkin
>     

>     https://sourceware.org/bugzilla/show_bug.cgi?id=12031 [1]
>     https://sourceware.org/ml/libc-alpha/2019-07/msg00477.html [2]
>     https://en.wikipedia.org/wiki/Romanization_of_Greek#Modern_Greek [3]
>

Attachment: publickey - egor@kobylkin.com - 0x01FEB4E8.asc
Description: application/pgp-keys

Attachment: signature.asc
Description: OpenPGP digital signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]