Sourceware Bugzilla – Bug 12031
iconv -t ascii//translit with Greek characters
Last modified: 2012-04-28 19:40:48 UTC
I don't know, if the example below is a bug or a feature request:
TITLE=`printf "„Χρώματα“" | sed -e "s/_/ /g;s/'/ /g;s/\+/ /g;s/.*/\U&/"`
TITLE2=$(printf "$TITLE\n" | iconv -t ascii//translit)
What would the transliteration look like? And is it locale-independent?
In general greek transliteration (greeklish) can be orthographic-correct or just phonetic-correct.
[ISO 843]: is orthograpih-correct where you can find complete transliteration map from Greek characters to Latin ones. It also includes accented letters and digraphs.
Examples using this ISO-843 are:
Ψάρι : Psári (notice 1-letter Ψ -> 2-letters Ps)
Όργανο : Όrgano
Φλάουτο : Fláouto (notice exception double vowel ου -> ou)
Φύτρο : Fýtro
Αυτοκίνητο : Autokínīto
Αυγό : Augó
Φεγγάρι : Feggári
[UN Elot 743]: It is much like ISO 843. Key differences is that it has different rules for consonant and vowel digraphs, as they become phonetic-correct. It also uses more mainstream latin letters like "i" and not "ī" that is used by ISO-843.
Αυτοκίνητο : Aftokínito (notice the αυ becomes af to be phonetic-correct)
Αυγό : Avgo (notice that here αυ becomes av to be phonetic-correct)
Φεγγάρι : Fengári ( notice that γγ becomes ng to be phonetic-correct)
In everyday usage (sms, im, forums etc) no one is using accented greeklish. There is no standard conversion table and the majority does only phonetic-correct and sometimes visually-correct transliteration.
Ψάρι : Psari
Όργανο : Organo
Φλάουτο : Flaouto, Flauto (you can skip "o" from "u" because "ou" sounds like "u")
Φύτρο : Fytro, Fitro
Αυτοκίνητο : Aftokinito, Aytokinhto, Aftokinito
Αυγό : Avgo, Aygo
Φεγγάρι : Feggari, Fegari
Θα έρθω: Tha erthw, Tha ertho, 8a er8w, 8a er8o (a lot of times "8" is used for "θ" because it looks a bit the same - visually correct)
Accents are not used because usually you have Greek and US/English k/b layout. On the US/English you don't have accents. Idioms like θ -> 8 which try to mimic letters can be confusing so they can be skipped. What I would like to see is a transliteration that can be written by users also. E.g. the ISO-843 is not good one because η => ī , I have no clue what layout do I need to write this accented i !?
Off course if there is possibility to support multiple systems, will be the best for all.
In any case, there are many higher-level frameworks that need transliteration and it is very annoying to specialize for greek if you are based on iconv. Any solution is welcome :)
Please ask if you need more information.
Info - References
Absolutely any transliteration scheme is good if it gives some ASCII characters instead of exception this function does now.
I have a similar problem with later versions of iconv (2.13 in Ubuntu).
iconv -t ascii//TRANSLIT <<< 'æ,ø,å'
gives me "ae,?,a" but in my opinion it should give me "ae,o,a".
Tested this on several machines with the same version (2.13) and on an old SunOS box with 1.9. The latter returned the desired result.
My LC_ALL and LANG variables are all set to nb_NO.UTF-8 and I've tried changing it to other available locales, without getting the wanted result.
Is this a bug?
(In reply to comment #4)
> gives me "ae,?,a" but in my opinion it should give me "ae,o,a".
> Is this a bug?
I believe it is a bug.
The request to change transliteration for æøå is http://sourceware.org/bugzilla/show_bug.cgi?id=89 . Please explain there why you believe it should transliterate to ae,o,a and not ae,oe,aa.
Created attachment 6380 [details]
I have created a first version of a file to use for greeklish (greek to ascii) transliteration.
The conversion scheme is:
alpha -> a
beta -> b
gamma -> g
delta -> d
epsilon -> e
zeta -> z
eta -> h
theta -> 8
iota -> i
kappa -> k
lamda -> l
mu -> m
nu -> n
xi -> ks
omikron -> o
pi -> p
ro -> r
sigma -> s
tau -> t
ypsilon -> y
phi -> f
chi -> x
psi -> ps
omega -> w
From my experiments I realized that there isn't "chained" transliteration.
By this, I mean that I had to specify the greeklish transliterations for all accented versions of letters, even I had specified for the simply one.
ETA with PERISPOMENI -> ETA (this is already in translit_combining)
ETA -> H (this is my addition)
If I try to convert "ETA with PERISPOMENI" to ascii then I get ?, I had to edit it to this:
ETA with PERISPOMENI -> ETA;H