Bug 12031 - iconv -t ascii//translit with Greek characters
: iconv -t ascii//translit with Greek characters
Status: NEW
Product: glibc
Classification: Unclassified
Component: localedata
: unspecified
: P2 normal
: ---
Assigned To: GNU C Library Locale Maintainers
:
:
:
:
  Show dependency treegraph
 
Reported: 2010-09-17 15:09 UTC by Al Bogner
Modified: 2012-04-28 19:40 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
Greeklish trasliteration (19.38 KB, application/octet-stream)
2012-04-28 19:40 UTC, Nick Andrik
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Al Bogner 2010-09-17 15:09:59 UTC
I don't know, if the example below is a bug or a feature request:


TITLE=`printf "„Χρώματα“" | sed -e "s/_/ /g;s/'/ /g;s/\+/ /g;s/.*/\U&/"`

printf "$TITLE\n"
„ΧΡΏΜΑΤΑ“

TITLE2=$(printf "$TITLE\n" | iconv -t ascii//translit)

printf "$TITLE2\n"
,,???????"
Comment 1 Ulrich Drepper 2011-05-15 04:43:52 UTC
What would the transliteration look like?  And is it locale-independent?
Comment 2 squarious 2011-06-26 15:17:00 UTC
In general greek transliteration (greeklish) can be orthographic-correct or
just phonetic-correct.

Standards^[1][3]
=======
[ISO 843]: is orthograpih-correct where you can find complete transliteration
map from Greek characters to Latin ones. It also includes accented letters and
digraphs.
http://en.wikipedia.org/wiki/ISO_843

Examples using this ISO-843 are:
Ψάρι : Psári   (notice 1-letter Ψ -> 2-letters Ps)
Όργανο : Όrgano 
Φλάουτο : Fláouto (notice exception double vowel ου -> ou)
Φύτρο : Fýtro
Αυτοκίνητο : Autokínīto
Αυγό : Augó
Φεγγάρι : Feggári
Μπάμια: Mpámia

----
[UN Elot 743]: It is much like ISO 843. Key differences is that it has
different rules for consonant and vowel digraphs, as they become
phonetic-correct. It also uses more mainstream latin letters like "i" and not
"ī" that is used by ISO-843.

Αυτοκίνητο : Aftokínito (notice the αυ becomes af to be phonetic-correct)
Αυγό : Avgo (notice that here αυ becomes av to be phonetic-correct)
Φεγγάρι : Fengári ( notice that γγ becomes ng to be phonetic-correct)

-----
[ALA-LC]: http://www.loc.gov/catdir/cpso/romanization/greek.pdf

Everyday greeklish
===========
In everyday usage (sms, im, forums etc) no one is using accented greeklish.
There is no standard conversion table and the majority does only
phonetic-correct and sometimes visually-correct transliteration.

Ψάρι : Psari
Όργανο : Organo
Φλάουτο : Flaouto, Flauto (you can skip "o" from "u" because "ou" sounds like
"u")
Φύτρο : Fytro, Fitro
Αυτοκίνητο : Aftokinito, Aytokinhto, Aftokinito
Αυγό : Avgo, Aygo
Φεγγάρι : Feggari, Fegari
Θα έρθω: Tha erthw, Tha ertho, 8a er8w, 8a er8o  (a lot of times "8" is used
for "θ" because it looks a bit the same - visually correct)

Personal opinion
==========
Accents are not used because usually you have Greek and US/English k/b layout.
On the US/English you don't have accents. Idioms like θ -> 8 which try to mimic
letters can be confusing so they can be skipped. What I would like to see is a
transliteration that can be written by users also. E.g. the ISO-843 is not good
one because η => ī , I have no clue what layout do I need to write this
accented i !? 
Off course if there is possibility to support multiple systems, will be the
best for all.

In any case, there are many higher-level frameworks that need transliteration
and it is very annoying to specialize for greek if you are based on iconv. Any
solution is welcome :)

Please ask if you need more information.

Info - References
----------------------------
[1] http://transliteration.eki.ee/pdf/Greek.pdf
[2] http://en.wikipedia.org/wiki/Greeklish
[3] http://en.wikipedia.org/wiki/Romanization_of_Greek
Comment 3 -EMail Hidden- 2011-09-24 02:33:54 UTC
Absolutely any transliteration scheme is good if it gives some ASCII characters
instead of exception this function does now.
Comment 4 Alexander Karlstad 2012-02-03 19:28:54 UTC
I have a similar problem with later versions of iconv (2.13 in Ubuntu).

iconv -t ascii//TRANSLIT <<< 'æ,ø,å'

gives me "ae,?,a" but in my opinion it should give me "ae,o,a".

Tested this on several machines with the same version (2.13) and on an old
SunOS box with 1.9. The latter returned the desired result.

My LC_ALL and LANG variables are all set to nb_NO.UTF-8 and I've tried changing
it to other available locales, without getting the wanted result.

Is this a bug?
Comment 5 Petter Reinholdtsen 2012-02-04 11:20:39 UTC
(In reply to comment #4)
> gives me "ae,?,a" but in my opinion it should give me "ae,o,a".
[...]
> Is this a bug?

I believe it is a bug.

The request to change transliteration for æøå is
http://sourceware.org/bugzilla/show_bug.cgi?id=89 .  Please explain there why
you believe it should transliterate to ae,o,a and not ae,oe,aa.
Comment 6 Nick Andrik 2012-04-28 19:40:48 UTC
Created attachment 6380 [details]
Greeklish trasliteration

I have created a first version of a file to use for greeklish (greek to ascii)
transliteration.

The conversion scheme is:

alpha -> a
beta -> b
gamma -> g
delta -> d
epsilon -> e
zeta -> z
eta -> h
theta -> 8
iota -> i
kappa -> k
lamda -> l
mu -> m
nu -> n
xi -> ks
omikron -> o
pi -> p
ro -> r
sigma -> s
tau -> t
ypsilon -> y
phi -> f
chi  -> x
psi -> ps
omega -> w

From my experiments I realized that there isn't "chained" transliteration.
By this, I mean that I had to specify the greeklish transliterations for all
accented versions of letters, even I had specified for the simply one.

Example:
ETA with PERISPOMENI -> ETA (this is already in translit_combining)
ETA -> H (this is my addition)
If I try to convert "ETA with PERISPOMENI" to ascii then I get ?, I had to edit
it to this:
ETA with PERISPOMENI -> ETA;H