I don't know, if the example below is a bug or a feature request:
TITLE=`printf "„Χρώματα“" | sed -e "s/_/ /g;s/'/ /g;s/\+/ /g;s/.*/\U&/"`
TITLE2=$(printf "$TITLE\n" | iconv -t ascii//translit)
What would the transliteration look like? And is it locale-independent?
In general greek transliteration (greeklish) can be orthographic-correct or just phonetic-correct.
[ISO 843]: is orthograpih-correct where you can find complete transliteration map from Greek characters to Latin ones. It also includes accented letters and digraphs.
Examples using this ISO-843 are:
Ψάρι : Psári (notice 1-letter Ψ -> 2-letters Ps)
Όργανο : Όrgano
Φλάουτο : Fláouto (notice exception double vowel ου -> ou)
Φύτρο : Fýtro
Αυτοκίνητο : Autokínīto
Αυγό : Augó
Φεγγάρι : Feggári
[UN Elot 743]: It is much like ISO 843. Key differences is that it has different rules for consonant and vowel digraphs, as they become phonetic-correct. It also uses more mainstream latin letters like "i" and not "ī" that is used by ISO-843.
Αυτοκίνητο : Aftokínito (notice the αυ becomes af to be phonetic-correct)
Αυγό : Avgo (notice that here αυ becomes av to be phonetic-correct)
Φεγγάρι : Fengári ( notice that γγ becomes ng to be phonetic-correct)
In everyday usage (sms, im, forums etc) no one is using accented greeklish. There is no standard conversion table and the majority does only phonetic-correct and sometimes visually-correct transliteration.
Ψάρι : Psari
Όργανο : Organo
Φλάουτο : Flaouto, Flauto (you can skip "o" from "u" because "ou" sounds like "u")
Φύτρο : Fytro, Fitro
Αυτοκίνητο : Aftokinito, Aytokinhto, Aftokinito
Αυγό : Avgo, Aygo
Φεγγάρι : Feggari, Fegari
Θα έρθω: Tha erthw, Tha ertho, 8a er8w, 8a er8o (a lot of times "8" is used for "θ" because it looks a bit the same - visually correct)
Accents are not used because usually you have Greek and US/English k/b layout. On the US/English you don't have accents. Idioms like θ -> 8 which try to mimic letters can be confusing so they can be skipped. What I would like to see is a transliteration that can be written by users also. E.g. the ISO-843 is not good one because η => ī , I have no clue what layout do I need to write this accented i !?
Off course if there is possibility to support multiple systems, will be the best for all.
In any case, there are many higher-level frameworks that need transliteration and it is very annoying to specialize for greek if you are based on iconv. Any solution is welcome :)
Please ask if you need more information.
Info - References
Absolutely any transliteration scheme is good if it gives some ASCII characters instead of exception this function does now.
I have a similar problem with later versions of iconv (2.13 in Ubuntu).
iconv -t ascii//TRANSLIT <<< 'æ,ø,å'
gives me "ae,?,a" but in my opinion it should give me "ae,o,a".
Tested this on several machines with the same version (2.13) and on an old SunOS box with 1.9. The latter returned the desired result.
My LC_ALL and LANG variables are all set to nb_NO.UTF-8 and I've tried changing it to other available locales, without getting the wanted result.
Is this a bug?
(In reply to comment #4)
> gives me "ae,?,a" but in my opinion it should give me "ae,o,a".
> Is this a bug?
I believe it is a bug.
The request to change transliteration for æøå is http://sourceware.org/bugzilla/show_bug.cgi?id=89 . Please explain there why you believe it should transliterate to ae,o,a and not ae,oe,aa.
Created attachment 6380 [details]
I have created a first version of a file to use for greeklish (greek to ascii) transliteration.
The conversion scheme is:
alpha -> a
beta -> b
gamma -> g
delta -> d
epsilon -> e
zeta -> z
eta -> h
theta -> 8
iota -> i
kappa -> k
lamda -> l
mu -> m
nu -> n
xi -> ks
omikron -> o
pi -> p
ro -> r
sigma -> s
tau -> t
ypsilon -> y
phi -> f
chi -> x
psi -> ps
omega -> w
From my experiments I realized that there isn't "chained" transliteration.
By this, I mean that I had to specify the greeklish transliterations for all accented versions of letters, even I had specified for the simply one.
ETA with PERISPOMENI -> ETA (this is already in translit_combining)
ETA -> H (this is my addition)
If I try to convert "ETA with PERISPOMENI" to ascii then I get ?, I had to edit it to this:
ETA with PERISPOMENI -> ETA;H
*** Bug 260998 has been marked as a duplicate of this bug. ***
Seen from the domain http://volichat.com
Page where seen: http://volichat.com/adult-chat-rooms
Marked for reference. Resolved as fixed @bugzilla.
(In reply to Petter Reinholdtsen from comment #5)
> (In reply to comment #4)
> > gives me "ae,?,a" but in my opinion it should give me "ae,o,a".
> > Is this a bug?
> I believe it is a bug.
It works in recent glibc (glibc-2.20-8.fc21.x86_64)
in *all* locales except C/POSIX.
$ echo 'Æ,æ,Ø,ø,Å,å' | LANG=nb_NO.UTF-8 iconv -t ascii//TRANSLIT
$ echo 'Æ,æ,Ø,ø,Å,å' | LANG=en_US.UTF-8 iconv -t ascii//TRANSLIT
$ echo 'Æ,æ,Ø,ø,Å,å' | LANG=POSIX iconv -t ascii//TRANSLIT
iconv: illegal input sequence at position 0
It is independent of the locale because all locales (except C/POSIX)
include translit_neutral where this is defined.
> The request to change transliteration for æøå is
> http://sourceware.org/bugzilla/show_bug.cgi?id=89 . Please explain there
> why you believe it should transliterate to ae,o,a and not ae,oe,aa.
For Scandinavian locales, transliterating 'Æ,æ,Ø,ø,Å,å' to 'Ae, ae,
Oe, oe, Aa, aa' is more appropriate. For most other locales,
transliterating å to a is probably OK. I am a bit puzzled about Æ ->
AE, shouldn’t this be transliterated to Ae, even in English locales?
(Same with Ø, transliterating to just O or maybe Oe in
translit_neutral for all locales which do not have special rules
The patch attached to
fixes the transliteration for Norwegian locales (nn_NO and nb_NO).
Probably the same fix should be applied also for Swedish and Finnish
locales (and maybe Icelandic locales as well).
(In reply to Mike FABIAN from comment #8)
> I am a bit puzzled about Æ ->
> AE, shouldn’t this be transliterated to Ae, even in English locales?
> (Same with Ø, transliterating to just O or maybe Oe in
> translit_neutral for all locales which do not have special rules
> seems better.
For me it make more sense to transliterate a capital letter to all capital
letters, to ensure words with only capital letters look sane. For example
SØRING would end up like SOERING, not SOeRING. Sure, if the capital letter is the first one in the sentence, it would make more sense to use Øvelse -> Oevelse,
but I suspect special norwegian characters at the start of the sentence
is less common than capital special norwegian letters in an all capital word. Most Norwegian words do not start with æ, ø or å. :)
On Mon, May 04, 2015 at 09:00:36PM +0000, pere at hungry dot com wrote:
> --- Comment #9 from Petter Reinholdtsen <pere at hungry dot com> ---
> (In reply to Mike FABIAN from comment #8)
> > I am a bit puzzled about Æ ->
> > AE, shouldn???t this be transliterated to Ae, even in English locales?
> > (Same with Ø, transliterating to just O or maybe Oe in
> > translit_neutral for all locales which do not have special rules
> > seems better.
> For me it make more sense to transliterate a capital letter to all capital
> letters, to ensure words with only capital letters look sane. For example
> SØRING would end up like SOERING, not SOeRING. Sure, if the capital letter is
> the first one in the sentence, it would make more sense to use Øvelse ->
> but I suspect special norwegian characters at the start of the sentence
> is less common than capital special norwegian letters in an all capital word.
> Most Norwegian words do not start with æ, ø or å. :)
The same goes for Danish which due to some common hertiage use the same letters
and to some extent the same transliteration rules.
I would also recommend transliterating Æ, Ø, Å to AE, OE, AA
The problem is present for many languages and was reporter earlier
I have created a spreadsheet to generate transliteration tables
The table should look like this https://sourceware.org/bugzilla/attachment.cgi?id=8591
And the list of unicode characters can be found here http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
Those who are interested in their language being included for transliteration, would you spend some time to generate the needed table/file?
*** This bug has been marked as a duplicate of bug 2872 ***
I have tested the translit_greeklish by Nick Andrik and will try to get it included into the fix along with with the translit_cyrilic that I have generated myself.