This is the mail archive of the
mailing list for the glibc project.
Re: Updating/adding locale for Ethiopia and Eritrea in GNU libc.
- From: Daniel Yacob <yacob at geez dot org>
- To: pere at hungry dot com, yacob at geez dot org
- Cc: libc-alpha at sources dot redhat dot com
- Date: Mon, 05 May 2003 10:14:13 -0400
- Subject: Re: Updating/adding locale for Ethiopia and Eritrea in GNU libc.
This may or may not get into the libc-alpha list, I'm not subscribed
so I don't know. You are welcome to forward it if not. At the end
of this mail is a link to 19 locales for glibc (14 new), they are
ready for glibc maintainers to evaluate.
>> Traditionally there is no AM or PM. A full sentence expression is
>> used to specify the time, I've used an abbreviated form. The word
>> used for AM has 4 spellings, I thought I had the canonical form but
>> did not. Most people will confuse this.
> Are you aware that you can skip the AM/PM value, and that this is
> taken as an indication that the locale should use 24-hour time value?
> At least I believe this is how it work.
The 24 hour system is unknown in Ethiopia, it would generate great
confusion to use it. Traditionally there are 7 named divisions to a day,
but using only two divisions in a digital format is understood.
>> The only oddity is that english softs with "aAbBcC" but I think that
>> is what is being set in file iso14651_t1 file. I get the same order
>> when I set LC_COLLATION=en_US.
>> OK. Are you sure lower case should come before capital letters? In
>> Norwegian it is the other way around.
Until now I had thought upper case came before lower case in every
language and region (likely I've used the "C" locale for so long that
I've forgotten). Looking through en_US, en_GB, it_IT, fr_FR I
find that they defer the matter to the "iso14651_t1" file. This is
the best thing for the East African locales to do as well, since the
countries listed above are those that have set the basis for Latin script
use in the region.
> Sounds good. I did not understand why you added whitespace (<U0020>)
> to the entries. Are you sure this is a good idea?
This occurred when month and day names were only two chars in length.
The space is added to aid columnar formatting. Doing "ls -l" and getting
a ragged list drove me nuts, so I added the space.
> Did you check the LC_NUMERIC and LC_MONETARY sections? How are
> numbers and currency values written in Amharic? I do not think there
> should be space at the end of int_curr_symbol. At the very least, it
> should use NO-BREAK SPACE (<U00A0>).
This ending <U0020> is found in most every locale file (even no_NO). I
think without it the currency name would be contiguous with the digits,
one of your examples would become "NOK1 234,56" instead of "NOK 1 234,56".
> How are the following values written as numbers and currency (using
> both $ and ETB) in Amharic?
The ETB (or "Birr" in Amharic) is generally not used unless the context
would be ambiguous. For example, in a government or trade document when
two or more monetary systems are used.
Generally, the time formats and collation is all I actually test. I
don't know how to test the currency formats or other symbols like
"name_mr", I'll download the latest glibc and look into its testing
tools. I've otherwise just checked to see that the formatting
strings are typo free.
> And so on... Do you have a similar list for the languages you are
> working on? I want to add it to the test scripts in glibc.
I've prepared all of the locales that I am working on here:
aa_ER-Saaho (Saaho Dialect)
gez_ER-Abegede (Alternate Sort Order)
gez_ET-Abegede (Alternate Sort Order)
These are revised and completely new locales, most are ready for
submission to glibc. I'll continue reviewing them and welcome
any feedback. The last trouble spots are:
om_ET contains code to extend Latin collation, but Ethiopic
collation then breaks. I can't see why, I could use help here.
With lang_ab, lang_term, and lang_lib commented out in the "gez",
"sid" and "tig" locales, they compile fine under my default RH9
libc-2.3.2 setup. They should be uncommented for testing in the
alpha version that would recognize these language tags.