This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 1/3] localedata: use same comment_char/escape_char in these files

On 13 Mar 2016 13:49, Chris Leonard wrote:
> One fundamental way of looking at this issue is to distill it down to
> just unique language codes (ignoring countries, scripts and currency
> variants).

right -- i don't care nearly as much about the locale combos (lang +
territory).  they do provide unambiguous direction at that point, but
with a little effort, we can still get good data w/out them.

some of the data is territory-specific and lang-independent, so as long
as cldr has details about all the territories glibc uses (and it does
today), then that part is fine.  i don't think we should add any that
are not listed in the cldr since it looks pretty complete (as it pulls
in a number of other standards).

> Between both projects combined, there are 257 unique language codes*.
> 115 codes are common between both projects.
> 79 languages are represented in CLDR only
> 63 languages are represented in glibc only.

as long as cldr has at least a territory-independent lang entry,
we can extract a good amount of detail out of that.

my concerns start when cldr lacks any lang info at all, or even more
problematic, has marked that lang code as deprecated or uses a diff
naming convention.  there appears to be about 65 langs / 71 locales
on the glibc side (ignoring @script variants) that fall into these
buckets.  looks like about the same count as you have.

> *Note: one exception is the qu/quz conflict between selecting language
> codes to represent the Quechua languages of the Andes.  I counted this
> as in common, although it will require some resolution going forward,
> as will the Aymara ay/ayc choice (there is no existing CLDR locale for
> Aymara at present).
> There are very possibly a few other such code selection issues which I
> will look into further, I have a nagging suspicion that something is
> going on with the Sotho language codes of Africa, but I need to
> confirm that.  In any event, those wouldn't change the overall numbers
> much.
> Overall, I would not declare one project the winner over the other in
> terms of best representing languages, clearly some cross-porting
> should be done where possible for the sake of language communities
> dependent on either locale type.
> I'm here because I work with glibc-dependent language communities, so
> that has been my focus.  I have not tried to work with CLDR on
> locales.  Anyone here have experience with that?  How
> welcoming/responsive are they to people who are trying to act as
> intermediaries for minority language communities?

seeing as you can represent the concerns of these communities better
than probably any of us, it would be great if you could look into the
cldr process.  from my glances around there, it doesn't look *too*
hard to break in and start posting contributions, especially when you
have no one else representing those languages.

Attachment: signature.asc
Description: Digital signature

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]