Bug 3326 - New locale request: crh_UA
Summary: New locale request: crh_UA
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: localedata (show other bugs)
Version: 2.4
: P2 normal
Target Milestone: ---
Assignee: GNU C Library Locale Maintainers
URL:
Keywords:
Depends on:
Blocks: 3363
  Show dependency treegraph
 
Reported: 2006-10-09 18:58 UTC by Reshat Sabiq
Modified: 2016-08-22 13:43 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
Starter locale file by Yours truly. (1.74 KB, text/plain)
2006-10-09 19:03 UTC, Reshat Sabiq
Details
0.2: Added a missed letter, and LC_NAME declaration. (1.78 KB, text/plain)
2006-10-10 15:43 UTC, Reshat Sabiq
Details
0.3: Using UTF-8 instead of ISO-8859-9 in comments, plus upper-cased some Unicode entities (1.79 KB, text/plain)
2006-10-13 20:43 UTC, Reshat Sabiq
Details
Updated crh_UA locale. (1.76 KB, text/plain)
2009-08-17 13:23 UTC, Reþat SABIQ
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Reshat Sabiq 2006-10-09 18:58:31 UTC
Please initiate the following new locale for Crimean Tatar: crh_UA.
Comment 1 Reshat Sabiq 2006-10-09 19:03:34 UTC
Created attachment 1363 [details]
Starter locale file by Yours truly.
Comment 2 Reshat Sabiq 2006-10-10 15:43:00 UTC
Created attachment 1365 [details]
0.2: Added a missed letter, and LC_NAME declaration.

Since this isn't checked in yet, i'm attaching the entire file.
Comment 3 Ulrich Drepper 2006-10-12 21:08:24 UTC
Which character encodings?  ISO-8859-9 is mentioned in the file but is it
necessary?  I.e., is there sufficient existing practice?  The general direction
is to only define a UTF-8 locale and define it has the base (i.e., crh_UA, not
crh_UA.UTF-8).
Comment 4 Reshat Sabiq 2006-10-12 23:26:33 UTC
(In reply to comment #3)
> Which character encodings?  ISO-8859-9 is mentioned in the file but is it
> necessary?  I.e., is there sufficient existing practice?  The general direction
> is to only define a UTF-8 locale and define it has the base (i.e., crh_UA, not
> crh_UA.UTF-8).
I mostly based the encoding on some other locales i've looked at: most of them
specify an ISO encoding. 
As far as Crimean Tatar, web sites appear to favor windows-1254, and ISO-8859-9.
However, as far as i know, desktop's locale doesn't affect browser settings, so
UTF-8 would be as much, or perhaps more acceptable: would have the advantage of
more characters supported (could probably come in handy in text processing in
some apps), w/ barely any performance penalty. 
I would rely on your judgment on this one, but indeed UTF-8 does appear to be a
better choice, and it appears other locales are UTF-8-based, despite the source
comments. In that case, making UTF-8 the base would also be the right thing to
do, as i don't think there'll be a reason to ever have another base.
Please let me know if you'd like me to submit the locale w/ UTF-8 replacing
ISO-8859-9.
Comment 5 keld@dkuug.dk 2006-10-13 16:27:01 UTC
Subject: Re:  New locale request: crh_UA

On Thu, Oct 12, 2006 at 11:26:33PM -0000, tatar dot iqtelif dot i18n at gmail dot com wrote:
> 
> ------- Additional Comments From tatar dot iqtelif dot i18n at gmail dot com  2006-10-12 23:26 -------
> (In reply to comment #3)
> > Which character encodings?  ISO-8859-9 is mentioned in the file but is it
> > necessary?  I.e., is there sufficient existing practice?  The general direction
> > is to only define a UTF-8 locale and define it has the base (i.e., crh_UA, not
> > crh_UA.UTF-8).
> I mostly based the encoding on some other locales i've looked at: most of them
> specify an ISO encoding. 
> As far as Crimean Tatar, web sites appear to favor windows-1254, and ISO-8859-9.
> However, as far as i know, desktop's locale doesn't affect browser settings, so
> UTF-8 would be as much, or perhaps more acceptable: would have the advantage of
> more characters supported (could probably come in handy in text processing in
> some apps), w/ barely any performance penalty. 
> I would rely on your judgment on this one, but indeed UTF-8 does appear to be a
> better choice, and it appears other locales are UTF-8-based, despite the source
> comments. In that case, making UTF-8 the base would also be the right thing to
> do, as i don't think there'll be a reason to ever have another base.
> Please let me know if you'd like me to submit the locale w/ UTF-8 replacing
> ISO-8859-9.

The recommendation is to write locales in a charset independent way, so
that it can work with a number of charsets. And then the locale in
source form should not have a charset name in it. When the locale is
compiled with a specific charset, it is fine to add the name of that
charset to the binary locale name.

best regards
keld
Comment 6 Reshat Sabiq 2006-10-13 20:43:53 UTC
Created attachment 1375 [details]
0.3: Using UTF-8 instead of ISO-8859-9 in comments, plus upper-cased some Unicode entities

(In reply to comment #5)
> The recommendation is to write locales in a charset independent way, so
> that it can work with a number of charsets. And then the locale in
> source form should not have a charset name in it. When the locale is
> compiled with a specific charset, it is fine to add the name of that
> charset to the binary locale name.
OK, so i conclude that the locale-specific ISO charsets in other locale sources
are there for historical reasons, and UTF-8 should be used in general.
Please find the new entire locale file using UTF-8 instead of ISO-8859-9 in
comments. 

P.S. Based on the assumption i marked the previous one obsolete.

Thanks all,
Reshat.
Comment 7 Ulrich Drepper 2007-02-17 08:04:00 UTC
I added the lcoale with UTF-8 as the only and default encoding.

I also changed the file a bit.  As much as you might not like it, the territory
is Ukraine and not Crimea.
Comment 8 Reþat SABIQ 2009-08-17 13:23:12 UTC
Created attachment 4139 [details]
Updated crh_UA locale.

This will be provided as a patch in a new bug shortly (using redhat url), but
just in case someone looks here first, i'm attaching the entire file here as
well.