Bug 26

Summary: [PATCH] Correct ta_IN sorting order, day/month names and lc_name
Product: glibc Reporter: Thuraiappah Vaseeharan <t_vasee>
Component: localedataAssignee: Petter Reinholdtsen <pere>
Status: RESOLVED FIXED    
Severity: normal CC: t_vasee
Priority: P2 Flags: fweimer: security-
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:
Attachments: Patch to ta_IN locale file to add LC_COLLATE section
Patch to update ta_IN (sorting order, day/month names, lc_name).
Patch to fix ta_IN locale.

Description Thuraiappah Vaseeharan 2004-02-18 14:31:18 UTC
- Fixed day and month abbr and LC_NAME <sivaraj_d@hotmail.com>
- Added LC_COLLATE section <t_vasee@yahoo.com>

--- ta_IN	2001-02-07 07:33:14.000000000 -0600
+++ /home/vasee/ta_IN	2004-02-18 00:00:36.000000000 -0600
@@ -3,6 +3,8 @@
 % Tamil language locale for India.
 % Contributed by Kentaroh Noji <knoji@jp.ibm.com> and
 % Tetsuji Orita <orita@jp.ibm.com>.
+% Fixed day and month abbr & LC_NAME <sivaraj_d@hotmail.com>
+% Added Madras Tamil Lexicon Collation Order: T. Vaseeharan <t_vasee@yahoo.com>
 
 LC_IDENTIFICATION
 title      "Tamil language locale for India"
@@ -28,6 +30,7 @@
 category  "ta_IN:2000";LC_NAME
 category  "ta_IN:2000";LC_ADDRESS
 category  "ta_IN:2000";LC_TELEPHONE
+category  "ta_IN:2000";LC_MEASUREMENT
 
 END LC_IDENTIFICATION
 
@@ -36,47 +39,103 @@
 END LC_CTYPE
 
 LC_COLLATE
-
-% Copy the template from ISO/IEC 14651
 copy "iso14651_t1"
 
-END LC_COLLATE
-
-
-LC_MONETARY
-% This is the POSIX Locale definition the LC_MONETARY category
-% generated by IBM Basic CountryPack Transformer.
-% These are generated based on XML base Locale defintion file 
-% for IBM Class for Unicode.
-%
-int_curr_symbol       "<U0049><U004E><U0052><U0020>"
-currency_symbol       "<U20A8>"
-mon_decimal_point     "<U002E>"
-mon_thousands_sep     "<U002C>"
-mon_grouping          3;2
-positive_sign         ""
-negative_sign         "<U002D>"
-int_frac_digits       2
-frac_digits           2
-p_cs_precedes         1
-p_sep_by_space        1
-n_cs_precedes         1
-n_sep_by_space        1
-p_sign_posn           1
-n_sign_posn           1
-%
-END LC_MONETARY
-
-
-LC_NUMERIC
-% This is the POSIX Locale definition for the LC_NUMERIC  category.
-%
-decimal_point          "<U002E>"
-thousands_sep          "<U002C>"
-grouping               3;2
-%
-END LC_NUMERIC
-
+% Tamil Collation Order as defined in The Madras Tamil Lexicon
+% Ref: http://www.uni-koeln.de/phil-fak/indologie/tamil/otl.html
+% Contact: T. Vaseehran <t_vasee@yahoo.com>
+% Last Updated:  Feb. 12, 2004
+% ChangeLog:
+%  - Added split forms of o, oo, au
+%  - Moved Tamil Symbols above numbers
+%  - Added TAMIL LETTER SHA (U0BB6)
+%    Ref: http://wwwold.dkuug.dk/JTC1/SC2/WG2/docs/n2617
+%       : http://wwwold.dkuug.dk/JTC1/SC2/WG2/docs/n2618
+% Initial version: Feb. 10, 2004.
+
+collating-element <split_o> from "<U0BC6><U0BBE>"
+collating-element <split_oo> from "<U0BC7><U0BBE>"
+collating-element <split_au> from "<U0BC6><U0BD7>"
+collating-element <tagl_KSHA> from "<U0B95><U0BCD><U0BB7>"
+collating-element <tagl_SHRI> from "<U0BB8><U0BCD><U0BB0><U0BC0>"
+
+reorder-after <U00DE>
+<U0BF3> % TAMIL SIGN DAY
+<U0BF4> % TAMIL SIGN MONTH
+<U0BF5> % TAMIL SIGN YEAR
+<U0BF6> % TAMIL SIGN DEBIT
+<U0BF7> % TAMIL SIGN CREDIT
+<U0BF8> % TAMIL SIGN AS ABOVE
+<U0BF9> % TAMIL SIGN RUPEE
+<U0BE6> % TAMIL DIGIT ZERO
+<U0BE7> % TAMIL DIGIT ONE
+<U0BE8> % TAMIL DIGIT TWO
+<U0BE9> % TAMIL DIGIT THREE
+<U0BEA> % TAMIL DIGIT FOUR
+<U0BEB> % TAMIL DIGIT FIVE
+<U0BEC> % TAMIL DIGIT SIX
+<U0BED> % TAMIL DIGIT SEVEN
+<U0BEE> % TAMIL DIGIT EIGHT
+<U0BEF> % TAMIL DIGIT NINE
+<U0BF0> % TAMIL NUMBER TEN
+<U0BF1> % TAMIL NUMBER ONE HUNDRED
+<U0BF2> % TAMIL NUMBER ONE THOUSAND
+<U0B85> % TAMIL LETTER A
+<U0B86> % TAMIL LETTER AA
+<U0B87> % TAMIL LETTER I
+<U0B88> % TAMIL LETTER II
+<U0B89> % TAMIL LETTER U
+<U0B8A> % TAMIL LETTER UU
+<U0B8E> % TAMIL LETTER E
+<U0B8F> % TAMIL LETTER EE
+<U0B90> % TAMIL LETTER AI
+<U0B92> % TAMIL LETTER O
+<U0B93> % TAMIL LETTER OO
+<U0B94> % TAMIL LETTER AU
+<U0B83> % TAMIL SIGN VISARGA (AYTHAM)
+<U0B95> % TAMIL LETTER K
+<U0B99> % TAMIL LETTER NG
+<U0B9A> % TAMIL LETTER C
+<U0B9E> % TAMIL LETTER NY
+<U0B9F> % TAMIL LETTER TT
+<U0BA3> % TAMIL LETTER NNN
+<U0BA4> % TAMIL LETTER T
+<U0BA8> % TAMIL LETTER N
+<U0BAA> % TAMIL LETTER P
+<U0BAE> % TAMIL LETTER M
+<U0BAF> % TAMIL LETTER Y
+<U0BB0> % TAMIL LETTER R
+<U0BB2> % TAMIL LETTER L
+<U0BB5> % TAMIL LETTER V
+<U0BB4> % TAMIL LETTER LLL
+<U0BB3> % TAMIL LETTER LL
+<U0BB1> % TAMIL LETTER RR
+<U0BA9> % TAMIL LETTER NN
+<U0B9C> % TAMIL LETTER JA
+<U0BB6> % TAMIL LETTER SHA
+<U0BB7> % TAMIL LETTER SSA
+<U0BB8> % TAMIL LETTER SA
+<U0BB9> % TAMIL LETTER HA
+<tagl_KSHA>
+<U0BCD> % TAMIL SIGN VIRAMA (PULLI)
+<U0BBE> % TAMIL VOWEL SIGN AA
+<U0BBF> % TAMIL VOWEL SIGN I
+<U0BC0> % TAMIL VOWEL SIGN II
+<U0BC1> % TAMIL VOWEL SIGN U
+<U0BC2> % TAMIL VOWEL SIGN UU
+<U0BC6> % TAMIL VOWEL SIGN E
+<U0BC7> % TAMIL VOWEL SIGN EE
+<U0BC8> % TAMIL VOWEL SIGN AI
+<U0BCA> % TAMIL VOWEL SIGN O
+<U0BCB> % TAMIL VOWEL SIGN OO
+<U0BCC> % TAMIL VOWEL SIGN AU
+<U0BD7> % TAMIL AU LENGTH MARK
+<tagl_SHRI> "<U0BB6><U0BCD><U0BB0><U0BC0>"
+<split_o>  <U0BCA>
+<split_oo> <U0BCB>
+<split_au> <U0BCC>
+reorder-end
+END LC_COLLATE 
 
 LC_TIME
 % This is the POSIX Locale definition for the LC_TIME category
@@ -85,9 +144,9 @@
 % for IBM Class for Unicode.
 %
 % Abbreviated weekday names (%a)
-abday       "<U0B9E>";"<U0BA4>";/
-            "<U0B9A>";"<U0BAA>";/
-            "<U0BB5>";"<U0BB5>";/
+abday       "<U0B9E><U0BBE>";"<U0BA4><U0BBF>";/
+            "<U0B9A><U0BC6>";"<U0BAA><U0BC1>";/
+            "<U0BB5><U0BBF>";"<U0BB5><U0BC6>";/
             "<U0B9A>"
 %
 % Full weekday names (%A)
@@ -97,20 +156,20 @@
             "<U0B9A><U0BA9><U0BBF>"
 %
 % Abbreviated month names (%b)
-abmon      
"<U0B9C><U0BA9><U0BB5><U0BB0><U0BBF>";"<U0BAA><U0BC6><U0BAA><U0BCD><U0BB0><U0BB5><U0BB0><U0BBF>";/
-           
"<U0BAE><U0BBE><U0BB0><U0BCD><U0B9A><U0BCD>";"<U0B8F><U0BAA><U0BCD><U0BB0><U0BB2><U0BCD>";/
+abmon       "<U0B9C><U0BA9>";"<U0BAA><U0BBF><U0BAA><U0BCD>";/
+            "<U0BAE><U0BBE><U0BB0><U0BCD>";"<U0B8F><U0BAA><U0BCD>";/
             "<U0BAE><U0BC7>";"<U0B9C><U0BC2><U0BA9><U0BCD>";/
-           
"<U0B9C><U0BC2><U0BB2><U0BC8>";"<U0B86><U0B95><U0BB8><U0BCD><U0B9F><U0BCD>";/
-           
"<U0B9A><U0BC6><U0BAA><U0BCD><U0B9F><U0BAE><U0BCD><U0BAA><U0BB0><U0BCD>";"<U0B85><U0B95><U0BCD><U0B9F><U0BCB><U0BAA><U0BB0><U0BCD>";/
-           
"<U0BA8><U0BB5><U0BAE><U0BCD><U0BAA><U0BB0><U0BCD>";"<U0B9F><U0BBF><U0B9A><U0BAE><U0BCD><U0BAA><U0BB0><U0BCD><U0072>"
+            "<U0B9C><U0BC2><U0BB2><U0BC8>";"<U0B86><U0B95>";/
+            "<U0B9A><U0BC6><U0BAA><U0BCD>";"<U0B85><U0B95><U0BCD>";/
+            "<U0BA8><U0BB5>";"<U0B9F><U0BBF><U0B9A>"
 %
 % Full month names (%B)
-mon        
"<U0B9C><U0BA9><U0BB5><U0BB0><U0BBF>";"<U0BAA><U0BC6><U0BAA><U0BCD><U0BB0><U0BB5><U0BB0><U0BBF>";/
+mon        
"<U0B9C><U0BA9><U0BB5><U0BB0><U0BBF>";"<U0BAA><U0BBF><U0BAA><U0BCD><U0BB0><U0BB5><U0BB0><U0BBF>";/
            
"<U0BAE><U0BBE><U0BB0><U0BCD><U0B9A><U0BCD>";"<U0B8F><U0BAA><U0BCD><U0BB0><U0BB2><U0BCD>";/
             "<U0BAE><U0BC7>";"<U0B9C><U0BC2><U0BA9><U0BCD>";/
            
"<U0B9C><U0BC2><U0BB2><U0BC8>";"<U0B86><U0B95><U0BB8><U0BCD><U0B9F><U0BCD>";/
            
"<U0B9A><U0BC6><U0BAA><U0BCD><U0B9F><U0BAE><U0BCD><U0BAA><U0BB0><U0BCD>";"<U0B85><U0B95><U0BCD><U0B9F><U0BCB><U0BAA><U0BB0><U0BCD>";/
-           
"<U0BA8><U0BB5><U0BAE><U0BCD><U0BAA><U0BB0><U0BCD>";"<U0B9F><U0BBF><U0B9A><U0BAE><U0BCD><U0BAA><U0BB0><U0BCD><U0072>"
+           
"<U0BA8><U0BB5><U0BAE><U0BCD><U0BAA><U0BB0><U0BCD>";"<U0B9F><U0BBF><U0B9A><U0BAE><U0BCD><U0BAA><U0BB0><U0BCD>"
 %
 % Equivalent of AM PM 
 am_pm       "<U0B95><U0BBE><U0BB2><U0BC8>";"<U0BAE><U0BBE><U0BB2><U0BC8>"
@@ -132,6 +191,43 @@
 %
 END LC_TIME
 
+LC_NUMERIC
+% This is the POSIX Locale definition for the LC_NUMERIC  category.
+%
+decimal_point          "<U002E>"
+thousands_sep          "<U002C>"
+grouping               3;2
+%
+END LC_NUMERIC
+
+
+
+LC_MONETARY
+% This is the POSIX Locale definition the LC_MONETARY category
+% generated by IBM Basic CountryPack Transformer.
+% These are generated based on XML base Locale defintion file 
+% for IBM Class for Unicode.
+%
+int_curr_symbol       "<U0049><U004E><U0052><U0020>"
+currency_symbol       "<U20A8>"
+mon_decimal_point     "<U002E>"
+mon_thousands_sep     "<U002C>"
+mon_grouping          3;2
+positive_sign         ""
+negative_sign         "<U002D>"
+int_frac_digits       2
+frac_digits           2
+p_cs_precedes         1
+p_sep_by_space        1
+n_cs_precedes         1
+n_sep_by_space        1
+p_sign_posn           1
+n_sign_posn           1
+%
+END LC_MONETARY
+
+
+
 
 LC_MESSAGES
 % This is the POSIX Locale definition for the LC_MESSAGES category
@@ -167,7 +263,6 @@
 % generated by IBM Basic CountryPack Transformer.
 height      297
 width       210
-
 END LC_PAPER
 
 
@@ -178,11 +273,10 @@
 % 
 name_fmt   
"<U0025><U0070><U0025><U0074><U0025><U0066><U0025><U0074><U0025><U0067>"
 name_gen    ""
-name_mr     "<U004D><U0072><U002E>"
-name_mrs    "<U004D><U0072><U0073><U002E>"
-name_miss   "<U004D><U0069><U0073><U0073><U002E>"
+name_mr     "<U0BA4><U0BBF><U0BB0><U0BC1><U0020>"
+name_mrs    "<U0BA4><U0BBF><U0BB0><U0BC1><U0BAE><U0BA4><U0BBF><U0020>"
+name_miss   "<U0B9A><U0BC6><U0BB2><U0BCD><U0BB5><U0BBF><U0020>"
 name_ms     "<U004D><U0073><U002E>"
-
 END LC_NAME
Comment 1 Thuraiappah Vaseeharan 2004-02-18 17:02:10 UTC
Motivation:

1. Define proper LC_COLLATE section for Tamil, so that programs like sort, uniq
etc. will sort in the order expected by native Tamil language users. The default
order in Unicode and ISO14651, which is the just the code point order, is *not*
the order expected by Tamil speakers.

References: 

* Issues in Indic Language Collation
http://www.unicode.org/notes/tn1/

* Alphabetic ordering according to Tamil Lexicon, Madras 1924-39:
http://www.uni-koeln.de/phil-fak/indologie/tamil/otl.html

2. Fix typos in day & month abbr, LC_NAME fields.
Comment 2 Thuraiappah Vaseeharan 2004-02-18 17:04:28 UTC
Created attachment 7 [details]
Patch to ta_IN locale file to add LC_COLLATE section
Comment 3 Petter Reinholdtsen 2004-02-24 15:33:27 UTC
*** Bug 27 has been marked as a duplicate of this bug. ***
Comment 4 Petter Reinholdtsen 2004-03-11 16:16:17 UTC
Can you attach a test for the collation order?  It should be a text file
with the lines the correct sorting order.  It is nice if the file
demonstrate (display) some of the problematic sorting issues.
Comment 5 Petter Reinholdtsen 2004-05-15 08:06:07 UTC
Created attachment 78 [details]
Patch to update ta_IN (sorting order, day/month names, lc_name).

The previous patch to not apply cleanly to the current glibc CVS.
Here is an improved patch which applies cleanly and only changes the
relevant parts of the file.

I still would like to hear from the original authors, but am starting
to understand that it might never happen.  It would be nice to have
a test file to use to check the sorting order.
Comment 6 Petter Reinholdtsen 2004-10-02 08:12:20 UTC
Created attachment 215 [details]
Patch to fix ta_IN locale.

I've submitted the patch to the libc-hacker mailing list, requesting
the glibc maintainers to commit it to CVS.
Comment 7 Sourceware Commits 2004-12-19 20:48:49 UTC
Subject: Bug 26

CVSROOT:	/cvs/glibc
Module name:	libc
Changes by:	aj@sources.redhat.com	2004-12-19 20:48:43

Modified files:
	localedata/locales: ta_IN 

Log message:
	[BZ #26]
	Correct sorting order.  Corrected day and month
	abbrevations.  Corrected name strings for mr., mrs. and miss.
	Patch from Thuraiappah Vaseeharan.

Patches:
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/localedata/locales/ta_IN.diff?cvsroot=glibc&r1=1.5&r2=1.6

Comment 8 Andreas Jaeger 2004-12-19 20:50:11 UTC
Patch submitted to CVS.