- Fixed day and month abbr and LC_NAME <sivaraj_d@hotmail.com> - Added LC_COLLATE section <t_vasee@yahoo.com> --- ta_IN 2001-02-07 07:33:14.000000000 -0600 +++ /home/vasee/ta_IN 2004-02-18 00:00:36.000000000 -0600 @@ -3,6 +3,8 @@ % Tamil language locale for India. % Contributed by Kentaroh Noji <knoji@jp.ibm.com> and % Tetsuji Orita <orita@jp.ibm.com>. +% Fixed day and month abbr & LC_NAME <sivaraj_d@hotmail.com> +% Added Madras Tamil Lexicon Collation Order: T. Vaseeharan <t_vasee@yahoo.com> LC_IDENTIFICATION title "Tamil language locale for India" @@ -28,6 +30,7 @@ category "ta_IN:2000";LC_NAME category "ta_IN:2000";LC_ADDRESS category "ta_IN:2000";LC_TELEPHONE +category "ta_IN:2000";LC_MEASUREMENT END LC_IDENTIFICATION @@ -36,47 +39,103 @@ END LC_CTYPE LC_COLLATE - -% Copy the template from ISO/IEC 14651 copy "iso14651_t1" -END LC_COLLATE - - -LC_MONETARY -% This is the POSIX Locale definition the LC_MONETARY category -% generated by IBM Basic CountryPack Transformer. -% These are generated based on XML base Locale defintion file -% for IBM Class for Unicode. -% -int_curr_symbol "<U0049><U004E><U0052><U0020>" -currency_symbol "<U20A8>" -mon_decimal_point "<U002E>" -mon_thousands_sep "<U002C>" -mon_grouping 3;2 -positive_sign "" -negative_sign "<U002D>" -int_frac_digits 2 -frac_digits 2 -p_cs_precedes 1 -p_sep_by_space 1 -n_cs_precedes 1 -n_sep_by_space 1 -p_sign_posn 1 -n_sign_posn 1 -% -END LC_MONETARY - - -LC_NUMERIC -% This is the POSIX Locale definition for the LC_NUMERIC category. -% -decimal_point "<U002E>" -thousands_sep "<U002C>" -grouping 3;2 -% -END LC_NUMERIC - +% Tamil Collation Order as defined in The Madras Tamil Lexicon +% Ref: http://www.uni-koeln.de/phil-fak/indologie/tamil/otl.html +% Contact: T. Vaseehran <t_vasee@yahoo.com> +% Last Updated: Feb. 12, 2004 +% ChangeLog: +% - Added split forms of o, oo, au +% - Moved Tamil Symbols above numbers +% - Added TAMIL LETTER SHA (U0BB6) +% Ref: http://wwwold.dkuug.dk/JTC1/SC2/WG2/docs/n2617 +% : http://wwwold.dkuug.dk/JTC1/SC2/WG2/docs/n2618 +% Initial version: Feb. 10, 2004. + +collating-element <split_o> from "<U0BC6><U0BBE>" +collating-element <split_oo> from "<U0BC7><U0BBE>" +collating-element <split_au> from "<U0BC6><U0BD7>" +collating-element <tagl_KSHA> from "<U0B95><U0BCD><U0BB7>" +collating-element <tagl_SHRI> from "<U0BB8><U0BCD><U0BB0><U0BC0>" + +reorder-after <U00DE> +<U0BF3> % TAMIL SIGN DAY +<U0BF4> % TAMIL SIGN MONTH +<U0BF5> % TAMIL SIGN YEAR +<U0BF6> % TAMIL SIGN DEBIT +<U0BF7> % TAMIL SIGN CREDIT +<U0BF8> % TAMIL SIGN AS ABOVE +<U0BF9> % TAMIL SIGN RUPEE +<U0BE6> % TAMIL DIGIT ZERO +<U0BE7> % TAMIL DIGIT ONE +<U0BE8> % TAMIL DIGIT TWO +<U0BE9> % TAMIL DIGIT THREE +<U0BEA> % TAMIL DIGIT FOUR +<U0BEB> % TAMIL DIGIT FIVE +<U0BEC> % TAMIL DIGIT SIX +<U0BED> % TAMIL DIGIT SEVEN +<U0BEE> % TAMIL DIGIT EIGHT +<U0BEF> % TAMIL DIGIT NINE +<U0BF0> % TAMIL NUMBER TEN +<U0BF1> % TAMIL NUMBER ONE HUNDRED +<U0BF2> % TAMIL NUMBER ONE THOUSAND +<U0B85> % TAMIL LETTER A +<U0B86> % TAMIL LETTER AA +<U0B87> % TAMIL LETTER I +<U0B88> % TAMIL LETTER II +<U0B89> % TAMIL LETTER U +<U0B8A> % TAMIL LETTER UU +<U0B8E> % TAMIL LETTER E +<U0B8F> % TAMIL LETTER EE +<U0B90> % TAMIL LETTER AI +<U0B92> % TAMIL LETTER O +<U0B93> % TAMIL LETTER OO +<U0B94> % TAMIL LETTER AU +<U0B83> % TAMIL SIGN VISARGA (AYTHAM) +<U0B95> % TAMIL LETTER K +<U0B99> % TAMIL LETTER NG +<U0B9A> % TAMIL LETTER C +<U0B9E> % TAMIL LETTER NY +<U0B9F> % TAMIL LETTER TT +<U0BA3> % TAMIL LETTER NNN +<U0BA4> % TAMIL LETTER T +<U0BA8> % TAMIL LETTER N +<U0BAA> % TAMIL LETTER P +<U0BAE> % TAMIL LETTER M +<U0BAF> % TAMIL LETTER Y +<U0BB0> % TAMIL LETTER R +<U0BB2> % TAMIL LETTER L +<U0BB5> % TAMIL LETTER V +<U0BB4> % TAMIL LETTER LLL +<U0BB3> % TAMIL LETTER LL +<U0BB1> % TAMIL LETTER RR +<U0BA9> % TAMIL LETTER NN +<U0B9C> % TAMIL LETTER JA +<U0BB6> % TAMIL LETTER SHA +<U0BB7> % TAMIL LETTER SSA +<U0BB8> % TAMIL LETTER SA +<U0BB9> % TAMIL LETTER HA +<tagl_KSHA> +<U0BCD> % TAMIL SIGN VIRAMA (PULLI) +<U0BBE> % TAMIL VOWEL SIGN AA +<U0BBF> % TAMIL VOWEL SIGN I +<U0BC0> % TAMIL VOWEL SIGN II +<U0BC1> % TAMIL VOWEL SIGN U +<U0BC2> % TAMIL VOWEL SIGN UU +<U0BC6> % TAMIL VOWEL SIGN E +<U0BC7> % TAMIL VOWEL SIGN EE +<U0BC8> % TAMIL VOWEL SIGN AI +<U0BCA> % TAMIL VOWEL SIGN O +<U0BCB> % TAMIL VOWEL SIGN OO +<U0BCC> % TAMIL VOWEL SIGN AU +<U0BD7> % TAMIL AU LENGTH MARK +<tagl_SHRI> "<U0BB6><U0BCD><U0BB0><U0BC0>" +<split_o> <U0BCA> +<split_oo> <U0BCB> +<split_au> <U0BCC> +reorder-end +END LC_COLLATE LC_TIME % This is the POSIX Locale definition for the LC_TIME category @@ -85,9 +144,9 @@ % for IBM Class for Unicode. % % Abbreviated weekday names (%a) -abday "<U0B9E>";"<U0BA4>";/ - "<U0B9A>";"<U0BAA>";/ - "<U0BB5>";"<U0BB5>";/ +abday "<U0B9E><U0BBE>";"<U0BA4><U0BBF>";/ + "<U0B9A><U0BC6>";"<U0BAA><U0BC1>";/ + "<U0BB5><U0BBF>";"<U0BB5><U0BC6>";/ "<U0B9A>" % % Full weekday names (%A) @@ -97,20 +156,20 @@ "<U0B9A><U0BA9><U0BBF>" % % Abbreviated month names (%b) -abmon "<U0B9C><U0BA9><U0BB5><U0BB0><U0BBF>";"<U0BAA><U0BC6><U0BAA><U0BCD><U0BB0><U0BB5><U0BB0><U0BBF>";/ - "<U0BAE><U0BBE><U0BB0><U0BCD><U0B9A><U0BCD>";"<U0B8F><U0BAA><U0BCD><U0BB0><U0BB2><U0BCD>";/ +abmon "<U0B9C><U0BA9>";"<U0BAA><U0BBF><U0BAA><U0BCD>";/ + "<U0BAE><U0BBE><U0BB0><U0BCD>";"<U0B8F><U0BAA><U0BCD>";/ "<U0BAE><U0BC7>";"<U0B9C><U0BC2><U0BA9><U0BCD>";/ - "<U0B9C><U0BC2><U0BB2><U0BC8>";"<U0B86><U0B95><U0BB8><U0BCD><U0B9F><U0BCD>";/ - "<U0B9A><U0BC6><U0BAA><U0BCD><U0B9F><U0BAE><U0BCD><U0BAA><U0BB0><U0BCD>";"<U0B85><U0B95><U0BCD><U0B9F><U0BCB><U0BAA><U0BB0><U0BCD>";/ - "<U0BA8><U0BB5><U0BAE><U0BCD><U0BAA><U0BB0><U0BCD>";"<U0B9F><U0BBF><U0B9A><U0BAE><U0BCD><U0BAA><U0BB0><U0BCD><U0072>" + "<U0B9C><U0BC2><U0BB2><U0BC8>";"<U0B86><U0B95>";/ + "<U0B9A><U0BC6><U0BAA><U0BCD>";"<U0B85><U0B95><U0BCD>";/ + "<U0BA8><U0BB5>";"<U0B9F><U0BBF><U0B9A>" % % Full month names (%B) -mon "<U0B9C><U0BA9><U0BB5><U0BB0><U0BBF>";"<U0BAA><U0BC6><U0BAA><U0BCD><U0BB0><U0BB5><U0BB0><U0BBF>";/ +mon "<U0B9C><U0BA9><U0BB5><U0BB0><U0BBF>";"<U0BAA><U0BBF><U0BAA><U0BCD><U0BB0><U0BB5><U0BB0><U0BBF>";/ "<U0BAE><U0BBE><U0BB0><U0BCD><U0B9A><U0BCD>";"<U0B8F><U0BAA><U0BCD><U0BB0><U0BB2><U0BCD>";/ "<U0BAE><U0BC7>";"<U0B9C><U0BC2><U0BA9><U0BCD>";/ "<U0B9C><U0BC2><U0BB2><U0BC8>";"<U0B86><U0B95><U0BB8><U0BCD><U0B9F><U0BCD>";/ "<U0B9A><U0BC6><U0BAA><U0BCD><U0B9F><U0BAE><U0BCD><U0BAA><U0BB0><U0BCD>";"<U0B85><U0B95><U0BCD><U0B9F><U0BCB><U0BAA><U0BB0><U0BCD>";/ - "<U0BA8><U0BB5><U0BAE><U0BCD><U0BAA><U0BB0><U0BCD>";"<U0B9F><U0BBF><U0B9A><U0BAE><U0BCD><U0BAA><U0BB0><U0BCD><U0072>" + "<U0BA8><U0BB5><U0BAE><U0BCD><U0BAA><U0BB0><U0BCD>";"<U0B9F><U0BBF><U0B9A><U0BAE><U0BCD><U0BAA><U0BB0><U0BCD>" % % Equivalent of AM PM am_pm "<U0B95><U0BBE><U0BB2><U0BC8>";"<U0BAE><U0BBE><U0BB2><U0BC8>" @@ -132,6 +191,43 @@ % END LC_TIME +LC_NUMERIC +% This is the POSIX Locale definition for the LC_NUMERIC category. +% +decimal_point "<U002E>" +thousands_sep "<U002C>" +grouping 3;2 +% +END LC_NUMERIC + + + +LC_MONETARY +% This is the POSIX Locale definition the LC_MONETARY category +% generated by IBM Basic CountryPack Transformer. +% These are generated based on XML base Locale defintion file +% for IBM Class for Unicode. +% +int_curr_symbol "<U0049><U004E><U0052><U0020>" +currency_symbol "<U20A8>" +mon_decimal_point "<U002E>" +mon_thousands_sep "<U002C>" +mon_grouping 3;2 +positive_sign "" +negative_sign "<U002D>" +int_frac_digits 2 +frac_digits 2 +p_cs_precedes 1 +p_sep_by_space 1 +n_cs_precedes 1 +n_sep_by_space 1 +p_sign_posn 1 +n_sign_posn 1 +% +END LC_MONETARY + + + LC_MESSAGES % This is the POSIX Locale definition for the LC_MESSAGES category @@ -167,7 +263,6 @@ % generated by IBM Basic CountryPack Transformer. height 297 width 210 - END LC_PAPER @@ -178,11 +273,10 @@ % name_fmt "<U0025><U0070><U0025><U0074><U0025><U0066><U0025><U0074><U0025><U0067>" name_gen "" -name_mr "<U004D><U0072><U002E>" -name_mrs "<U004D><U0072><U0073><U002E>" -name_miss "<U004D><U0069><U0073><U0073><U002E>" +name_mr "<U0BA4><U0BBF><U0BB0><U0BC1><U0020>" +name_mrs "<U0BA4><U0BBF><U0BB0><U0BC1><U0BAE><U0BA4><U0BBF><U0020>" +name_miss "<U0B9A><U0BC6><U0BB2><U0BCD><U0BB5><U0BBF><U0020>" name_ms "<U004D><U0073><U002E>" - END LC_NAME
Motivation: 1. Define proper LC_COLLATE section for Tamil, so that programs like sort, uniq etc. will sort in the order expected by native Tamil language users. The default order in Unicode and ISO14651, which is the just the code point order, is *not* the order expected by Tamil speakers. References: * Issues in Indic Language Collation http://www.unicode.org/notes/tn1/ * Alphabetic ordering according to Tamil Lexicon, Madras 1924-39: http://www.uni-koeln.de/phil-fak/indologie/tamil/otl.html 2. Fix typos in day & month abbr, LC_NAME fields.
Created attachment 7 [details] Patch to ta_IN locale file to add LC_COLLATE section
*** Bug 27 has been marked as a duplicate of this bug. ***
Can you attach a test for the collation order? It should be a text file with the lines the correct sorting order. It is nice if the file demonstrate (display) some of the problematic sorting issues.
Created attachment 78 [details] Patch to update ta_IN (sorting order, day/month names, lc_name). The previous patch to not apply cleanly to the current glibc CVS. Here is an improved patch which applies cleanly and only changes the relevant parts of the file. I still would like to hear from the original authors, but am starting to understand that it might never happen. It would be nice to have a test file to use to check the sorting order.
Created attachment 215 [details] Patch to fix ta_IN locale. I've submitted the patch to the libc-hacker mailing list, requesting the glibc maintainers to commit it to CVS.
Subject: Bug 26 CVSROOT: /cvs/glibc Module name: libc Changes by: aj@sources.redhat.com 2004-12-19 20:48:43 Modified files: localedata/locales: ta_IN Log message: [BZ #26] Correct sorting order. Corrected day and month abbrevations. Corrected name strings for mr., mrs. and miss. Patch from Thuraiappah Vaseeharan. Patches: http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/localedata/locales/ta_IN.diff?cvsroot=glibc&r1=1.5&r2=1.6
Patch submitted to CVS.