Bug 16905 - hanzi: new collation
Summary: hanzi: new collation
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: localedata (show other bugs)
Version: unspecified
: P2 enhancement
Target Milestone: 2.27
Assignee: Mike FABIAN
URL:
Keywords:
Depends on:
Blocks: 17563
  Show dependency treegraph
 
Reported: 2014-05-05 07:53 UTC by Wei-Lun Chao
Modified: 2017-08-10 13:13 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
hanzi collation by stroke (738.08 KB, text/plain)
2014-05-05 07:53 UTC, Wei-Lun Chao
Details
hanzi collation by stroke (738.17 KB, text/plain)
2014-10-13 08:40 UTC, Wei-Lun Chao
Details
hanzi collation by stroke (738.17 KB, text/plain)
2017-07-20 02:22 UTC, Wei-Lun Chao
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Wei-Lun Chao 2014-05-05 07:53:35 UTC
Created attachment 7586 [details]
hanzi collation by stroke

Please find attached the collation for hanzi(chinese characters) to be considered for inclusion in glibc.

The order is based on the stroke data from http://www.cns11643.gov.tw/

There is already iso14651_t1_pinyin in glibc, so I just named this file as iso14651_t1_stroke which may not be a proper name.
Comment 1 Wei-Lun Chao 2014-05-20 04:23:02 UTC
I have realized, according to bug 14095, that this file may have nothing to do with iso14651.

The license of data from http://www.cns11643.gov.tw/ looks BSD-like, so I have forked a project https://github.com/bluebat/cfs11643

Should this collation be named as cfs11643_stroke ?
Comment 2 Wei-Lun Chao 2014-10-13 08:40:38 UTC
Created attachment 7829 [details]
hanzi collation by stroke

Collation updated.
Comment 3 Wei-Lun Chao 2014-11-07 03:41:34 UTC
URL for the attachment:
https://github.com/bluebat/cfs11643/releases/download/v0/cfs11643_stroke.gz
Comment 4 Wei-Lun Chao 2015-08-21 09:35:40 UTC
http://data.gov.tw/node/5961
Upstream license changed to "CC Attribution 4.0 international" compatible:
http://data.gov.tw/license
Comment 5 Wei-Lun Chao 2016-08-02 17:45:48 UTC
The english version of the license is at:
http://data.gov.tw/license#eng
Comment 6 Wei-Lun Chao 2017-07-20 02:22:39 UTC
Created attachment 10275 [details]
hanzi collation by stroke

Collation updated
Comment 7 Wei-Lun Chao 2017-08-10 11:44:16 UTC
Making a new version...
Comment 8 Sourceware Commits 2017-08-10 11:49:56 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  bd80111ed9cb93b2d56720dcd1d1f259616c27ae (commit)
       via  4169825556bcc23ced731e711be91819465d4a83 (commit)
       via  38dbcacb606f70ad0a35fbcacb6f3cbff5f34d94 (commit)
      from  68dc02d1dcbfb37ee22327d6a3c43f528d593035 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=bd80111ed9cb93b2d56720dcd1d1f259616c27ae

commit bd80111ed9cb93b2d56720dcd1d1f259616c27ae
Author: Mike FABIAN <mfabian@redhat.com>
Date:   Thu Aug 10 12:16:29 2017 +0200

    Fix stdlib/tst-strfmon_l.c test case to agree with the changes in Indian monetary formatting
    
    The test cases should expose non-standard grouping and the trailing
    space after the currency sign. After the changes to the Indian
    monetary formatting, the Indian formatting still shows the
    non-standard grouping. To test the trailing space after the currency
    sign I chose the hr_HR locale.
    
    See:
    
        commit 82b3124268bec0609b337dd993e771c93e44cbf2
        Author: Akhilesh Kumar <akhilesh.k@samsung.com>
    
            Remove redundant data for LC_MONETARY for Indian locales

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=4169825556bcc23ced731e711be91819465d4a83

commit 4169825556bcc23ced731e711be91819465d4a83
Author: Akhilesh Kumar <akhilesh.k@samsung.com>
Date:   Wed Aug 9 18:27:14 2017 +0530

    Remove redundant data for LC_MONETARY for Indian locales
    
    	Reference is taken from
    	https://en.wikipedia.org/wiki/Indian_numbering_system
    	https://en.wikipedia.org/wiki/Indian_rupee
    
    	CLDR has the currency format pattern “¤#,##,##0.00”.
    
    	[BZ #21836]
    	* locales/ar_IN (LC_MONETARY) : copy "hi_IN"
    	* locales/as_IN (LC_MONETARY) : copy "hi_IN"
    	* locales/bhb_IN (LC_MONETARY): copy "hi_IN"
    	* locales/bn_IN (LC_MONETARY) : copy "hi_IN"
    	* locales/en_IN (LC_MONETARY) : copy "hi_IN"
    	* locales/gu_IN (LC_MONETARY) : copy "hi_IN"
    	* locales/hi_IN (LC_MONETARY) : Fix mon_grouping,
    	p_sep_by_space and n_sep_by_space
    	* locales/kn_IN (LC_MONETARY) : copy "hi_IN"
    	* locales/kok_IN(LC_MONETARY) : copy "hi_IN"
    	* locales/ks_IN (LC_MONETARY) : copy "hi_IN"
    	* locales/ml_IN (LC_MONETARY) : copy "hi_IN"
    	* locales/mr_IN (LC_MONETARY) : copy "hi_IN"
    	* locales/or_IN (LC_MONETARY) : copy "hi_IN"
    	* locales/pa_IN (LC_MONETARY) : copy "hi_IN"
    	* locales/sa_IN (LC_MONETARY) : copy "hi_IN"
    	* locales/sd_IN (LC_MONETARY) : copy "hi_IN"
    	* locales/ta_IN (LC_MONETARY) : copy "hi_IN"
    	* locales/tcy_IN(LC_MONETARY) : copy "hi_IN"
    	* locales/te_IN (LC_MONETARY) : copy "hi_IN"
    	* locales/ur_IN (LC_MONETARY) : copy "hi_IN"

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=38dbcacb606f70ad0a35fbcacb6f3cbff5f34d94

commit 38dbcacb606f70ad0a35fbcacb6f3cbff5f34d94
Author: Wei-Lun Chao <bluebat@member.fsf.org>
Date:   Wed Aug 9 12:19:44 2017 +0200

    cmn_TW: add hanzi collation
    
    	[BZ #17563]
    	[BZ #16905]
    	* locales/cmn_TW (LC_COLLATE): Use cns11643_stroke file for sorting.
    	* locales/cmn_TW (LC_TIME): Improve time and date formats.
    	* locales/cmn_TW (LC_MESSAGES): Add  yesstr and nostr.
    	* locales/cns11643_stroke: New file, stroke count collation for
    	traditional Chinese.

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                          |    7 +
 localedata/ChangeLog               |   41 +
 localedata/locales/ar_IN           |   22 +-
 localedata/locales/as_IN           |   22 +-
 localedata/locales/bhb_IN          |    2 +-
 localedata/locales/bn_IN           |   22 +-
 localedata/locales/cmn_TW          |   44 +-
 localedata/locales/cns11643_stroke |70754 ++++++++++++++++++++++++++++++++++++
 localedata/locales/en_IN           |   22 +-
 localedata/locales/gu_IN           |   21 +-
 localedata/locales/hi_IN           |   16 +-
 localedata/locales/kn_IN           |   21 +-
 localedata/locales/kok_IN          |   22 +-
 localedata/locales/ks_IN           |   23 +-
 localedata/locales/ml_IN           |   25 +-
 localedata/locales/mr_IN           |   22 +-
 localedata/locales/or_IN           |   22 +-
 localedata/locales/pa_IN           |   18 +-
 localedata/locales/sa_IN           |   21 +-
 localedata/locales/sd_IN           |   22 +-
 localedata/locales/ta_IN           |   22 +-
 localedata/locales/tcy_IN          |    2 +-
 localedata/locales/te_IN           |   22 +-
 localedata/locales/ur_IN           |    2 +-
 stdlib/Makefile                    |    2 +-
 stdlib/tst-strfmon_l.c             |   20 +-
 26 files changed, 70868 insertions(+), 371 deletions(-)
 create mode 100644 localedata/locales/cns11643_stroke
Comment 9 Wei-Lun Chao 2017-08-10 12:07:45 UTC
Oh! Thanks anyway. Now I have more time to update...
Comment 10 Mike FABIAN 2017-08-10 13:13:43 UTC
FIXED.