This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [Patch 0/13] [BZ #14095] update collation data from Unicode / ISO 14651
- From: Mike FABIAN <mfabian at redhat dot com>
- To: Carlos O'Donell <carlos at redhat dot com>
- Cc: libc-alpha at sourceware dot org
- Date: Sat, 27 Jan 2018 10:03:08 +0100
- Subject: Re: [Patch 0/13] [BZ #14095] update collation data from Unicode / ISO 14651
- Authentication-results: sourceware.org; auth=none
- References: <s9d4ln8q4f0.fsf@taka.site> <942e88d2-6f16-ee2d-2db3-0473e8fd268b@redhat.com>
Carlos O'Donell <carlos@redhat.com> さんはかきました:
> On 01/26/2018 02:51 AM, Mike FABIAN wrote:
>>
>> This set of patches updates our
>> glibc/localedata/locales/iso14651_t1_common file to the latest
>> available version from ISO and adapts the collation rules of all
>> locales using “copy "iso14651_t1"” to the changes in the new file.
>>
>> The ISO standard 14651:2016 is available here:
>
> What about ISO/IEC 14651:2016/Amd.1:2017?
>
> It looks like it updates things to Unicode 9.0?
>
> In particular ISO14651_2017_TABLE1_en.txt matches Amd.1:2017, and
> *not* the 2016 version.
I used ISO14651_2015_TABLE1_en.txt because I did not find
ISO14651_2017_TABLE1_en.txt. I’ll update to ISO14651_2017_TABLE1_en.txt
in the next version of my patch series.
>> ISO/IEC 14651:2016: https://www.iso.org/standard/68309.html
>>
>> And a POSIX style LC_COLLATE file is downloadable from:
>>
>> http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html
>> http://standards.iso.org/ittf/PubliclyAvailableStandards/c068309_ISO_IEC_14651_2016.zip
>>
>> This .zip file contains a ISO14651_2017_TABLE1_en.txt which is in a
>> similar format as our current iso14651_t1_common and can be used as an
>> update.
>>
>
> To be clear, the text file is not in the above zip, it is in the associated
> "Eletronic inserts" zip file which is part of the published standard.
>
> http://standards.iso.org/ittf/PubliclyAvailableStandards/c068309_ISO_IEC_14651_2016_Electronic_inserts.zip
>
> With this additional zip file you can review the tabular data to make
> comparisons and review the patches.
>
>> That file is unfortunately up-to-date only with Unicode 8.0.0,
>> but that is already a huge improvement over what we have now.
>
> This doesn't seem correct given the data in Amd.1:2017:
> ~~~
> The current Common Template Table reflects the repertoire of
> characters of Unicode 9.0, included in
> ISO/IEC 10646:2014 plus its Amendments 1 and 2, plus 273 new
> characters that will be included in the
> fifth edition of ISO/IEC 10646.
> ~~~
Yes, it was Unicode 8.0.0 because I used the older file
ISO14651_2015_TABLE1_en.txt. I’ll update to the newer
ISO14651_2017_TABLE1_en.txt file.
>> Also, that file contained some errors which needed to be fixed.
>> Seems strange for a file release by ISO, but it really contained
>> some errors.
>>
>> And as the names for most collation symbols have been changed, all the
>> collation rules of locales using “copy "iso14651_t1"” needed to be
>> updated.
>>
>> While doing that, I made the collation rules of all locales I touched
>> agree with the CLDR collation rules. glibc has several locales which are
>> not in CLDR, for these I just adapted the existing rules.
>
> In summary:
>
> * Can we get clarification of exactly which standard we are update to?
> Is it just ISO/IEC 14651:2016 or ISO/IEC 14651:2016/Amd.1:2017?
--
Mike FABIAN <mfabian@redhat.com>
睡眠不足はいい仕事の敵だ。