This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Patch 0/13] [BZ #14095] update collation data from Unicode / ISO 14651


Carlos O'Donell <carlos@redhat.com> さんはかきました:

> On 01/26/2018 02:51 AM, Mike FABIAN wrote:
>> 
>> This set of patches updates our
>> glibc/localedata/locales/iso14651_t1_common file to the latest
>> available version from ISO and adapts the collation rules of all
>> locales using “copy "iso14651_t1"” to the changes in the new file.
>> 
>> The ISO standard 14651:2016 is available here:
>
> What about ISO/IEC 14651:2016/Amd.1:2017?
>
> It looks like it updates things to Unicode 9.0?
>
> In particular ISO14651_2017_TABLE1_en.txt matches Amd.1:2017, and
> *not* the 2016 version.

I used ISO14651_2015_TABLE1_en.txt because I did not find
ISO14651_2017_TABLE1_en.txt. I’ll update to ISO14651_2017_TABLE1_en.txt
in the next version of my patch series.

>> ISO/IEC 14651:2016: https://www.iso.org/standard/68309.html
>> 
>> And a POSIX style LC_COLLATE file is downloadable from:
>> 
>> http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html
>> http://standards.iso.org/ittf/PubliclyAvailableStandards/c068309_ISO_IEC_14651_2016.zip
>> 
>> This .zip file contains a ISO14651_2017_TABLE1_en.txt which is in a
>> similar format as our current iso14651_t1_common and can be used as an
>> update.
>>
>
> To be clear, the text file is not in the above zip, it is in the associated
> "Eletronic inserts" zip file which is part of the published standard.
>
> http://standards.iso.org/ittf/PubliclyAvailableStandards/c068309_ISO_IEC_14651_2016_Electronic_inserts.zip
>
> With this additional zip file you can review the tabular data to make
> comparisons and review the patches.
>
>> That file is unfortunately up-to-date only with Unicode 8.0.0,
>> but that is already a huge improvement over what we have now.
>
> This doesn't seem correct given the data in Amd.1:2017:
> ~~~
> The current Common Template Table reflects the repertoire of
> characters of Unicode 9.0, included in
> ISO/IEC 10646:2014 plus its Amendments 1 and 2, plus 273 new
> characters that will be included in the
> fifth edition of ISO/IEC 10646.
> ~~~

Yes, it was Unicode 8.0.0 because I used the older file
ISO14651_2015_TABLE1_en.txt. I’ll update to the newer
ISO14651_2017_TABLE1_en.txt file.

>> Also, that file contained some errors which needed to be fixed.
>> Seems strange for a file release by ISO, but it really contained
>> some errors.
>> 
>> And as the names for most collation symbols have been changed, all the
>> collation rules of locales using “copy "iso14651_t1"” needed to be
>> updated.
>> 
>> While doing that, I made the collation rules of all locales I touched
>> agree with the CLDR collation rules. glibc has several locales which are
>> not in CLDR, for these I just adapted the existing rules.
>
> In summary:
>
> * Can we get clarification of exactly which standard we are update to?
>   Is it just ISO/IEC 14651:2016 or ISO/IEC 14651:2016/Amd.1:2017?

-- 
Mike FABIAN <mfabian@redhat.com>
睡眠不足はいい仕事の敵だ。


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]