[idea] Update ISO 14651 file in locales to the latest standard version

Carlos O'Donell carlos@redhat.com
Tue Nov 2 16:52:58 GMT 2021


On 10/10/21 12:07, Florian Weimer wrote:
> * Alexander Bantyev:
> 
>> The file localedef/locales/iso14651_t1_common is, as far as I can tell,
>> supposed to be taken from <https://standards.iso.org/iso-iec/14651>. 
>> However,
>> the version in glibc repository is quite old (from 2016, I think) and is
>> missing some new Unicode codepoints. There have been new editions to the
>> standard, the newest being edition 6 from 2020:
>> <https://standards.iso.org/iso-iec/14651/ed-6/en/ISO14651_2020_TABLE1_en.txt>
>>
>> Perhaps the file in the glibc repository can be updated to match the
>> latest standard?
> 
> I think it's scary to update this file because it alters the result of
> bracket patterns in regular expressions.  The file is no longer fully
> automatically generated, I think.  Implementing rational ranges where
> it counts in glibc would be one way forward here.
> 
> Cc:ing Mike and Carlos, who have more details.

(1) Where does glibc's ISO 14651 data come from?

We use ISO 14651 in glibc for collation weights.

We do not use ISO 14651 in glibc for collation element ordering (CEO).

(2) Is glibc's ISO 14651 data updated in an automated fashion?

No. Importing new ISO 14651 data is a manual and difficult process that involves
harmonizing with all existing locale and their collation tailorings. This is
difficult and requires reviewing the tailorings and harmoizning them with the
updates from ISO 14651.

(3) What about regexp ranges?

Regular expression ranges rely on "collation element ordering" (not weights)
and so after importing ISO 14651 updates we must update the element orders to
retain rational ranges for English language speaker expectations for ranges
e.g. [a-z], [A-Z], and [0-9].

(4) When was the ISO 14651 data last updated for glibc?

In 2018 we updated to ISO 14651 4th Edition which was harmonized with Unicode 9.0.0.

We have not updated to 5th or 6th Edition yet.

I've filed the following bug to track this:
Bug 28528 - Update to ISO 14651 6th Edition 2020.
https://sourceware.org/bugzilla/show_bug.cgi?id=28528

Hopefully this answers your questions.

-- 
Cheers,
Carlos.



More information about the Libc-help mailing list