This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Patch 0/13] [BZ #14095] update collation data from Unicode / ISO 14651


On 01/29/2018 09:42 AM, Joseph Myers wrote:
> On Mon, 29 Jan 2018, Carlos O'Donell wrote:
> 
>> * Get automation scripts from ISO 14651 group to process Unicode data 
>>   into ISO 14651 format data.
> 
> - Hopefully under a free software license such as the Unicode, Inc. 
> License Agreement for Data Files and Software.
> 
> Ultimately the point is to have correct Unicode collation - and if the 
> overall effect of the collation definitions in glibc is as intended, those 
> definitions don't need to be textually close to those from ISO 14651, and 
> the generators don't need to be the same, if there are different ways to 
> achieve the same resulting ordering.
 
Ultimately I think the goal of the project should be to harmonize as much
as possible with Unicode, CLDR, and ISO 14651 etc. This harmonization includes
collation, but only in so far as we *can* harmonize with Unicode.

Mike and I have talked about this on-and-off over the years, and we don't know
if the POSIX collation rules are semantically sufficient to match the Unicode
Collation Algorithm rules, particularly when it comes to complex Asian collations.
We don't know if glibc can actually sort all Japanese symbols correctly, but we
will endeavour to try and harmonize collation up to the point where we document
the collation failings.

Collation is certainly the most difficult update for glibc. The recent test cases
that Mike adds with the ISO 14651 update make a huge difference in providing
stability guarantees and rationale for the verification of correct sorting.

-- 
Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]