This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [Patch 0/13] [BZ #14095] update collation data from Unicode / ISO 14651

From: Carlos O'Donell <carlos at redhat dot com>
To: Mike FABIAN <mfabian at redhat dot com>, Joseph Myers <joseph at codesourcery dot com>
Cc: libc-alpha at sourceware dot org
Date: Fri, 26 Jan 2018 09:34:51 -0800
Subject: Re: [Patch 0/13] [BZ #14095] update collation data from Unicode / ISO 14651
Authentication-results: sourceware.org; auth=none
References: <s9dy3kkopq0.fsf@taka.site> <alpine.DEB.2.20.1801261253010.14878@digraph.polyomino.org.uk> <s9d607on0oc.fsf@taka.site>

On 01/26/2018 06:40 AM, Mike FABIAN wrote:
> Joseph Myers <joseph@codesourcery.com> さんはかきました:
> 
>> On Fri, 26 Jan 2018, Mike FABIAN wrote:
>>
>>> [BZ #14095] - Review / update collation data from Unicode / ISO 14651
>>>
>>> Updating this file alone is not enough, there are problems in the new
>>> file which need to be fixed and the collation rules for many locales
>>> need to be adapted. This is done by the following patches.
>>>
>>> This update also fixes the problem that many characters are treated as
>>> identical when sorting because they were not yet in the old
>>> iso14651_t1_common file, see:
>>
>> To be clear: do you mean it fixes it *for the characters in the Unicode 
>> version supported by these updated collation data*?  Or globally for all 
>> characters including those not yet defined or too new for that collation 
>> data?
> 
> Yes, it fixes it only for the characters which are in this updated
> collation data, i.e. for all characters up to Unicode 8.0.0. All
> characters added after Unicode 8.0.0 or still undefined will still have
> that problem.
This is OK IMO, though as I finish C.UTF-8 in glibc 2.28 we may be able to
have code-point sorting for all such undefined elements if the locale uses
UTF-8 (one of the C.UTF-8 enhancements is to provide full UTF-8 coverage
of all code points to provide code-point sorting).

-- 
Cheers,
Carlos.

References:
- [Patch 0/13] [BZ #14095] update collation data from Unicode / ISO 14651
  - From: Mike FABIAN
- Re: [Patch 0/13] [BZ #14095] update collation data from Unicode / ISO 14651
  - From: Joseph Myers
- Re: [Patch 0/13] [BZ #14095] update collation data from Unicode / ISO 14651
  - From: Mike FABIAN

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]