This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [Patch v4 6/14] [BZ #14095] update collation data from Unicode / ISO 14651
- From: Carlos O'Donell <carlos at redhat dot com>
- To: Mike FABIAN <mfabian at redhat dot com>, libc-alpha at sourceware dot org
- Cc: "Dmitry V. Levin" <ldv at altlinux dot org>
- Date: Mon, 26 Feb 2018 10:14:15 -0800
- Subject: Re: [Patch v4 6/14] [BZ #14095] update collation data from Unicode / ISO 14651
- Authentication-results: sourceware.org; auth=none
- References: <s9dzi3vaiw9.fsf@taka.site>
On 02/26/2018 07:08 AM, Mike FABIAN wrote:
> From b517dae2da9fa61acd31053d3bf150141f20611e Mon Sep 17 00:00:00 2001
> From: Mike FABIAN <mfabian@redhat.com>
> Date: Wed, 31 Jan 2018 06:18:47 +0100
> Subject: [PATCH 06/14] iso14651_t1_common: make the fourth level the codepoint
> for characters which are ignorable on all 4 levels
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> Entries for characters which have “IGNORE†on all 4 levels like:
>
> <U0001> IGNORE;IGNORE;IGNORE;IGNORE % START OF HEADING (in ISO 6429)
>
> are changed into:
>
> <U0001> IGNORE;IGNORE;IGNORE;<U0001> % START OF HEADING (in ISO 6429)
>
> i.e. putting the code point of the character into the fourth level
> instead of “IGNOREâ€. Without that change, all such characters
> would compare equal which would make a wcscoll test case fail.
> It is better to have a clearly defined sort order even for characters
> like this so it is good to use the code point as a tie-break.
>
> * localedata/locales/iso14651_t1_common: Use the code point of a character
> in the fourth collation level instead of IGNORE for all entries which
> have IGNORE on all 4 levels.
LGTM.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
--
Cheers,
Carlos.