This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Keep expected behaviour for [a-z] and [A-z] (Bug 23393).


Carlos O'Donell wrote:

> In commit 9479b6d5e08eacce06c6ab60abc9b2f4eb8b71e4 we updated all of
> the collation data to harmonize with the new version of ISO 14651
> which is derived from Unicode 9.0.0.  This collation update brought
> with it some changes to locales which were not desirable by some
> users, in particular it altered the meaning of the
> locale-dependent-range regular expression, namely [a-z] and [A-Z], and
> for en_US it caused uppercase letters to be matched by [a-z] for the
> first time.

The Debian system where it is most convenient for me to test has
Debian's libc6 package, version 2.24-12.  [a-z] matches uppercase
letters.  I've always considered that undesirable but I'm confused
about the described regression.  Did one of Debian's patches to
localedata cause it to pick up the regression early (by which I mean,
more than 5 years ago)?

> In glibc we implement the requirement of ISO POSIX-2:1993 and use
> collation element order (CEO) to construct the range expression, the
> API internally is __collseq_table_lookup().  The fact that we use CEO
> and also have 4-level weights on each collation rule means that we can
> in practice reorder the collation rules in iso14651_t1_common (the new
> data) to provide consistent range expression resolution *and* the
> weights should maintain the expected total order.
> * Adds new test data for which exercises
>   strcoll* and strxfrm* and ensures the ISO 14651 collation remains.

Cool!  Checking my understanding: does this mean that if I have files


that with this patch,

	echo [a-z]*

would no longer match MMM, and

	ls | sort

would continue to sort in the order lll < MMM < nnn?

I wish we had done it 10 years ago. ;-)  Thanks for getting it done.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]