This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCHv3] Expected behaviour for a-z, A-Z, and 0-9 (Bug 23393).


On 07/23/2018 11:10 AM, Florian Weimer wrote:
> On 07/20/2018 11:56 PM, Carlos O'Donell wrote:
>> v2
>> - Fixed tr_TR by duplicating A-Z rational range.
>> - Fixed tst-rxspender.
>> - Fixed bug-regex17.
>>
>> Tell me how the new version does.
> 
> My tester likes it.  tr_TR.ISO-8859-9 is now fixed.  I added fnmatch
> support, too, and initial results look good as well.

OK, here is v3.

~~~ NEWS ~~
* The GNU C Library now uses rational ranges for regular expression
  matching of ranges that are within a-z, A-Z, and 0-9 for all
  locales.  This means that the range [a-c] will no longer match
  accented letter a's and will only match exactly a, b, and c. Likewise
  [0-9] will only include the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, and
  no other characters.  Rational ranges have been implemented by
  several other GNU projects to provide straight forward rules for
  regular expression ranges and to make them portable across locales.
  The current rational ranges are implemented using collation element
  ordering, which may yield unexpected results if the range includes
  accented characters e.g. [a-ñ], since such a range will include a-z
  since ñ comes after the rational range in collation element order.
  In the future the library may implement full rational ranges covering
  all characters by using Unicode code point ordering which will make
  the sequences faster to match and more portable.
~~~

We have approval from Mike and Rafal, the two localedata subsystem
maintainers.

This solution matches what you and Rich Felker both thinks is the
correct solution.

So for 2.28 we would use rational ranges for a-z, A-Z, and 0-9, until
we can implement code point ranges.

v3
- Merged lowercase/uppercase deinterlacing.
- Added NEWS entry.

Please run this through your checker, and ACK this for 2.28 and I'll
commit.

Attaching it as swbz23393v3.tar.gz to avoid spam rejection.

Cheers,
Carlos.

Attachment: swbz23393v3.tar.gz
Description: application/gzip


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]