This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Rational Ranges - Rafal and Mike's opinion? (Bug 23393).


Carlos O'Donell <carlos@redhat.com> さんはかきました:

> On 07/23/2018 11:10 AM, Florian Weimer wrote:
>> On 07/20/2018 11:56 PM, Carlos O'Donell wrote:
>>> v2
>>> - Fixed tr_TR by duplicating A-Z rational range.
>>> - Fixed tst-rxspender.
>>> - Fixed bug-regex17.
>>>
>>> Tell me how the new version does.
>> 
>> My tester likes it.  tr_TR.ISO-8859-9 is now fixed.  I added fnmatch
>> support, too, and initial results look good as well.
>
> OK, so we have the capability to deploy rational ranges.
>
> Florian,
>
> Should we do so in 2.28? Avoiding all possible problems in the future
> and making the ranges portable, rational, and safe from a security
> perspective?
>
> Rafal,
>
> As localedata maintainer what is your opinion of changing the meaning
> of [a-z], [A-Z], and [0-9] to be rational ranges for *all* locales
> which mean exactly the latin character sequences you would expect
> e.g. {a,b,c,d,e,f,g,h,i,j,k,l,n,m,o,p,q,r,s,t,u,v,w,x,y,z} for [a-z],
> [A-Z] likewise, and {0,1,2,3,4,5,6,7,8,9}?
>
> Mike,
>
> Same question to you.

I agree that rational ranges are much more useful.

I cannot imagine any use case for [a-z] matching aAbB...z and not Z.

One never knows what [a-z] would match if it uses the locale sort order,
it is just too confusing.

In the long run, I think implementing ranges by code points would be
the best solution and make updates of the iso14651_t1_common file easier
because we need to make less changes to the upstream version of that
file then.

But for 2.28 this cannot be done. Therefore, I think the solution
by Carlos is very good.

> For historical context in gawk:
> https://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html
>
> For context from POSIX:
> http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_chap09.html
> (see the section on "RE Bracket Expressions").
>
> Support for rational ranges would make [a-z], [A-Z], [0-9] and other subranges
> rational for all locales, and would no longer include mixed case, or accents.
>
> I'd like to year affirmatives from the localedata maintainers on this issue.
>
> Cheers,
> Carlos.

-- 
Mike FABIAN <mfabian@redhat.com>
睡眠不足はいい仕事の敵だ。


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]