This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug regex/23393] [0-9] matches ¼ ١ 2 〣 and others, but not 9 (and other nines)


https://sourceware.org/bugzilla/show_bug.cgi?id=23393

--- Comment #9 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Mike FABIAN from comment #8)
> (In reply to Florian Weimer from comment #4)
> > I still think that it's very hard to make the case that the fact that [0-9]
> > matches 8 but not 9 is the right behavior.
> 
> Doesn’t your first comment show that 9 is included in the match?

Florian is talking about having included FULLWIDTH DIGIT EIGHT but *not*
including FULLWIDTH DIGIT NINE, for en_US.UTF-8.

Why doesn't it include FULLWIDTH DIGIT NINE?

And the answer is because the regex says to stop at "9", and there are lots
more nine's after 9 that don't get included. To capture all the nine's you'd
have to stop at 10 (which doesn't exist) or the last known "9" (which is locale
dependent).

Therefore I think it's still the right behaviour.

We really should be using [:lower:].

>  expected: "0123456789"
>   actual:   "0123456789²³¹¼ ... and lots of other stuff but 9 was there ...

Yes, that's fine, and that includes the 9, and so is OK.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]