This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug regex/23393] [0-9] matches ¼ ١ 2 〣 and others, but not 9 (and other nines)
- From: "carlos at redhat dot com" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sourceware dot org
- Date: Wed, 18 Jul 2018 13:31:26 +0000
- Subject: [Bug regex/23393] [0-9] matches ¼ ١ 2 〣 and others, but not 9 (and other nines)
- Auto-submitted: auto-generated
- References: <bug-23393-131@http.sourceware.org/bugzilla/>
https://sourceware.org/bugzilla/show_bug.cgi?id=23393
--- Comment #9 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Mike FABIAN from comment #8)
> (In reply to Florian Weimer from comment #4)
> > I still think that it's very hard to make the case that the fact that [0-9]
> > matches 8 but not 9 is the right behavior.
>
> Doesn’t your first comment show that 9 is included in the match?
Florian is talking about having included FULLWIDTH DIGIT EIGHT but *not*
including FULLWIDTH DIGIT NINE, for en_US.UTF-8.
Why doesn't it include FULLWIDTH DIGIT NINE?
And the answer is because the regex says to stop at "9", and there are lots
more nine's after 9 that don't get included. To capture all the nine's you'd
have to stop at 10 (which doesn't exist) or the last known "9" (which is locale
dependent).
Therefore I think it's still the right behaviour.
We really should be using [:lower:].
> expected: "0123456789"
> actual: "0123456789²³¹¼ ... and lots of other stuff but 9 was there ...
Yes, that's fine, and that includes the 9, and so is OK.
--
You are receiving this mail because:
You are on the CC list for the bug.