This is the mail archive of the
libc-help@sourceware.org
mailing list for the glibc project.
Re: Questions on fnmatch() and case folding
- From: Carlos O'Donell <carlos at redhat dot com>
- To: Nick Stoughton <nstoughton at logitech dot com>, libc-help at sourceware dot org
- Date: Thu, 26 Jan 2017 20:59:15 -0500
- Subject: Re: Questions on fnmatch() and case folding
- Authentication-results: sourceware.org; auth=none
- References: <CACpbN909nircU4Qx6-fRauYSjbjMbAP5t3XN5TO5QVCo_G5Phg@mail.gmail.com>
On 01/26/2017 02:27 PM, Nick Stoughton wrote:
> Some questions have arisen during the Austin Group (the POSIX
> maintainers) meetings around adding support in POSIX for case
> insensitive file name matching (see
> http://austingroupbugs.net/view.php?id=1031)
>
> It was observed that the glibc implementation of fnmatch() with the
> FNM_CASEFOLD flag does NOT do case folding when given an explicit
> character class. That is to say, the string "A" does not match the
> pattern "[[:lower:]]" even with FNM_CASEFOLD.
I have confirmed that it doesn't match correctly using character
classes and I would expect it to.
> I've checked the current master branch on git, and the issue (if
> indeed it is an issue) is still present there.
>
> There's also a question with range expressions such as "[Z-a]"
> (assuming a POSIX locale): should this match characters such as '_'
> (which in ASCII at least lies between upper case Z and lower case a),
> and whether or not case insensitivity should or should not affect
> this.
No opinion.
> My personal expectation is that "[[:lower:]]" should match an
> uppercase character if case folding is occurring (which it does not in
> glibc). Is this a bug?
Yes it looks like a bug to me.
The manual says:
"Ignore case in comparing @var{string} to @var{pattern}."
So "[[:lower:]]" should match any uppercase or lowercase character.
> In the POSIX locale, [:lower:] is the character set
> abcdefghijklmnopqrstuvwxyz, and [:upper:] is a similar (upper case)
> set. Thus we might expect
> [[:upper:]-[:lower:]] to be the same as
> [ABCDEFGHIJKLMNOPQRSTUVWXYZ-abcdefghijklmnopqrstuvwxyz]
> ... but it isn't!
>
> The program below demonstrates...
I expect they are all bugs.
To quote from our test input data:
~~~
# Derived from the IEEE 2003.2 text. The standard only contains some
# wording describing the situations to be tested. It does not specify
# any specific tests. I.e., the tests below are in no case sufficient.
# They are hopefully necessary, though.
~~~
Hope is not a plan.
Out of the ~500 tests we have only 1 FNM_CASEFOLD test that I'm aware
of, so we absolutely don't cover any of the more complex corner cases.
FNM_CASEFOLD is implemented as a tolower-like equivalent across the
inputs, but the comparison loop is complicated enough that there might
be cases where FOLD() is required but missing.
Your source is already a better test case for FNM_CASEFOLD than what
we have in glibc :-)
--
Cheers,
Carlos.