This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Questions on fnmatch() and case folding


On 01/26/2017 02:27 PM, Nick Stoughton wrote:
> Some questions have arisen during the Austin Group (the POSIX
> maintainers) meetings around adding support in POSIX for case
> insensitive file name matching (see
> http://austingroupbugs.net/view.php?id=1031)
> 
> It was observed that the glibc implementation of fnmatch() with the
> FNM_CASEFOLD flag does NOT do case folding when given an explicit
> character class. That is to say, the string "A" does not match the
> pattern "[[:lower:]]" even with FNM_CASEFOLD.

I have confirmed that it doesn't match correctly using character
classes and I would expect it to.

> I've checked the current master branch on git, and the issue (if
> indeed it is an issue) is still present there.
> 
> There's also a question with range expressions such as "[Z-a]"
> (assuming a POSIX locale): should this match characters such as '_'
> (which in ASCII at least lies between upper case Z and lower case a),
> and whether or not case insensitivity should or should not affect
> this.

No opinion.

> My personal expectation is that "[[:lower:]]" should match an
> uppercase character if case folding is occurring (which it does not in
> glibc). Is this a bug?

Yes it looks like a bug to me.

The manual says:

"Ignore case in comparing @var{string} to @var{pattern}."

So "[[:lower:]]" should match any uppercase or lowercase character.

> In the POSIX locale, [:lower:] is the character set
> abcdefghijklmnopqrstuvwxyz, and [:upper:] is a similar (upper case)
> set. Thus we might expect
> [[:upper:]-[:lower:]] to be the same as
> [ABCDEFGHIJKLMNOPQRSTUVWXYZ-abcdefghijklmnopqrstuvwxyz]
> ... but it isn't!
> 
> The program below demonstrates...
 
I expect they are all bugs.

To quote from our test input data:
~~~
# Derived from the IEEE 2003.2 text.  The standard only contains some
# wording describing the situations to be tested.  It does not specify
# any specific tests.  I.e., the tests below are in no case sufficient.
# They are hopefully necessary, though.
~~~

Hope is not a plan.

Out of the ~500 tests we have only 1 FNM_CASEFOLD test that I'm aware
of, so we absolutely don't cover any of the more complex corner cases.

FNM_CASEFOLD is implemented as a tolower-like equivalent across the
inputs, but the comparison loop is complicated enough that there might
be cases where FOLD() is required but missing.

Your source is already a better test case for FNM_CASEFOLD than what
we have in glibc :-)

-- 
Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]