This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Questions on fnmatch() and case folding

From: Carlos O'Donell <carlos at redhat dot com>
To: Nick Stoughton <nstoughton at logitech dot com>, libc-help at sourceware dot org
Date: Thu, 26 Jan 2017 20:59:15 -0500
Subject: Re: Questions on fnmatch() and case folding
Authentication-results: sourceware.org; auth=none
References: <CACpbN909nircU4Qx6-fRauYSjbjMbAP5t3XN5TO5QVCo_G5Phg@mail.gmail.com>

On 01/26/2017 02:27 PM, Nick Stoughton wrote:
> Some questions have arisen during the Austin Group (the POSIX
> maintainers) meetings around adding support in POSIX for case
> insensitive file name matching (see
> http://austingroupbugs.net/view.php?id=1031)
> 
> It was observed that the glibc implementation of fnmatch() with the
> FNM_CASEFOLD flag does NOT do case folding when given an explicit
> character class. That is to say, the string "A" does not match the
> pattern "[[:lower:]]" even with FNM_CASEFOLD.

I have confirmed that it doesn't match correctly using character
classes and I would expect it to.

> I've checked the current master branch on git, and the issue (if
> indeed it is an issue) is still present there.
> 
> There's also a question with range expressions such as "[Z-a]"
> (assuming a POSIX locale): should this match characters such as '_'
> (which in ASCII at least lies between upper case Z and lower case a),
> and whether or not case insensitivity should or should not affect
> this.

No opinion.

> My personal expectation is that "[[:lower:]]" should match an
> uppercase character if case folding is occurring (which it does not in
> glibc). Is this a bug?

Yes it looks like a bug to me.

The manual says:

"Ignore case in comparing @var{string} to @var{pattern}."

So "[[:lower:]]" should match any uppercase or lowercase character.

> In the POSIX locale, [:lower:] is the character set
> abcdefghijklmnopqrstuvwxyz, and [:upper:] is a similar (upper case)
> set. Thus we might expect
> [[:upper:]-[:lower:]] to be the same as
> [ABCDEFGHIJKLMNOPQRSTUVWXYZ-abcdefghijklmnopqrstuvwxyz]
> ... but it isn't!
> 
> The program below demonstrates...

I expect they are all bugs.

To quote from our test input data:
~~~
# Derived from the IEEE 2003.2 text.  The standard only contains some
# wording describing the situations to be tested.  It does not specify
# any specific tests.  I.e., the tests below are in no case sufficient.
# They are hopefully necessary, though.
~~~

Hope is not a plan.

Out of the ~500 tests we have only 1 FNM_CASEFOLD test that I'm aware
of, so we absolutely don't cover any of the more complex corner cases.

FNM_CASEFOLD is implemented as a tolower-like equivalent across the
inputs, but the comparison loop is complicated enough that there might
be cases where FOLD() is required but missing.

Your source is already a better test case for FNM_CASEFOLD than what
we have in glibc :-)

-- 
Cheers,
Carlos.

References:
- Questions on fnmatch() and case folding
  - From: Nick Stoughton

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]