This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
Re: [PATCH] Fix fnmatch escape handling in brackets (BZ #361)
- From: "Markus F.X.J. Oberhumer" <markus at oberhumer dot com>
- To: Jakub Jelinek <jakub at redhat dot com>
- Cc: Ulrich Drepper <drepper at redhat dot com>,Glibc <libc-alpha at sources dot redhat dot com>
- Date: Thu, 2 Sep 2004 16:48:30 +0200
- Subject: Re: [PATCH] Fix fnmatch escape handling in brackets (BZ #361)
- Organization: oberhumer.com
- References: <20040901190912.GM30497@sunsite.ms.mff.cuni.cz> <200409020551.18238.markus@oberhumer.com> <20040902082243.GB30573@devserv.devel.redhat.com>
I happily accept that FNM_CASEFOLD ranges are somewhat limited, but the docs
need some clarificataion then; especially that GNU sed/grep/egrep and Perl
_do_ get it right.
Please see the somewhat polished example program attached below.
Unrelated, I may have hit a bug in GNU sed 4.1.1 (works with 3.0.2 and 4.0.5).
Please have a look as well.
Finally one could also argue that the following pseudocode would match the
semantics better:
if (ch >= range_start && char <= range_end) goto matched_range;
if (flag_FNM_CASEFOLD)
if (FOLD(ch) >= FOLD(range_start) &&
FOLD(ch) <= FOLD(range_end)) goto matched_range;
/* range not matched */
Markus
On Thursday 02 September 2004 10:22, Jakub Jelinek wrote:
> On Thu, Sep 02, 2004 at 05:51:18AM +0200, Markus F.X.J. Oberhumer wrote:
> > Jakub,
> >
> > I have not been able to test your patch yet, but the program attached
> > below prints some errors for me (I originally thought this was caused by
> > the backslash at the end but I now see it is completely unrelated).
> >
> > Not sure if this is actually a bug - at least it is confusing that when
> > enlarging the pattern-range or adding FNM_CASEFOLD an "A" does
> > not match a range starting with "A" anymore.
>
> I don't think it is a bug.
> Unlike RE_ICASE regcomp which converts to uppercase, FNM_CASEFOLD
> converts to lowercase.
> And a range where start character is > end character is invalid.
> You can play with regex RE_ICASE too:
>
> $ echo | LC_ALL=C sed -n '/[a-[]/Ip'
> $ echo | LC_ALL=C sed -n '/[A-[]/Ip'
> $ echo | LC_ALL=C sed -n '/[[-a]/Ip'
> sed: -e expression #1, char 8: Invalid range end
> $ echo | LC_ALL=C sed -n '/[[-A]/Ip'
> sed: -e expression #1, char 8: Invalid range end
> $ echo | LC_ALL=C sed -n '/[[-A]/p'
> sed: -e expression #1, char 7: Invalid range end
> $ echo | LC_ALL=C sed -n '/[[-a]/p'
> $ echo | LC_ALL=C sed -n '/[a-[]/p'
> sed: -e expression #1, char 7: Invalid range end
> $ echo | LC_ALL=C sed -n '/[A-[]/p'
>
> and fnmatch behaves very similarly to this (just with the difference
> that it uses lowercase instead of uppercase conversion).
> See that while [[-a] is valid range without RE_ICASE, it is not
> with RE_ICASE (but [a-[] is, which without RE_ICASE is not valid).
>
> So, when you are using FNM_CASEFOLD, it is always better to write the
> ranges where both range ends aren't a-z or A-Z in lowercase.
>
> I have checked Solaris and there your testcase with #define FNM_CASEFOLD
> FNM_IGNORECASE behaves the same as on Linux.
>
> Jakub
--
Markus F.X.J. Oberhumer
oberhumer.com, http://www.oberhumer.com/
#define _GNU_SOURCE 1 /* for FNM_CASEFOLD */
#include <stdio.h>
#include <fnmatch.h>
#include <assert.h>
int main()
{
/* character class from 'A' to '[' (ASCII 65-91) */
const char pattern_A[] = "[A-[]";
/* character class from 'a' to '{' (ASCII 97-123) */
const char pattern_a[] = "[a-{]";
printf("%d", fnmatch(pattern_A, "A", 0));
printf("%d", fnmatch(pattern_A, "a", 0));
printf("%d", fnmatch(pattern_a, "A", 0));
printf("%d", fnmatch(pattern_a, "a", 0));
printf("%d", fnmatch(pattern_A, "A", FNM_CASEFOLD));
printf("%d", fnmatch(pattern_A, "a", FNM_CASEFOLD));
printf("%d", fnmatch(pattern_a, "A", FNM_CASEFOLD));
printf("%d", fnmatch(pattern_a, "a", FNM_CASEFOLD));
printf("\n");
/* using regular expressions: the same program with GNU sed 3.02 / 4.05:
[UNRELATED PROBLEM: GNU sed 4.1.1 cannot parse this -- sed bug ???]
sed=sed
echo -n A | $sed -e 's,[A-[],0,' -e 's,[Aa],1,'
echo -n a | $sed -e 's,[A-[],0,' -e 's,[Aa],1,'
echo -n A | $sed -e 's,[a-{],0,' -e 's,[Aa],1,'
echo -n a | $sed -e 's,[a-{],0,' -e 's,[Aa],1,'
echo -n A | $sed -e 's,[A-[],0,i' -e 's,[Aa],1,'
echo -n a | $sed -e 's,[A-[],0,i' -e 's,[Aa],1,'
echo -n A | $sed -e 's,[a-{],0,i' -e 's,[Aa],1,'
echo -n a | $sed -e 's,[a-{],0,i' -e 's,[Aa],1,'
echo
*/
/* using regular expressions: the same program with GNU grep or egrep 2.5.1:
grep=egrep
grep=grep
echo A | $grep -q '[A-[]'; echo -n $?
echo a | $grep -q '[A-[]'; echo -n $?
echo A | $grep -q '[a-{]'; echo -n $?
echo a | $grep -q '[a-{]'; echo -n $?
echo A | $grep -q -i '[A-[]'; echo -n $?
echo a | $grep -q -i '[A-[]'; echo -n $?
echo A | $grep -q -i '[a-{]'; echo -n $?
echo a | $grep -q -i '[a-{]'; echo -n $?
echo
*/
/* using regular expressions: the same program in Perl5 (5.8.5):
printf "%d", "A" !~ /[A-[]/;
printf "%d", "a" !~ /[A-[]/;
printf "%d", "A" !~ /[a-{]/;
printf "%d", "a" !~ /[a-{]/;
printf "%d", "A" !~ /[A-[]/i;
printf "%d", "a" !~ /[A-[]/i;
printf "%d", "A" !~ /[a-{]/i;
printf "%d", "a" !~ /[a-{]/i;
print "\n";
*/
/*
export LC_ALL=C
fnmatch prints: 01101100
sed prints: 01100000
grep prints: 01100000
egrep prints: 01100000
Perl prints: 01100000
*/
return 0;
}
/*
vi:ts=4:et
*/