This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix fnmatch escape handling in brackets (BZ #361)


I happily accept that FNM_CASEFOLD ranges are somewhat limited, but the docs 
need some clarificataion then; especially that GNU sed/grep/egrep and Perl 
_do_ get it right.

Please see the somewhat polished example program attached below.

Unrelated, I may have hit a bug in GNU sed 4.1.1 (works with 3.0.2 and 4.0.5). 
Please have a look as well.

Finally one could also argue that the following pseudocode would match the 
semantics better:

   if (ch >= range_start && char <= range_end) goto matched_range;
   if (flag_FNM_CASEFOLD)
      if (FOLD(ch) >= FOLD(range_start) && 
         FOLD(ch) <= FOLD(range_end)) goto matched_range;
   /* range not matched */

Markus

On Thursday 02 September 2004 10:22, Jakub Jelinek wrote:
> On Thu, Sep 02, 2004 at 05:51:18AM +0200, Markus F.X.J. Oberhumer wrote:
> > Jakub,
> >
> > I have not been able to test your patch yet, but the program attached
> > below prints some errors for me (I originally thought this was caused by
> > the backslash at the end but I now see it is completely unrelated).
> >
> > Not sure if this is actually a bug - at least it is confusing that when
> > enlarging the pattern-range or adding FNM_CASEFOLD an "A" does
> > not match a range starting with "A" anymore.
>
> I don't think it is a bug.
> Unlike RE_ICASE regcomp which converts to uppercase, FNM_CASEFOLD
> converts to lowercase.
> And a range where start character is > end character is invalid.
> You can play with regex RE_ICASE too:
>
> $ echo | LC_ALL=C sed -n '/[a-[]/Ip'
> $ echo | LC_ALL=C sed -n '/[A-[]/Ip'
> $ echo | LC_ALL=C sed -n '/[[-a]/Ip'
> sed: -e expression #1, char 8: Invalid range end
> $ echo | LC_ALL=C sed -n '/[[-A]/Ip'
> sed: -e expression #1, char 8: Invalid range end
> $ echo | LC_ALL=C sed -n '/[[-A]/p'
> sed: -e expression #1, char 7: Invalid range end
> $ echo | LC_ALL=C sed -n '/[[-a]/p'
> $ echo | LC_ALL=C sed -n '/[a-[]/p'
> sed: -e expression #1, char 7: Invalid range end
> $ echo | LC_ALL=C sed -n '/[A-[]/p'
>
> and fnmatch behaves very similarly to this (just with the difference
> that it uses lowercase instead of uppercase conversion).
> See that while [[-a] is valid range without RE_ICASE, it is not
> with RE_ICASE (but [a-[] is, which without RE_ICASE is not valid).
>
> So, when you are using FNM_CASEFOLD, it is always better to write the
> ranges where both range ends aren't a-z or A-Z in lowercase.
>
> I have checked Solaris and there your testcase with #define FNM_CASEFOLD
> FNM_IGNORECASE behaves the same as on Linux.
>
>  Jakub

-- 
Markus F.X.J. Oberhumer
oberhumer.com, http://www.oberhumer.com/
#define _GNU_SOURCE 1   /* for FNM_CASEFOLD */
#include <stdio.h>
#include <fnmatch.h>
#include <assert.h>

int main()
{
    /* character class from 'A' to '[' (ASCII 65-91) */
    const char pattern_A[] = "[A-[]";

    /* character class from 'a' to '{' (ASCII 97-123) */
    const char pattern_a[] = "[a-{]";

    printf("%d", fnmatch(pattern_A, "A", 0));
    printf("%d", fnmatch(pattern_A, "a", 0));
    printf("%d", fnmatch(pattern_a, "A", 0));
    printf("%d", fnmatch(pattern_a, "a", 0));

    printf("%d", fnmatch(pattern_A, "A", FNM_CASEFOLD));
    printf("%d", fnmatch(pattern_A, "a", FNM_CASEFOLD));
    printf("%d", fnmatch(pattern_a, "A", FNM_CASEFOLD));
    printf("%d", fnmatch(pattern_a, "a", FNM_CASEFOLD));
    printf("\n");


/* using regular expressions: the same program with GNU sed 3.02 / 4.05:

   [UNRELATED PROBLEM: GNU sed 4.1.1 cannot parse this -- sed bug ???]

sed=sed

echo -n A | $sed -e 's,[A-[],0,'  -e 's,[Aa],1,'
echo -n a | $sed -e 's,[A-[],0,'  -e 's,[Aa],1,'
echo -n A | $sed -e 's,[a-{],0,'  -e 's,[Aa],1,'
echo -n a | $sed -e 's,[a-{],0,'  -e 's,[Aa],1,'

echo -n A | $sed -e 's,[A-[],0,i' -e 's,[Aa],1,'
echo -n a | $sed -e 's,[A-[],0,i' -e 's,[Aa],1,'
echo -n A | $sed -e 's,[a-{],0,i' -e 's,[Aa],1,'
echo -n a | $sed -e 's,[a-{],0,i' -e 's,[Aa],1,'
echo

*/



/* using regular expressions: the same program with GNU grep or egrep 2.5.1:
grep=egrep
grep=grep

echo A | $grep -q    '[A-[]'; echo -n $?
echo a | $grep -q    '[A-[]'; echo -n $?
echo A | $grep -q    '[a-{]'; echo -n $?
echo a | $grep -q    '[a-{]'; echo -n $?

echo A | $grep -q -i '[A-[]'; echo -n $?
echo a | $grep -q -i '[A-[]'; echo -n $?
echo A | $grep -q -i '[a-{]'; echo -n $?
echo a | $grep -q -i '[a-{]'; echo -n $?
echo

*/


/* using regular expressions: the same program in Perl5 (5.8.5):

printf "%d", "A" !~ /[A-[]/;
printf "%d", "a" !~ /[A-[]/;
printf "%d", "A" !~ /[a-{]/;
printf "%d", "a" !~ /[a-{]/;

printf "%d", "A" !~ /[A-[]/i;
printf "%d", "a" !~ /[A-[]/i;
printf "%d", "A" !~ /[a-{]/i;
printf "%d", "a" !~ /[a-{]/i;
print "\n";

*/



/*

export LC_ALL=C

fnmatch prints: 01101100

sed     prints: 01100000
grep    prints: 01100000
egrep   prints: 01100000
Perl    prints: 01100000

*/

    return 0;
}

/*
vi:ts=4:et
*/

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]