This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 1/3] sed: Fix infinite loop on some false multi-bytematches
- From: Stanislav Brabec <sbrabec at suse dot cz>
- To: Aharon Robbins <arnold at skeeve dot com>,Roland McGrath <roland at hack dot frob dot com>
- Cc: bug-gnu-utils at gnu dot org, libc-alpha at sourceware dot org
- Date: Wed, 15 Feb 2012 17:29:28 +0100
- Subject: Re: [PATCH 1/3] sed: Fix infinite loop on some false multi-bytematches
- References: <201202121930.q1CJUKUi003938@skeeve.com>
Roland McGrath wrote:
>A subtle issue such as this warrants an addition to the test
>suite.
Aharon Robbins wrote:
> I have been looking at this and trying to see if I can reproduce
> it in gawk. I can't seem too. Would someone who understands the
> issue supply me with a test awk program that either shows that
> gawk has this bug, or doesn't?
PATCH 2/3 contains sed testcase that can easily reproduce the bug in
sed. (The last line contains testcase for another bug that appeared in
older versions of glibc.)
However I tried hard to minimize the testcase, I failed to reproduce it
outside sed. Here is my best attempt C testcase, but it _does_not_
reproduce the problem. Probably there are some additional conditions
that are fulfilled in sed, but not here:
/* Test re_search with multi-byte characters in EUC-JP.
Copyright (C) 2006 Free Software Foundation, Inc.
This file is part of the GNU C Library.
Contributed by Stanislav Brabec <sbrabec@suse.cz>, 2012.
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
02111-1307 USA. */
#define _GNU_SOURCE 1
#include <locale.h>
#include <regex.h>
#include <stdio.h>
#include <string.h>
int
main (void)
{
struct re_pattern_buffer r;
struct re_registers s;
int e, rc = 0;
if (setlocale (LC_CTYPE, "ja_JP.EUC-JP") == NULL)
{
puts ("setlocale failed");
return 1;
}
memset (&r, 0, sizeof (r));
memset (&s, 0, sizeof (s));
re_set_syntax (RE_SYNTAX_POSIX_BASIC | RE_NO_POSIX_BACKTRACKING);
/* å */
re_compile_pattern ("\xb7\xbd", 2, &r);
r.regs_allocated = REGS_REALLOCATE;
/* aaaaaäaæå, \xb7\xbd constitutes a false match */
e = re_search (&r, "\x61\x61\x61\x61\x61\xb7\xef\x61\xbf\xb7\xbd\xe8",
12, 0, 12, &s);
if (e != -1)
{
printf ("bug-regex33.1: false match or error: re_search() returned %d\n", e);
rc = 1;
}
/* aaaaäaæå, \xb7\xbd constitutes a false match */
e = re_search (&r, "\x61\x61\x61\x61\xb7\xef\x61\xbf\xb7\xbd\xe8",
11, 0, 11, &s);
if (e != -1)
{
printf ("bug-regex33.2: false match or error: re_search() returned %d\n", e);
rc = 1;
}
/* aaaäaæå, \xb7\xbd constitutes a false match */
e = re_search (&r, "\x61\x61\x61\xb7\xef\x61\xbf\xb7\xbd\xe8",
10, 0, 10, &s);
if (e != -1)
{
printf ("bug-regex33.3: false match or error: re_search() returned %d\n", e);
rc = 1;
}
/* aaäaæå, \xb7\xbd constitutes a false match */
e = re_search (&r, "\x61\x61\xb7\xef\x61\xbf\xb7\xbd\xe8",
9, 0, 9, &s);
if (e != -1)
{
printf ("bug-regex33.4: false match or error: re_search() returned %d\n", e);
rc = 1;
}
/* aäaæå, \xb7\xbd constitutes a false match */
e = re_search (&r, "\x61\xb7\xef\x61\xbf\xb7\xbd\xe8",
8, 0, 8, &s);
if (e != -1)
{
printf ("bug-regex33.5: false match or error: re_search() returned %d\n", e);
rc = 1;
}
/* æååæå, \xb7\xbd here really matches å */
e = re_search (&r, "\xbf\xb7\xbd\xe8\xb7\xbd\xbf\xb7\xbd\xe8",
10, 0, 10, &s);
if (e != 4)
{
printf ("bug-regex33.6: match not found: re_search() returned %d\n", e);
rc = 1;
}
return rc;
}
--
Best Regards / S pozdravem,
Stanislav Brabec
software developer
---------------------------------------------------------------------
SUSE LINUX, s. r. o. e-mail: sbrabec@suse.cz
Lihovarskà 1060/12 tel: +49 911 7405384547
190 00 Praha 9 fax: +420 284 028 951
Czech Republic http://www.suse.cz/