This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 1/3] sed: Fix infinite loop on some false multi-bytematches


Roland McGrath wrote:
>A subtle issue such as this warrants an addition to the test
>suite.

Aharon Robbins wrote:
> I have been looking at this and trying to see if I can reproduce
> it in gawk. I can't seem too. Would someone who understands the
> issue supply me with a test awk program that either shows that
> gawk has this bug, or doesn't?

PATCH 2/3 contains sed testcase that can easily reproduce the bug in
sed. (The last line contains testcase for another bug that appeared in
older versions of glibc.)

However I tried hard to minimize the testcase, I failed to reproduce it
outside sed. Here is my best attempt C testcase, but it _does_not_
reproduce the problem. Probably there are some additional conditions
that are fulfilled in sed, but not here:


/* Test re_search with multi-byte characters in EUC-JP.
   Copyright (C) 2006 Free Software Foundation, Inc.
   This file is part of the GNU C Library.
   Contributed by Stanislav Brabec <sbrabec@suse.cz>, 2012.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Lesser General Public
   License as published by the Free Software Foundation; either
   version 2.1 of the License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Lesser General Public License for more details.

   You should have received a copy of the GNU Lesser General Public
   License along with the GNU C Library; if not, write to the Free
   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
   02111-1307 USA.  */

#define _GNU_SOURCE 1
#include <locale.h>
#include <regex.h>
#include <stdio.h>
#include <string.h>

int
main (void)
{
  struct re_pattern_buffer r;
  struct re_registers s;
  int e, rc = 0;
  if (setlocale (LC_CTYPE, "ja_JP.EUC-JP") == NULL)
    {
      puts ("setlocale failed");
      return 1;
    }
  memset (&r, 0, sizeof (r));
  memset (&s, 0, sizeof (s));
  re_set_syntax (RE_SYNTAX_POSIX_BASIC | RE_NO_POSIX_BACKTRACKING);
                    /* å */
  re_compile_pattern ("\xb7\xbd", 2, &r);

  r.regs_allocated = REGS_REALLOCATE;

                /* aaaaaäaæå, \xb7\xbd constitutes a false match */
  e = re_search (&r, "\x61\x61\x61\x61\x61\xb7\xef\x61\xbf\xb7\xbd\xe8",
                 12, 0, 12, &s);
  if (e != -1)
    {
      printf ("bug-regex33.1: false match or error: re_search() returned %d\n", e);
      rc = 1;
    }

                /* aaaaäaæå, \xb7\xbd constitutes a false match */
  e = re_search (&r, "\x61\x61\x61\x61\xb7\xef\x61\xbf\xb7\xbd\xe8",
                 11, 0, 11, &s);
  if (e != -1)
    {
      printf ("bug-regex33.2: false match or error: re_search() returned %d\n", e);
      rc = 1;
    }

                /* aaaäaæå, \xb7\xbd constitutes a false match */
  e = re_search (&r, "\x61\x61\x61\xb7\xef\x61\xbf\xb7\xbd\xe8",
                 10, 0, 10, &s);
  if (e != -1)
    {
      printf ("bug-regex33.3: false match or error: re_search() returned %d\n", e);
      rc = 1;
    }

                /* aaäaæå, \xb7\xbd constitutes a false match */
  e = re_search (&r, "\x61\x61\xb7\xef\x61\xbf\xb7\xbd\xe8",
                 9, 0, 9, &s);
  if (e != -1)
    {
      printf ("bug-regex33.4: false match or error: re_search() returned %d\n", e);
      rc = 1;
    }

                /* aäaæå, \xb7\xbd constitutes a false match */
  e = re_search (&r, "\x61\xb7\xef\x61\xbf\xb7\xbd\xe8",
                 8, 0, 8, &s);
  if (e != -1)
    {
      printf ("bug-regex33.5: false match or error: re_search() returned %d\n", e);
      rc = 1;
    }

                /* æååæå, \xb7\xbd here really matches å */
  e = re_search (&r, "\xbf\xb7\xbd\xe8\xb7\xbd\xbf\xb7\xbd\xe8",
                 10, 0, 10, &s);
  if (e != 4)
    {
      printf ("bug-regex33.6: match not found: re_search() returned %d\n", e);
      rc = 1;
    }

  return rc;
}


-- 
Best Regards / S pozdravem,

Stanislav Brabec
software developer
---------------------------------------------------------------------
SUSE LINUX, s. r. o.                          e-mail: sbrabec@suse.cz
Lihovarskà 1060/12                            tel: +49 911 7405384547
190 00 Praha 9                                  fax: +420 284 028 951
Czech Republic                                    http://www.suse.cz/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]