bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/3] sed: Fix infinite loop on some false multi-byte matches


From: Stanislav Brabec
Subject: Re: [PATCH 1/3] sed: Fix infinite loop on some false multi-byte matches
Date: Wed, 15 Feb 2012 17:29:28 +0100

Roland McGrath wrote:
>A subtle issue such as this warrants an addition to the test
>suite.

Aharon Robbins wrote:
> I have been looking at this and trying to see if I can reproduce
> it in gawk. I can't seem too. Would someone who understands the
> issue supply me with a test awk program that either shows that
> gawk has this bug, or doesn't?

PATCH 2/3 contains sed testcase that can easily reproduce the bug in
sed. (The last line contains testcase for another bug that appeared in
older versions of glibc.)

However I tried hard to minimize the testcase, I failed to reproduce it
outside sed. Here is my best attempt C testcase, but it _does_not_
reproduce the problem. Probably there are some additional conditions
that are fulfilled in sed, but not here:


/* Test re_search with multi-byte characters in EUC-JP.
   Copyright (C) 2006 Free Software Foundation, Inc.
   This file is part of the GNU C Library.
   Contributed by Stanislav Brabec <address@hidden>, 2012.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Lesser General Public
   License as published by the Free Software Foundation; either
   version 2.1 of the License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Lesser General Public License for more details.

   You should have received a copy of the GNU Lesser General Public
   License along with the GNU C Library; if not, write to the Free
   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
   02111-1307 USA.  */

#define _GNU_SOURCE 1
#include <locale.h>
#include <regex.h>
#include <stdio.h>
#include <string.h>

int
main (void)
{
  struct re_pattern_buffer r;
  struct re_registers s;
  int e, rc = 0;
  if (setlocale (LC_CTYPE, "ja_JP.EUC-JP") == NULL)
    {
      puts ("setlocale failed");
      return 1;
    }
  memset (&r, 0, sizeof (r));
  memset (&s, 0, sizeof (s));
  re_set_syntax (RE_SYNTAX_POSIX_BASIC | RE_NO_POSIX_BACKTRACKING);
                    /* 圭 */
  re_compile_pattern ("\xb7\xbd", 2, &r);

  r.regs_allocated = REGS_REALLOCATE;

                /* aaaaa件a新処, \xb7\xbd constitutes a false match */
  e = re_search (&r, "\x61\x61\x61\x61\x61\xb7\xef\x61\xbf\xb7\xbd\xe8",
                 12, 0, 12, &s);
  if (e != -1)
    {
      printf ("bug-regex33.1: false match or error: re_search() returned %d\n", 
e);
      rc = 1;
    }

                /* aaaa件a新処, \xb7\xbd constitutes a false match */
  e = re_search (&r, "\x61\x61\x61\x61\xb7\xef\x61\xbf\xb7\xbd\xe8",
                 11, 0, 11, &s);
  if (e != -1)
    {
      printf ("bug-regex33.2: false match or error: re_search() returned %d\n", 
e);
      rc = 1;
    }

                /* aaa件a新処, \xb7\xbd constitutes a false match */
  e = re_search (&r, "\x61\x61\x61\xb7\xef\x61\xbf\xb7\xbd\xe8",
                 10, 0, 10, &s);
  if (e != -1)
    {
      printf ("bug-regex33.3: false match or error: re_search() returned %d\n", 
e);
      rc = 1;
    }

                /* aa件a新処, \xb7\xbd constitutes a false match */
  e = re_search (&r, "\x61\x61\xb7\xef\x61\xbf\xb7\xbd\xe8",
                 9, 0, 9, &s);
  if (e != -1)
    {
      printf ("bug-regex33.4: false match or error: re_search() returned %d\n", 
e);
      rc = 1;
    }

                /* a件a新処, \xb7\xbd constitutes a false match */
  e = re_search (&r, "\x61\xb7\xef\x61\xbf\xb7\xbd\xe8",
                 8, 0, 8, &s);
  if (e != -1)
    {
      printf ("bug-regex33.5: false match or error: re_search() returned %d\n", 
e);
      rc = 1;
    }

                /* 新処圭新処, \xb7\xbd here really matches 圭 */
  e = re_search (&r, "\xbf\xb7\xbd\xe8\xb7\xbd\xbf\xb7\xbd\xe8",
                 10, 0, 10, &s);
  if (e != 4)
    {
      printf ("bug-regex33.6: match not found: re_search() returned %d\n", e);
      rc = 1;
    }

  return rc;
}


-- 
Best Regards / S pozdravem,

Stanislav Brabec
software developer
---------------------------------------------------------------------
SUSE LINUX, s. r. o.                          e-mail: address@hidden
Lihovarská 1060/12                            tel: +49 911 7405384547
190 00 Praha 9                                  fax: +420 284 028 951
Czech Republic                                    http://www.suse.cz/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]