guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

regexp-quote bugs


From: Kevin Ryde
Subject: regexp-quote bugs
Date: Sun, 22 Aug 2004 11:43:23 +1000
User-agent: Gnus/5.110003 (No Gnus v0.3) Emacs/21.3 (gnu/linux)

        * regex-posix.c (scm_regexp_quote): Rewrite of ice-9 regex
        regexp-quote in C.  Fix [ and |, they must be quoted.  Fix quoting of
        ( ) { + ? for regexp/basic, must use char class [(] etc since \( in
        fact them become special.

This is for 1.6 too.

I'm assuming regexp-quote is meant to quote for both regexp/basic and
regexp/extended usages.  At the moment it's got problems in both.

New code below (have to find the posix spec to double check what needs
to be done, but this works with glibc for a start).



SCM_DEFINE (scm_regexp_quote, "regexp-quote", 1, 0, 0,
            (SCM str),
            "Return a regexp string which matches @var{str} literally, ie.@:\n"
            "any characters like @samp{*} in @var{str} which are special in\n"
            "a regexp are quoted.  If there are no special characters then\n"
            "@var{str} itself is returned.\n"
            "\n"
            "The regexp returned can be used with both @code{regexp/basic}\n"
            "and @code{regexp/extended}, the quoting applied is safe for\n"
            "both styles.")
#define FUNC_NAME s_scm_regexp_quote
{
  size_t     i, j, len, newlen;
  const char *ptr;
  char       *newptr;
  SCM        newstr;

  SCM_VALIDATE_STRING (SCM_ARG1, str);
  ptr = scm_i_string_chars (str);
  len = scm_i_string_length (str);

  /* [ * . \ ^ and $ are special in both regexp/basic and regexp/extended
     and can be backslash escaped.

     ( ) { } + ? and | are special in regexp/extended so must be escaped.
     But that can't be done with a backslash since in regexp/basic sequences
     \( \) \{ \} \+ \? and \| are special.  Character class forms [(] etc
     are used instead.

     ] is not special outside a [ ] character class, so doesn't need to be
     escaped.  */

#define REGEXP_QUOTE_BACKSLASH                  \
  case '[':                                     \
 case '*':                                      \
 case '.':                                      \
 case '\\':                                     \
 case '^':                                      \
 case '$'

#define REGEXP_QUOTE_CHARCLASS                  \
  case '(':                                     \
 case ')':                                      \
 case '{':                                      \
 case '}':                                      \
 case '+':                                      \
 case '?':                                      \
 case '|'

  for (i = 0, newlen = 0; i < len; i++)
    {
      switch (ptr[i]) {
      REGEXP_QUOTE_BACKSLASH:
        newlen += 2;
        break;
      REGEXP_QUOTE_CHARCLASS:
        newlen += 3;
        break;
      default:
        newlen += 1;
        break;
      }
    }

  if (newlen == len)
    return str;

  newstr = scm_i_make_string (newlen, &newptr);
  for (i = 0, j = 0; i < len; i++, j++)
    {
      char c = ptr[i];
      switch (c) {
      REGEXP_QUOTE_BACKSLASH:
        newptr[j++] = '\\';
        goto store_c;
      REGEXP_QUOTE_CHARCLASS:
        newptr[j++] = '[';
        newptr[j++] = c;
        newptr[j] = ']';
        break;
      default:
      store_c:
        newptr[j] = c;
        break;
      }
    }
  scm_remember_upto_here_1 (str);
  return newstr;
}
#undef FUNC_NAME




reply via email to

[Prev in Thread] Current Thread [Next in Thread]