Re: [Patch] SRFI-13 string-tokenize is wrong

guile-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Patch] SRFI-13 string-tokenize is wrong

From:	Matthias Koeppe
Subject:	Re: [Patch] SRFI-13 string-tokenize is wrong
Date:	Fri, 26 Apr 2002 10:27:50 +0200
User-agent:	Gnus/5.090004 (Oort Gnus v0.04) Emacs/21.1.80 (sparc-sun-solaris2.7)

Marius Vollmer <address@hidden> writes:

> Matthias Koeppe <address@hidden> writes:
>> the Guile implementation of SRFI-13 `string-tokenize' gets the meaning
>> of the `token-set' argument wrong.
>
> Yep, good catch.  However, your patch did not make an unspecified
> token-set equivalent to char-set:graphic, but to (char-set-complement
> char-set:whitespace).

Thanks for pointing this out.  I've updated the patch; see below.

Now this works correctly:

 (string-tokenize "Help make programs\arun, run,\nRUN!")
 ==> ("Help" "make" "programs" "run," "run," "RUN!")

-- Matthias

Index: srfi-13.c
===================================================================
RCS file: /cvs/guile/guile-core/srfi/srfi-13.c,v
retrieving revision 1.11.2.5
diff -u -u -r1.11.2.5 srfi-13.c
--- srfi-13.c   14 Mar 2002 05:32:48 -0000      1.11.2.5
+++ srfi-13.c   26 Apr 2002 08:20:04 -0000
@@ -2798,13 +2798,14 @@
 
 
 SCM_DEFINE (scm_string_tokenize, "string-tokenize", 1, 3, 0,
-           (SCM s, SCM token_char, SCM start, SCM end),
+           (SCM s, SCM token_set, SCM start, SCM end),
            "Split the string @var{s} into a list of substrings, where each\n"
            "substring is a maximal non-empty contiguous sequence of\n"
-           "characters equal to the character @var{token_char}, or\n"
-           "whitespace, if @var{token_char} is not given.  If\n"
-           "@var{token_char} is a character set, it is used for finding the\n"
-           "token borders.")
+           "characters from the character set @var{token_set}, which\n"
+           "defaults to an equivalent of @code{char-set:graphic}.\n"
+           "If @var{start} or @var{end} indices are provided, they restrict\n"
+           "@code{string-tokenize} to operating on the indicated substring\n"
+           "of @var{s}.")
 #define FUNC_NAME s_scm_string_tokenize
 {
   char * cstr;
@@ -2814,7 +2815,7 @@
   SCM_VALIDATE_SUBSTRING_SPEC_COPY (1, s, cstr,
                                    3, start, cstart,
                                    4, end, cend);
-  if (SCM_UNBNDP (token_char))
+  if (SCM_UNBNDP (token_set))
     {
       int idx;
 
@@ -2822,7 +2823,7 @@
        {
          while (cstart < cend)
            {
-             if (!isspace (cstr[cend - 1]))
+             if (isgraph (cstr[cend - 1]))
                break;
              cend--;
            }
@@ -2831,14 +2832,14 @@
          idx = cend;
          while (cstart < cend)
            {
-             if (isspace (cstr[cend - 1]))
+             if (!isgraph (cstr[cend - 1]))
                break;
              cend--;
            }
          result = scm_cons (scm_mem2string (cstr + cend, idx - cend), result);
        }
     }
-  else if (SCM_CHARSETP (token_char))
+  else if (SCM_CHARSETP (token_set))
     {
       int idx;
 
@@ -2846,7 +2847,7 @@
        {
          while (cstart < cend)
            {
-             if (!SCM_CHARSET_GET (token_char, cstr[cend - 1]))
+             if (SCM_CHARSET_GET (token_set, cstr[cend - 1]))
                break;
              cend--;
            }
@@ -2855,41 +2856,14 @@
          idx = cend;
          while (cstart < cend)
            {
-             if (SCM_CHARSET_GET (token_char, cstr[cend - 1]))
-               break;
-             cend--;
-           }
-         result = scm_cons (scm_mem2string (cstr + cend, idx - cend), result);
-       }
-    }
-  else
-    {
-      int idx;
-      char chr;
-
-      SCM_VALIDATE_CHAR (2, token_char);
-      chr = SCM_CHAR (token_char);
-
-      while (cstart < cend)
-       {
-         while (cstart < cend)
-           {
-             if (cstr[cend - 1] != chr)
-               break;
-             cend--;
-           }
-         if (cstart >= cend)
-           break;
-         idx = cend;
-         while (cstart < cend)
-           {
-             if (cstr[cend - 1] == chr)
+             if (!SCM_CHARSET_GET (token_set, cstr[cend - 1]))
                break;
              cend--;
            }
          result = scm_cons (scm_mem2string (cstr + cend, idx - cend), result);
        }
     }
+  else SCM_WRONG_TYPE_ARG (2, token_set);
   return result;
 }
 #undef FUNC_NAME

-- 
Matthias Koeppe -- http://www.math.uni-magdeburg.de/~mkoeppe

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Patch] SRFI-13 string-tokenize is wrong, Marius Vollmer, 2002/04/24
- Re: [Patch] SRFI-13 string-tokenize is wrong, Matthias Koeppe <=
  - Re: [Patch] SRFI-13 string-tokenize is wrong, Marius Vollmer, 2002/04/26
    - Re: [Patch] SRFI-13 string-tokenize is wrong, Matthias Koeppe, 2002/04/29

Prev by Date: Re: Use of "set -e -x" in autogen.sh not portable
Next by Date: Problem building CVS Guile with automake 1.6.1
Previous by thread: Re: [Patch] SRFI-13 string-tokenize is wrong
Next by thread: Re: [Patch] SRFI-13 string-tokenize is wrong
Index(es):
- Date
- Thread