Re: [PATCH] In string-split, add support for character sets and predicat

guile-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] In string-split, add support for character sets and predicat

From:	Mark H Weaver
Subject:	Re: [PATCH] In string-split, add support for character sets and predicates.
Date:	Mon, 08 Oct 2012 11:40:44 -0400
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/24.2 (gnu/linux)

Hi Daniel,

Thanks for the patch!  I have a few comments.

Daniel Hartwig <address@hidden> writes:
> From 0aeed16baa70eca143fec05e864f98d95d7267e8 Mon Sep 17 00:00:00 2001
> From: Daniel Hartwig <address@hidden>
> Date: Mon, 8 Oct 2012 18:35:00 +0800
> Subject: [PATCH] In string-split, add support for character sets and
>  predicates.
>
> * libguile/srfi-13.c (string-split): Add support for splitting on
>   character sets and predicates, like string-index and others.  Keep the
>   original (fast) path when splitting by character and refactor using
>   string-index-right for other types; the later involves handling SCM
>   values so there is less chance to optimize anyway.

As Ludovic frequently reminds us (and I agree), rationales should be in
the source code comments, not in the commit message.

> * test-suite/tests/strings.test (string-split): Add tests covering
>   the new argument types.
> ---
>  libguile/srfi-13.c            |   53 ++++++++++++++++++++++++++++++----
>  libguile/srfi-13.h            |    2 +-
>  test-suite/tests/strings.test |   62 
> ++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 108 insertions(+), 9 deletions(-)
>
> diff --git a/libguile/srfi-13.c b/libguile/srfi-13.c
> index 2834553..1874754 100644
> --- a/libguile/srfi-13.c
> +++ b/libguile/srfi-13.c
> @@ -2993,11 +2993,22 @@ SCM_DEFINE (scm_string_tokenize, "string-tokenize", 
> 1, 3, 0,
>  #undef FUNC_NAME
>  
>  SCM_DEFINE (scm_string_split, "string-split", 2, 0, 0,
> -         (SCM str, SCM chr),
> +         (SCM str, SCM char_pred),
>           "Split the string @var{str} into a list of the substrings 
> delimited\n"
> -         "by appearances of the character @var{chr}.  Note that an empty 
> substring\n"
> -         "between separator characters will result in an empty string in 
> the\n"
> -         "result list.\n"
> +         "by appearances characters which\n"

"by appearances of characters that"
                ^^            ^^^^

(the difference between 'which' and 'that' is described in
<http://www.kentlaw.edu/academics/lrw/grinker/LwtaThat_Versus_Which.htm>)

> +            "\n"
> +            "@itemize @bullet\n"
> +            "@item\n"
> +            "equals @var{char_pred}, if it is a character,\n"

Should be "equal", not "equals", because the subject "characters" is
plural.

> +            "\n"
> +            "@item\n"
> +            "satisfies the predicate @var{char_pred}, if it is a 
> procedure,\n"

Should be "satisfy" for the same reason.

> +            "\n"
> +            "@item\n"
> +            "is in the set @var{char_pred}, if it is a character set.\n"

"are in the set".

> +            "@end itemize\n\n"
> +            "Note that an empty substring between separator characters\n"
> +            "will result in an empty string in the result list.\n"
>           "\n"
>           "@lisp\n"
>           "(string-split \"root:x:0:0:root:/root:/bin/bash\" #\\:)\n"
> @@ -3014,13 +3025,39 @@ SCM_DEFINE (scm_string_split, "string-split", 2, 0, 0,
>           "@end lisp")
>  #define FUNC_NAME s_scm_string_split
>  {
> +  SCM sidx, slast_idx;
>    long idx, last_idx;
>    int narrow;
>    SCM res = SCM_EOL;
>  
>    SCM_VALIDATE_STRING (1, str);
> -  SCM_VALIDATE_CHAR (2, chr);
>    
> +  if (SCM_CHARP (char_pred))
> +    {
> +      goto split_char;

I'd prefer to avoid the use of 'goto' here, and instead use nested 'if's
here.  (I admit that 'goto's occasionally make code simpler and more
readable, but not in this case IMO).

Can you please put the code between 'split_char' and 'done' within this
'if', and all the code from here to 'split_char' within the 'else'?

> +    }
> +  else if (!SCM_CHARSETP (char_pred))
> +    {
> +      SCM_ASSERT (scm_is_true (scm_procedure_p (char_pred)),
> +                  char_pred, SCM_ARG2, FUNC_NAME);
> +    }
> +
> +  sidx = scm_string_length (str);
> +  slast_idx = SCM_BOOL_F;
> +  while (scm_is_true (sidx))
> +    {
> +      slast_idx = sidx;
> +      sidx = scm_string_index_right (str, char_pred, SCM_INUM0, slast_idx);
> +      if (scm_is_true (sidx))
> +        {
> +          SCM substr = scm_substring (str, scm_oneplus (sidx), slast_idx);
> +          res = scm_cons (substr, res);
> +        }
> +    }

It is needlessly inefficient to test 'scm_is_true (sidx)' twice per
iteration of this loop.  The first test is also a waste.  Here's one way
to avoid the redundant tests:

  slast_idx = scm_string_length (str);
  for (;;)
    {
      sidx = scm_string_index_right (str, char_pred, SCM_INUM0, slast_idx);
      if (scm_is_false (sidx))
        break;
      res = scm_cons (scm_substring (str, scm_oneplus (sidx), slast_idx), res);
      slast_idx = sidx;
    }

> +  res = scm_cons (scm_substring (str, SCM_INUM0, slast_idx), res);
> +  goto done;
> +
> + split_char:
>    /* This is explicit wide/narrow logic (instead of using
>       scm_i_string_ref) is a speed optimization.  */
>    idx = scm_i_string_length (str);
> @@ -3031,7 +3068,7 @@ SCM_DEFINE (scm_string_split, "string-split", 2, 0, 0,
>        while (idx >= 0)
>          {
>            last_idx = idx;
> -          while (idx > 0 && buf[idx-1] != (char) SCM_CHAR(chr))
> +          while (idx > 0 && buf[idx-1] != (char) SCM_CHAR(char_pred))
>              idx--;
>            if (idx >= 0)
>              {
> @@ -3046,7 +3083,7 @@ SCM_DEFINE (scm_string_split, "string-split", 2, 0, 0,
>        while (idx >= 0)
>          {
>            last_idx = idx;
> -          while (idx > 0 && buf[idx-1] != SCM_CHAR(chr))
> +          while (idx > 0 && buf[idx-1] != SCM_CHAR(char_pred))
>              idx--;
>            if (idx >= 0)
>              {
> @@ -3055,6 +3092,8 @@ SCM_DEFINE (scm_string_split, "string-split", 2, 0, 0,
>              }
>          }
>      }
> +
> + done:
>    scm_remember_upto_here_1 (str);
>    return res;
>  }

[...]

Everything else looks good.  Thanks for including a full set of tests!

     Mark

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH] In string-split, add support for character sets and predicates., Daniel Hartwig, 2012/10/08
- Re: [PATCH] In string-split, add support for character sets and predicates., Mark H Weaver <=
  - Re: [PATCH] In string-split, add support for character sets and predicates., Daniel Hartwig, 2012/10/08
    - Re: [PATCH] In string-split, add support for character sets and predicates., Mark H Weaver, 2012/10/09
    - Re: [PATCH] In string-split, add support for character sets and predicates., Daniel Hartwig, 2012/10/09
    - Re: [PATCH] In string-split, add support for character sets and predicates., Mark H Weaver, 2012/10/09
    - Re: [PATCH] In string-split, add support for character sets and predicates., Daniel Hartwig, 2012/10/09
    - Re: [PATCH] In string-split, add support for character sets and predicates., Mark H Weaver, 2012/10/09
    - Re: [PATCH] In string-split, add support for character sets and predicates., Daniel Hartwig, 2012/10/09
    - Re: [PATCH] In string-split, add support for character sets and predicates., Mark H Weaver, 2012/10/10
    - Re: [PATCH] In string-split, add support for character sets and predicates., Ludovic Courtès, 2012/10/10
    - Re: [PATCH] In string-split, add support for character sets and predicates., Daniel Hartwig, 2012/10/12

Prev by Date: Re: [Guile-commits] GNU Guile branch, stable-2.0, updated. v2.0.6-31-g2446f8e
Next by Date: [PATCH] Preserve keyword in 'syntax-rules' and 'define-syntax-rule'
Previous by thread: [PATCH] In string-split, add support for character sets and predicates.
Next by thread: Re: [PATCH] In string-split, add support for character sets and predicates.
Index(es):
- Date
- Thread