guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: regexp-split for Guile


From: nalaginrut
Subject: Re: regexp-split for Guile
Date: Tue, 18 Sep 2012 20:59:33 +0800

I had the same topic before:
http://lists.gnu.org/archive/html/guile-devel/2011-12/msg00247.html
Actually, there's an older thread than mine before:
http://old.nabble.com/regex-split-for-Guile-td31093245.html

Anyway, if there're so many people like this nice thing, why not we add
it (at any option of these three implementations) into ice-9?


On Mon, 2012-09-17 at 10:01 -0400, Chris K. Jester-Young wrote: 
> Hi there,
> 
> I'm currently implementing regexp-split for Guile, which provides a
> Perl-style split function (including correctly implementing the "limit"
> parameter), minus the special awk-style whitespace handling (that is
> used with a pattern of " ", as opposed to / /, with Perl's split).
> 
> Attached is a couple of patches, to support the regexp-split function
> which I'm proposing at the bottom of this message:
> 
> 1. The first fixes the behaviour of fold-matches and list-matches when
>    the pattern contains a ^ (identical to the patch in my last email).
> 2. The second adds the ability to limit the number of matches done.
>    This applies on top of the first patch.
> 
> Some comments about the regexp-split implementation: the value that's
> being passed to regexp-split-fold is a cons, where the car is the last
> match's end position, and the cdr is the substrings so far collected.
> 
> The special check in regexp-split-fold for match-end being zero is to
> emulate a specific behaviour as documented for Perl's split: "Empty
> leading fields are produced when there are positive-width matches at
> the beginning of the string; a zero-width match at the beginning of the
> string does not produce an empty field."
> 
> Below is the implementation; comments are welcome! If it all looks good,
> I'll write tests and documentation, with a view to eventually putting it
> into (ice-9 regex).
> 
> Thanks,
> Chris.
> 
>                       *       *       *
> 
> (define (regexp-split-fold match prev)
>   (if (zero? (match:end match)) prev
>       (cons* (match:end match)
>              (substring (match:string match) (car prev) (match:start match))
>              (cdr prev))))
> 
> (define (string-empty? str)
>   (zero? (string-length str)))
> 
> (define* (regexp-split pat str #:optional (limit 0))
>   (let* ((result (fold-matches pat str '(0) regexp-split-fold 0
>                                (if (positive? limit) (1- limit) #f)))
>          (final (cons (substring str (car result)) (cdr result))))
>     (reverse! (if (zero? limit) (drop-while string-empty? final) final))))
> differences between files attachment
> (0001-In-fold-matches-set-regexp-notbol-unless-matching-st.patch)
> From da8b0cd523f6e9bf9e1d46829cccf01e3115c614 Mon Sep 17 00:00:00 2001
> From: "Chris K. Jester-Young" <address@hidden>
> Date: Sun, 16 Sep 2012 02:20:56 -0400
> Subject: [PATCH 1/2] In fold-matches, set regexp/notbol unless matching
>  string start.
> 
> * module/ice-9/regex.scm (fold-matches): Set regexp/notbol if the
>   starting position is nonzero.
> * test-suite/tests/regexp.test (fold-matches): Check that when
>   matching /^foo/ against "foofoofoofoo", only one match results.
> ---
>  module/ice-9/regex.scm       |    3 ++-
>  test-suite/tests/regexp.test |    9 ++++++++-
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/module/ice-9/regex.scm b/module/ice-9/regex.scm
> index f7b94b7..08ae2c2 100644
> --- a/module/ice-9/regex.scm
> +++ b/module/ice-9/regex.scm
> @@ -172,8 +172,9 @@
>      (let loop ((start 0)
>                 (value init)
>                 (abuts #f))              ; True if start abuts a previous 
> match.
> +      (define bol (if (zero? start) 0 regexp/notbol))
>        (let ((m (if (> start (string-length string)) #f
> -                   (regexp-exec regexp string start flags))))
> +                   (regexp-exec regexp string start (logior flags bol)))))
>          (cond
>           ((not m) value)
>           ((and (= (match:start m) (match:end m)) abuts)
> diff --git a/test-suite/tests/regexp.test b/test-suite/tests/regexp.test
> index ef59465..d549df2 100644
> --- a/test-suite/tests/regexp.test
> +++ b/test-suite/tests/regexp.test
> @@ -132,7 +132,14 @@
>                     (lambda (match result)
>                       (cons (match:substring match)
>                             result))
> -                   (logior regexp/notbol regexp/noteol)))))
> +                   (logior regexp/notbol regexp/noteol))))
> +
> +  (pass-if "regexp/notbol is set correctly"
> +    (equal? '("foo")
> +            (fold-matches "^foo" "foofoofoofoo" '()
> +                          (lambda (match result)
> +                            (cons (match:substring match)
> +                                  result))))))
>  
> 
>  ;;;
> differences between files attachment
> (0002-Add-limit-parameter-to-fold-matches-and-list-matches.patch)
> From 147dc0d7fd9ab04d10b4f13cecf47a32c5b6c4b6 Mon Sep 17 00:00:00 2001
> From: "Chris K. Jester-Young" <address@hidden>
> Date: Mon, 17 Sep 2012 01:06:07 -0400
> Subject: [PATCH 2/2] Add "limit" parameter to fold-matches and list-matches.
> 
> * doc/ref/api-regex.texi: Document new "limit" parameter.
> 
> * module/ice-9/regex.scm (fold-matches, list-matches): Optionally take
>   a "limit" argument that, if specified, limits how many times the
>   pattern is matched.
> 
> * test-suite/tests/regexp.test (fold-matches): Add tests for the correct
>   functioning of the limit parameter.
> ---
>  doc/ref/api-regex.texi       |   10 ++++++----
>  module/ice-9/regex.scm       |   18 ++++++++++--------
>  test-suite/tests/regexp.test |   16 +++++++++++++++-
>  3 files changed, 31 insertions(+), 13 deletions(-)
> 
> diff --git a/doc/ref/api-regex.texi b/doc/ref/api-regex.texi
> index 082fb87..2d2243f 100644
> --- a/doc/ref/api-regex.texi
> +++ b/doc/ref/api-regex.texi
> @@ -189,11 +189,12 @@ or @code{#f} otherwise.
>  @end deffn
>  
>  @sp 1
> address@hidden {Scheme Procedure} list-matches regexp str [flags]
> address@hidden {Scheme Procedure} list-matches regexp str [flags [limit]]
>  Return a list of match structures which are the non-overlapping
>  matches of @var{regexp} in @var{str}.  @var{regexp} can be either a
>  pattern string or a compiled regexp.  The @var{flags} argument is as
> -per @code{regexp-exec} above.
> +per @code{regexp-exec} above.  The @var{limit} argument, if specified,
> +limits how many times @var{regexp} is matched.
>  
>  @example
>  (map match:substring (list-matches "[a-z]+" "abc 42 def 78"))
> @@ -201,11 +202,12 @@ per @code{regexp-exec} above.
>  @end  example
>  @end deffn
>  
> address@hidden {Scheme Procedure} fold-matches regexp str init proc [flags]
> address@hidden {Scheme Procedure} fold-matches regexp str init proc [flags 
> [limit]]
>  Apply @var{proc} to the non-overlapping matches of @var{regexp} in
>  @var{str}, to build a result.  @var{regexp} can be either a pattern
>  string or a compiled regexp.  The @var{flags} argument is as per
> address@hidden above.
> address@hidden above.  The @var{limit} argument, if specified,
> +limits how many times @var{regexp} is matched.
>  
>  @var{proc} is called as @code{(@var{proc} match prev)} where
>  @var{match} is a match structure and @var{prev} is the previous return
> diff --git a/module/ice-9/regex.scm b/module/ice-9/regex.scm
> index 08ae2c2..0ffe74c 100644
> --- a/module/ice-9/regex.scm
> +++ b/module/ice-9/regex.scm
> @@ -167,26 +167,28 @@
>  ;;; `b'.  Around or within `xxx', only the match covering all three
>  ;;; x's counts, because the rest are not maximal.
>  
> -(define* (fold-matches regexp string init proc #:optional (flags 0))
> +(define* (fold-matches regexp string init proc #:optional (flags 0) limit)
>    (let ((regexp (if (regexp? regexp) regexp (make-regexp regexp))))
>      (let loop ((start 0)
> +               (count 0)
>                 (value init)
>                 (abuts #f))              ; True if start abuts a previous 
> match.
> -      (define bol (if (zero? start) 0 regexp/notbol))
> -      (let ((m (if (> start (string-length string)) #f
> -                   (regexp-exec regexp string start (logior flags bol)))))
> +      (let* ((bol (if (zero? start) 0 regexp/notbol))
> +             (m (and (or (not limit) (< count limit))
> +                     (<= start (string-length string))
> +                     (regexp-exec regexp string start (logior flags bol)))))
>          (cond
>           ((not m) value)
>           ((and (= (match:start m) (match:end m)) abuts)
>            ;; We matched an empty string, but that would overlap the
>            ;; match immediately before.  Try again at a position
>            ;; further to the right.
> -          (loop (+ start 1) value #f))
> +          (loop (1+ start) count value #f))
>           (else
> -          (loop (match:end m) (proc m value) #t)))))))
> +          (loop (match:end m) (1+ count) (proc m value) #t)))))))
>  
> -(define* (list-matches regexp string #:optional (flags 0))
> -  (reverse! (fold-matches regexp string '() cons flags)))
> +(define* (list-matches regexp string #:optional (flags 0) limit)
> +  (reverse! (fold-matches regexp string '() cons flags limit)))
>  
>  (define (regexp-substitute/global port regexp string . items)
>  
> diff --git a/test-suite/tests/regexp.test b/test-suite/tests/regexp.test
> index d549df2..c3ba698 100644
> --- a/test-suite/tests/regexp.test
> +++ b/test-suite/tests/regexp.test
> @@ -139,7 +139,21 @@
>              (fold-matches "^foo" "foofoofoofoo" '()
>                            (lambda (match result)
>                              (cons (match:substring match)
> -                                  result))))))
> +                                  result)))))
> +
> +  (pass-if "without limit"
> +    (equal? '("foo" "foo" "foo" "foo")
> +            (fold-matches "foo" "foofoofoofoo" '()
> +                          (lambda (match result)
> +                            (cons (match:substring match)
> +                                  result)))))
> +
> +  (pass-if "with limit"
> +    (equal? '("foo" "foo")
> +            (fold-matches "foo" "foofoofoofoo" '()
> +                          (lambda (match result)
> +                            (cons (match:substring match)
> +                                  result)) 0 2))))
>  
> 
>  ;;;





reply via email to

[Prev in Thread] Current Thread [Next in Thread]