m4-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: improve substr


From: Eric Blake
Subject: Re: improve substr
Date: Fri, 26 Dec 2008 00:52:04 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.18) Gecko/20081105 Thunderbird/2.0.0.18 Mnenhy/0.7.5.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Eric Blake on 12/24/2008 4:23 PM:
> Again, implementing this natively will be more efficient.  What do you think 
> of 
> adding these two enhancements to substr?
> 

Here's an implementation of the two patches; I'm now in the process of
regression testing autoconf and bison to ensure they don't trip up on the
new semantics (I'm also thinking of copying support for negative arguments
into m4sugar for the benefit of people still using m4 1.4.x, after getting
the replacement polished up for the manual).

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAklUjSQACgkQ84KuGfSFAYAIRQCgx+OdXryH7UB7VcNdCPFCna21
XksAoI7FQcqqGZs16vn8WcTUdtqI5+iQ
=x62F
-----END PGP SIGNATURE-----
>From 58675b0c10d79b45027187ba817ed1d05c7673f1 Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Fri, 26 Dec 2008 00:33:18 -0700
Subject: [PATCH] Enhance substr to support negative values.

* doc/m4.texinfo (Substr): Document new semantics, and how to
simulate old.
* src/builtin.c (m4_substr): Support negative values.
* NEWS: Document this.
---
 ChangeLog      |    8 +++
 NEWS           |    5 ++
 doc/m4.texinfo |  149 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
 src/builtin.c  |   49 ++++++++++++------
 4 files changed, 186 insertions(+), 25 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index d15e52a..da63237 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,11 @@
+2008-12-26  Eric Blake  <address@hidden>
+
+       Enhance substr to support negative values.
+       * doc/m4.texinfo (Substr): Document new semantics, and how to
+       simulate old.
+       * src/builtin.c (m4_substr): Support negative values.
+       * NEWS: Document this.
+
 2008-12-24  Eric Blake  <address@hidden>
 
        Enhance eval, as allowed by POSIX 2008.
diff --git a/NEWS b/NEWS
index 2e1a286..b5a2ebd 100644
--- a/NEWS
+++ b/NEWS
@@ -53,6 +53,11 @@ Foundation, Inc.
    the current expansion is nested within argument collection of another
    macro.  It has also been optimized for faster performance.
 
+** The `substr' builtin now treats negative arguments as indices relative
+   to the end of the string.  The manual gives an
+   example of how to recover M4 1.4.x behavior, as well as an example of
+   simulating the new negative argument semantics with older M4.
+
 ** The `-d'/`--debug' command-line option now understands `-' and `+'
    modifiers, the way the builtin `debugmode' has always done; this allows
    `-d-V' to disable prior debug settings from the command line, similar to
diff --git a/doc/m4.texinfo b/doc/m4.texinfo
index 93adb64..b26b006 100644
--- a/doc/m4.texinfo
+++ b/doc/m4.texinfo
@@ -6233,12 +6233,27 @@ Substr
 Substrings are extracted with @code{substr}:
 
 @deffn Builtin substr (@var{string}, @var{from}, @ovar{length})
-Expands to the substring of @var{string}, which starts at index
address@hidden, and extends for @var{length} characters, or to the end of
address@hidden, if @var{length} is omitted.  The starting index of a string
-is always 0.  The expansion is empty if there is an error parsing
address@hidden or @var{length}, if @var{from} is beyond the end of
address@hidden, or if @var{length} is negative.
+Performs a substring operation on @var{string}.  If @var{from} is
+positive, it represents the 0-based index where the substring begins.
+If @var{length} is omitted, the substring ends at the end of
address@hidden; if it is positive, @var{length} is added to the starting
+index to determine the ending index.
+
address@hidden @acronym{GNU} extensions
+As a @acronym{GNU} extension, if @var{from} is negative, it is added to
+the length of @var{string} to determine the starting index; if it is
+empty, the start of the string is used.  Likewise, if @var{length} is
+negative, it is added to the length of @var{string} to determine the
+ending index, and an emtpy @var{length} behaves like an omitted
address@hidden  It is not an error if either of the resulting indices lie
+outside the string, but the selected substring only contains the bytes
+of @var{string} that overlap the selected indices.  If the end point
+lies before the beginning point, the substring chosen is the empty
+string located at the starting index.
+
+The expansion is the selected substring, which may be empty.  The
+expansion is empty and a warning issued if @var{from} or @var{length}
+cannot be parsed.
 
 The macro @code{substr} is recognized only with parameters.
 @end deffn
@@ -6250,15 +6265,131 @@ Substr
 @result{}gnats
 @end example
 
-Omitting @var{from} evokes a warning, but still produces output.
+Omitting @var{from} evokes a warning, but still produces output.  On the
+other hand, selecting a @var{from} or @var{length} that lies beyond
address@hidden is not a problem.
 
 @example
 substr(`abc')
 @error{}m4:stdin:1: Warning: substr: too few arguments: 1 < 2
 @result{}abc
-substr(`abc',)
address@hidden:stdin:2: Warning: substr: empty string treated as 0
+substr(`abc', `')
 @result{}abc
+substr(`abc', `4')
address@hidden
+substr(`abc', `1', `4')
address@hidden
address@hidden example
+
+Using negative values for @var{from} or @var{length} are @acronym{GNU}
+extensions, useful for accessing a fixed size tail of an
+arbitrary-length string.  Prior to M4 1.6, using these values would
+silently result in the empty string.  Some other implementations crash
+on negative values, and many treat an explicitly empty @var{length} as
+0, which is different from the omitted @var{length} implying the rest of
+the original @var{string}.
+
address@hidden
+substr(`abcde', `2', `')
address@hidden
+substr(`abcde', `-3')
address@hidden
+substr(`abcde', `', `-3')
address@hidden
+substr(`abcde', `-6')
address@hidden
+substr(`abcde', `-6', `5')
address@hidden
+substr(`abcde', `-7', `1')
address@hidden
+substr(`abcde', `1', `-2')
address@hidden
+substr(`abcde', `-4', `-1')
address@hidden
+substr(`abcde', `4', `-3')
address@hidden
+substr(`abcdefghij', `-09', `08')
address@hidden
address@hidden example
+
+If backwards compabitility to M4 1.4.x behavior is necessary, the
+following macro is sufficient to do the job (mimicking warnings about
+empty @var{from} or @var{length} or the presence of an ignored fourth
+argument is left as an exercise to the reader).
+
address@hidden
+define(`substr', `ifelse(`$#', `0', ``$0'',
+  eval(`$2 - 0 < 0 || $3 - 0 < 0'), `1', `',
+  `builtin(`$0', `$1', `$2', `$3')')')
address@hidden
+substr(`abcde', `1', `-1')
address@hidden
+substr(`abcde', `2', `1', `C')
address@hidden
address@hidden example
+
+On the other hand, it is possible to portably emulate the @acronym{GNU}
+extension of negative @var{from} and @var{length} arguments across all
address@hidden implementations, albeit with a lot more overhead.  This
+example uses @code{incr} and @code{decr} to normalize @samp{-08} to
+something that a later @code{eval} will treat as a decimal value, rather
+than looking like an invalid octal number, while avoiding using these
+macros on an empty string.  The helper macro @code{_substr_normalize} is
+recursive, since it is easier to fix @var{length} after @var{from} has
+been normalized, with the final iteration supplying two non-negative
+arguments to the original builtin, now named @code{_substr}.
+
address@hidden options: -daq -t_substr
address@hidden
+$ @kbd{m4 -daq -t _substr}
+define(`_substr', defn(`substr'))dnl
+define(`substr', `ifelse(`$#', `0', ``$0'',
+  `_$0(`$1', _$0_normalize(len(`$1'),
+    ifelse(`$2', `', `0', `incr(decr(`$2'))'),
+    ifelse(`$3', `', `', `incr(decr(`$3'))')))')')dnl
+define(`_substr_normalize', `ifelse(
+  eval(`$2 < 0 && $1 + $2 >= 0'), `1',
+    `$0(`$1', eval(`$1 + $2'), `$3')',
+  eval(`$2 < 0')`$3', `1', ``0', `$1'',
+  eval(`$2 < 0 && $3 - 0 >= 0 && $1 + $2 + $3 - 0 >= 0'), `1',
+    `$0(`$1', `0', eval(`$1 + $2 + $3 - 0'))',
+  eval(`$2 < 0 && $3 - 0 >= 0'), `1', ``0', `0'',
+  eval(`$2 < 0'), `1', `$0(`$1', `0', `$3')',
+  `$3', `', ``$2', `$1'',
+  eval(`$3 - 0 < 0 && $1 - $2 + $3 - 0 >= 0'), `1',
+    ``$2', eval(`$1 - $2 + $3')',
+  eval(`$3 - 0 < 0'), `1', ``$2', `0'',
+  ``$2', `$3'')')dnl
+substr(`abcde', `2', `')
address@hidden: -1- _substr(`abcde', `2', `5')
address@hidden
+substr(`abcde', `-3')
address@hidden: -1- _substr(`abcde', `2', `5')
address@hidden
+substr(`abcde', `', `-3')
address@hidden: -1- _substr(`abcde', `0', `2')
address@hidden
+substr(`abcde', `-6')
address@hidden: -1- _substr(`abcde', `0', `5')
address@hidden
+substr(`abcde', `-6', `5')
address@hidden: -1- _substr(`abcde', `0', `4')
address@hidden
+substr(`abcde', `-7', `1')
address@hidden: -1- _substr(`abcde', `0', `0')
address@hidden
+substr(`abcde', `1', `-2')
address@hidden: -1- _substr(`abcde', `1', `2')
address@hidden
+substr(`abcde', `-4', `-1')
address@hidden: -1- _substr(`abcde', `1', `3')
address@hidden
+substr(`abcde', `4', `-3')
address@hidden: -1- _substr(`abcde', `4', `0')
address@hidden
+substr(`abcdefghij', `-09', `08')
address@hidden: -1- _substr(`abcdefghij', `1', `8')
address@hidden
 @end example
 
 @node Translit
diff --git a/src/builtin.c b/src/builtin.c
index 33ef9e5..d3825ea 100644
--- a/src/builtin.c
+++ b/src/builtin.c
@@ -1861,22 +1861,26 @@ m4_index (struct obstack *obs, int argc, 
macro_arguments *argv)
   shipout_int (obs, retval);
 }
 
-/*-------------------------------------------------------------------------.
-| The macro "substr" extracts substrings from the first argument, starting |
-| from the index given by the second argument, extending for a length     |
-| given by the third argument.  If the third argument is missing, the     |
-| substring extends to the end of the first argument.                     |
-`-------------------------------------------------------------------------*/
+/*-------------------------------------------------------------------.
+| The macro "substr" extracts substrings from the first argument,    |
+| starting from the index given by the second argument, extending    |
+| for a length given by the third argument.  If the third argument   |
+| is missing or empty, the substring extends to the end of the first |
+| argument.  As an extension, negative arguments are treated as             |
+| indices relative to the string length.  Also, if a fourth argument |
+| is supplied, the original string is output with the selected      |
+| substring replaced by the argument.                               |
+`-------------------------------------------------------------------*/
 
 static void
 m4_substr (struct obstack *obs, int argc, macro_arguments *argv)
 {
   const call_info *me = arg_info (argv);
   int start = 0;
+  int end;
   int length;
-  int avail;
 
-  if (bad_argc (me, argc, 2, 3))
+  if (bad_argc (me, argc, 2, 4))
     {
       /* builtin(`substr') is blank, but substr(`abc') is abc.  */
       if (argc == 2)
@@ -1884,19 +1888,32 @@ m4_substr (struct obstack *obs, int argc, 
macro_arguments *argv)
       return;
     }
 
-  length = avail = ARG_LEN (1);
-  if (!numeric_arg (me, ARG (2), &start))
+  length = ARG_LEN (1);
+  if (!arg_empty (argv, 2) && !numeric_arg (me, ARG (2), &start))
     return;
+  if (start < 0)
+    start += length;
 
-  if (argc >= 4 && !numeric_arg (me, ARG (3), &length))
-    return;
+  if (arg_empty (argv, 3))
+    end = length;
+  else
+    {
+      if (!numeric_arg (me, ARG (3), &end))
+       return;
+      if (end < 0)
+       end += length;
+      else
+       end += start;
+    }
 
-  if (start < 0 || length <= 0 || start >= avail)
+  if (start < 0)
+    start = 0;
+  if (length < end)
+    end = length;
+  if (end <= start)
     return;
 
-  if (start + length > avail)
-    length = avail - start;
-  obstack_grow (obs, ARG (1) + start, length);
+  obstack_grow (obs, ARG (1) + start, end - start);
 }
 
 /*------------------------------------------------------------------.
-- 
1.6.0.4


>From 28abf48ba5b417658c2401f85592cf1e54991965 Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Fri, 26 Dec 2008 00:45:24 -0700
Subject: [PATCH] Enhance substr to support replacement text.

* doc/m4.texinfo (Substr): Document new semantics.
* src/builtin.c (m4_substr): Support optional fourth argument.
* NEWS: Document this.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog      |    5 +++++
 NEWS           |    3 ++-
 doc/m4.texinfo |   34 +++++++++++++++++++++++++++++++---
 src/builtin.c  |   20 ++++++++++++++++++++
 4 files changed, 58 insertions(+), 4 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index da63237..8e3a915 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,10 @@
 2008-12-26  Eric Blake  <address@hidden>
 
+       Enhance substr to support replacement text.
+       * doc/m4.texinfo (Substr): Document new semantics.
+       * src/builtin.c (m4_substr): Support optional fourth argument.
+       * NEWS: Document this.
+
        Enhance substr to support negative values.
        * doc/m4.texinfo (Substr): Document new semantics, and how to
        simulate old.
diff --git a/NEWS b/NEWS
index b5a2ebd..641d8dd 100644
--- a/NEWS
+++ b/NEWS
@@ -54,7 +54,8 @@ Foundation, Inc.
    macro.  It has also been optimized for faster performance.
 
 ** The `substr' builtin now treats negative arguments as indices relative
-   to the end of the string.  The manual gives an
+   to the end of the string, and accepts an optional fourth argument of
+   text to supply in place of the selected substring.  The manual gives an
    example of how to recover M4 1.4.x behavior, as well as an example of
    simulating the new negative argument semantics with older M4.
 
diff --git a/doc/m4.texinfo b/doc/m4.texinfo
index b26b006..e54abb2 100644
--- a/doc/m4.texinfo
+++ b/doc/m4.texinfo
@@ -6232,7 +6232,8 @@ Substr
 @cindex substrings, extracting
 Substrings are extracted with @code{substr}:
 
address@hidden Builtin substr (@var{string}, @var{from}, @ovar{length})
address@hidden Builtin substr (@var{string}, @var{from}, @ovar{length}, @
+  @ovar{replace})
 Performs a substring operation on @var{string}.  If @var{from} is
 positive, it represents the 0-based index where the substring begins.
 If @var{length} is omitted, the substring ends at the end of
@@ -6251,9 +6252,13 @@ Substr
 lies before the beginning point, the substring chosen is the empty
 string located at the starting index.
 
-The expansion is the selected substring, which may be empty.  The
+If @var{replace} is omitted, then the expansion is only the selected
+substring, which may be empty.  As a @acronym{GNU} extension,if
address@hidden is provided, then the expansion is the original
address@hidden with the selected substring replaced by @var{replace}.  The
 expansion is empty and a warning issued if @var{from} or @var{length}
-cannot be parsed.
+cannot be parsed, or if @var{replace} is provided but the selected
+indices do not overlap with @var{string}.
 
 The macro @code{substr} is recognized only with parameters.
 @end deffn
@@ -6312,6 +6317,29 @@ Substr
 @result{}bcdefghi
 @end example
 
+Another useful @acronym{GNU} extension, also added in M4 1.6, is the
+ability to replace a substring within the original @var{string}.  An
+empty length substring at the beginning or end of @var{string} is valid,
+but selecting a substring that does not overlap @var{string} causes a
+warning.
+
address@hidden
+substr(`abcde', `1', `3', `t')
address@hidden
+substr(`abcde', `5', `', `f')
address@hidden
+substr(`abcde', `-3', `-4', `f')
address@hidden
+substr(`abcde', `-6', `1', `f')
address@hidden
+substr(`abcde', `-7', `1', `f')
address@hidden:stdin:5: Warning: substr: substring out of range
address@hidden
+substr(`abcde', `6', `', `f')
address@hidden:stdin:6: Warning: substr: substring out of range
address@hidden
address@hidden example
+
 If backwards compabitility to M4 1.4.x behavior is necessary, the
 following macro is sufficient to do the job (mimicking warnings about
 empty @var{from} or @var{length} or the presence of an ignored fourth
diff --git a/src/builtin.c b/src/builtin.c
index d3825ea..6f04072 100644
--- a/src/builtin.c
+++ b/src/builtin.c
@@ -1906,6 +1906,26 @@ m4_substr (struct obstack *obs, int argc, 
macro_arguments *argv)
        end += start;
     }
 
+  if (argc >= 5)
+    {
+      /* Replacement text provided.  */
+      if (end < start)
+       end = start;
+      if (end < 0 || length < start)
+       {
+         m4_warn (0, me, _("substring out of range"));
+         return;
+       }
+      if (start < 0)
+       start = 0;
+      if (length < end)
+       end = length;
+      obstack_grow (obs, ARG (1), start);
+      push_arg (obs, argv, 4);
+      obstack_grow (obs, ARG (1) + end, length - end);
+      return;
+    }
+
   if (start < 0)
     start = 0;
   if (length < end)
-- 
1.6.0.4


reply via email to

[Prev in Thread] Current Thread [Next in Thread]