m4-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

argv_ref patch 26: allow NUL in macro definitions


From: Eric Blake
Subject: argv_ref patch 26: allow NUL in macro definitions
Date: Sun, 03 Aug 2008 22:41:32 -0600
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.16) Gecko/20080708 Thunderbird/2.0.0.16 Mnenhy/0.7.5.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Next in the series.  The master branch already had most of the work done,
thanks to hiding symbol definitions behind accessor methods, but
branch-1.6 was using quite a bit of strlen.  Macro definitions are now
tracked by length rather than NUL termination, so that all 256 bytes are
transparently supported in macro definitions.  The branch is slightly
faster due to expand_user_macro using memchr rather than bytewise searches
for the next $ byte, but I didn't see a way to easily add that in the
master branch without improving the syntax table to track whether only a
single character is being used in the DOLLAR syntax category.  No real
change in memory usage.

2008-08-03  Eric Blake  <address@hidden>

        Stage 26: Allow embedded NUL in macro definitions.
        Track macro definitions by length, to allow embedded NUL.  Make
        arg_len callers aware of the issue of flattening builtins when
        determining length.  Optimize loops that scan a definition.
        Memory impact: none.
        Speed impact: slight improvement, due to faster scans.
        * src/m4.h (set_word_regexp, arg_len, define_user_macro): Add
        parameters.
        (SYMBOL_TEXT_LEN): New macro.
        (ARG_LEN): Adjust callers.
        * src/builtin.c (define_user_macro): Add a parameter.
        (builtin_init, define_macro): Adjust callers.
        (m4_dumpdef, m4_defn, m4_changeword): Handle embedded NULs.
        (expand_user_macro): Handle embedded NUL, and speed up search for
        embedded $.
        * src/macro.c (arg_len): Add parameter.
        * src/input.c (set_word_regexp): Add parameter.
        (input_init): Adjust caller.
        * src/m4.c (main): Likewise.
        * src/freeze.c (dump_symbol_CB): Preserve NUL on freeze.
        (reload_frozen_state): Retrieve NUL on load.
        * doc/m4.texinfo (Builtin, Using frozen files): Enhance tests.
        * examples/null.m4: Likewise.
        * examples/null.out: Update expected output.
        * examples/null.err: Likewise.

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkiWiHsACgkQ84KuGfSFAYCeUgCgmC8Ng06k+60OWpHvXEpbkw7T
sVYAoLNU3LNSxdbjlR0Rp49IyyhL/7op
=Fwv9
-----END PGP SIGNATURE-----
From acb5619a320c331d846e1a4ad51cf9b72829e5c9 Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Sun, 3 Aug 2008 22:23:11 -0600
Subject: [PATCH] Stage 26: Allow embedded NUL in macro definitions.

* m4/m4module.h (m4_arg_len): Add parameter.
(M4ARGLEN): Provide default for the parameter.
* m4/m4private.h (includes): Share xmemdup0.h among all libm4
files.
* m4/macro.c (m4_arg_len): Fail if builtins are not flattened.
* m4/syntax.c (includes): Rely on m4private.h for xmemdup0.
* m4/symtab.c (includes): Likewise.
(m4_symbol_value_copy): Use xmemdup0.
* m4/module.c (install_macro_table): Likewise.
* src/freeze.c (reload_frozen_state): Likewise.
* tests/freeze.at (reloading nul): Enhance test.
* tests/null.m4: Likewise.
* tests/null.err: Update expected output.
* tests/null.out: Likewise.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog       |   21 +++++++++++++++++++++
 m4/m4module.h   |    4 ++--
 m4/m4private.h  |    1 +
 m4/macro.c      |   23 ++++++++++-------------
 m4/module.c     |    3 +--
 m4/symtab.c     |    5 ++---
 m4/syntax.c     |    1 -
 src/freeze.c    |    4 ++--
 tests/freeze.at |    4 ++--
 tests/null.err  |   10 ++++++----
 tests/null.m4   |   23 ++++++++++++++---------
 tests/null.out  |    7 ++++---
 12 files changed, 65 insertions(+), 41 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index c08f6f1..b94dbe0 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,26 @@
 2008-08-03  Eric Blake  <address@hidden>
 
+       Stage 26: Allow embedded NUL in macro definitions.
+       Clean up final few locations that did not track macro definitions
+       by length, to allow embedded NUL.  Make m4_arg_len callers aware
+       of issue of flattening builtins when determining length.
+       Memory impact: none.
+       Speed impact: none noticed.
+       * m4/m4module.h (m4_arg_len): Add parameter.
+       (M4ARGLEN): Provide default for the parameter.
+       * m4/m4private.h (includes): Share xmemdup0.h among all libm4
+       files.
+       * m4/macro.c (m4_arg_len): Fail if builtins are not flattened.
+       * m4/syntax.c (includes): Rely on m4private.h for xmemdup0.
+       * m4/symtab.c (includes): Likewise.
+       (m4_symbol_value_copy): Use xmemdup0.
+       * m4/module.c (install_macro_table): Likewise.
+       * src/freeze.c (reload_frozen_state): Likewise.
+       * tests/freeze.at (reloading nul): Enhance test.
+       * tests/null.m4: Likewise.
+       * tests/null.err: Update expected output.
+       * tests/null.out: Likewise.
+
        Fix regression in commenting unbalanced quotes, from 2008-02-16.
        * m4/m4private.h (m4__token_type): Add M4_TOKEN_COMMENT.
        * m4/input.c (m4__next_token, m4_print_token): Supply new token
diff --git a/m4/m4module.h b/m4/m4module.h
index 70a062e..c17c98a 100644
--- a/m4/m4module.h
+++ b/m4/m4module.h
@@ -165,7 +165,7 @@ struct m4_string_pair
 /* Grab the length of the text contents of argument I, or abort if the
    argument is not text.  Assumes that `m4 *context' and
    `m4_macro_args *argv' are in scope.  */
-#define M4ARGLEN(i) m4_arg_len (context, argv, i)
+#define M4ARGLEN(i) m4_arg_len (context, argv, i, false)
 
 extern bool    m4_bad_argc        (m4 *, size_t, const m4_call_info *, size_t,
                                    size_t, bool);
@@ -362,7 +362,7 @@ extern const char *m4_arg_text              (m4 *, 
m4_macro_args *, size_t, bool);
 extern bool    m4_arg_equal            (m4 *, m4_macro_args *, size_t,
                                         size_t);
 extern bool    m4_arg_empty            (m4_macro_args *, size_t);
-extern size_t  m4_arg_len              (m4 *, m4_macro_args *, size_t);
+extern size_t  m4_arg_len              (m4 *, m4_macro_args *, size_t, bool);
 extern m4_builtin_func *m4_arg_func    (m4_macro_args *, size_t);
 extern m4_obstack *m4_arg_scratch      (m4 *);
 extern m4_macro_args *m4_make_argv_ref (m4 *, m4_macro_args *, const char *,
diff --git a/m4/m4private.h b/m4/m4private.h
index 603af64..71249af 100644
--- a/m4/m4private.h
+++ b/m4/m4private.h
@@ -25,6 +25,7 @@
 #include <ltdl.h>
 
 #include "cloexec.h"
+#include "xmemdup0.h"
 
 typedef struct m4__search_path_info m4__search_path_info;
 typedef struct m4__macro_arg_stacks m4__macro_arg_stacks;
diff --git a/m4/macro.c b/m4/macro.c
index 5653576..7e6ef47 100644
--- a/m4/macro.c
+++ b/m4/macro.c
@@ -1413,9 +1413,10 @@ m4_arg_empty (m4_macro_args *argv, size_t arg)
 }
 
 /* Given ARGV, return the length of argument ARG.  Abort if the
-   argument is not text.  Indices beyond argc return 0.  */
+   argument is not text and FLATTEN is not true.  Indices beyond argc
+   return 0.  */
 size_t
-m4_arg_len (m4 *context, m4_macro_args *argv, size_t arg)
+m4_arg_len (m4 *context, m4_macro_args *argv, size_t arg, bool flatten)
 {
   m4_symbol_value *value;
   m4__symbol_chain *chain;
@@ -1428,7 +1429,7 @@ m4_arg_len (m4 *context, m4_macro_args *argv, size_t arg)
     }
   if (argv->argc <= arg)
     return 0;
-  value = m4_arg_symbol (argv, arg);
+  value = arg_symbol (argv, arg, NULL, flatten);
   if (m4_is_symbol_value_text (value))
     return m4_get_symbol_value_len (value);
   assert (value->type == M4_SYMBOL_COMP);
@@ -1444,6 +1445,9 @@ m4_arg_len (m4 *context, m4_macro_args *argv, size_t arg)
        case M4__CHAIN_STR:
          len += chain->u.u_s.len;
          break;
+       case M4__CHAIN_FUNC:
+         assert (flatten);
+         break;
        case M4__CHAIN_ARGV:
          i = chain->u.u_a.index;
          limit = chain->u.u_a.argv->argc - i - chain->u.u_a.skip_last;
@@ -1454,15 +1458,8 @@ m4_arg_len (m4 *context, m4_macro_args *argv, size_t arg)
            len += (quotes->len1 + quotes->len2) * limit;
          len += limit - 1;
          while (limit--)
-           {
-             /* TODO handle concatenation of builtins.  */
-             if (m4_is_symbol_value_func (m4_arg_symbol (chain->u.u_a.argv,
-                                                         i)))
-               assert (argv->flatten);
-             else
-               len += m4_arg_len (context, chain->u.u_a.argv, i);
-             i++;
-           }
+           len += m4_arg_len (context, chain->u.u_a.argv, i++,
+                              flatten || chain->u.u_a.flatten);
          break;
        default:
          assert (!"m4_arg_len");
@@ -1470,7 +1467,7 @@ m4_arg_len (m4 *context, m4_macro_args *argv, size_t arg)
        }
       chain = chain->next;
     }
-  assert (len);
+  assert (len || flatten);
   return len;
 }
 
diff --git a/m4/module.c b/m4/module.c
index 58dad59..bb24573 100644
--- a/m4/module.c
+++ b/m4/module.c
@@ -179,8 +179,7 @@ install_macro_table (m4 *context, m4_module *module)
          /* Sanity check that builtins meet the required interface.  */
          assert (mp->min_args <= mp->max_args);
 
-         m4_set_symbol_value_text (value, xmemdup (mp->value, len + 1),
-                                   len, 0);
+         m4_set_symbol_value_text (value, xmemdup0 (mp->value, len), len, 0);
          VALUE_MODULE (value) = module;
          VALUE_MIN_ARGS (value) = mp->min_args;
          VALUE_MAX_ARGS (value) = mp->max_args;
diff --git a/m4/symtab.c b/m4/symtab.c
index 025fbcd..61c4a64 100644
--- a/m4/symtab.c
+++ b/m4/symtab.c
@@ -21,7 +21,6 @@
 #include <config.h>
 
 #include "m4private.h"
-#include "xmemdup0.h"
 
 /* Define this to see runtime debug info.  Implied by DEBUG.  */
 /*#define DEBUG_SYM */
@@ -504,8 +503,8 @@ m4_symbol_value_copy (m4 *context, m4_symbol_value *dest, 
m4_symbol_value *src)
        size_t len = m4_get_symbol_value_len (src);
        unsigned int age = m4_get_symbol_value_quote_age (src);
        m4_set_symbol_value_text (dest,
-                                 xmemdup (m4_get_symbol_value_text (src),
-                                          len + 1), len, age);
+                                 xmemdup0 (m4_get_symbol_value_text (src),
+                                           len), len, age);
       }
       break;
     case M4_SYMBOL_FUNC:
diff --git a/m4/syntax.c b/m4/syntax.c
index 8dda94e..1fb4815 100644
--- a/m4/syntax.c
+++ b/m4/syntax.c
@@ -21,7 +21,6 @@
 #include <config.h>
 
 #include "m4private.h"
-#include "xmemdup0.h"
 
 /* Define this to see runtime debug info.  Implied by DEBUG.  */
 /*#define DEBUG_SYNTAX */
diff --git a/src/freeze.c b/src/freeze.c
index 7261b09..d61a8df 100644
--- a/src/freeze.c
+++ b/src/freeze.c
@@ -27,6 +27,7 @@
 #include "binary-io.h"
 #include "close-stream.h"
 #include "quotearg.h"
+#include "xmemdup0.h"
 
 static void  produce_mem_dump          (FILE *, const char *, size_t);
 static void  produce_resyntax_dump     (m4 *, FILE *);
@@ -929,8 +930,7 @@ ill-formed frozen file, version 2 directive `%c' 
encountered"), 'T');
            if (number[2] > 0)
              module = m4__module_find (string[2]);
 
-           m4_set_symbol_value_text (token, xmemdup (string[1],
-                                                     number[1] + 1),
+           m4_set_symbol_value_text (token, xmemdup0 (string[1], number[1]),
                                      number[1], 0);
            VALUE_MODULE (token) = module;
            VALUE_MAX_ARGS (token) = -1;
diff --git a/tests/freeze.at b/tests/freeze.at
index a3b4b35..995c4a6 100644
--- a/tests/freeze.at
+++ b/tests/freeze.at
@@ -386,11 +386,11 @@ AT_KEYWORDS([frozen])
 
 dnl AT_DATA can't generate NUL bytes (at least, not in all shells).
 # Skip the test if printf(1) is insufficient.
-AT_CHECK([printf 'define(-\0-,hi)changequote([,\0])changecom(--\0)dnl
+AT_CHECK([printf 'define(-\0-,\0-\0)changequote([,\0])changecom(--\0)dnl
 divert(1)undivert(null.out)' || exit 77],
  [0], [stdout], [ignore])
 mv stdout frozen.m4
-printf 'divert(0)[divnum\0] @%:@-- indir(-\0-)\n' > unfrozen.m4
+printf 'divert(0)[divnum\0] @%:@-- len(indir(-\0-))\n' > unfrozen.m4
 
 # First generate the `expout' output by running over the sources before
 # freezing.
diff --git a/tests/null.err b/tests/null.err
index 9a3f322..74ec09d 100644
--- a/tests/null.err
+++ b/tests/null.err
@@ -6,17 +6,19 @@ m4trace: -1- dumpdef(echo/) -> /
 changesyntax:
 m4:null.m4:46: Warning: changesyntax: undefined syntax code: `\0'
 defn:
-m4:null.m4:54: Warning: defn: undefined macro `\0-\0'
+m4:null.m4:55: Warning: defn: undefined macro `\0-\0'
 dumpdef:
-m4:null.m4:66: Warning: dumpdef: undefined macro `\0-\0'
+m4:null.m4:68: Warning: dumpdef: undefined macro `\0-\0'
 :      `empty'
 -:     `dash'
 --:   ``$0': $1'
 --:   ``$0': $1'
 --:    `dashes'
+body:  `--'
 errprint: -- --
 indir:
-m4:null.m4:96: Warning: indir: undefined macro `\0-\0'
-m4:null.m4:98: Warning: \0\0%%: extra arguments ignored: 1 > 0
+m4:null.m4:99: Warning: indir: undefined macro `\0-\0'
+m4:null.m4:101: Warning: \0\0%%: extra arguments ignored: 1 > 0
 traceon:
 m4trace: -1- --(`--') -> `strange: --'
+m4trace: -1- body -> `-'
diff --git a/tests/null.m4 b/tests/null.m4
index 18a5e1d..77b6e67 100644
--- a/tests/null.m4
+++ b/tests/null.m4
@@ -48,13 +48,15 @@ dnl Ignored by changesyntax: TODO - support ignored 
category?
 dnl Warning from debugfile: not tested yet. No file name includes NUL, needs 
to warn
 dnl Warning from debugmode: not tested yet. NUL not a valid mode, needs to warn
 dnl Warning from decr: not tested yet. NUL not a number, needs to warn
-dnl Definition of define: not tested yet
+dnl Definition of define:
+`define:' define(`body', `--')body
 dnl Undefined argument of defn:
 errprint(`defn:
 ')defn(`-')dnl
 dnl Defined macro name in defn:
-`defn:' defn(`--')
-dnl Macro contents in defn: not tested yet
+`defn:' defn(`--')dnl
+dnl Macro contents in defn:
+ defn(`body')
 dnl Argument to divert: not tested yet. NUL not a number, needs to warn
 dnl Passed through diversion by divert:
 divert(`1')`divert:' --
@@ -66,7 +68,8 @@ errprint(`dumpdef:
 ')dumpdef(`-')dnl
 dnl Defined macro names in dumpdef:
 dumpdef(`--', `-', `', `--', `--')dnl
-dnl Macro contents in dumpdef: not tested yet
+dnl Macro contents in dumpdef:
+dumpdef(`body')dnl
 dnl Passed through errprint:
 errprint(`errprint:' --, `--
 ')dnl
@@ -126,8 +129,9 @@ dnl Defined argument of popdef:
 `popdef:' popdef(`--')ifdef(`--', `oops', `ok')
 dnl Undefined argument of popdef: not tested yet. Should it warn?
 dnl Macro name of pushdef:
-`pushdef:' pushdef(`--', `strange: $1')ifdef(`--', `ok', `oops')
-dnl Definition of pushdef: not tested yet
+`pushdef:' pushdef(`--', `strange: $1')ifdef(`--', `ok', `oops')`'dnl
+dnl Definition of pushdef:
+ pushdef(`body', `-')body
 dnl Bad regex in regexp: not tested yet
 dnl First argument of regexp:
 `regexp:' regexp(`ab', `b')dnl
@@ -153,10 +157,11 @@ dnl Passed to syscmd: not tested yet. NUL truncates 
string, needs to warn
 dnl Sysval takes no arguments, and never produces NUL.
 dnl Passed to traceoff:
 traceoff(`--', `')dnl
-dnl Macro name and arguments of traceon: not perfect yet
+dnl Macro name and arguments of traceon:
 `traceon:' errprint(`traceon:
-')traceon(`--')indir(`--', `--')
-dnl Defined text of traceon: not tested yet, needs quoting
+')traceon(`--')indir(`--', `--')dnl
+dnl Defined text of traceon:
+ traceon(`body')body
 dnl First argument of translit: not tested yet
 dnl Single character in other arguments of translit: not tested yet
 dnl Character ranges of translit: not tested yet
diff --git a/tests/null.out b/tests/null.out
index 9e48a6a..5f6df39 100644
--- a/tests/null.out
+++ b/tests/null.out
@@ -7,7 +7,8 @@ builtin: 3
 changecom: echo//echo --echo-
 changequote: echoecho echo
 changesyntax: -- --: dash echo .... dash- nul
-defn: `$0': $1
+define: --
+defn: `$0': $1 --
 divert: --
 esyscmd: [] 0
 ifdef: yes: -- no: --
@@ -18,10 +19,10 @@ len: 1 3
 m4symbols: --
 patsubst: .. -- abc -!- ---
 popdef: ok
-pushdef: ok
+pushdef: ok -
 regexp: 2 ! 0 -
 shift: --,--
 substr: --
-traceon: strange: --
+traceon: strange: -- -
 undefine: ok
 m4wrap: --
-- 
1.5.6.4

From a3a7734d1beabbb438656461076258f5ff32c08b Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Mon, 14 Jan 2008 17:25:13 -0700
Subject: [PATCH] Stage 26: Allow embedded NUL in macro definitions.

* src/m4.h (set_word_regexp, arg_len, define_user_macro): Add
parameters.
(SYMBOL_TEXT_LEN): New macro.
(ARG_LEN): Adjust callers.
* src/builtin.c (define_user_macro): Add a parameter.
(builtin_init, define_macro): Adjust callers.
(m4_dumpdef, m4_defn, m4_changeword): Handle embedded NULs.
(expand_user_macro): Handle embedded NUL, and speed up search for
embedded $.
* src/macro.c (arg_len): Add parameter.
* src/input.c (set_word_regexp): Add parameter.
(input_init): Adjust caller.
* src/m4.c (main): Likewise.
* src/freeze.c (dump_symbol_CB): Preserve NUL on freeze.
(reload_frozen_state): Retrieve NUL on load.
* doc/m4.texinfo (Builtin, Using frozen files): Enhance tests.
* examples/null.m4: Likewise.
* examples/null.out: Update expected output.
* examples/null.err: Likewise.

(cherry picked from commit cb26d7cb8b438224908d53df59b1d394ba1928f8)

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog         |   26 ++++++++++++++++++++
 doc/m4.texinfo    |   13 ++++++++--
 examples/null.err |   18 ++++++++-----
 examples/null.m4  |   39 ++++++++++++++++++++----------
 examples/null.out |   10 ++++---
 src/builtin.c     |   67 +++++++++++++++++++++++++++++++---------------------
 src/freeze.c      |    6 ++--
 src/input.c       |   28 +++++++++++++--------
 src/m4.c          |    2 +-
 src/m4.h          |   10 ++++---
 src/macro.c       |   25 +++++++------------
 11 files changed, 155 insertions(+), 89 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 325bf7a..7a50b85 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,31 @@
 2008-08-03  Eric Blake  <address@hidden>
 
+       Stage 26: Allow embedded NUL in macro definitions.
+       Track macro definitions by length, to allow embedded NUL.  Make
+       arg_len callers aware of the issue of flattening builtins when
+       determining length.  Optimize loops that scan a definition.
+       Memory impact: none.
+       Speed impact: slight improvement, due to faster scans.
+       * src/m4.h (set_word_regexp, arg_len, define_user_macro): Add
+       parameters.
+       (SYMBOL_TEXT_LEN): New macro.
+       (ARG_LEN): Adjust callers.
+       * src/builtin.c (define_user_macro): Add a parameter.
+       (builtin_init, define_macro): Adjust callers.
+       (m4_dumpdef, m4_defn, m4_changeword): Handle embedded NULs.
+       (expand_user_macro): Handle embedded NUL, and speed up search for
+       embedded $.
+       * src/macro.c (arg_len): Add parameter.
+       * src/input.c (set_word_regexp): Add parameter.
+       (input_init): Adjust caller.
+       * src/m4.c (main): Likewise.
+       * src/freeze.c (dump_symbol_CB): Preserve NUL on freeze.
+       (reload_frozen_state): Retrieve NUL on load.
+       * doc/m4.texinfo (Builtin, Using frozen files): Enhance tests.
+       * examples/null.m4: Likewise.
+       * examples/null.out: Update expected output.
+       * examples/null.err: Likewise.
+
        Fix regression in commenting unbalanced quotes, from 2008-02-16.
        * src/m4.h (enum token_type): Add TOKEN_COMMENT.
        * src/input.c (next_token, peek_token, token_type_string)
diff --git a/doc/m4.texinfo b/doc/m4.texinfo
index d8e2625..7f3cb49 100644
--- a/doc/m4.texinfo
+++ b/doc/m4.texinfo
@@ -2684,6 +2684,13 @@ builtin(`builtin')
 builtin(`builtin',)
 @error{}m4:stdin:4: Warning: builtin: undefined builtin `'
 @result{}
+builtin(`builtin', ``'
+')
address@hidden:stdin:5: Warning: builtin: undefined builtin ``\'\n'
address@hidden
+indir(`index')
address@hidden:stdin:7: Warning: index: too few arguments: 0 < 2
address@hidden
 @end example
 
 @ignore
@@ -7153,13 +7160,13 @@ ifdef(`__unix__', ,
       `errprint(` skipping: syscmd does not have unix semantics
 ')m4exit(`77')')dnl
 changequote(`[', `]')dnl
-syscmd([printf 'define(-\0-,hi)changequote([,\0])changecom(--\0)dnl
+syscmd([printf 'define(-\0-,\0-\0)changequote([,\0])changecom(--\0)dnl
 divert(1)undivert(null.out)' | ]__program__[ -F in.m4f \
-     && printf 'errprint([divnum\0] #-- indir(-\0-))' \
+     && printf 'errprint([divnum\0] #-- len(indir(-\0-)))' \
        | ]__program__[ -R in.m4f \
      && rm in.m4f])errprint([ ]sysval[
 ])dnl
address@hidden #-- hi 0
address@hidden #-- 3 0
 @end example
 @end ignore
 
diff --git a/examples/null.err b/examples/null.err
index 5f989ee..897ce34 100644
--- a/examples/null.err
+++ b/examples/null.err
@@ -1,20 +1,24 @@
 builtin:
-m4:examples/null.m4:19: Warning: builtin: undefined builtin `-\0-'
+m4:examples/null.m4:21: Warning: builtin: undefined builtin `-\0-'
 changequote:
 echo:  address@hidden/
 m4trace: -1- dumpdef(echo/) -> /
+changeword:
+m4:examples/null.m4:43: Warning: changeword: bad regular expression `\\\0\\': 
Trailing backslash
 defn:
-m4:examples/null.m4:45: Warning: defn: undefined macro `\0-\0'
+m4:examples/null.m4:54: Warning: defn: undefined macro `\0-\0'
 dumpdef:
-m4:examples/null.m4:57: Warning: dumpdef: undefined macro `\0-\0'
+m4:examples/null.m4:67: Warning: dumpdef: undefined macro `\0-\0'
 :      `empty'
 -:     `dash'
---:   `odd name: $1'
---:   `odd name: $1'
+--:   ``$0': $1'
+--:   ``$0': $1'
 --:    `dashes'
+body:  `--'
 errprint: -- --
 indir:
-m4:examples/null.m4:87: Warning: indir: undefined macro `\0-\0'
-m4:examples/null.m4:89: Warning: \0\0%%: extra arguments ignored: 1 > 0
+m4:examples/null.m4:98: Warning: indir: undefined macro `\0-\0'
+m4:examples/null.m4:100: Warning: \0\0%%: extra arguments ignored: 1 > 0
 traceon:
 m4trace: -1- --(`--') -> `strange: --'
+m4trace: -1- body -> `-'
diff --git a/examples/null.m4 b/examples/null.m4
index de76742..1823073 100644
--- a/examples/null.m4
+++ b/examples/null.m4
@@ -13,6 +13,8 @@ dnl Passed through $1, $*, $@:
 define(`echo', address@hidden')define(`', `empty')dnl
 define(`-', `dash')define(`--', `dashes')dnl
 user: echo(--,`11')
+dnl Macro name of define:
+define(`--', ``$0': $1')dnl
 dnl All macros matching __*__ take no arguments, and never produce NUL.
 dnl First argument of builtin:
 errprint(`builtin:
@@ -32,20 +34,28 @@ dnl Quotes in trace and dump output:
 errprint(`changequote:
 ')traceon(`dumpdef')dumpdef(`echo'changequote(,/))changequote`'dnl
 traceoff(`dumpdef')dnl
-dnl Used in changeword (if changeword available): not tested yet
-dnl Bad regex in changeword: not tested yet
+dnl Used in changeword (if changeword available):
+`changeword:' ifdef(`changeword', `', `define(`changeword')define(`c')')dnl
+changeword(`[-_a-zA-Z0-9]+')-- dnl
+ifdef(`c', `--: dash', `--(-)')`'changeword()dnl
+dnl Bad regex in changeword:
+errprint(`changeword:
+')changeword(`\\')
+ifdef(`c', `errprint(__program__:__file__:decr(__line__): Warning: dnl
+`changeword: bad regular expression `\\\0\\': Trailing backslash
+')')dnl
 dnl Warning from debugfile: not tested yet. No file name includes NUL, needs 
to warn
 dnl Warning from debugmode: not tested yet. NUL not a valid mode, needs to warn
 dnl Warning from decr: not tested yet. NUL not a number, needs to warn
-dnl Macro name of define:
-define(`--', `odd name: $1')dnl
-dnl Definition of define: not tested yet
+dnl Definition of define:
+`define:' define(`body', `--')body
 dnl Undefined argument of defn:
 errprint(`defn:
 ')defn(`-')dnl
 dnl Defined macro name in defn:
-`defn:' defn(`--')
-dnl Macro contents in defn: not tested yet
+`defn:' defn(`--')dnl
+dnl Macro contents in defn:
+ defn(`body')
 dnl Argument to divert: not tested yet. NUL not a number, needs to warn
 dnl Passed through diversion by divert:
 divert(`1')`divert:' --
@@ -57,7 +67,8 @@ errprint(`dumpdef:
 ')dumpdef(`-')dnl
 dnl Defined macro names in dumpdef:
 dumpdef(`--', `-', `', `--', `--')dnl
-dnl Macro contents in dumpdef: not tested yet, needs quoting
+dnl Macro contents in dumpdef:
+dumpdef(`body')dnl
 dnl Passed through errprint:
 errprint(`errprint:' --, `--
 ')dnl
@@ -111,8 +122,9 @@ dnl Defined argument of popdef:
 `popdef:' popdef(`--')ifdef(`--', `oops', `ok')
 dnl Undefined argument of popdef: not tested yet. Should it warn?
 dnl Macro name of pushdef:
-`pushdef:' pushdef(`--', `strange: $1')ifdef(`--', `ok', `oops')
-dnl Definition of pushdef: not tested yet
+`pushdef:' pushdef(`--', `strange: $1')ifdef(`--', `ok', `oops')`'dnl
+dnl Definition of pushdef:
+ pushdef(`body', `-')body
 dnl Bad regex in regexp: not tested yet
 dnl First argument of regexp:
 `regexp:' regexp(`ab', `b')dnl
@@ -133,10 +145,11 @@ dnl Passed to syscmd: not tested yet. NUL truncates 
string, needs to warn
 dnl Sysval takes no arguments, and never produces NUL.
 dnl Passed to traceoff:
 traceoff(`--', `')dnl
-dnl Macro name and arguments of traceon: not perfect yet, needs quoting
+dnl Macro name and arguments of traceon:
 `traceon:' errprint(`traceon:
-')traceon(`--')indir(`--', `--')
-dnl Defined text of traceon: not tested yet, needs quoting
+')traceon(`--')indir(`--', `--')dnl
+dnl Defined text of traceon:
+ traceon(`body')body
 dnl First argument of translit: not tested yet
 dnl Single character in other arguments of translit: not tested yet
 dnl Character ranges of translit: not tested yet
diff --git a/examples/null.out b/examples/null.out
index 5e90221..dd83416 100644
--- a/examples/null.out
+++ b/examples/null.out
@@ -6,20 +6,22 @@ user: .--.--,11.--,11.
 builtin: 3
 changecom: echo//echo --echo-
 changequote: echoecho echo
-defn: odd name: $1
+changeword: -- --: dash
+define: --
+defn: `$0': $1 --
 divert: --
 esyscmd: [] 0
 ifdef: yes: -- no: --
 ifelse: yes: --
 index: 2 -1 -1 8
-indir: odd name: 11 0 3
+indir: --: 11 0 3
 len: 1 3
 patsubst: .. -- abc -!- ---
 popdef: ok
-pushdef: ok
+pushdef: ok -
 regexp: 2 ! 0 -
 shift: --,--
 substr: --
-traceon: strange: --
+traceon: strange: -- -
 undefine: ok
 m4wrap: --
diff --git a/src/builtin.c b/src/builtin.c
index f8a3f3c..cc21ea2 100644
--- a/src/builtin.c
+++ b/src/builtin.c
@@ -429,26 +429,32 @@ free_regex (void)
       }
 }
 
-/*-----------------------------------------------------------------.
-| Define a predefined or user-defined macro, with name NAME of     |
-| length NAME_LEN, and expansion TEXT.  MODE is SYMBOL_INSERT for  |
-| "define" or SYMBOL_PUSHDEF for "pushdef".  This function is also |
-| used from main ().                                               |
-`-----------------------------------------------------------------*/
+/*------------------------------------------------------------------.
+| Define a predefined or user-defined macro, with name NAME of      |
+| length NAME_LEN, and expansion TEXT of length LEN.  LEN may be    |
+| SIZE_MAX, to use the string length of TEXT instead.  MODE is      |
+| SYMBOL_INSERT for "define" or SYMBOL_PUSHDEF for "pushdef".  This |
+| function is also used from main ().                               |
+`------------------------------------------------------------------*/
 
 void
 define_user_macro (const char *name, size_t name_len, const char *text,
-                  symbol_lookup mode)
+                  size_t len, symbol_lookup mode)
 {
   symbol *s;
-  char *defn = xstrdup (text ? text : "");
+  char *defn;
 
+  assert (text);
+  if (len == SIZE_MAX)
+    len = strlen (text);
+  defn = xmemdup (text, len);
   s = lookup_symbol (name, name_len, mode);
   if (SYMBOL_TYPE (s) == TOKEN_TEXT)
     free (SYMBOL_TEXT (s));
 
   SYMBOL_TYPE (s) = TOKEN_TEXT;
   SYMBOL_TEXT (s) = defn;
+  SYMBOL_TEXT_LEN (s) = len;
   SYMBOL_MACRO_ARGS (s) = true;
 
   /* Implement --warn-macro-sequence.  */
@@ -456,7 +462,6 @@ define_user_macro (const char *name, size_t name_len, const 
char *text,
     {
       regoff_t offset = 0;
       struct re_registers *regs = &macro_sequence_regs;
-      size_t len = strlen (defn);
 
       while (offset < len
             && (offset = re_search (&macro_sequence_buf, defn, len, offset,
@@ -515,13 +520,13 @@ builtin_init (void)
       {
        if (pp->unix_name != NULL)
          define_user_macro (pp->unix_name, strlen (pp->unix_name),
-                            pp->func, SYMBOL_INSERT);
+                            pp->func, SIZE_MAX, SYMBOL_INSERT);
       }
     else
       {
        if (pp->gnu_name != NULL)
          define_user_macro (pp->gnu_name, strlen (pp->gnu_name),
-                            pp->func, SYMBOL_INSERT);
+                            pp->func, SIZE_MAX, SYMBOL_INSERT);
       }
 }
 
@@ -675,7 +680,7 @@ define_macro (int argc, macro_arguments *argv, 
symbol_lookup mode)
 
   if (argc == 2)
     {
-      define_user_macro (ARG (1), ARG_LEN (1), "", mode);
+      define_user_macro (ARG (1), ARG_LEN (1), "", 0, mode);
       return;
     }
 
@@ -685,7 +690,8 @@ define_macro (int argc, macro_arguments *argv, 
symbol_lookup mode)
       m4_warn (0, me, _("cannot concatenate builtins"));
       /* fallthru */
     case TOKEN_TEXT:
-      define_user_macro (ARG (1), ARG_LEN (1), arg_text (argv, 2, true), mode);
+      define_user_macro (ARG (1), ARG_LEN (1), arg_text (argv, 2, true),
+                        arg_len (argv, 2, true), mode);
       break;
 
     case TOKEN_FUNC:
@@ -914,7 +920,8 @@ m4_dumpdef (struct obstack *obs, int argc, macro_arguments 
*argv)
        case TOKEN_TEXT:
          if (debug_level & DEBUG_TRACE_QUOTE)
            fwrite (curr_quote.str1, 1, curr_quote.len1, debug);
-         fputs (SYMBOL_TEXT (data.base[0]), debug);
+         fwrite (SYMBOL_TEXT (data.base[0]), 1,
+                 SYMBOL_TEXT_LEN (data.base[0]), debug);
          if (debug_level & DEBUG_TRACE_QUOTE)
            fwrite (curr_quote.str2, 1, curr_quote.len2, debug);
          break;
@@ -1049,7 +1056,7 @@ m4_defn (struct obstack *obs, int argc, macro_arguments 
*argv)
        {
        case TOKEN_TEXT:
          obstack_grow (obs, curr_quote.str1, curr_quote.len1);
-         obstack_grow (obs, SYMBOL_TEXT (s), strlen (SYMBOL_TEXT (s)));
+         obstack_grow (obs, SYMBOL_TEXT (s), SYMBOL_TEXT_LEN (s));
          obstack_grow (obs, curr_quote.str2, curr_quote.len2);
          break;
 
@@ -1422,7 +1429,7 @@ m4_changeword (struct obstack *obs, int argc, 
macro_arguments *argv)
 
   if (bad_argc (me, argc, 1, 1))
     return;
-  set_word_regexp (me, ARG (1));
+  set_word_regexp (me, ARG (1), ARG_LEN (1));
 }
 
 #endif /* ENABLE_CHANGEWORD */
@@ -2305,29 +2312,31 @@ void
 expand_user_macro (struct obstack *obs, symbol *sym,
                   int argc, macro_arguments *argv)
 {
-  const char *text;
+  const char *text = SYMBOL_TEXT (sym);
+  size_t len = SYMBOL_TEXT_LEN (sym);
   int i;
+  const char *dollar = memchr (text, '$', len);
 
-  for (text = SYMBOL_TEXT (sym); *text != '\0';)
+  while (dollar)
     {
-      if (*text != '$')
-       {
-         obstack_1grow (obs, *text);
-         text++;
-         continue;
-       }
-      text++;
-      switch (*text)
+      obstack_grow (obs, text, dollar - text);
+      len -= dollar - text;
+      text = dollar;
+      if (len == 1)
+       break;
+      len--;
+      switch (*++text)
        {
        case '0': case '1': case '2': case '3': case '4':
        case '5': case '6': case '7': case '8': case '9':
          if (no_gnu_extensions)
            {
              i = *text++ - '0';
+             len--;
            }
          else
            {
-             for (i = 0; isdigit (to_uchar (*text)); text++)
+             for (i = 0; len && isdigit (to_uchar (*text)); text++, len--)
                i = i * 10 + (*text - '0');
            }
          push_arg (obs, argv, i);
@@ -2336,17 +2345,21 @@ expand_user_macro (struct obstack *obs, symbol *sym,
        case '#':               /* number of arguments */
          shipout_int (obs, argc - 1);
          text++;
+         len--;
          break;
 
        case '*':               /* all arguments */
        case '@':               /* ... same, but quoted */
          push_args (obs, argv, false, *text == '@');
          text++;
+         len--;
          break;
 
        default:
          obstack_1grow (obs, '$');
          break;
        }
+      dollar = memchr (text, '$', len);
     }
+  obstack_grow (obs, text, len);
 }
diff --git a/src/freeze.c b/src/freeze.c
index 2a7d9dc..c45722f 100644
--- a/src/freeze.c
+++ b/src/freeze.c
@@ -75,9 +75,9 @@ dump_symbol_CB (symbol *sym, void *f)
        case TOKEN_TEXT:
          xfprintf (file, "T%d,%d\n",
                    (int) SYMBOL_NAME_LEN (sym),
-                   (int) strlen (SYMBOL_TEXT (sym)));
+                   (int) SYMBOL_TEXT_LEN (sym));
          fwrite (SYMBOL_NAME (sym), 1, SYMBOL_NAME_LEN (sym), file);
-         fputs (SYMBOL_TEXT (sym), file);
+         fwrite (SYMBOL_TEXT (sym), 1, SYMBOL_TEXT_LEN (sym), file);
          fputc ('\n', file);
          break;
 
@@ -379,7 +379,7 @@ reload_frozen_state (const char *name)
 
              /* Enter a macro having an expansion text as a definition.  */
 
-             define_user_macro (string[0], number[0], string[1],
+             define_user_macro (string[0], number[0], string[1], number[1],
                                 SYMBOL_PUSHDEF);
              break;
 
diff --git a/src/input.c b/src/input.c
index 4f969b7..b967087 100644
--- a/src/input.c
+++ b/src/input.c
@@ -1309,7 +1309,7 @@ input_init (void)
   curr_comm.len2 = 1;
 
 #ifdef ENABLE_CHANGEWORD
-  set_word_regexp (NULL, user_word_regexp);
+  set_word_regexp (NULL, user_word_regexp, SIZE_MAX);
 #endif /* ENABLE_CHANGEWORD */
 
   set_quote_age ();
@@ -1406,19 +1406,24 @@ set_comment (const char *bc, size_t bc_len, const char 
*ec, size_t ec_len)
 
 #ifdef ENABLE_CHANGEWORD
 
-/*-------------------------------------------------------------------.
-| Set the regular expression for recognizing words to REGEXP, and    |
-| report errors on behalf of CALLER.  If REGEXP is NULL, revert back |
-| to the default parsing rules.                                      |
-`-------------------------------------------------------------------*/
+/*-----------------------------------------------------------------.
+| Set the regular expression for recognizing words to REGEXP of    |
+| length LEN, and report errors on behalf of CALLER.  If REGEXP is |
+| NULL, revert back to the default parsing rules.  If LEN is       |
+| SIZE_MAX, use strlen(REGEXP) instead.                            |
+`-----------------------------------------------------------------*/
 
 void
-set_word_regexp (const call_info *caller, const char *regexp)
+set_word_regexp (const call_info *caller, const char *regexp, size_t len)
 {
   const char *msg;
   struct re_pattern_buffer new_word_regexp;
 
-  if (!*regexp || !strcmp (regexp, DEFAULT_WORD_REGEXP))
+  if (len == SIZE_MAX)
+    len = strlen (regexp);
+  if (len == 0
+      || (len == strlen (DEFAULT_WORD_REGEXP)
+         && !memcmp (regexp, DEFAULT_WORD_REGEXP, len)))
     {
       default_word_regexp = true;
       set_quote_age ();
@@ -1427,12 +1432,13 @@ set_word_regexp (const call_info *caller, const char 
*regexp)
 
   /* Dry run to see whether the new expression is compilable.  */
   init_pattern_buffer (&new_word_regexp, NULL);
-  msg = re_compile_pattern (regexp, strlen (regexp), &new_word_regexp);
+  msg = re_compile_pattern (regexp, len, &new_word_regexp);
   regfree (&new_word_regexp);
 
   if (msg != NULL)
     {
-      m4_warn (0, caller, _("bad regular expression `%s': %s"), regexp, msg);
+      m4_warn (0, caller, _("bad regular expression %s: %s"),
+              quotearg_style_mem (locale_quoting_style, regexp, len), msg);
       return;
     }
 
@@ -1442,7 +1448,7 @@ set_word_regexp (const call_info *caller, const char 
*regexp)
      by the final regfree.  */
   if (!word_regexp.fastmap)
     word_regexp.fastmap = xcharalloc (UCHAR_MAX + 1);
-  msg = re_compile_pattern (regexp, strlen (regexp), &word_regexp);
+  msg = re_compile_pattern (regexp, len, &word_regexp);
   assert (!msg);
   re_set_registers (&word_regexp, &regs, regs.num_regs, regs.start, regs.end);
   if (re_compile_fastmap (&word_regexp))
diff --git a/src/m4.c b/src/m4.c
index 551d80c..1bb1ec7 100644
--- a/src/m4.c
+++ b/src/m4.c
@@ -623,7 +623,7 @@ main (int argc, char *const *argv, char *const *envp)
            const char *value = strchr (defines->arg, '=');
            size_t len = value ? value - defines->arg : strlen (defines->arg);
            define_user_macro (defines->arg, len, value ? value + 1 : "",
-                              SYMBOL_INSERT);
+                              value ? SIZE_MAX : 0, SYMBOL_INSERT);
          }
          break;
 
diff --git a/src/m4.h b/src/m4.h
index 40aa5ec..8da7d3c 100644
--- a/src/m4.h
+++ b/src/m4.h
@@ -381,7 +381,7 @@ extern string_pair curr_quote;
 void set_quotes (const char *, size_t, const char *, size_t);
 void set_comment (const char *, size_t, const char *, size_t);
 #ifdef ENABLE_CHANGEWORD
-void set_word_regexp (const call_info *, const char *);
+void set_word_regexp (const call_info *, const char *, size_t);
 #endif
 unsigned int quote_age (void);
 bool safe_quotes (void);
@@ -438,6 +438,7 @@ struct symbol
 #define SYMBOL_NAME_LEN(S)     ((S)->len)
 #define SYMBOL_TYPE(S)         (TOKEN_DATA_TYPE (&(S)->data))
 #define SYMBOL_TEXT(S)         (TOKEN_DATA_TEXT (&(S)->data))
+#define SYMBOL_TEXT_LEN(S)     (TOKEN_DATA_LEN (&(S)->data))
 #define SYMBOL_FUNC(S)         (TOKEN_DATA_FUNC (&(S)->data))
 
 typedef enum symbol_lookup symbol_lookup;
@@ -467,7 +468,7 @@ token_data_type arg_type (macro_arguments *, unsigned int);
 const char *arg_text (macro_arguments *, unsigned int, bool);
 bool arg_equal (macro_arguments *, unsigned int, unsigned int);
 bool arg_empty (macro_arguments *, unsigned int);
-size_t arg_len (macro_arguments *, unsigned int);
+size_t arg_len (macro_arguments *, unsigned int, bool);
 builtin_func *arg_func (macro_arguments *, unsigned int);
 struct obstack *arg_scratch (void);
 bool arg_print (struct obstack *, macro_arguments *, unsigned int,
@@ -487,7 +488,7 @@ void wrap_args (macro_arguments *);
 
 /* Grab the text length at argv index I.  Assumes macro_argument *argv
    is in scope, and aborts if the argument is not text.  */
-#define ARG_LEN(i) arg_len (argv, i)
+#define ARG_LEN(i) arg_len (argv, i, false)
 
 
 /* File: builtin.c  --- builtins.  */
@@ -523,7 +524,8 @@ bool bad_argc (const call_info *, int, unsigned int, 
unsigned int);
 void define_builtin (const char *, size_t, const builtin *, symbol_lookup);
 void set_macro_sequence (const char *);
 void free_regex (void);
-void define_user_macro (const char *, size_t, const char *, symbol_lookup);
+void define_user_macro (const char *, size_t, const char *, size_t,
+                       symbol_lookup);
 void undivert_all (void);
 void expand_user_macro (struct obstack *, symbol *, int, macro_arguments *);
 void m4_placeholder (struct obstack *, int, macro_arguments *);
diff --git a/src/macro.c b/src/macro.c
index 9d8ffbb..d1f70e9 100644
--- a/src/macro.c
+++ b/src/macro.c
@@ -1128,9 +1128,10 @@ arg_empty (macro_arguments *argv, unsigned int arg)
 }
 
 /* Given ARGV, return the length of argument ARG.  Abort if the
-   argument is not text.  Indices beyond argc return 0.  */
+   argument is not text.  Indices beyond argc return 0.  If FLATTEN,
+   builtins are ignored.  */
 size_t
-arg_len (macro_arguments *argv, unsigned int arg)
+arg_len (macro_arguments *argv, unsigned int arg, bool flatten)
 {
   token_data *token;
   token_chain *chain;
@@ -1143,7 +1144,7 @@ arg_len (macro_arguments *argv, unsigned int arg)
     }
   if (arg >= argv->argc)
     return 0;
-  token = arg_token (argv, arg, NULL, false);
+  token = arg_token (argv, arg, NULL, flatten);
   switch (TOKEN_DATA_TYPE (token))
     {
     case TOKEN_TEXT:
@@ -1163,9 +1164,8 @@ arg_len (macro_arguments *argv, unsigned int arg)
              len += chain->u.u_s.len;
              break;
            case CHAIN_FUNC:
-             /* TODO concatenate builtins.  */
-             assert (!"implemented");
-             abort ();
+             assert (flatten);
+             break;
            case CHAIN_ARGV:
              i = chain->u.u_a.index;
              limit = chain->u.u_a.argv->argc - i - chain->u.u_a.skip_last;
@@ -1176,15 +1176,8 @@ arg_len (macro_arguments *argv, unsigned int arg)
                len += (quotes->len1 + quotes->len2) * limit;
              len += limit - 1;
              while (limit--)
-               {
-                 /* TODO handle builtin concatenation.  */
-                 if (TOKEN_DATA_TYPE (arg_token (chain->u.u_a.argv, i, NULL,
-                                                 false)) == TOKEN_FUNC)
-                   assert (argv->flatten);
-                 else
-                   len += arg_len (chain->u.u_a.argv, i);
-                 i++;
-               }
+               len += arg_len (chain->u.u_a.argv, i++,
+                               flatten || chain->u.u_a.flatten);
              break;
            default:
              assert (!"arg_len");
@@ -1192,7 +1185,7 @@ arg_len (macro_arguments *argv, unsigned int arg)
            }
          chain = chain->next;
        }
-      assert (len);
+      assert (len || flatten);
       return len;
     case TOKEN_FUNC:
     default:
-- 
1.5.6.4


reply via email to

[Prev in Thread] Current Thread [Next in Thread]