autoconf-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: improve AS_ESCAPE


From: Eric Blake
Subject: Re: improve AS_ESCAPE
Date: Mon, 2 Mar 2009 17:30:59 +0000 (UTC)
User-agent: Loom/3.14 (http://gmane.org/)

Ralf Wildenhues <Ralf.Wildenhues <at> gmx.de> writes:

Hi Ralf,

> a couple of comments on this patch (thanks as always for all your work
> on this!)

Thanks for your reviews, even if they do come in later than you would like.

> > address@hidden AS_ESCAPE (@var{string}, @dvar{chars, `\"$})
> > address@hidden
> > +Expands to @var{string}, with any characters in @var{chars} escaped with
> > +a backslash (@samp{\}).  @var{chars} should be at most four bytes long,
> 
> I'm not too happy with this seemingly arbitrary limitation, and just for
> the sake of optimization, but I guess in practice it will be one of the
> less problematic ones.  Oh well.
> 
> > +and only contain characters from the set @samp{`\"$};
> 
> (and this limitation, of course; with it, the former isn't a problem)

We never supported -, [, or ], which meant you have never been able to use this 
macro to safely escape all shell meta-characters for use in an unquoted shell 
context.  Additionally, I could not find any uses in the wild trying to quote 
any more than just the four special characters within "" context.

A more general macro can be written based on m4_bpatsubst that supports more 
than four characters, but it is also a slower macro (the point of these 
optimizations is that it is faster to use m4_translit and m4_index to check if 
anything needs escaping, since most user strings need no escaping, than to pay 
the price of compiling a regex for every string on the off-chance that it might 
need escaping).

> 
> > however,
> > +characters may be safely listed more than once in @var{chars} for the
> > +sake of syntax highlighting editors.  The current implementation expands
> > address@hidden after adding escapes; if @var{string} contains macro calls
> > +that have text that needs quoting, you can use
> 
> Does "quoting" refer to M4 or to shell quoting here?  This text is not
> easy to read.

shell quoting.  I'll add that.

> 
> > +This macro is often used with @code{AS_ECHO}.  For example, this snippet
> > +will produce shell code that outputs the four lines @samp{"$foo" =
> > +"bar"}, @samp{macro}, @samp{a, b}, and @samp{a, \b}:
> 
> The first @samp will be line-wrapped in the info page here, which is
> unfortunate, given that it's valuable to see the verbatim results.  How
> about making them a four-line result @example?

Done.

> 
> > address@hidden
> > +foo=bar
> > +AS_ECHO(["AS_ESCAPE(["$foo" = ])AS_ESCAPE(["$foo"], [""])"])
> > +m4_define([macro], [a, [\b]])
> > +AS_ECHO(["AS_ESCAPE([[macro]])"])
> > +AS_ECHO(["AS_ESCAPE([macro])"])
> > +AS_ECHO(["AS_ESCAPE(m4_dquote(m4_expand([macro])))"])
> > address@hidden example
> 
> > address@hidden Should we add AS_ESCAPE_SINGLE? If we do, we can optimize in
> > address@hidden the case of @var{string} that does not contain '.
> 
> What would it do?

AS_ECHO(['AS_ESCAPE_SINGLE([a'"b])'])
=> $as_echo 'a'\''"b'

that is, AS_ESCAPE_SINGLE provides the shell-escaping necessary in a '' shell 
context, comparable to AS_ESCAPE in the "" context.  I'm not convinced it is 
necessary, though, hence the @comment.  What do you think?  Nuke the @comment, 
or implement the macro?

> > @@ -68,7 +68,7 @@ m4_define([AH_OUTPUT], [])
> >  # Quote for Perl '' strings, which are those used by Autoheader.
> >  m4_define([AH_VERBATIM],
> >  [AS_LITERAL_IF([$1],
> > -          [AH_OUTPUT([$1], AS_ESCAPE([[$2]], [\\'']))])])
> > +          [AH_OUTPUT([$1], AS_ESCAPE([[$2]], [\']))])])
> 
> Why is this change needed?  Or is it a cosmetic fix only?

Speed.  AS_ESCAPE([[$2]], [\']) is faster than AS_ESCAPE([[$2]], [[\\'']]), 
because the former calls m4_translit with a two-byte change, where m4 can use 
memchr2 to rapidly scan a word at a time for instances of one of those two 
bytes, while the latter calls m4_translit with a four-byte change, where m4 
must create a 256-byte mapping table and process the string a byte at a time.  
It doesn't take much input length for vector operations to outperform byte-wise.

> 
> > --- a/lib/m4sugar/m4sh.m4
> > +++ b/lib/m4sugar/m4sh.m4
> > @@ -680,18 +680,33 @@ m4_define([AS_MESSAGE_LOG_FD])
> 
> > +# _AS_ESCAPE(STRING, KEY, SET)
> > +# ----------------------------
> > +# Backslash-escape all instances of the singly byte KEY or up to four
> 
> s/singly/single/

Thanks.

> 
> > +# bytes in SET occurring in STRING.  Although a character can occur
> > +# multiple times, optimum efficiency occurs when KEY and SET are
> > +# distinct, and when SET does not exceed two bytes.  These particular
> > +# semantics allow for the fewest number of parses of STRING, as well
> > +# as taking advantage of newer m4's optimizations when m4_translit is
> 
> s/newer m4's optimizations/optimizations in M4 versions XX and newer/
> ?

m4 1.4.13.  Changed.


From: Eric Blake <address@hidden>
Date: Mon, 2 Mar 2009 10:29:35 -0700
Subject: [PATCH] Improve wording for AS_ESCAPE.

* doc/autoconf.texi (Common Shell Constructs) <AS_ESCAPE>: Touch
up documentation.
* lib/m4sugar/m4sh.m4 (_AS_ESCAPE): Fix comment typos.
Reported by Ralf Wildenhues.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog           |    6 ++++++
 doc/autoconf.texi   |   11 +++++++----
 lib/m4sugar/m4sh.m4 |    6 +++---
 3 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 7546c75..789474c 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,11 @@
 2009-03-02  Eric Blake  <address@hidden>

+       Improve wording for AS_ESCAPE.
+       * doc/autoconf.texi (Common Shell Constructs) <AS_ESCAPE>: Touch
+       up documentation.
+       * lib/m4sugar/m4sh.m4 (_AS_ESCAPE): Fix comment typos.
+       Reported by Ralf Wildenhues.
+
        Silence AC_CACHE_VAL if not part of AC_MSG sequence.
        * lib/autoconf/general.m4 (AC_MSG_CHECKING, AC_MSG_RESULT): Track
        pairs of calls.
diff --git a/doc/autoconf.texi b/doc/autoconf.texi
index 9e3fc46..b511bee 100644
--- a/doc/autoconf.texi
+++ b/doc/autoconf.texi
@@ -12356,7 +12356,7 @@ Common Shell Constructs
 characters may be safely listed more than once in @var{chars} for the
 sake of syntax highlighting editors.  The current implementation expands
 @var{string} after adding escapes; if @var{string} contains macro calls
-that have text that needs quoting, you can use
+that in turn expand to text needing shell quoting, you can use
 @code{AS_ESCAPE(m4_dquote(m4_expand([string])))}.

 The default for @var{chars} (@samp{\"$`}) is the set of characters
@@ -12371,17 +12371,20 @@ Common Shell Constructs
 expansions (such as @address@hidden "hi"address@hidden) that would be broken
 with improper escapes.

-This macro is often used with @code{AS_ECHO}.  For example, this snippet
-will produce shell code that outputs the four lines @samp{"$foo" =
-"bar"}, @samp{macro}, @samp{a, b}, and @samp{a, \b}:
+This macro is often used with @code{AS_ECHO}.  For an example, observe
+the output generated by the shell code generated from this snippet:

 @example
 foo=bar
 AS_ECHO(["AS_ESCAPE(["$foo" = ])AS_ESCAPE(["$foo"], [""])"])
address@hidden"$foo" = "bar"
 m4_define([macro], [a, [\b]])
 AS_ECHO(["AS_ESCAPE([[macro]])"])
address@hidden
 AS_ECHO(["AS_ESCAPE([macro])"])
address@hidden, b
 AS_ECHO(["AS_ESCAPE(m4_dquote(m4_expand([macro])))"])
address@hidden, \b
 @end example

 @comment Should we add AS_ESCAPE_SINGLE? If we do, we can optimize in
diff --git a/lib/m4sugar/m4sh.m4 b/lib/m4sugar/m4sh.m4
index e033063..d884ead 100644
--- a/lib/m4sugar/m4sh.m4
+++ b/lib/m4sugar/m4sh.m4
@@ -696,13 +696,13 @@ m4_define([AS_ESCAPE],

 # _AS_ESCAPE(STRING, KEY, SET)
 # ----------------------------
-# Backslash-escape all instances of the singly byte KEY or up to four
+# Backslash-escape all instances of the single byte KEY or up to four
 # bytes in SET occurring in STRING.  Although a character can occur
 # multiple times, optimum efficiency occurs when KEY and SET are
 # distinct, and when SET does not exceed two bytes.  These particular
 # semantics allow for the fewest number of parses of STRING, as well
-# as taking advantage of newer m4's optimizations when m4_translit is
-# passed SET of size 2 or smaller.
+# as taking advantage of the optimizations in m4 1.4.13+ when
+# m4_translit is passed SET of size 2 or smaller.
 m4_define([_AS_ESCAPE],
 [m4_if(m4_index(m4_translit([[$1]], [$3], [$2$2$2$2]), [$2]), [-1],
        [$0_], [m4_bpatsubst])([$1], [[$2$3]], [\\\&])])
-- 
1.6.1.2








reply via email to

[Prev in Thread] Current Thread [Next in Thread]