m4-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

document patsubst shortfall


From: Eric Blake
Subject: document patsubst shortfall
Date: Tue, 2 Oct 2007 22:02:30 +0000 (UTC)
User-agent: Loom/3.14 (http://gmane.org/)

After I caused a regression in autoconf last weekend by careless quoting when 
passing text through some manipulation macros, I thought it would be worth 
adding more to the manual on what to look out for.  This only documents a 
problem with patsubst (regexp, substr, and to a lesser extent, translit and 
format, are also builtins that can generate awkwardly quoted substrings), but I 
like how the exposition turned out.    Applied to both head and branch.

Writing a robust capitalize macro is a lot harder than it looks :)

From: Eric Blake <address@hidden>
Date: Tue, 2 Oct 2007 14:01:51 -0600
Subject: [PATCH] Document quoting pitfalls in capitalize.

* doc/m4.texinfo (Patsubst): Use the examples directory.  Also
document shortfall.
(Improved capitalize): New node.
* examples/capitalize.m4: Update to match manual.
* examples/capitalize2.m4: New file.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog               |    9 +++
 doc/m4.texinfo          |  166 +++++++++++++++++++++++++++++++++++++++++++++--
 examples/capitalize.m4  |   16 +++--
 examples/capitalize2.m4 |   19 ++++++
 4 files changed, 197 insertions(+), 13 deletions(-)
 create mode 100644 examples/capitalize2.m4

diff --git a/ChangeLog b/ChangeLog
index 396a64f..7c9755a 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,12 @@
+2007-10-02  Eric Blake  <address@hidden>
+
+       Document quoting pitfalls in capitalize.
+       * doc/m4.texinfo (Patsubst): Use the examples directory.  Also
+       document shortfall.
+       (Improved capitalize): New node.
+       * examples/capitalize.m4: Update to match manual.
+       * examples/capitalize2.m4: New file.
+
 2007-10-01  Eric Blake  <address@hidden>
 
        Another Autoconf usage pattern optimization.
diff --git a/doc/m4.texinfo b/doc/m4.texinfo
index 6c76b7b..fff48a3 100644
--- a/doc/m4.texinfo
+++ b/doc/m4.texinfo
@@ -270,6 +270,7 @@ Correct version of some examples
 * Improved forloop::            Solution for @code{forloop}
 * Improved foreach::            Solution for @code{foreach}
 * Improved cleardivert::        Solution for @code{cleardivert}
+* Improved capitalize::         Solution for @code{capitalize}
 * Improved fatal_error::        Solution for @code{fatal_error}
 
 How to make copies of the overall M4 package
@@ -4886,18 +4887,45 @@ to lower case, and @code{capitalize} changes the first 
character of each
 word to upper case and the remaining characters to lower case.
 @end deffn
 
+First, an example of their usage, using implementations distributed in
address@hidden@value{VERSION}/@/examples/@/capitalize.m4}.
+
 @example
-define(`upcase', `translit(`$*', `a-z', `A-Z')')dnl
-define(`downcase', `translit(`$*', `A-Z', `a-z')')dnl
-define(`capitalize1',
-       `regexp(`$1', `^\(\w\)\(\w*\)',
-               `upcase(`\1')`'downcase(`\2')')')dnl
-define(`capitalize',
-       `patsubst(`$1', `\w+', `capitalize1(`\&')')')dnl
+$ @kbd{m4 -I examples}
+include(`capitalize.m4')
address@hidden
+upcase(`GNUs not Unix')
address@hidden NOT UNIX
+downcase(`GNUs not Unix')
address@hidden not unix
 capitalize(`GNUs not Unix')
 @result{}Gnus Not Unix
 @end example
 
+Now for the implementation.  There is a helper macro @code{_capitalize}
+which puts only its first word in mixed case.  Then @code{capitalize}
+merely parses out the words, and replaces them with an invocation of
address@hidden  (As presented here, the @code{capitalize} macro has
+some subtle flaws.  You should try to see if you can find and correct
+them; or @pxref{Improved capitalize, , Answers}).
+
address@hidden
+$ @kbd{m4 -I examples}
+undivert(`capitalize.m4')dnl
address@hidden(`-1')
address@hidden upcase(text)
address@hidden downcase(text)
address@hidden capitalize(text)
address@hidden   change case of text, simple version
address@hidden(`upcase', `translit(`$*', `a-z', `A-Z')')
address@hidden(`downcase', `translit(`$*', `A-Z', `a-z')')
address@hidden(`_capitalize',
address@hidden       `regexp(`$1', `^\(\w\)\(\w*\)',
address@hidden               `upcase(`\1')`'downcase(`\2')')')
address@hidden(`capitalize', `patsubst(`$1', `\w+', `_$0(`\&')')')
address@hidden'dnl
address@hidden example
+
 While @code{regexp} replaces the whole input with the replacement as
 soon as there is a match, @code{patsubst} replaces each
 @emph{occurrence} of a match and preserves non-matching pieces:
@@ -6490,6 +6518,7 @@ presented here.
 * Improved forloop::            Solution for @code{forloop}
 * Improved foreach::            Solution for @code{foreach}
 * Improved cleardivert::        Solution for @code{cleardivert}
+* Improved capitalize::         Solution for @code{capitalize}
 * Improved fatal_error::        Solution for @code{fatal_error}
 @end menu
 
@@ -6792,6 +6821,129 @@ undivert
 @result{}
 @end example
 
address@hidden Improved capitalize
address@hidden Solution for @code{capitalize}
+
+The @code{capitalize} macro (@pxref{Patsubst}) as presented earlier does
+not allow clients to follow the quoting rule of thumb.  Consider the
+three macros @code{active}, @code{Active}, and @code{ACTIVE}, and the
+difference between calling @code{capitalize} with the expansion of a
+macro, expanding the result of a case change, and changing the case of a
+double-quoted string:
+
address@hidden
+$ @kbd{m4 -I examples}
+include(`capitalize.m4')dnl
+define(`active', `act1, ive')dnl
+define(`Active', `Act2, Ive')dnl
+define(`ACTIVE', `ACT3, IVE')dnl
+upcase(active)
address@hidden,IVE
+upcase(`active')
address@hidden, IVE
+upcase(``active'')
address@hidden
+downcase(ACTIVE)
address@hidden,ive
+downcase(`ACTIVE')
address@hidden, ive
+downcase(``ACTIVE'')
address@hidden
+capitalize(active)
address@hidden
+capitalize(`active')
address@hidden
+capitalize(``active'')
address@hidden(`active')
+define(`A', `OOPS')
address@hidden
+capitalize(active)
address@hidden
+capitalize(`active')
address@hidden
address@hidden example
+
+First, when @code{capitalize} is called with more than one argument, it
+was throwing away later arguments, whereas @code{upcase} and
address@hidden used @samp{$*} to collect them all.  The fix is simple:
+use @samp{$*} consistently.
+
+Next, with single-quoting, @code{capitalize} outputs a single character,
+a set of quotes, then the rest of the characters, making it impossible
+to invoke @code{Active} after the fact, and allowing the alternate macro
address@hidden to interfere.  Here, the solution is to use additional quoting
+in the helper macros, then pass the final over-quoted output string
+through @code{_arg1} to remove the extra quoting and finally invoke the
+concatenated portions as a single string.
+
+Finally, when passed a double-quoted string, the nested macro
address@hidden is never invoked because it ended up nested inside
+quotes.  This one is the toughest to fix.  In short, we have no idea how
+many levels of quotes are in effect on the substring being altered by
address@hidden  If the replacement string cannot be expressed entirely
+in terms of literal text and backslash substitutions, then we need a
+mechanism to guarantee that the helper macros are invoked outside of
+quotes.  In other words, this sounds like a job for @code{changequote}
+(@pxref{Changequote}).  By changing the active quoting characters, we
+can guarantee that replacement text injected by @code{patsubst} always
+occurs in the middle of a string that has exactly one level of
+over-quoting using alternate quotes; so the replacement text closes the
+quoted string, invokes the helper macros, then reopens the quoted
+string.  In turn, that means the replacement text has unbalanced quotes,
+necessitating another round of @code{changequote}.
+
+In the fixed version below, (also shipped as
address@hidden@value{VERSION}/@/examples/@/capitalize.m4}), @code{capitalize}
+uses the alternate quotes of @samp{<<[} and @samp{]>>} (the longer
+strings are chosen so as to be less likely to appear in the text being
+converted).  The helpers @code{_to_alt} and @code{_from_alt} merely
+reduce the number of characters required to perform a
address@hidden, since the definition changes twice.  The outermost
+pair means that @code{patsubst} and @code{_capitalize_alt} are invoked
+with alternate quoting; the innermost pair is used so that the third
+argument to @code{patsubst} can contain an unbalanced
address@hidden>>}/@samp{<<[} pair.  Note that @code{upcase} and @code{downcase}
+must be redefined as @code{_upcase_alt} and @code{_downcase_alt}, since
+they contain nested quotes but are invoked with the alternate quoting
+scheme in effect.
+
address@hidden
+$ @kbd{m4 -I examples}
+include(`capitalize2.m4')dnl
+define(`active', `act1, ive')dnl
+define(`Active', `Act2, Ive')dnl
+define(`ACTIVE', `ACT3, IVE')dnl
+define(`A', `OOPS')dnl
+capitalize(active)
address@hidden,Ive
+capitalize(`active')
address@hidden, Ive
+capitalize(``active'')
address@hidden
+capitalize(```actIVE''')
address@hidden'
+undivert(`capitalize2.m4')dnl
address@hidden(`-1')
address@hidden upcase(text)
address@hidden downcase(text)
address@hidden capitalize(text)
address@hidden   change case of text, improved version
address@hidden(`upcase', `translit(`$*', `a-z', `A-Z')')
address@hidden(`downcase', `translit(`$*', `A-Z', `a-z')')
address@hidden(`_arg1', `$1')
address@hidden(`_to_alt', `changequote(`<<[', `]>>')')
address@hidden(`_from_alt', `changequote(<<[`]>>, <<[']>>)')
address@hidden(`_upcase_alt', `translit(<<[$*]>>, <<[a-z]>>, <<[A-Z]>>)')
address@hidden(`_downcase_alt', `translit(<<[$*]>>, <<[A-Z]>>, <<[a-z]>>)')
address@hidden(`_capitalize_alt',
address@hidden  `regexp(<<[$1]>>, <<[^\(\w\)\(\w*\)]>>,
address@hidden    
<<[_upcase_alt(<<[<<[\1]>>]>>)_downcase_alt(<<[<<[\2]>>]>>)]>>)')
address@hidden(`capitalize',
address@hidden  `_arg1(_to_alt()patsubst(<<[<<[$*]>>]>>, <<[\w+]>>,
address@hidden    _from_alt()`]>>_$0_alt(<<[\&]>>)<<['_to_alt())_from_alt())')
address@hidden'dnl
address@hidden example
+
 @node Improved fatal_error
 @section Solution for @code{fatal_error}
 
diff --git a/examples/capitalize.m4 b/examples/capitalize.m4
index 5c28de2..d4e4a50 100644
--- a/examples/capitalize.m4
+++ b/examples/capitalize.m4
@@ -1,8 +1,12 @@
-dnl
-dnl convert to upper- resp. lowercase
+divert(`-1')
+# upcase(text)
+# downcase(text)
+# capitalize(text)
+#   change case of text, simple version
 define(`upcase', `translit(`$*', `a-z', `A-Z')')
 define(`downcase', `translit(`$*', `A-Z', `a-z')')
-dnl
-dnl capitalize a single word
-define(`capitalize1', `regexp(`$1', `^\(\w\)\(\w*\)', `upcase(`\1')`'downcase
(`\2')')')
-define(`capitalize', `patsubst(`$1', `\w+', ``'capitalize1(`\0')')')
+define(`_capitalize',
+       `regexp(`$1', `^\(\w\)\(\w*\)',
+               `upcase(`\1')`'downcase(`\2')')')
+define(`capitalize', `patsubst(`$1', `\w+', `_$0(`\&')')')
+divert`'dnl
diff --git a/examples/capitalize2.m4 b/examples/capitalize2.m4
new file mode 100644
index 0000000..154dc50
--- /dev/null
+++ b/examples/capitalize2.m4
@@ -0,0 +1,19 @@
+divert(`-1')
+# upcase(text)
+# downcase(text)
+# capitalize(text)
+#   change case of text, improved version
+define(`upcase', `translit(`$*', `a-z', `A-Z')')
+define(`downcase', `translit(`$*', `A-Z', `a-z')')
+define(`_arg1', `$1')
+define(`_to_alt', `changequote(`<<[', `]>>')')
+define(`_from_alt', `changequote(<<[`]>>, <<[']>>)')
+define(`_upcase_alt', `translit(<<[$*]>>, <<[a-z]>>, <<[A-Z]>>)')
+define(`_downcase_alt', `translit(<<[$*]>>, <<[A-Z]>>, <<[a-z]>>)')
+define(`_capitalize_alt',
+  `regexp(<<[$1]>>, <<[^\(\w\)\(\w*\)]>>,
+    <<[_upcase_alt(<<[<<[\1]>>]>>)_downcase_alt(<<[<<[\2]>>]>>)]>>)')
+define(`capitalize',
+  `_arg1(_to_alt()patsubst(<<[<<[$*]>>]>>, <<[\w+]>>,
+    _from_alt()`]>>_$0_alt(<<[\&]>>)<<['_to_alt())_from_alt())')
+divert`'dnl
-- 
1.5.3.2







reply via email to

[Prev in Thread] Current Thread [Next in Thread]