autoconf-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

O(n) set manipulation


From: Eric Blake
Subject: O(n) set manipulation
Date: Mon, 28 Jul 2008 07:10:45 -0600
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.16) Gecko/20080708 Thunderbird/2.0.0.16 Mnenhy/0.7.5.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

m4_append_uniq is lousy at set manipulation: O(n^2).  Yet we use it for
_AC_SUBST_VARS, which can be a rather large set (more than 450 variables
for coreutils' configure.ac).  This patch introduces a new API for O(n)
set manipulation, then converts only the largest expected set to use it.

Here's some timing numbers.  First, demonstrate the scaling factor:
limit=2000 or 4000
add=m4_append_uniq([a],i,[ ]) or m4_set_add([a],i)
print=m4_defn([a]) or m4_set_contents([a],[ ])

echo 'm4_divert[]m4_for(i,1,'$limit',,['$add'])m4_len('$print')' \
~  | m4 -Ilib m4sugar/m4sugar.m4 -

           2000,append  4000,append  2000,set   4000,set
m4-1.4.11       1.108   4.030           0.405   0.717
branch-1.6      0.905   3.389           0.483   0.811
argv_ref        0.483   1.092           0.514   0.889
master          2.058   7.903           0.778   1.277

Right now, argv_ref is the only branch where m4_append is not quadratic;
but as m4_append_uniq is quadratic in spite of m4_append, this
demonstrates a huge improvement regardless of m4 implementation.


Next, demonstrate the constant factor associated with handling small sets
(2 elements, and 1 ignored duplicate):

append:
~  add='m4_append_uniq([a],1,[ ])m4_append_uniq([a],2,
~    [ ])m4_append_uniq([a],1,[ ])'
~  show='m4_defn([a])m4_popdef([a])'
contents:
~  add='m4_set_add([a],1)m4_set_add([a],2)m4_set_add([a],1)'
~  show='m4_set_contents([a],[ ])m4_set_delete([a])'
dump:
~  add='m4_set_add([a],1)m4_set_add([a],2)m4_set_add([a],1)'
~  show='m4_set_dump([a],[ ])'

echo 'm4_divert[]m4_len(m4_for(i,1,2000,,['"$add$show"']))' \
~  | m4 -Ilib m4sugar/m4sugar.m4 -

                append contents dump
m4-1.4.11       0.686   1.014   0.670
branch-1.6      0.592   1.014   0.655
argv_ref        0.717   0.983   0.655
master          0.965   1.762   1.122

m4_set_contents/m4_set_delete is more expensive than m4_append_uniq for
small sets; it takes a sizable set before the algorithmic improvements
result in a speedup.  m4_set_dump has mixed results, depending on the
quality of the underlying m4.  So, I only changed _AC_SUBST_VARS, rather
than all of autoconf's ac_append_uniq clients, to use the new API.
Meanwhile, these numbers should explain why I added m4_set_dump as a
faster way to do a one-shot dump of a set, even though it results in the
reverse direction in output.


Finally, see if this change is noticeable in real life, by running this on
coreutils:
time M4=... autoconf -f

                pre     m4_set_contents m4_set_dump
m4-1.4.11       20.298  20.906          20.436
branch-1.6      21.688  21.560          21.046
argv_ref        18.297  18.233          17.703
master          29.371  29.967          28.199

The resulting configure is identical for m4_set_contents, and trivially
different for m4_set_dump: in order to reach optimum speed, m4_set_dump
outputs elements in reverse order from m4_append_uniq.  I intentionally
documented m4_set output as unspecified order, to allow a future change
where the elements are maintained in sorted order rather than entry order,
by making set manipulation an m4 2.0 module.  m4_append_uniq and
m4_set_contents were comparable in speed, even for the 450+ elements of
coreutils' list, so the algorithmic benefits don't show up until larger
lists; but m4_set_dump is a hands down winner.  Reverse order does not
hurt configure, so the patch below uses the faster m4_set_dump.

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkiNxVUACgkQ84KuGfSFAYA8/ACbBkDghU2pempw1tAf9tcWMsQu
n5oAnjQXQojrHItLcizkA/00oQBU6YE1
=LR9W
-----END PGP SIGNATURE-----
>From 3f1a601013fb7cd0bf5a44fd68d9362578311b0e Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Mon, 28 Jul 2008 06:35:34 -0600
Subject: [PATCH] Implement O(n) unique element set creation.

* lib/m4sugar/m4sugar.m4 (m4_set_add, m4_set_add_all)
(m4_set_contains, m4_set_contents, m4_set_delete)
(m4_set_difference, m4_set_dump, m4_set_empty, m4_set_foreach)
(m4_set_intersection, m4_set_list, m4_set_listc, m4_set_remove)
(m4_set_size, m4_set_union): New macros.
* lib/m4sugar/foreach.m4 (m4_set_add_all): Add O(n) fallback for
m4 1.4.x.
* lib/autoconf/general.m4 (_AC_INIT_DEFAULTS, AC_SUBST): Use
new m4_set API for the set most likely to be large.
* doc/autoconf.texi (Set manipulation Macros): New node.
* NEWS: Mention new macros.
* tests/m4sugar.at (m4@&address@hidden): New test.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog               |   16 +++
 NEWS                    |    5 +-
 doc/autoconf.texi       |  225 ++++++++++++++++++++++++++++++++
 lib/autoconf/general.m4 |    5 +-
 lib/m4sugar/foreach.m4  |   18 +++
 lib/m4sugar/m4sugar.m4  |  325 ++++++++++++++++++++++++++++++++++++++++++++++-
 tests/m4sugar.at        |  142 +++++++++++++++++++++
 7 files changed, 730 insertions(+), 6 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index c2147d9..945891b 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,19 @@
+2008-07-28  Eric Blake  <address@hidden>
+
+       Implement O(n) unique element set creation.
+       * lib/m4sugar/m4sugar.m4 (m4_set_add, m4_set_add_all)
+       (m4_set_contains, m4_set_contents, m4_set_delete)
+       (m4_set_difference, m4_set_dump, m4_set_empty, m4_set_foreach)
+       (m4_set_intersection, m4_set_list, m4_set_listc, m4_set_remove)
+       (m4_set_size, m4_set_union): New macros.
+       * lib/m4sugar/foreach.m4 (m4_set_add_all): Add O(n) fallback for
+       m4 1.4.x.
+       * lib/autoconf/general.m4 (_AC_INIT_DEFAULTS, AC_SUBST): Use
+       new m4_set API for the set most likely to be large.
+       * doc/autoconf.texi (Set manipulation Macros): New node.
+       * NEWS: Mention new macros.
+       * tests/m4sugar.at (m4@&address@hidden): New test.
+
 2008-07-25  Eric Blake  <address@hidden>
 
        Avoid infinite aclocal loop.
diff --git a/NEWS b/NEWS
index 83f43a4..e7d1d1c 100644
--- a/NEWS
+++ b/NEWS
@@ -20,7 +20,10 @@ GNU Autoconf NEWS - User visible changes.
    allowing the output of unbalanced parantheses in more contexts.
 
 ** The following m4sugar macros are new:
-   m4_joinall
+   m4_joinall  m4_set_add  m4_set_add_all  m4_set_contains
+   m4_set_contents  m4_set_delete  m4_set_difference  m4_set_dump
+   m4_set_empty  m4_set_foreach  m4_set_intersection  m4_set_list
+   m4_set_listc  m4_set_remove  m4_set_size  m4_set_union
 
 ** The following m4sugar macros now accept multiple arguments, as is the
    case with underlying m4:
diff --git a/doc/autoconf.texi b/doc/autoconf.texi
index efb6540..6ac8449 100644
--- a/doc/autoconf.texi
+++ b/doc/autoconf.texi
@@ -456,6 +456,7 @@ Programming in M4sugar
 * Evaluation Macros::           More quotation and evaluation control
 * Text processing Macros::      String manipulation in M4
 * Number processing Macros::    Arithmetic computation in M4
+* Set manipulation Macros::     Set manipulation in M4
 * Forbidden Patterns::          Catching unexpanded macros
 
 Writing Autoconf Macros
@@ -10218,6 +10219,7 @@ define your own macros into these namespaces.
 * Evaluation Macros::           More quotation and evaluation control
 * Text processing Macros::      String manipulation in M4
 * Number processing Macros::    Arithmetic computation in M4
+* Set manipulation Macros::     Set manipulation in M4
 * Forbidden Patterns::          Catching unexpanded macros
 @end menu
 
@@ -11383,6 +11385,229 @@ m4_version_compare([2.61b], [2.61a-248-dc51])
 @end defmac
 
 
address@hidden Set manipulation Macros
address@hidden Set manipulation in M4
address@hidden Set manipulation
address@hidden Data structure, set
address@hidden Unordered set manipulation
+
+Sometimes, it is necessary to track a set of data, where the order does
+not matter and where there are no duplicates in the set.  The following
+macros facilitate set manipulations.  Each set is an opaque object,
+which can only be accessed via these basic operations.  The underlying
+implementation guarantees linear scaling for set creation, which is more
+efficient than using the quadratic @code{m4_append_uniq}.  Both set
+names and values can be arbitrary strings, except for unbalanced quotes.
+This implementation ties up memory for removed elements until the next
+operation that must traverse all the elements of a set; and although
+that may slow down some operations until the memory for removed elements
+is pruned, it still guarantees linear performance.
+
address@hidden m4_set_add (@var{set}, @var{value}, @ovar{if-uniq}, 
@ovar{if-dup})
address@hidden
+Adds the string @var{value} as a member of set @var{set}.  Expand
address@hidden if the element was added, or @var{if-dup} if it was
+previously in the set.  Operates in amortized constant time, so that set
+creation scales linearly.
address@hidden defmac
+
address@hidden m4_set_add_all (@var{set}, @address@hidden)
address@hidden
+Adds each @var{value} to the set @var{set}.  This is slightly more
+efficient than repeatedly invoking @code{m4_set_add}.
address@hidden defmac
+
address@hidden m4_set_contains (@var{set}, @var{value}, @ovar{if-present}, @
+ @ovar{if-absent})
address@hidden
+Expands @var{if-present} if the string @var{value} is a member of
address@hidden, otherwise @var{if-absent}.
+
address@hidden
+m4_set_contains([a], [1], [yes], [no])
address@hidden
+m4_set_add([a], [1], [added], [dup])
address@hidden
+m4_set_add([a], [1], [added], [dup])
address@hidden
+m4_set_contains([a], [1], [yes], [no])
address@hidden
+m4_set_remove([a], [1], [removed], [missing])
address@hidden
+m4_set_contains([a], [1], [yes], [no])
address@hidden
+m4_set_remove([a], [1], [removed], [missing])
address@hidden
address@hidden example
address@hidden defmac
+
address@hidden m4_set_contents (@var{set}, @ovar{sep})
address@hidden m4_set_dump (@var{set}, @ovar{sep})
address@hidden
address@hidden
+Expands to a single string consisting of all the members of the set
address@hidden, each separated by @var{sep}, which is not expanded.
address@hidden leaves the elements in @var{set} but reclaims any
+memory occupied by removed elements, while @code{m4_set_dump} is a
+faster one-shot action that also deletes the set.  No provision is made
+for disambiguating members that contain a non-empty @var{sep} as a
+substring; use @code{m4_set_empty} to distinguish between an empty set
+and the set containing only the empty string.  The order of the output
+is unspecified; in the current implementation, part of the speed of
address@hidden results from using a different output order than
address@hidden  These macros scale linearly in the size of the
+set before memory pruning, and @code{m4_set_contents(address@hidden,
address@hidden)} is faster than
address@hidden(address@hidden(address@hidden))}.
+
address@hidden
+m4_set_add_all([a], [1], [2], [3])
address@hidden
+m4_set_contents([a], [-])
address@hidden
+m4_joinall([-]m4_set_listc([a]))
address@hidden
+m4_set_dump([a], [-])
address@hidden
+m4_set_contents([a])
address@hidden
+m4_set_add([a], [])
address@hidden
+m4_set_contents([a], [-])
address@hidden
address@hidden example
address@hidden defmac
+
address@hidden m4_set_delete (@var{set})
address@hidden
+Delete all elements and memory associated with @var{set}.  This is
+linear in the set size, and faster than removing one element at a time.
address@hidden defmac
+
address@hidden m4_set_difference (@var{seta}, @var{setb})
address@hidden m4_set_intersection (@var{seta}, @var{setb})
address@hidden m4_set_union (@var{seta}, @var{setb})
address@hidden
address@hidden
address@hidden
+Compute the relation between @var{seta} and @var{setb}, and output the
+result as a list of quoted arguments without duplicates and with a
+leading comma.  Set difference selects the elements in @var{seta} but
+not @var{setb}, intersection selects only elements in both sets, and
+union selects elements in either set.  These actions are linear in the
+sum of the set sizes.  The leading comma is necessary to distinguish
+between no elements and the empty string as the only element.
+
address@hidden
+m4_set_add_all([a], [1], [2], [3])
address@hidden
+m4_set_add_all([b], [3], [], [4])
address@hidden
+m4_set_difference([a], [b])
address@hidden,1,2
+m4_set_difference([b], [a])
address@hidden,,4
+m4_set_intersection([a], [b])
address@hidden,3
+m4_set_union([a], [b])
address@hidden,1,2,3,,4
address@hidden example
address@hidden defmac
+
address@hidden m4_set_empty (@var{set}, @ovar{if-empty}, @ovar{if-elements})
address@hidden
+Expand @var{if-empty} if the set @var{set} has no elements, otherwise
+expand @var{if-elements}.  This macro operates in constant time.  Using
+this macro can help disambiguate output from @code{m4_set_contents} or
address@hidden
address@hidden defmac
+
address@hidden m4_set_foreach (@var{set}, @var{variable}, @var{action})
address@hidden
+For each element in the set @var{set}, expand @var{action} with the
+macro @var{variable} defined as the set element.  Behavior is
+unspecified if @var{action} recursively lists the contents of @var{set}
+(although listing other sets is acceptable), or if it modifies the set
+in any way other than removing the element currently contained in
address@hidden  This macro is faster than the corresponding
address@hidden(address@hidden,
+m4_indir([m4_dquote]m4_set_listc(address@hidden)), address@hidden)}.
+
address@hidden
+m4_set_add_all([a]m4_for([i], [1], [5], [], [,i]))
address@hidden
+m4_set_contents([a])
address@hidden
+m4_set_foreach([a], [i],
+  [m4_if(m4_eval(i&1), [0], [m4_set_remove([a], i, [i])])])
address@hidden
+m4_set_contents([a])
address@hidden
address@hidden example
address@hidden defmac
+
address@hidden m4_set_list (@var{set})
address@hidden m4_set_listc (@var{set})
address@hidden
address@hidden
+Produce a list of arguments, where each argument is a quoted element
+from the set @var{set}.  The variant @code{m4_set_listc} is unambiguous,
+by adding a leading comma if there are any set elements, whereas the
+variant @code{m4_set_list} cannot distinguish between an empty set and a
+set containing only the empty string.  These can be directly used in
+macros that take multiple arguments, such as @code{m4_join} or
address@hidden, or wrapped by @code{m4_dquote} for macros that
+take a quoted list, such as @code{m4_map} or @code{m4_foreach}.  Any
+memory occupied by removed elements is reclaimed during these macros.
+
address@hidden
+m4_set_add_all([a], [1], [2], [3])
address@hidden
+m4_set_list([a])
address@hidden,2,3
+m4_set_list([b])
address@hidden
+m4_set_listc([b])
address@hidden
+m4_count(m4_set_list([b]))
address@hidden
+m4_set_empty([b], [0], [m4_count(m4_set_list([b]))])
address@hidden
+m4_set_add([b], [])
address@hidden
+m4_set_list([b])
address@hidden
+m4_set_listc([b])
address@hidden,
+m4_count(m4_set_list([b]))
address@hidden
+m4_set_empty([b], [0], [m4_count(m4_set_list([b]))])
address@hidden
address@hidden example
address@hidden defmac
+
address@hidden m4_set_remove (@var{set}, @var{value}, @ovar{if-present}, @
+ @ovar{if-absent})
address@hidden
+If @var{value} is an element in the set @var{set}, then remove it and
+expand @var{if-present}.  Otherwise expand @var{if-absent}.  This macro
+operates in constant time so that multiple removals will scale linearly
+rather than quadratically; but when used outside of
address@hidden, it leaves memory occupied until the set is later
+compacted by @code{m4_set_contents} or @code{m4_set_list}.  Several
+other set operations are then less efficient between the time of element
+removal and subsequent memory compaction, but still maintain their
+guaranteed scaling performance.
address@hidden defmac
+
address@hidden m4_set_size (@var{set})
address@hidden
+Expand to the size of the set @var{set}.  This implementation operates
+in constant time, and is thus more efficient than
address@hidden(m4_count(m4_set_listc([set])) - 1)}.
address@hidden defmac
+
+
 @node Forbidden Patterns
 @subsection Forbidden Patterns
 @cindex Forbidden patterns
diff --git a/lib/autoconf/general.m4 b/lib/autoconf/general.m4
index 0f8e32d..8af0dc4 100644
--- a/lib/autoconf/general.m4
+++ b/lib/autoconf/general.m4
@@ -414,7 +414,7 @@ AC_SUBST([PACKAGE_BUGREPORT],
 
 m4_divert_pop([DEFAULTS])dnl
 m4_wrap([m4_divert_text([DEFAULTS],
-[ac_subst_vars='m4_ifdef([_AC_SUBST_VARS],  [m4_defn([_AC_SUBST_VARS])])'
+[ac_subst_vars='m4_set_dump([_AC_SUBST_VARS], m4_newline)'
 ac_subst_files='m4_ifdef([_AC_SUBST_FILES], [m4_defn([_AC_SUBST_FILES])])'
 ac_user_opts='
 enable_option_checking
@@ -2096,8 +2096,7 @@ m4_define([AC_SUBST],
 AC_SUBST_TRACE([$1])dnl
 m4_pattern_allow([^$1$])dnl
 m4_ifvaln([$2], [$1=$2])[]dnl
-m4_append_uniq([_AC_SUBST_VARS], [$1], [
-])dnl
+m4_set_add([_AC_SUBST_VARS], [$1])dnl
 ])# AC_SUBST
 
 
diff --git a/lib/m4sugar/foreach.m4 b/lib/m4sugar/foreach.m4
index 935dbff..d6a9f72 100644
--- a/lib/m4sugar/foreach.m4
+++ b/lib/m4sugar/foreach.m4
@@ -234,3 +234,21 @@ m4_define([_m4_minmax],
 [m4_pushdef([_m4_best], m4_eval([$2]))m4_foreach([_m4_arg], [m4_shift2($@)],
     [m4_define([_m4_best], $1(_m4_best, _m4_defn([_m4_arg])))])]dnl
 [_m4_best[]_m4_popdef([_m4_best])])
+
+# m4_set_add_all(SET, VALUE...)
+# -----------------------------
+# Add each VALUE into SET.  This is O(n) in the number of VALUEs, and
+# can be faster than calling m4_set_add for each VALUE.
+#
+# m4_foreach to the rescue.  If no deletions have occurred, then avoid
+# the speed penalty of m4_set_add.
+m4_define([m4_set_add_all],
+[m4_if([$#], [0], [], [$#], [1], [],
+       [m4_define([_m4_set_size($1)], m4_eval(m4_set_size([$1])
+         + m4_len(m4_foreach([_m4_arg], [m4_shift($@)],
+    m4_ifdef([_m4_set_cleanup($1)],
+            [[m4_set_add([$1], _m4_defn([_m4_arg]))]],
+            [[m4_ifdef([_m4_set([$1],]_m4_defn([_m4_arg])[)], [],
+                       [m4_define([_m4_set([$1],]_m4_defn([_m4_arg])[)],
+                                  [1])m4_pushdef([_m4_set([$1])],
+       _m4_defn([_m4_arg]))-])]])))))])])
diff --git a/lib/m4sugar/m4sugar.m4 b/lib/m4sugar/m4sugar.m4
index 3e201b9..9da7d57 100644
--- a/lib/m4sugar/m4sugar.m4
+++ b/lib/m4sugar/m4sugar.m4
@@ -2274,9 +2274,330 @@ m4_ifdef([m4_PACKAGE_VERSION],
 [[m4_fatal([m4sugar/version.m4 not found])]]))
 
 
+## ------------------ ##
+## 15. Set handling.  ##
+## ------------------ ##
+
+# Autoconf likes to create arbitrarily large sets; for example, as of
+# this writing, the configure.ac for coreutils tracks a set of more
+# than 400 AC_SUBST.  How do we track all of these set members,
+# without introducing duplicates?  We could use m4_append_uniq, with
+# the set NAME residing in the contents of the macro NAME.
+# Unfortunately, m4_append_uniq is quadratic for set creation, because
+# it costs O(n) to search the string for each of O(n) insertions; not
+# to mention that with m4 1.4.x, even using m4_append is slow, costing
+# O(n) rather than O(1) per insertion.  Other set operations, not used
+# by Autoconf but still possible by manipulation of the definition
+# tracked in macro NAME, include O(n) deletion of one element and O(n)
+# computation of set size.  Because the set is exposed to the user via
+# the definition of a single macro, we cannot cache any data about the
+# set without risking the cache being invalidated by the user
+# redefining NAME.
+#
+# Can we do better?  Yes, because m4 gives us an O(1) search function
+# for free: ifdef.  Additionally, even m4 1.4.x gives us an O(1)
+# insert operation for free: pushdef.  But to use these, we must
+# represent the set via a group of macros; to keep the set consistent,
+# we must hide the set so that the user can only manipulate it through
+# accessor macros.  The contents of the set are maintained through two
+# access points; _m4_set([name]) is a pushdef stack of values in the
+# set, useful for O(n) traversal of the set contents; while the
+# existence of _m4_set([name],value) with no particular value is
+# useful for O(1) querying of set membership.  And since the user
+# cannot externally manipulate the set, we are free to add additional
+# caching macros for other performance improvements.  Deletion can be
+# O(1) per element rather than O(n), by reworking the definition of
+# _m4_set([name],value) to be 0 or 1 based on current membership, and
+# adding _m4_set_cleanup(name) to defer the O(n) cleanup of
+# _m4_set([name]) until we have another reason to do an O(n)
+# traversal.  The existence of _m4_set_cleanup(name) can then be used
+# elsewhere to determine if we must dereference _m4_set([name],value),
+# or assume that definition implies set membership.  Finally, size can
+# be tracked in an O(1) fashion with _m4_set_size(name).
+#
+# The quoting in _m4_set([name],value) is chosen so that there is no
+# ambiguity with a set whose name contains a comma, and so that we can
+# supply the value via _m4_defn([_m4_set([name])]) without needing any
+# quote manipulation.
+
+# m4_set_add(SET, VALUE, [IF-UNIQ], [IF-DUP])
+# -------------------------------------------
+# Add VALUE as an element of SET.  Expand IF-UNIQ on the first
+# addition, and IF-DUP if it is already in the set.  Addition of one
+# element is O(1), such that overall set creation is O(n).
+#
+# We do not want to add a duplicate for a previously deleted but
+# unpruned element, but it is just as easy to check existence directly
+# as it is to query _m4_set_cleanup($1).
+m4_define([m4_set_add],
+[m4_ifdef([_m4_set([$1],$2)],
+         [m4_if(m4_indir([_m4_set([$1],$2)]), [0],
+                [m4_define([_m4_set([$1],$2)],
+                           [1])_m4_set_size([$1], [m4_incr])$3], [$4])],
+         [m4_define([_m4_set([$1],$2)],
+                    [1])m4_pushdef([_m4_set([$1])],
+                                   [$2])_m4_set_size([$1], [m4_incr])$3])])
+
+# m4_set_add_all(SET, VALUE...)
+# -----------------------------
+# Add each VALUE into SET.  This is O(n) in the number of VALUEs, and
+# can be faster than calling m4_set_add for each VALUE.
+#
+# Implement two recursion helpers; the check variant is slower but
+# handles the case where an element has previously been removed but
+# not pruned.  The recursion helpers ignore their second argument, so
+# that we can use the faster m4_shift2 and 2 arguments, rather than
+# _m4_shift2 and one argument, as the signal to end recursion.
+m4_define([m4_set_add_all],
+[m4_define([_m4_set_size($1)], m4_eval(m4_set_size([$1])
+  + m4_len(m4_ifdef([_m4_set_cleanup($1)], [_$0_check], [_$0])([$1], $@))))])
+
+m4_define([_m4_set_add_all],
+[m4_if([$#], [2], [],
+       [m4_ifdef([_m4_set([$1],$3)], [],
+                [m4_define([_m4_set([$1],$3)], [1])m4_pushdef([_m4_set([$1])],
+          [$3])-])$0([$1], m4_shift2($@))])])
+
+m4_define([_m4_set_add_all_check],
+[m4_if([$#], [2], [],
+       [m4_set_add([$1], [$3])$0([$1], m4_shift2($@))])])
+
+# m4_set_contains(SET, VALUE, [IF-PRESENT], [IF-ABSENT])
+# ------------------------------------------------------
+# Expand IF-PRESENT if SET contains VALUE, otherwise expand IF-ABSENT.
+# This is always O(1).
+m4_define([m4_set_contains],
+[m4_ifdef([_m4_set_cleanup($1)],
+         [m4_if(m4_ifdef([_m4_set([$1],$2)],
+                   [m4_indir([_m4_set([$1],$2)])], [0]), [1], [$3], [$4])],
+         [m4_ifdef([_m4_set([$1],$2)], [$3], [$4])])])
+
+# m4_set_contents(SET, [SEP])
+# ---------------------------
+# Expand to a single string containing all the elements in SET,
+# separated by SEP, without modifying SET.  No provision is made for
+# disambiguating set elements that contain non-empty SEP as a
+# sub-string, or for recognizing a set that contains only the empty
+# string.  Order of the output is not guaranteed.  If any elements
+# have been previously removed from the set, this action will prune
+# the unused memory.  This is O(n) in the size of the set before
+# pruning.
+#
+# Use _m4_popdef for speed.  The existence of _m4_set_cleanup($1)
+# determines which version of _1 helper we use.
+m4_define([m4_set_contents],
+[m4_ifdef([_m4_set_cleanup($1)], [_$0_1c], [_$0_1])([$1])_$0_2([$1],
+    [_m4_defn([_m4_set_($1)])], [[$2]])])
+
+# _m4_set_contents_1(SET)
+# _m4_set_contents_1c(SET)
+# _m4_set_contents_2(SET, SEP, PREP)
+# ----------------------------------
+# Expand to a list of quoted elements currently in the set, separated
+# by SEP, and moving PREP in front of SEP on recursion.  To avoid
+# nesting limit restrictions, the algorithm must be broken into two
+# parts; _1 destructively copies the stack in reverse into
+# _m4_set_($1), producing no output; then _2 destructively copies
+# _m4_set_($1) back into the stack in reverse.  SEP is expanded while
+# _m4_set_($1) contains the current element, so a SEP containing
+# _m4_defn([_m4_set_($1)]) can produce output in the order the set was
+# created.  Behavior is undefined if SEP tries to recursively list or
+# modify SET in any way other than calling m4_set_remove on the
+# current element.  Use _1 if all entries in the stack are guaranteed
+# to be in the set, and _1c to prune removed entries.  Uses _m4_defn
+# and _m4_popdef for speed.
+m4_define([_m4_set_contents_1],
+[m4_ifdef([_m4_set([$1])], [m4_pushdef([_m4_set_($1)],
+    _m4_defn([_m4_set([$1])]))_m4_popdef([_m4_set([$1])])$0([$1])])])
+
+m4_define([_m4_set_contents_1c],
+[m4_ifdef([_m4_set([$1])],
+         [m4_set_contains([$1], _m4_defn([_m4_set([$1])]),
+                  [m4_pushdef([_m4_set_($1)], _m4_defn([_m4_set([$1])]))],
+                  [_m4_popdef([_m4_set([$1],]_m4_defn(
+      [_m4_set([$1])])[)])])_m4_popdef([_m4_set([$1])])$0([$1])],
+         [_m4_popdef([_m4_set_cleanup($1)])])])
+
+m4_define([_m4_set_contents_2],
+[m4_ifdef([_m4_set_($1)], [m4_pushdef([_m4_set([$1])],
+    _m4_defn([_m4_set_($1)]))$2[]_m4_popdef([_m4_set_($1)])$0([$1], [$3$2])])])
+
+# m4_set_delete(SET)
+# ------------------
+# Delete all elements in SET, and reclaim any memory occupied by the
+# set.  This is O(n) in the set size.
+#
+# Use _m4_defn and _m4_popdef for speed.
+m4_define([m4_set_delete],
+[m4_ifdef([_m4_set([$1])],
+         [_m4_popdef([_m4_set([$1],]_m4_defn([_m4_set([$1])])[)],
+                     [_m4_set([$1])])$0([$1])],
+         [m4_ifdef([_m4_set_cleanup($1)],
+                   [_m4_popdef([_m4_set_cleanup($1)])])m4_ifdef(
+                   [_m4_set_size($1)],
+                   [_m4_popdef([_m4_set_size($1)])])])])
+
+# m4_set_difference(SET1, SET2)
+# -----------------------------
+# Produce a LIST of quoted elements that occur in SET1 but not SET2.
+# Output a comma prior to any elements, to distinguish the empty
+# string from no elements.  This can be directly used as a series of
+# arguments, such as for m4_join, or wrapped inside quotes for use in
+# m4_foreach.  Order of the output is not guaranteed.
+#
+# Short-circuit the idempotence relation.  Use _m4_defn for speed.
+m4_define([m4_set_difference],
+[m4_if([$1], [$2], [],
+       [m4_set_foreach([$1], [_m4_element],
+                      [m4_set_contains([$2], _m4_defn([_m4_element]), [],
+                                       [,_m4_defn([_m4_element])])])])])
+
+# m4_set_dump(SET, [SEP])
+# -----------------------
+# Expand to a single string containing all the elements in SET,
+# separated by SEP, then delete SET.  In general, if you only need to
+# list the contents once, this is faster than m4_set_contents.  No
+# provision is made for disambiguating set elements that contain
+# non-empty SEP as a sub-string.  Order of the output is not
+# guaranteed.  This is O(n) in the size of the set before pruning.
+#
+# Use _m4_popdef for speed.  Use existence of _m4_set_cleanup($1) to
+# decide if more expensive recursion is needed.
+m4_define([m4_set_dump],
+[m4_ifdef([_m4_set_size($1)],
+         [_m4_popdef([_m4_set_size($1)])])m4_ifdef([_m4_set_cleanup($1)],
+    [_$0_check], [_$0])([$1], [], [$2])])
+
+# _m4_set_dump(SET, SEP, PREP)
+# _m4_set_dump_check(SET, SEP, PREP)
+# ----------------------------------
+# Print SEP and the current element, then delete the element and
+# recurse with empty SEP changed to PREP.  The check variant checks
+# whether the element has been previously removed.  Use _m4_defn and
+# _m4_popdef for speed.
+m4_define([_m4_set_dump],
+[m4_ifdef([_m4_set([$1])],
+         [[$2]_m4_defn([_m4_set([$1])])_m4_popdef([_m4_set([$1],]_m4_defn(
+               [_m4_set([$1])])[)], [_m4_set([$1])])$0([$1], [$2$3])])])
+
+m4_define([_m4_set_dump_check],
+[m4_ifdef([_m4_set([$1])],
+         [m4_set_contains([$1], _m4_defn([_m4_set([$1])]),
+                          [[$2]_m4_defn([_m4_set([$1])])])_m4_popdef(
+    [_m4_set([$1],]_m4_defn([_m4_set([$1])])[)],
+    [_m4_set([$1])])$0([$1], [$2$3])],
+         [_m4_popdef([_m4_set_cleanup($1)])])])
+
+# m4_set_empty(SET, [IF-EMPTY], [IF-ELEMENTS])
+# --------------------------------------------
+# Expand IF-EMPTY if SET has no elements, otherwise IF-ELEMENTS.
+m4_define([m4_set_empty],
+[m4_ifdef([_m4_set_size($1)],
+         [m4_if(m4_indir([_m4_set_size($1)]), [0], [$2], [$3])], [$2])])
+
+# m4_set_foreach(SET, VAR, ACTION)
+# --------------------------------
+# For each element of SET, define VAR to the element and expand
+# ACTION.  ACTION should not recursively list SET's contents, add
+# elements to SET, nor delete any element from SET except the one
+# currently in VAR.  The order that the elements are visited in is not
+# guaranteed.  This is faster than the corresponding m4_foreach([VAR],
+#   m4_indir([m4_dquote]m4_set_listc([SET])), [ACTION])
+m4_define([m4_set_foreach],
+[m4_pushdef([$2])m4_ifdef([_m4_set_cleanup($1)],
+    [_m4_set_contents_1c], [_m4_set_contents_1])([$1])_m4_set_contents_2([$1],
+       [m4_define([$2], _m4_defn([_m4_set_($1)]))$3[]])m4_popdef([$2])])
+
+# m4_set_intersection(SET1, SET2)
+# -------------------------------
+# Produce a LIST of quoted elements that occur in both SET1 or SET2.
+# Output a comma prior to any elements, to distinguish the empty
+# string from no elements.  This can be directly used as a series of
+# arguments, such as for m4_join, or wrapped inside quotes for use in
+# m4_foreach.  Order of the output is not guaranteed.
+#
+# Iterate over the smaller set, and short-circuit the idempotence
+# relation.  Use _m4_defn for speed.
+m4_define([m4_set_intersection],
+[m4_if([$1], [$2], [m4_set_listc([$1])],
+       m4_eval(m4_set_size([$2]) < m4_set_size([$1])), [1], [$0([$2], [$1])],
+       [m4_set_foreach([$1], [_m4_element],
+                      [m4_set_contains([$2], _m4_defn([_m4_element]),
+                                       [,_m4_defn([_m4_element])])])])])
+
+# m4_set_list(SET)
+# m4_set_listc(SET)
+# -----------------
+# Produce a LIST of quoted elements of SET.  This can be directly used
+# as a series of arguments, such as for m4_join or m4_set_add_all, or
+# wrapped inside quotes for use in m4_foreach or m4_map.  With
+# m4_set_list, there is no way to distinguish an empty set from a set
+# containing only the empty string; with m4_set_listc, a leading comma
+# is output if there are any elements.
+m4_define([m4_set_list],
+[m4_ifdef([_m4_set_cleanup($1)], [_m4_set_contents_1c],
+         [_m4_set_contents_1])([$1])_m4_set_contents_2([$1],
+              [_m4_defn([_m4_set_($1)])], [,])])
+
+m4_define([m4_set_listc],
+[m4_ifdef([_m4_set_cleanup($1)], [_m4_set_contents_1c],
+         [_m4_set_contents_1])([$1])_m4_set_contents_2([$1],
+              [,_m4_defn([_m4_set_($1)])])])
+
+# m4_set_remove(SET, VALUE, [IF-PRESENT], [IF-ABSENT])
+# ----------------------------------------------------
+# If VALUE is an element of SET, delete it and expand IF-PRESENT.
+# Otherwise expand IF-ABSENT.  Deleting a single value is O(1),
+# although it leaves memory occupied until the next O(n) traversal of
+# the set which will compact the set.
+#
+# Optimize if the element being removed is the most recently added,
+# since defining _m4_set_cleanup($1) slows down so many other macros.
+# In particular, this plays well with m4_set_foreach.
+m4_define([m4_set_remove],
+[m4_set_contains([$1], [$2], [_m4_set_size([$1],
+    [m4_decr])m4_if(_m4_defn([_m4_set([$1])]), [$2],
+                   [_m4_popdef([_m4_set([$1],$2)], [_m4_set([$1])])],
+                   [m4_define([_m4_set_cleanup($1)])m4_define(
+                     [_m4_set([$1],$2)], [0])])$3], [$4])])
+
+# m4_set_size(SET)
+# ----------------
+# Expand to the number of elements currently in SET.  This operation
+# is O(1), and thus more efficient than m4_count(m4_set_list([SET])).
+m4_define([m4_set_size],
+[m4_ifdef([_m4_set_size($1)], [m4_indir([_m4_set_size($1)])], [0])])
+
+# _m4_set_size(SET, ACTION)
+# -------------------------
+# ACTION must be either m4_incr or m4_decr, and the size of SET is
+# changed accordingly.  If the set is empty, ACTION must not be
+# m4_decr.
+m4_define([_m4_set_size],
+[m4_define([_m4_set_size($1)],
+          m4_ifdef([_m4_set_size($1)], [$2(m4_indir([_m4_set_size($1)]))],
+                   [1]))])
+
+# m4_set_union(SET1, SET2)
+# ------------------------
+# Produce a LIST of double quoted elements that occur in either SET1
+# or SET2, without duplicates.  Output a comma prior to any elements,
+# to distinguish the empty string from no elements.  This can be
+# directly used as a series of arguments, such as for m4_join, or
+# wrapped inside quotes for use in m4_foreach.  Order of the output is
+# not guaranteed.
+#
+# We can rely on the fact that m4_set_listc prunes SET1, so we don't
+# need to check _m4_set([$1],element) for 0.  Use _m4_defn for speed.
+# Short-circuit the idempotence relation.
+m4_define([m4_set_union],
+[m4_set_listc([$1])m4_if([$1], [$2], [], [m4_set_foreach([$2], [_m4_element],
+  [m4_ifdef([_m4_set([$1],]_m4_defn([_m4_element])[)], [],
+           [,_m4_defn([_m4_element])])])])])
+
 
 ## ------------------- ##
-## 15. File handling.  ##
+## 16. File handling.  ##
 ## ------------------- ##
 
 
@@ -2298,7 +2619,7 @@ m4_if(m4_sysval, [0], [],
 
 
 ## ------------------------ ##
-## 16. Setting M4sugar up.  ##
+## 17. Setting M4sugar up.  ##
 ## ------------------------ ##
 
 
diff --git a/tests/m4sugar.at b/tests/m4sugar.at
index f34f50c..d82675e 100644
--- a/tests/m4sugar.at
+++ b/tests/m4sugar.at
@@ -829,3 +829,145 @@ end
 ]])
 
 AT_CLEANUP
+
+
+## ---------- ##
+## m4_set_*.  ##
+## ---------- ##
+
+AT_SETUP([m4@&address@hidden)
+
+AT_KEYWORDS([m4@&address@hidden m4@&address@hidden m4@&address@hidden
+m4@&address@hidden m4@&address@hidden m4@&address@hidden m4@&address@hidden
+m4@&address@hidden m4@&address@hidden m4@&address@hidden m4@&address@hidden
+m4@&address@hidden m4@&address@hidden m4@&address@hidden m4@&address@hidden)
+
+# Simple tests
+AT_CHECK_M4SUGAR_TEXT([[m4_set_contains([a], [1], [yes], [no])
+m4_set_add([a], [1], [added], [dup])
+m4_set_contains([a], [1], [yes], [no])
+m4_set_add([a], [1], [added], [dup])
+m4_set_contents([a])
+m4_set_remove([a], [1], [removed], [missing])
+m4_set_contains([a], [1], [yes], [no])
+m4_set_remove([a], [1], [removed], [missing])
+m4_set_add([a], [2], [added], [dup])
+m4_set_empty([a], [yes], [no])
+m4_set_delete([a])
+m4_set_empty([a], [yes], [no])
+m4_set_add_all([c], [1], [2], [3])
+m4_set_add_all([a]m4_set_listc([c]))
+m4_set_contents([c], [-])
+m4_set_dump([a], [-])
+m4_set_contents([a])
+m4_set_add_all([a], [1], [2], [3])m4_set_add_all([b], [3], [], [4])
+m4_set_difference([a], [b])
+m4_set_difference([b], [a])
+m4_set_intersection([a], [b])
+m4_set_union([a], [b])
+m4_set_foreach([a], [i], [m4_if(m4_eval(i & 1), [1], [m4_set_remove([a], i)])])
+m4_set_list([a])
+m4_set_add([a], [])
+m4_set_list([a])
+m4_set_remove([a], [2])
+m4_dquote(m4_set_list([a]))
+m4_set_listc([a])
+m4_set_size([a])
+m4_set_delete([a])
+m4_dquote(m4_set_list([a]))
+m4_indir([m4_dquote]m4_set_listc([a]))
+m4_set_listc([a])
+m4_set_size([a])
+]], [[no
+added
+yes
+dup
+1
+removed
+no
+missing
+added
+no
+
+yes
+
+
+1-2-3
+3-2-1
+
+
+,1,2
+,,4
+,3
+,1,2,3,,4
+
+2
+
+2,
+
+[]
+,
+1
+
+[]
+
+
+0
+]])
+
+# Stress tests - check for unusual names/values
+AT_CHECK_M4SUGAR_TEXT([[m4_define([a], [oops])dnl
+m4_set_add([a], [a])dnl
+m4_set_remove([a], [oops], [yes], [no])
+m4_set_add([a,b], [c])dnl
+m4_set_add([a,b], [$*[]])dnl
+m4_set_add_all([a], [b,c])dnl
+m4_set_size([a])
+m4_count(m4_set_contents([a], [,]))
+m4_count(m4_set_list([a], [,]))
+m4_set_dump([a], [,])
+m4_set_contents([a,b], [,])
+m4_set_list([a,b])
+m4_set_foreach([$*[]], [$*[]], [oops])
+m4_set_add([$*[]], [])dnl
+m4_set_remove([$*[]], [a], [yes], [no])
+m4_set_add([$*[]], [a])dnl
+m4_set_foreach([$*[]], [$*[]], [-m4_defn([$*[]])m4_indir([$*[]])-])
+m4_set_remove([$*[]], [], [yes], [no])
+m4_set_add([c], [,])dnl
+m4_set_foreach([a,b], [set], [:m4_set_listc(_m4_defn([set])):])
+]],[[no
+2
+1
+2
+b,c,a
+c,$*[]
+c,$*[]
+
+no
+---aoops-
+yes
+:,,::,a:
+]])
+
+# Stress tests - check for linear scaling (won't necessarily fail if
+# quadratic, but hopefully users will complain if it appears to hang)
+AT_CHECK_M4SUGAR_TEXT([[dnl
+m4_for([i], [1], [10000], [], [m4_set_add([a], i)])dnl
+m4_set_add_all([b]m4_for([i], [1], [10000], [], [,i]))dnl
+m4_set_remove([a], [1])dnl
+m4_set_remove([b], [10000])dnl
+m4_set_add_all([a]m4_for([i], [1], [10000], [], [,i]))dnl
+m4_for([i], [1], [10000], [], [m4_set_add([b], i)])dnl
+m4_len(m4_set_contents([a]))
+m4_len(m4_set_foreach([b], [b], [m4_if(m4_eval(b & 1), [1],
+  [m4_set_remove([b], b, [-])])]))
+m4_set_size([b])
+m4_count(m4_shift(m4_set_intersection([a], [b])))
+]], [[38894
+5000
+5000
+5000
+]])
+
+AT_CLEANUP
-- 
1.5.6.4


reply via email to

[Prev in Thread] Current Thread [Next in Thread]