m4-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: needless fopens


From: Eric Blake
Subject: Re: needless fopens
Date: Wed, 10 Dec 2008 07:18:15 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.18) Gecko/20081105 Thunderbird/2.0.0.18 Mnenhy/0.7.5.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Eric Blake on 12/6/2008 5:17 PM:
> But before I apply this patch, I'm working on one other.
> So it looks like adding a cache of 1 or 2 hot diversion FILE*
> should cut the number of fopen from 301 down to 2, without sacrificing the
> 1.4.8 fix to avoid the arbitrary EMFILE limitation, for even more
> potential speedup.

Here's the state of the series against branch-1.6 that I'm probably
applying soon after one more rebase (first I have to port to master; and
probably to branch-1.4).  The first patch improves a corner case from
quadratic to linear - I added a new test that goes from minutes down to
seconds, by no longer copying 1 million bytes across 10000 files, but
rather renaming the file as it progresses through different diversion
numbers.  But autoconf seldom undiverts into an empty diversion, so not
much improvement in real-life cases.

The second patch does a better job at tracking in-memory diversion usage,
so that less I/O is performed for small diversions after a large diversion
has been encountered (my original patch on this series).  This gives some
benefit to any autoconf project where configure exceeds 256k.

Next, I introduced a 1-deep cache of spilled files.  The mere fact that a
diversion spilled means that it is likely to get more data, so it is worth
trying to keep it open as long as possible.  This is particularly true for
autoconf's usage pattern, where m4_require repeatedly collects small
expansions in GROW, then undiverts them onto large BODY.  On platforms
like Linux and cygwin, where you can successfully rename an open file,
this also speeds up the test in the first patch due to much less I/O
overhead from repeated fclose/rename/fopen sequences.  I still need to add
a configure.ac check for mingw (and probably choose the pessimistic result
for cross-compiling) that defines RENAME_OPEN_FILE_WORKS appropriately.

Then I noticed that a lot of time was spent seeking to position 0 on
reopens, only to seek back to the end for append.  Non-append streams are
inherently faster, because there is less seeking, so the fourth patch
changes things to use write mode and fewer seeks.

Finally, coreutils has such a big ac_require from gnulib that it toggled
between two hot files, so I enlarged the cache to two files instead of
one.  On coreutils, it cut things from 149 fopen to 2.

Here are some best-of-three runs on running autoconf on coreutils'
configure.ac (the resulting configure file was identical in all cases),
although I'm  not sure how much system load affected this.

pre-patch
real    0m18.019s
user    0m14.330s

patch1
real    0m18.038s
user    0m14.176s

patch2
real    0m17.528s
user    0m13.925s

patch3
real    0m17.302s
user    0m14.301s

patch4
real    0m16.914s
user    0m13.878s

patch5
real    0m16.648s
user    0m14.128s

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkk/z6cACgkQ84KuGfSFAYBbpwCgxUzr6RMxD2rsz5wOzKEocTdo
TKcAoIBtk+ezf1DcbRiF21I8DosMdGWW
=xczK
-----END PGP SIGNATURE-----
>From cf05567d8d47a22c6f882def1b14017c586752dc Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Tue, 9 Dec 2008 20:56:37 -0700
Subject: [PATCH] Avoid quadratic behavior for some cases of divert/undivert.

* src/output.c (struct m4_diversion): Improve comments.
(m4_tmpname, make_diversion): Strengthen preconditions.
(m4_tmprename): New function.
(output_init, output_exit): Move after internal functions.
(make_room_for): Don't bother copying uninitialized bytes.
(insert_diversion_helper): Transfer metadata, rather than copying
contents, when undiverting into a previously unused diversion.
* doc/m4.texinfo (Diversions): Add test.
(Undivert): Enhance test.
* NEWS: Document the speedup.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog      |   14 +++++
 NEWS           |    2 +-
 doc/m4.texinfo |   29 +++++++++++
 src/output.c   |  155 ++++++++++++++++++++++++++++++++++++++++----------------
 4 files changed, 156 insertions(+), 44 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index e991a8c..668fa7d 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,17 @@
+2008-12-09  Eric Blake  <address@hidden>
+
+       Avoid quadratic behavior for some cases of divert/undivert.
+       * src/output.c (struct m4_diversion): Improve comments.
+       (m4_tmpname, make_diversion): Strengthen preconditions.
+       (m4_tmprename): New function.
+       (output_init, output_exit): Move after internal functions.
+       (make_room_for): Don't bother copying uninitialized bytes.
+       (insert_diversion_helper): Transfer metadata, rather than copying
+       contents, when undiverting into a previously unused diversion.
+       * doc/m4.texinfo (Diversions): Add test.
+       (Undivert): Enhance test.
+       * NEWS: Document the speedup.
+
 2008-12-03  Eric Blake  <address@hidden>
 
        Stage 27: Allow embedded NUL in text processing macros.
diff --git a/NEWS b/NEWS
index 3fb4d90..df960e7 100644
--- a/NEWS
+++ b/NEWS
@@ -45,7 +45,7 @@ Foundation, Inc.
 ** The `divert' builtin now accepts an optional second argument of text
    that is immediately placed in the new diversion, regardless of whether
    the current expansion is nested within argument collection of another
-   macro.
+   macro.  It has also been optimized for faster performance.
 
 ** The `-d'/`--debug' command-line option now understands `-' and `+'
    modifiers, the way the builtin `debugmode' has always done; this allows
diff --git a/doc/m4.texinfo b/doc/m4.texinfo
index 2fb676d..36d0829 100644
--- a/doc/m4.texinfo
+++ b/doc/m4.texinfo
@@ -5530,7 +5530,12 @@ Diversions
 @result{}1048576
 divert(`1')
 f
+divert(`2')
+f
 divert(`-1')undivert
+divert(`1')bye
+^D
address@hidden
 @end example
 
 @comment Another test of spilled diversions.
@@ -5578,6 +5583,22 @@ Diversions
 sysval
 @result{}0
 @end example
+
address@hidden Avoid quadratic copying time when transferring diversions;
address@hidden test both in-memory and spilled to file.
+
address@hidden examples
address@hidden
+$ @kbd{m4 -I examples}
+include(`forloop2.m4')dnl
+divert(`1')format(`%10000s', `')dnl
+forloop(`i', `1', `10000',
+  `divert(incr(i))undivert(i)')dnl
+divert(`1')format(`%1000000s', `')dnl
+forloop(`i', `1', `10000',
+  `divert(incr(i))undivert(i)')dnl
+divert(`-1')undivert
address@hidden example
 @end ignore
 
 Diversions make it possible to generate output in a different order than
@@ -5805,6 +5826,14 @@ Undivert
 undivert
 @result{}diverted text
 @result{}
+divert(`1')more
+divert(`2')undivert(`1')diverted text`'divert
address@hidden
+undivert(`1')
address@hidden
+undivert(`2')
address@hidden
address@hidden text
 @end example
 
 When a diversion has been undiverted, the diverted text is discarded,
diff --git a/src/output.c b/src/output.c
index 6a02b80..0d18832 100644
--- a/src/output.c
+++ b/src/output.c
@@ -57,13 +57,13 @@ struct m4_diversion
   {
     union
       {
-       FILE *file;             /* diversion file on disk */
-       char *buffer;           /* in-memory diversion buffer */
-       m4_diversion *next;     /* free-list pointer */
+       FILE *file;             /* Diversion file on disk.  */
+       char *buffer;           /* Malloc'd diversion buffer.  */
+       m4_diversion *next;     /* Free-list pointer */
       } u;
-    int divnum;                        /* which diversion this represents */
-    int size;                  /* usable size before reallocation */
-    int used;                  /* used length in characters */
+    int divnum;                        /* Which diversion this represents.  */
+    int size;                  /* Usable size before reallocation.  */
+    int used;                  /* Used buffer length, or tmp file exists.  */
   };
 
 /* Table of diversions 1 through INT_MAX.  */
@@ -85,13 +85,25 @@ static int total_buffer_size;
    maintained for the `divnum' builtin function.  */
 int current_diversion;
 
-/* Current output diversion, NULL if output is being currently discarded.  */
+/* Current output diversion, NULL if output is being currently
+   discarded.  output_diversion->u is guaranteed non-NULL except when
+   the diversion has never been used; use size to determine if it is a
+   malloc'd buffer or a FILE.  output_diversion->used is 0 if u.file
+   is stdout, and non-zero if this is a malloc'd buffer or a temporary
+   diversion file.  */
 static m4_diversion *output_diversion;
 
-/* Values of some output_diversion fields, cached out for speed.  */
-static FILE *output_file;      /* current value of (file) */
-static char *output_cursor;    /* current value of (buffer + used) */
-static int output_unused;      /* current value of (size - used) */
+/* Cache of output_diversion->u.file, only valid when
+   output_diversion->size is 0.  */
+static FILE *output_file;
+
+/* Cache of output_diversion->u.buffer + output_diversion->used, only
+   valid when output_diversion->size is non-zero.  */
+static char *output_cursor;
+
+/* Cache of output_diversion->size - output_diversion->used, only
+   valid when output_diversion->size is non-zero.  */
+static int output_unused;
 
 /* Number of input line we are generating output for.  */
 int output_current_line;
@@ -101,9 +113,7 @@ static m4_temp_dir *output_temp_dir;
 
 
 
-/*------------------------.
-| Output initialization.  |
-`------------------------*/
+/* Internal routines.  */
 
 /* Callback for comparing list elements ELT1 and ELT2 for order in
    diversion_table.  */
@@ -127,28 +137,6 @@ threshold_diversion_CB (const void *elt, const void 
*threshold)
   return diversion->divnum >= *(const int *) threshold;
 }
 
-void
-output_init (void)
-{
-  diversion_table = gl_oset_create_empty (GL_AVLTREE_OSET, cmp_diversion_CB,
-                                         NULL);
-  div0.u.file = stdout;
-  output_diversion = &div0;
-  output_file = stdout;
-  obstack_init (&diversion_storage);
-}
-
-void
-output_exit (void)
-{
-  /* Order is important, since we may have registered cleanup_tmpfile
-     as an atexit handler, and it must not traverse stale memory.  */
-  gl_oset_t table = diversion_table;
-  diversion_table = NULL;
-  gl_oset_free (table);
-  obstack_free (&diversion_storage, NULL);
-}
-
 /* Clean up any temporary directory.  Designed for use as an atexit
    handler, where it is not safe to call exit() recursively; so this
    calls _exit if a problem is encountered.  */
@@ -196,6 +184,7 @@ m4_tmpname (int divnum)
       buffer = (char *) obstack_alloc (&diversion_storage,
                                       INT_BUFSIZE_BOUND (divnum));
     }
+  assert (0 < divnum);
   if (snprintf (&buffer[offset], INT_BUFSIZE_BOUND (divnum), "%d", divnum) < 0)
     m4_error (EXIT_FAILURE, errno, NULL,
              _("cannot create temporary file for diversion"));
@@ -205,9 +194,10 @@ m4_tmpname (int divnum)
 /* Create a temporary file for diversion DIVNUM open for reading and
    writing in a secure temp directory.  The file will be automatically
    closed and deleted on a fatal signal.  The file can be closed and
-   reopened with m4_tmpclose and m4_tmpopen; when finally done with
-   the file, close it with m4_tmpremove.  Exits on failure, so the
-   return value is always an open file.  */
+   reopened with m4_tmpclose and m4_tmpopen, or moved with
+   m4_tmprename; when finally done with the file, close it with
+   m4_tmpremove.  Exits on failure, so the return value is always an
+   open file.  */
 static FILE *
 m4_tmpfile (int divnum)
 {
@@ -273,6 +263,51 @@ m4_tmpremove (int divnum)
   return cleanup_temp_file (output_temp_dir, m4_tmpname (divnum));
 }
 
+/* Transfer the temporary file for diversion OLDNUM to the previously
+   unused diversion NEWNUM.  Return an open stream visiting the new
+   temporary file, exiting on failure.  */
+static FILE*
+m4_tmprename (int oldnum, int newnum)
+{
+  /* m4_tmpname reuses its return buffer.  */
+  char *oldname = xstrdup (m4_tmpname (oldnum));
+  const char *newname = m4_tmpname (newnum);
+  register_temp_file (output_temp_dir, newname);
+  if (rename (oldname, newname))
+    m4_error (EXIT_FAILURE, errno, NULL,
+             _("cannot create temporary file for diversion"));
+  unregister_temp_file (output_temp_dir, oldname);
+  free (oldname);
+  return m4_tmpopen (newnum);
+}
+
+
+/*------------------------.
+| Output initialization.  |
+`------------------------*/
+
+void
+output_init (void)
+{
+  diversion_table = gl_oset_create_empty (GL_AVLTREE_OSET, cmp_diversion_CB,
+                                         NULL);
+  div0.u.file = stdout;
+  output_diversion = &div0;
+  output_file = stdout;
+  obstack_init (&diversion_storage);
+}
+
+void
+output_exit (void)
+{
+  /* Order is important, since we may have registered cleanup_tmpfile
+     as an atexit handler, and it must not traverse stale memory.  */
+  gl_oset_t table = diversion_table;
+  diversion_table = NULL;
+  gl_oset_free (table);
+  obstack_free (&diversion_storage, NULL);
+}
+
 /*-----------------------------------------------------------------------.
 | Reorganize in-memory diversion buffers so the current diversion can   |
 | accomodate LENGTH more characters without further reorganization.  The |
@@ -381,8 +416,12 @@ make_room_for (int length)
        }
 
       /* The current buffer may be safely reallocated.  */
-      output_diversion->u.buffer = xrealloc (output_diversion->u.buffer,
-                                            (size_t) wanted_size);
+      {
+       char *buffer = output_diversion->u.buffer;
+       output_diversion->u.buffer = xcharalloc ((size_t) wanted_size);
+       memcpy (output_diversion->u.buffer, buffer, output_diversion->used);
+       free (buffer);
+      }
 
       total_buffer_size += wanted_size - output_diversion->size;
       output_diversion->size = wanted_size;
@@ -613,10 +652,10 @@ make_diversion (int divnum)
     {
       if (!output_diversion->size && !output_diversion->u.file)
        {
+         assert (!output_diversion->used);
          if (!gl_oset_remove (diversion_table, output_diversion))
            assert (false);
          output_diversion->u.next = free_list;
-         output_diversion->used = 0;
          free_list = output_diversion;
        }
       else if (output_diversion->size)
@@ -732,7 +771,37 @@ insert_diversion_helper (m4_diversion *diversion)
   if (output_diversion)
     {
       if (diversion->size)
-       output_text (diversion->u.buffer, diversion->used);
+       {
+         if (!output_diversion->u.file)
+           {
+             /* Transferring diversion metadata is faster than
+                copying contents.  */
+             assert (!output_diversion->used && output_diversion != &div0
+                     && !output_file);
+             output_diversion->u.buffer = diversion->u.buffer;
+             output_diversion->size = diversion->size;
+             output_cursor = diversion->u.buffer + diversion->used;
+             output_unused = diversion->size - diversion->used;
+             diversion->u.buffer = NULL;
+           }
+         else
+           {
+             output_text (diversion->u.buffer, diversion->used);
+           }
+       }
+      else if (!output_diversion->u.file)
+       {
+         /* Transferring diversion metadata is faster than copying
+            contents.  */
+         assert (!output_diversion->used && output_diversion != &div0
+                 && !output_file);
+         output_diversion->u.file = m4_tmprename (diversion->divnum,
+                                                  output_diversion->divnum);
+         output_diversion->used = 1;
+         output_file = output_diversion->u.file;
+         diversion->u.file = NULL;
+         diversion->size = 1;
+       }
       else
        {
          if (!diversion->u.file)
-- 
1.6.0.4


>From 17609d1929f7c5677f4d856b0d0c4fbd389eed83 Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Tue, 9 Dec 2008 21:23:44 -0700
Subject: [PATCH] Correctly track size of in-memory diversions.

* src/output.c (insert_diversion_helper): Correctly track total
in-memory diversion size after undivert.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog    |    4 ++++
 src/output.c |    6 ++++++
 2 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 668fa7d..4df7b6e 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,9 @@
 2008-12-09  Eric Blake  <address@hidden>
 
+       Correctly track size of in-memory diversions.
+       * src/output.c (insert_diversion_helper): Correctly track total
+       in-memory diversion size after undivert.
+
        Avoid quadratic behavior for some cases of divert/undivert.
        * src/output.c (struct m4_diversion): Improve comments.
        (m4_tmpname, make_diversion): Strengthen preconditions.
diff --git a/src/output.c b/src/output.c
index 0d18832..32cc258 100644
--- a/src/output.c
+++ b/src/output.c
@@ -786,6 +786,10 @@ insert_diversion_helper (m4_diversion *diversion)
            }
          else
            {
+             /* Avoid double-charging the total in-memory size when
+                transferring from one in-memory diversion to
+                another.  */
+             total_buffer_size -= diversion->size;
              output_text (diversion->u.buffer, diversion->used);
            }
        }
@@ -815,6 +819,8 @@ insert_diversion_helper (m4_diversion *diversion)
   /* Return all space used by the diversion.  */
   if (diversion->size)
     {
+      if (!output_diversion)
+       total_buffer_size -= diversion->size;
       free (diversion->u.buffer);
       diversion->size = 0;
       diversion->used = 0;
-- 
1.6.0.4


>From 2e23797f3dd0ed378480b823831cc27f227c06a7 Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Tue, 9 Dec 2008 21:36:32 -0700
Subject: [PATCH] Cache most recently spilled diversion.

* src/output.c (tmp_file, tmp_file_owner): New variables, for
1-deep cache of spilled diversions.
(m4_tmpfile): Open in append mode, since we might revisit
diversion without closing it now.
(m4_tmpopen): Check cache first.
(m4_tmpclose): Update cache, rather than closing.  Add parameter.
(m4_tmpremove): Close cache before removing.
(m4_tmprename): Deal with open files when renaming.
(output_exit): Close cache before exiting.
(make_room_for, make_diversion, insert_diversion_helper): Adjust
callers.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog    |   13 +++++++++++
 src/output.c |   65 ++++++++++++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 70 insertions(+), 8 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 4df7b6e..f8f66e3 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,18 @@
 2008-12-09  Eric Blake  <address@hidden>
 
+       Cache most recently spilled diversion.
+       * src/output.c (tmp_file, tmp_file_owner): New variables, for
+       1-deep cache of spilled diversions.
+       (m4_tmpfile): Open in append mode, since we might revisit
+       diversion without closing it now.
+       (m4_tmpopen): Check cache first.
+       (m4_tmpclose): Update cache, rather than closing.  Add parameter.
+       (m4_tmpremove): Close cache before removing.
+       (m4_tmprename): Deal with open files when renaming.
+       (output_exit): Close cache before exiting.
+       (make_room_for, make_diversion, insert_diversion_helper): Adjust
+       callers.
+
        Correctly track size of in-memory diversions.
        * src/output.c (insert_diversion_helper): Correctly track total
        in-memory diversion size after undivert.
diff --git a/src/output.c b/src/output.c
index 32cc258..26818db 100644
--- a/src/output.c
+++ b/src/output.c
@@ -111,6 +111,11 @@ int output_current_line;
 /* Temporary directory holding all spilled diversion files.  */
 static m4_temp_dir *output_temp_dir;
 
+/* Cache of most recently used spilled diversion file.  */
+static FILE *tmp_file;
+
+/* Diversion that owns tmp_file, or 0.  */
+static int tmp_file_owner;
 
 
 /* Internal routines.  */
@@ -214,7 +219,7 @@ m4_tmpfile (int divnum)
     }
   name = m4_tmpname (divnum);
   register_temp_file (output_temp_dir, name);
-  file = fopen_temp (name, O_BINARY ? "wb+" : "w+");
+  file = fopen_temp (name, O_BINARY ? "ab+" : "a+");
   if (file == NULL)
     {
       unregister_temp_file (output_temp_dir, name);
@@ -232,9 +237,17 @@ m4_tmpfile (int divnum)
 static FILE *
 m4_tmpopen (int divnum)
 {
-  const char *name = m4_tmpname (divnum);
+  const char *name;
   FILE *file;
 
+  if (tmp_file_owner == divnum)
+    {
+      if (fseeko (tmp_file, 0, SEEK_SET) != 0)
+       m4_error (EXIT_FAILURE, errno, NULL,
+                 _("cannot seek to beginning of diversion"));
+      return tmp_file;
+    }
+  name = m4_tmpname (divnum);
   file = fopen_temp (name, O_BINARY ? "ab+" : "a+");
   if (file == NULL)
     m4_error (EXIT_FAILURE, errno, NULL,
@@ -249,17 +262,36 @@ m4_tmpopen (int divnum)
   return file;
 }
 
-/* Close, but don't delete, a temporary FILE.  */
+/* Close, but don't delete, a temporary FILE for diversion DIVNUM.  To
+   reduce the I/O overhead of repeatedly opening and closing the same
+   file, this implementation caches the most recent spilled diversion.
+   On the other hand, keeping every spilled diversion open would run
+   into EMFILE limits.  */
 static int
-m4_tmpclose (FILE *file)
+m4_tmpclose (FILE *file, int divnum)
 {
-  return close_stream_temp (file);
+  int result = 0;
+  if (divnum != tmp_file_owner)
+    {
+      if (tmp_file_owner)
+       result = close_stream_temp (tmp_file);
+      tmp_file = file;
+      tmp_file_owner = divnum;
+    }
+  return result;
 }
 
 /* Delete a closed temporary FILE for diversion DIVNUM.  */
 static int
 m4_tmpremove (int divnum)
 {
+  if (divnum == tmp_file_owner)
+    {
+      int result = close_stream_temp (tmp_file);
+      if (result)
+       return result;
+      tmp_file_owner = 0;
+    }
   return cleanup_temp_file (output_temp_dir, m4_tmpname (divnum));
 }
 
@@ -273,6 +305,21 @@ m4_tmprename (int oldnum, int newnum)
   char *oldname = xstrdup (m4_tmpname (oldnum));
   const char *newname = m4_tmpname (newnum);
   register_temp_file (output_temp_dir, newname);
+  if (oldnum == tmp_file_owner)
+    {
+      /* Be careful of mingw, which can't rename an open file.  */
+      if (1/*RENAME_OPEN_FILES_WORKS*/)
+       tmp_file_owner = newnum;
+      else
+       {
+         if (close_stream_temp (tmp_file))
+           m4_error (EXIT_FAILURE, errno, NULL,
+                     _("cannot close temporary file for diversion"));
+         tmp_file_owner = 0;
+       }
+    }
+  /* Either it is safe to rename an open file, or no one should have
+     oldname open at this point.  */
   if (rename (oldname, newname))
     m4_error (EXIT_FAILURE, errno, NULL,
              _("cannot create temporary file for diversion"));
@@ -303,6 +350,8 @@ output_exit (void)
   /* Order is important, since we may have registered cleanup_tmpfile
      as an atexit handler, and it must not traverse stale memory.  */
   gl_oset_t table = diversion_table;
+  if (tmp_file_owner)
+    m4_tmpremove (tmp_file_owner);
   diversion_table = NULL;
   gl_oset_free (table);
   obstack_free (&diversion_storage, NULL);
@@ -410,7 +459,7 @@ make_room_for (int length)
        {
          FILE *file = selected_diversion->u.file;
          selected_diversion->u.file = NULL;
-         if (m4_tmpclose (file) != 0)
+         if (m4_tmpclose (file, selected_diversion->divnum) != 0)
            m4_warn (errno, NULL,
                     _("cannot close temporary file for diversion"));
        }
@@ -664,7 +713,7 @@ make_diversion (int divnum)
        {
          FILE *file = output_diversion->u.file;
          output_diversion->u.file = NULL;
-         if (m4_tmpclose (file) != 0)
+         if (m4_tmpclose (file, output_diversion->divnum) != 0)
            m4_warn (errno, NULL,
                     _("cannot close temporary file for diversion"));
        }
@@ -832,7 +881,7 @@ insert_diversion_helper (m4_diversion *diversion)
          FILE *file = diversion->u.file;
          diversion->u.file = NULL;
          diversion->used = 0;
-         if (m4_tmpclose (file) != 0)
+         if (m4_tmpclose (file, diversion->divnum) != 0)
            m4_warn (errno, NULL,
                     _("cannot clean temporary file for diversion"));
        }
-- 
1.6.0.4


>From a33293ae4fc5315b97026a4b536d61c7c2837cc2 Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Wed, 10 Dec 2008 06:15:53 -0700
Subject: [PATCH] Use fewer seeks on cached files.

* src/output.c (m4_tmpfile): Use write, not append mode.
(m4_tmpopen): Add parameter to decide when to skip seeks.
(m4_tmprename, make_diversion, insert_diversion_helper)
(freeze_diversions): Adjust callers.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog    |    6 ++++++
 src/output.c |   39 +++++++++++++++++++++------------------
 2 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index f8f66e3..e45a394 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,11 @@
 2008-12-09  Eric Blake  <address@hidden>
 
+       Use fewer seeks on cached files.
+       * src/output.c (m4_tmpfile): Use write, not append mode.
+       (m4_tmpopen): Add parameter to decide when to skip seeks.
+       (m4_tmprename, make_diversion, insert_diversion_helper)
+       (freeze_diversions): Adjust callers.
+
        Cache most recently spilled diversion.
        * src/output.c (tmp_file, tmp_file_owner): New variables, for
        1-deep cache of spilled diversions.
diff --git a/src/output.c b/src/output.c
index 26818db..f3ee06b 100644
--- a/src/output.c
+++ b/src/output.c
@@ -219,7 +219,7 @@ m4_tmpfile (int divnum)
     }
   name = m4_tmpname (divnum);
   register_temp_file (output_temp_dir, name);
-  file = fopen_temp (name, O_BINARY ? "ab+" : "a+");
+  file = fopen_temp (name, O_BINARY ? "wb+" : "w+");
   if (file == NULL)
     {
       unregister_temp_file (output_temp_dir, name);
@@ -232,33 +232,36 @@ m4_tmpfile (int divnum)
 }
 
 /* Reopen a temporary file for diversion DIVNUM for reading and
-   writing in a secure temp directory.  Exits on failure, so the
-   return value is always an open file.  */
+   writing in a secure temp directory.  If REREAD, the file is
+   positioned at offset 0, otherwise the file is posistioned at the
+   end.  Exits on failure, so the return value is always an open
+   file.  */
 static FILE *
-m4_tmpopen (int divnum)
+m4_tmpopen (int divnum, bool reread)
 {
   const char *name;
   FILE *file;
 
   if (tmp_file_owner == divnum)
     {
-      if (fseeko (tmp_file, 0, SEEK_SET) != 0)
+      if (reread && fseeko (tmp_file, 0, SEEK_SET) != 0)
        m4_error (EXIT_FAILURE, errno, NULL,
-                 _("cannot seek to beginning of diversion"));
+                 _("cannot seek within diversion"));
       return tmp_file;
     }
   name = m4_tmpname (divnum);
-  file = fopen_temp (name, O_BINARY ? "ab+" : "a+");
+  /* We need update mode, to avoid truncation.  */
+  file = fopen_temp (name, O_BINARY ? "rb+" : "r+");
   if (file == NULL)
     m4_error (EXIT_FAILURE, errno, NULL,
              _("cannot create temporary file for diversion"));
   else if (set_cloexec_flag (fileno (file), true) != 0)
     m4_warn (errno, NULL, _("cannot protect diversion across forks"));
-  /* POSIX states that it is undefined whether an append stream starts
-     at offset 0 or at the end.  We want the beginning.  */
-  else if (fseeko (file, 0, SEEK_SET) != 0)
+  /* Update mode starts at the beginning of the stream, but sometimes
+     we want the end.  */
+  else if (!reread && fseeko (file, 0, SEEK_END) != 0)
     m4_error (EXIT_FAILURE, errno, NULL,
-             _("cannot seek to beginning of diversion"));
+             _("cannot seek within diversion"));
   return file;
 }
 
@@ -297,7 +300,7 @@ m4_tmpremove (int divnum)
 
 /* Transfer the temporary file for diversion OLDNUM to the previously
    unused diversion NEWNUM.  Return an open stream visiting the new
-   temporary file, exiting on failure.  */
+   temporary file for further writes, exiting on failure.  */
 static FILE*
 m4_tmprename (int oldnum, int newnum)
 {
@@ -325,7 +328,7 @@ m4_tmprename (int oldnum, int newnum)
              _("cannot create temporary file for diversion"));
   unregister_temp_file (output_temp_dir, oldname);
   free (oldname);
-  return m4_tmpopen (newnum);
+  return m4_tmpopen (newnum, false);
 }
 
 
@@ -770,7 +773,8 @@ make_diversion (int divnum)
   else
     {
       if (!output_diversion->u.file && output_diversion->used)
-       output_diversion->u.file = m4_tmpopen (output_diversion->divnum);
+       output_diversion->u.file = m4_tmpopen (output_diversion->divnum,
+                                              false);
       output_file = output_diversion->u.file;
     }
   output_current_line = -1;
@@ -858,7 +862,7 @@ insert_diversion_helper (m4_diversion *diversion)
       else
        {
          if (!diversion->u.file)
-           diversion->u.file = m4_tmpopen (diversion->divnum);
+           diversion->u.file = m4_tmpopen (diversion->divnum, true);
          insert_file (diversion->u.file);
        }
 
@@ -872,7 +876,6 @@ insert_diversion_helper (m4_diversion *diversion)
        total_buffer_size -= diversion->size;
       free (diversion->u.buffer);
       diversion->size = 0;
-      diversion->used = 0;
     }
   else
     {
@@ -880,7 +883,6 @@ insert_diversion_helper (m4_diversion *diversion)
        {
          FILE *file = diversion->u.file;
          diversion->u.file = NULL;
-         diversion->used = 0;
          if (m4_tmpclose (file, diversion->divnum) != 0)
            m4_warn (errno, NULL,
                     _("cannot clean temporary file for diversion"));
@@ -888,6 +890,7 @@ insert_diversion_helper (m4_diversion *diversion)
       if (m4_tmpremove (diversion->divnum) != 0)
        m4_warn (errno, NULL, _("cannot clean temporary file for diversion"));
     }
+  diversion->used = 0;
   gl_oset_remove (diversion_table, diversion);
   diversion->u.next = free_list;
   free_list = diversion;
@@ -964,7 +967,7 @@ freeze_diversions (FILE *file)
          else
            {
              struct stat file_stat;
-             diversion->u.file = m4_tmpopen (diversion->divnum);
+             diversion->u.file = m4_tmpopen (diversion->divnum, true);
              if (fstat (fileno (diversion->u.file), &file_stat) < 0)
                m4_error (EXIT_FAILURE, errno, NULL,
                          _("cannot stat diversion"));
-- 
1.6.0.4


>From 5f33d666c5f545f451ced5ddc21f01cba2b05cfe Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Wed, 10 Dec 2008 07:16:51 -0700
Subject: [PATCH] Double size of temp file cache.

* src/output.c (tmp_file, tmp_file_owner): Split...
(tmp_file1, tmp_file2, tmp_file1_owner, tmp_file2_owner): ...into
two variables.
(tmp_file2_recent): New variable.
(m4_tmpopen, m4_tmpclose, m4_tmpremove, m4_tmprename)
(output_exit): Adjust callers.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog    |   10 ++++++
 src/output.c |   88 ++++++++++++++++++++++++++++++++++++++++++++--------------
 2 files changed, 77 insertions(+), 21 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index e45a394..aa7e232 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,13 @@
+2008-12-10  Eric Blake  <address@hidden>
+
+       Double size of temp file cache.
+       * src/output.c (tmp_file, tmp_file_owner): Split...
+       (tmp_file1, tmp_file2, tmp_file1_owner, tmp_file2_owner): ...into
+       two variables.
+       (tmp_file2_recent): New variable.
+       (m4_tmpopen, m4_tmpclose, m4_tmpremove, m4_tmprename)
+       (output_exit): Adjust callers.
+
 2008-12-09  Eric Blake  <address@hidden>
 
        Use fewer seeks on cached files.
diff --git a/src/output.c b/src/output.c
index f3ee06b..f7301df 100644
--- a/src/output.c
+++ b/src/output.c
@@ -111,11 +111,16 @@ int output_current_line;
 /* Temporary directory holding all spilled diversion files.  */
 static m4_temp_dir *output_temp_dir;
 
-/* Cache of most recently used spilled diversion file.  */
-static FILE *tmp_file;
+/* Cache of most recently used spilled diversion files.  */
+static FILE *tmp_file1;
+static FILE *tmp_file2;
 
-/* Diversion that owns tmp_file, or 0.  */
-static int tmp_file_owner;
+/* Diversions that owns tmp_file, or 0.  */
+static int tmp_file1_owner;
+static int tmp_file2_owner;
+
+/* True if tmp_file2 is more recently used.  */
+static bool tmp_file2_recent;
 
 
 /* Internal routines.  */
@@ -242,12 +247,21 @@ m4_tmpopen (int divnum, bool reread)
   const char *name;
   FILE *file;
 
-  if (tmp_file_owner == divnum)
+  if (tmp_file1_owner == divnum)
+    {
+      if (reread && fseeko (tmp_file1, 0, SEEK_SET) != 0)
+       m4_error (EXIT_FAILURE, errno, NULL,
+                 _("cannot seek within diversion"));
+      tmp_file2_recent = false;
+      return tmp_file1;
+    }
+  else if (tmp_file2_owner == divnum)
     {
-      if (reread && fseeko (tmp_file, 0, SEEK_SET) != 0)
+      if (reread && fseeko (tmp_file2, 0, SEEK_SET) != 0)
        m4_error (EXIT_FAILURE, errno, NULL,
                  _("cannot seek within diversion"));
-      return tmp_file;
+      tmp_file2_recent = true;
+      return tmp_file2;
     }
   name = m4_tmpname (divnum);
   /* We need update mode, to avoid truncation.  */
@@ -274,12 +288,22 @@ static int
 m4_tmpclose (FILE *file, int divnum)
 {
   int result = 0;
-  if (divnum != tmp_file_owner)
+  if (divnum != tmp_file1_owner && divnum != tmp_file2_owner)
     {
-      if (tmp_file_owner)
-       result = close_stream_temp (tmp_file);
-      tmp_file = file;
-      tmp_file_owner = divnum;
+      if (tmp_file2_recent)
+       {
+         if (tmp_file1_owner)
+           result = close_stream_temp (tmp_file1);
+         tmp_file1 = file;
+         tmp_file1_owner = divnum;
+       }
+      else
+       {
+         if (tmp_file2_owner)
+           result = close_stream_temp (tmp_file2);
+         tmp_file2 = file;
+         tmp_file2_owner = divnum;
+       }
     }
   return result;
 }
@@ -288,12 +312,19 @@ m4_tmpclose (FILE *file, int divnum)
 static int
 m4_tmpremove (int divnum)
 {
-  if (divnum == tmp_file_owner)
+  if (divnum == tmp_file1_owner)
     {
-      int result = close_stream_temp (tmp_file);
+      int result = close_stream_temp (tmp_file1);
       if (result)
        return result;
-      tmp_file_owner = 0;
+      tmp_file1_owner = 0;
+    }
+  else if (divnum == tmp_file2_owner)
+    {
+      int result = close_stream_temp (tmp_file2);
+      if (result)
+       return result;
+      tmp_file2_owner = 0;
     }
   return cleanup_temp_file (output_temp_dir, m4_tmpname (divnum));
 }
@@ -308,17 +339,30 @@ m4_tmprename (int oldnum, int newnum)
   char *oldname = xstrdup (m4_tmpname (oldnum));
   const char *newname = m4_tmpname (newnum);
   register_temp_file (output_temp_dir, newname);
-  if (oldnum == tmp_file_owner)
+  if (oldnum == tmp_file1_owner)
+    {
+      /* Be careful of mingw, which can't rename an open file.  */
+      if (1/*RENAME_OPEN_FILES_WORKS*/)
+       tmp_file1_owner = newnum;
+      else
+       {
+         if (close_stream_temp (tmp_file1))
+           m4_error (EXIT_FAILURE, errno, NULL,
+                     _("cannot close temporary file for diversion"));
+         tmp_file1_owner = 0;
+       }
+    }
+  else if (oldnum == tmp_file2_owner)
     {
       /* Be careful of mingw, which can't rename an open file.  */
       if (1/*RENAME_OPEN_FILES_WORKS*/)
-       tmp_file_owner = newnum;
+       tmp_file2_owner = newnum;
       else
        {
-         if (close_stream_temp (tmp_file))
+         if (close_stream_temp (tmp_file2))
            m4_error (EXIT_FAILURE, errno, NULL,
                      _("cannot close temporary file for diversion"));
-         tmp_file_owner = 0;
+         tmp_file2_owner = 0;
        }
     }
   /* Either it is safe to rename an open file, or no one should have
@@ -353,8 +397,10 @@ output_exit (void)
   /* Order is important, since we may have registered cleanup_tmpfile
      as an atexit handler, and it must not traverse stale memory.  */
   gl_oset_t table = diversion_table;
-  if (tmp_file_owner)
-    m4_tmpremove (tmp_file_owner);
+  if (tmp_file1_owner)
+    m4_tmpremove (tmp_file1_owner);
+  if (tmp_file2_owner)
+    m4_tmpremove (tmp_file2_owner);
   diversion_table = NULL;
   gl_oset_free (table);
   obstack_free (&diversion_storage, NULL);
-- 
1.6.0.4


reply via email to

[Prev in Thread] Current Thread [Next in Thread]