m4-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: branch-1_4 read stdin twice


From: Eric Blake
Subject: Re: branch-1_4 read stdin twice
Date: Mon, 11 Sep 2006 07:15:22 -0600
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.5) Gecko/20060719 Thunderbird/1.5.0.5 Mnenhy/0.7.4.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Eric Blake on 9/7/2006 4:46 PM:
> Oops - we correctly passed our own stdin to cat (as documented in our 
> manual), 
> but did not first set our file position, even though file is seekable.  So 
> cat 
> was starting from where our buffer ended, as opposed to the character we left 
> off processing at, and the buffer size determined whether m4 or cat processed 
> the final foo in the file.  While POSIX requires that processes that 
> terminate 
> without consuming all of stdin restore the file position, it is silent on 
> what 
> the file position is when invoking a child process; but for consistency, I 
> would rather have [e]syscmd start with stdin in the same place as we would 
> next 
> process.  I may find a good patch for that later.
> 

And here it is.  This patch even has the side-effect of working around
cygwin's bug in fclose(stdin), in that cygwin does not (yet) reset the
position of the underlying fd to the next character that would be read
(running the POSIX-specified example of "(sed -n 1q; cat) < file", using
the same GNU version of sed and cat, on both Solaris and cygwin, shows the
cygwin bug).  However, the patch is not in effect for mingw - there, fseek
is documented to not work on text files (which I validated in tests), so
the only safe way to use fseek is to do all I/O in binary mode, and I'm
not sure what the impact would be in switching the 1.4.x branch to all
binary mode.  However, for CVS head, I am comfortable making mingw switch
to all binary, along with the caveat that so far I've never tried to
target mingw with CVS head yet.

2006-09-11  Eric Blake  <address@hidden>

        * src/Makefile.am (m4_LDADD): Add any gnulib dependent libraries.
        * src/debug.c (debug_flush_files) [UNIX]: Flush stdin if it is
        seekable.
        (debug_set_file): Use STDOUT_FILENO.
        * src/builtin.c (m4_m4exit): Flush stdin before exiting, to comply
        with POSIX in regards to unread input.
        * NEWS: Document this fix.
        * doc/m4.texinfo (Syscmd, Esyscmd, M4exit): Likewise.

- --
Life is short - so eat dessert first!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFBWFp84KuGfSFAYARAmUaAKCfUbdIHcLqLJOVNteevfCBsENMqwCgqTU0
qN89n1wAA+qouDIpVmVjv5E=
=5wQO
-----END PGP SIGNATURE-----
Index: NEWS
===================================================================
RCS file: /sources/m4/m4/NEWS,v
retrieving revision 1.1.1.1.2.60
diff -u -p -r1.1.1.1.2.60 NEWS
--- NEWS        7 Sep 2006 22:48:15 -0000       1.1.1.1.2.60
+++ NEWS        11 Sep 2006 13:04:14 -0000
@@ -12,6 +12,9 @@ Version 1.4.7 - ?? ??? 2006, by ??  (CVS
   is not closed until all wrapped text is handled.  This makes a
   difference when stdin is not a regular file, and also fixes bugs when
   using the syscmd or esyscmd macros from wrapped text.
+* When standard input is a seekable file, the m4exit, syscmd, and esyscmd
+  macros now restore the current position to the next unread byte rather
+  than discarding an arbitrary amount of buffered data.
 
 Version 1.4.6 - 25 August 2006, by Eric Blake  (CVS version 1.4.5a)
 
Index: doc/m4.texinfo
===================================================================
RCS file: /sources/m4/m4/doc/m4.texinfo,v
retrieving revision 1.1.1.1.2.74
diff -u -p -r1.1.1.1.2.74 m4.texinfo
--- doc/m4.texinfo      7 Sep 2006 22:48:15 -0000       1.1.1.1.2.74
+++ doc/m4.texinfo      11 Sep 2006 13:04:15 -0000
@@ -4093,7 +4093,7 @@ The expansion of @code{syscmd} is void, 
 are not read by @code{m4}.  @xref{Esyscmd}, if you need to process the
 command output.
 
-Prior to executing the command, @code{m4} flushes its output buffers.
+Prior to executing the command, @code{m4} flushes its buffers.
 The default standard input, output and error of @var{shell-command} are
 the same as those of @code{m4}.
 
@@ -4119,7 +4119,11 @@ for the @code{cat} subcommand.  Therefor
 using standard input (either by specifying no files, or by passing
 @samp{-} as a file name on the command line, @pxref{Invoking m4}), and
 also invoking subcommands via @code{syscmd} or @code{esyscmd} that
-consume data from standard input.
+consume data from standard input.  When standard input is a seekable
+file, the subprocess will pick up with the next character not yet
+processed by @code{m4}; when it is a pipe or other non-seekable file,
+there is no guarantee how much data will already be buffered by
address@hidden and thus unavailable to the child.
 
 @ignore
 @comment If the user types the example below with stdin being an
@@ -4148,7 +4152,7 @@ If you want @code{m4} to read the output
 Expands to the standard output of the shell command
 @var{shell-command}.
 
-Prior to executing the command, @code{m4} flushes its output buffers.
+Prior to executing the command, @code{m4} flushes its buffers.
 The default standard input and error output of @var{shell-command} are
 the same as those of @code{m4}.  The error output of @var{shell-command}
 is not a part of the expansion: it will appear along with the error
@@ -4168,6 +4172,9 @@ esyscmd(`echo foo')
 Note how the expansion of @code{esyscmd} keeps the trailing newline of
 the command, as well as using the newline that appeared after the macro.
 
+Just as with @code{syscmd}, care must be exercised when sharing standard
+input between @code{m4} and the child process of @code{esyscmd}.
+
 @node Sysval
 @section Exit status
 
@@ -4452,6 +4459,11 @@ what was requested by @code{m4exit}.  If
 error, such as a write error on standard out, the exit status will be
 non-zero even if @code{m4exit} requested zero.
 
+If standard input is seekable, then the file will be positioned at the
+next unread character.  If it is a pipe or other non-seekable file,
+then there are no guarantees how much data @code{m4} might have read
+into buffers, and thus discarded.
+
 @node Frozen files
 @chapter Fast loading of frozen state
 
Index: src/Makefile.am
===================================================================
RCS file: /sources/m4/m4/src/Attic/Makefile.am,v
retrieving revision 1.18.2.2
diff -u -p -r1.18.2.2 Makefile.am
--- src/Makefile.am     3 Jul 2006 17:01:07 -0000       1.18.2.2
+++ src/Makefile.am     11 Sep 2006 13:04:15 -0000
@@ -26,4 +26,4 @@ macro.c output.c path.c symtab.c
 if STACKOVF
 m4_SOURCES += stackovf.c
 endif
-m4_LDADD = ../lib/libm4.a
+m4_LDADD = ../lib/libm4.a $(LIBM4_LIBDEPS)
Index: src/builtin.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/builtin.c,v
retrieving revision 1.1.1.1.2.39
diff -u -p -r1.1.1.1.2.39 builtin.c
--- src/builtin.c       7 Sep 2006 22:48:15 -0000       1.1.1.1.2.39
+++ src/builtin.c       11 Sep 2006 13:04:15 -0000
@@ -1291,15 +1291,16 @@ m4_m4exit (struct obstack *obs, int argc
                "exit status out of range: `%d'", exit_code));
       exit_code = EXIT_FAILURE;
     }
+  /* Change debug stream back to stderr, to force flushing debug stream and
+     detect any errors it might have encountered.  */
+  debug_set_output (NULL);
+  debug_flush_files ();
   if (close_stream (stdout) != 0)
     {
       M4ERROR ((warning_status, errno, "write error"));
       if (exit_code == 0)
        exit_code = EXIT_FAILURE;
     }
-  /* Change debug stream back to stderr, to force flushing debug stream and
-     detect any errors it might have encountered.  */
-  debug_set_output (NULL);
   if (exit_code == 0 && retcode != 0)
     exit_code = retcode;
   exit (exit_code);
Index: src/debug.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/debug.c,v
retrieving revision 1.1.1.1.2.9
diff -u -p -r1.1.1.1.2.9 debug.c
--- src/debug.c 23 Aug 2006 11:30:11 -0000      1.1.1.1.2.9
+++ src/debug.c 11 Sep 2006 13:04:15 -0000
@@ -142,7 +142,7 @@ debug_set_file (FILE *fp)
 
   if (debug != NULL && debug != stdout)
     {
-      if (fstat (fileno (stdout), &stdout_stat) < 0)
+      if (fstat (STDOUT_FILENO, &stdout_stat) < 0)
        return;
       if (fstat (fileno (debug), &debug_stat) < 0)
        return;
@@ -175,6 +175,27 @@ debug_flush_files (void)
   fflush (stderr);
   if (debug != NULL && debug != stdout && debug != stderr)
     fflush (debug);
+  /* POSIX requires that if m4 doesn't consume all input, but stdin is
+     opened on a seekable file, that the file pointer be left at the
+     next character on exit (but places no restrictions on the file
+     pointer location on a non-seekable file).  It also requires that
+     fflush() followed by fseek() on an input file set the underlying
+     file pointer.  However, fflush() on a non-seekable file can lose
+     buffered data, which we might otherwise want to process after
+     syscmd.  Hence, we must check whether stdin is seekable.  We must
+     also be tolerant of operating with stdin closed, so we don't
+     report any failures in this attempt.  The stdio-safer module and
+     friends are essential, so that if stdin was closed, this lseek is
+     not on some other file that we have since opened.  Mingw has bugs
+     when using fseek on text files, so we only strive for POSIX
+     behavior when we detect a UNIX environment.  */
+#if UNIX
+  if (lseek (STDIN_FILENO, 0, SEEK_CUR) >= 0
+      && fflush (stdin) == 0)
+    {
+      fseek (stdin, 0, SEEK_CUR);
+    }
+#endif /* UNIX */
 }
 
 /*-------------------------------------------------------------------------.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]