m4-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

branch-1_4 read stdin twice


From: Eric Blake
Subject: branch-1_4 read stdin twice
Date: Thu, 7 Sep 2006 22:46:51 +0000 (UTC)
User-agent: Loom/3.14 (http://gmane.org/)

Until this patch, we could not do "m4 - file -" and get similar behavior to 
coreutils' "cat - file -".  This also has ramifications when doing:
$ echo 'changequote([,])m4wrap([syscmd(cat)])'|m4
since cat's behavior differs depending on whether stdin is open or closed.  I'm 
adding a test accordingly.

I think we still have a bug that this patch does not address.  For example, on 
Solaris, where fread() buffers 8k of data:
$ cat > file
define(foo,bar)
syscmd(`cat')
divert(-1)
... <4500 dots> ...
divert
foo
$ m4 < file



bar
$ sed -e '/^\./s/./../g' < file > file1 # grow to 9000 dots
$ m4 < file1

.....
divert
foo
$

Oops - we correctly passed our own stdin to cat (as documented in our manual), 
but did not first set our file position, even though file is seekable.  So cat 
was starting from where our buffer ended, as opposed to the character we left 
off processing at, and the buffer size determined whether m4 or cat processed 
the final foo in the file.  While POSIX requires that processes that terminate 
without consuming all of stdin restore the file position, it is silent on what 
the file position is when invoking a child process; but for consistency, I 
would rather have [e]syscmd start with stdin in the same place as we would next 
process.  I may find a good patch for that later.

I also noticed that my patch for debian bug 385720 introduced a regression - 
when stdin was a terminal, m4 was calling getc() a second time after detecting 
EOF, making the user have to type two ^D sequences instead of one.  But I don't 
know how to add a test for this in the 1.4.x testsuite; maybe I'll come up with 
something when porting this patch to head.

2006-09-07  Eric Blake  <address@hidden>

        * m4/gnulib-cache.m4: Update to newer gnulib-tool.
        * src/m4.h (push_file): Change prototype.
        * src/input.c (push_file, peek_input, next_char_1): Only call getc
        once at EOF, to avoid double ^D on terminal stdin; regression from
        2006-09-04.
        (push_file, pop_file): Allow reading stdin twice.
        * src/m4.c (main): Likewise.
        * src/builtin.c (include): Update caller.
        * NEWS: Document this change.
        * doc/m4.texinfo (Invoking m4, Incompatibilities): Likewise.
        (Syscmd): Add a test that failed before this patch.

Index: NEWS
===================================================================
RCS file: /sources/m4/m4/NEWS,v
retrieving revision 1.1.1.1.2.59
diff -u -r1.1.1.1.2.59 NEWS
--- NEWS        6 Sep 2006 03:58:05 -0000       1.1.1.1.2.59
+++ NEWS        7 Sep 2006 22:29:15 -0000
@@ -7,7 +7,11 @@
 * Fix regression from 1.4.5 in handling a file that ends in a macro
   expansion without arguments instead of a newline.
 * The define and pushdef macros now warn when the first argument is not
-  a string.
+  a string, rather than silently doing nothing.
+* Standard input can now be read more than once, as in 'm4 - file -', and
+  is not closed until all wrapped text is handled.  This makes a
+  difference when stdin is not a regular file, and also fixes bugs when
+  using the syscmd or esyscmd macros from wrapped text.
 
 Version 1.4.6 - 25 August 2006, by Eric Blake  (CVS version 1.4.5a)
 
Index: doc/m4.texinfo
===================================================================
RCS file: /sources/m4/m4/doc/m4.texinfo,v
retrieving revision 1.1.1.1.2.73
diff -u -r1.1.1.1.2.73 m4.texinfo
--- doc/m4.texinfo      6 Sep 2006 03:58:05 -0000       1.1.1.1.2.73
+++ doc/m4.texinfo      7 Sep 2006 22:29:15 -0000
@@ -588,10 +588,12 @@
 name of @file{-} is taken to mean the standard input.  It is
 conventional, but not required, for input files to end in @samp{.m4}.
 
-The input files are read in the sequence given.  The standard input can
-only be read once, so the file name @file{-} should only appear once on
-the command line.  It is an error if an input file ends in the middle of
-argument collection, a comment, or a quoted string.
+The input files are read in the sequence given.  Standard input can be
+read more than once, so the file name @file{-} may appear multiple times
+on the command line; this makes a difference when input is from a
+terminal or other special file type.  It is an error if an input file
+ends in the middle of argument collection, a comment, or a quoted
+string.
 
 If none of the input files invoked @code{m4exit} (@pxref{M4exit}), the
 exit status of @code{m4} will be 0 for success, 1 for general failure
@@ -4109,6 +4111,32 @@
 Note how the expansion of @code{syscmd} keeps the trailing newline of
 the command, as well as using the newline that appeared after the macro.
 
+As an example of @var{shell-command} using the same standard input as
address@hidden, the command line @kbd{echo "m4wrap(\`syscmd(\`cat')')" | m4}
+will tell @code{m4} to read all of its input before executing the
+wrapped text, then hand a valid (albeit emptied) pipe as standard input
+for the @code{cat} subcommand.  Therefore, you should be careful when
+using standard input (either by specifying no files, or by passing
address@hidden as a file name on the command line, @pxref{Invoking m4}), and
+also invoking subcommands via @code{syscmd} or @code{esyscmd} that
+consume data from standard input.
+
address@hidden
address@hidden If the user types the example below with stdin being an
address@hidden interactive terminal, then cat will hang waiting for additional
address@hidden input after m4 has exited.  But the testsuite is using a pipe
address@hidden for stdin.  Hence, we have two versions - the one we feed the
address@hidden testsuite below, and the one we display to the user above that
address@hidden more accurately shows what the testsuite is really doing but
address@hidden which the testsuite cannot parse.
+
address@hidden
+m4wrap(`syscmd(`cat')')
address@hidden
+^D
address@hidden example
address@hidden ignore
+
 @node Esyscmd
 @section Reading the output of commands
 
@@ -4751,6 +4779,11 @@
 @code{m4exit} (@pxref{M4exit}) with a non-numeric argument).
 
 @item
+Some traditional implementations only allow reading standard input
+once, but @acronym{GNU} @code{m4} correctly handles multiple instances
+of @samp{-} on the command line.
+
address@hidden
 @acronym{POSIX} requires @code{m4wrap} (@pxref{M4wrap}) to act in FIFO
 (first-in, first-out) order, but @acronym{GNU} @code{m4} currently uses
 LIFO order.  Furthermore, @acronym{POSIX} states that only the first
Index: m4/gnulib-cache.m4
===================================================================
RCS file: /sources/m4/m4/m4/Attic/gnulib-cache.m4,v
retrieving revision 1.1.2.14
diff -u -r1.1.2.14 gnulib-cache.m4
--- m4/gnulib-cache.m4  22 Aug 2006 21:36:34 -0000      1.1.2.14
+++ m4/gnulib-cache.m4  7 Sep 2006 22:29:15 -0000
@@ -18,6 +18,7 @@
 #   gnulib-tool --import --dir=. --lib=libm4 --source-base=lib --m4-base=m4 --
doc-base=doc --aux-dir=. --no-libtool --macro-prefix=M4 binary-io cloexec close-
stream error fdl fopen-safer free gendocs getopt gnupload mkstemp obstack regex 
stdlib-safer strtol tmpfile-safer unlocked-io verror xalloc xvasprintf
 
 # Specification in the form of a few gnulib-tool.m4 macro invocations:
+gl_LOCAL_DIR([])
 gl_MODULES([binary-io cloexec close-stream error fdl fopen-safer free gendocs 
getopt gnupload mkstemp obstack regex stdlib-safer strtol tmpfile-safer 
unlocked-io verror xalloc xvasprintf])
 gl_AVOID([])
 gl_SOURCE_BASE([lib])
@@ -25,4 +26,5 @@
 gl_DOC_BASE([doc])
 gl_TESTS_BASE([tests])
 gl_LIB([libm4])
+gl_MAKEFILE_NAME([])
 gl_MACRO_PREFIX([M4])
Index: src/builtin.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/builtin.c,v
retrieving revision 1.1.1.1.2.38
diff -u -r1.1.1.1.2.38 builtin.c
--- src/builtin.c       6 Sep 2006 03:58:05 -0000       1.1.1.1.2.38
+++ src/builtin.c       7 Sep 2006 22:29:15 -0000
@@ -1175,7 +1175,7 @@
       return;
     }
 
-  push_file (fp, name);
+  push_file (fp, name, TRUE);
   free ((char *) name);
 }
 
Index: src/input.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/input.c,v
retrieving revision 1.1.1.1.2.22
diff -u -r1.1.1.1.2.22 input.c
--- src/input.c 4 Sep 2006 13:35:10 -0000       1.1.1.1.2.22
+++ src/input.c 7 Sep 2006 22:29:15 -0000
@@ -82,10 +82,11 @@
       struct
        {
          FILE *file;           /* input file handle */
+         boolean end;          /* true if peek has seen EOF */
+         boolean close;        /* true if we should close file on pop */
          const char *name;     /* name of PREVIOUS input file */
-         int lineno;           /* current line number for do */
-         /* Yet another attack of "The curse of global variables" (sic) */
-         int out_lineno;       /* current output line number do */
+         int lineno;           /* current line of previous file */
+         int out_lineno;       /* current output line of previous file */
          boolean advance_line; /* start_of_input_line from next_char () */
        }
       u_f;
@@ -156,14 +157,16 @@
 #endif
 
 
-/*-------------------------------------------------------------------------.
-| push_file () pushes an input file on the input stack, saving the current |
-| file name and line number.  If next is non-NULL, this push invalidates a |
-| call to push_string_init (), whose storage are consequentely released.   |
-`-------------------------------------------------------------------------*/
+/*-------------------------------------------------------------------.
+| push_file () pushes an input file on the input stack, saving the   |
+| current file name and line number.  If next is non-NULL, this push |
+| invalidates a call to push_string_init (), whose storage is        |
+| consequently released.  If CLOSE, then close FP after EOF is       |
+| detected.                                                          |
+`-------------------------------------------------------------------*/
 
 void
-push_file (FILE *fp, const char *title)
+push_file (FILE *fp, const char *title, boolean close)
 {
   input_block *i;
 
@@ -180,6 +183,8 @@
                                     sizeof (struct input_block));
   i->type = INPUT_FILE;
 
+  i->u.u_f.end = FALSE;
+  i->u.u_f.close = close;
   i->u.u_f.name = current_file;
   i->u.u_f.lineno = current_line;
   i->u.u_f.out_lineno = output_current_line;
@@ -325,7 +330,7 @@
          fclose (isp->u.u_f.file);
          retcode = EXIT_FAILURE;
        }
-      else if (fclose (isp->u.u_f.file) == EOF)
+      else if (isp->u.u_f.close && fclose (isp->u.u_f.file) == EOF)
        {
          M4ERROR ((warning_status, errno, "error reading file"));
          retcode = EXIT_FAILURE;
@@ -445,6 +450,7 @@
              ungetc (ch, isp->u.u_f.file);
              return ch;
            }
+         isp->u.u_f.end = TRUE;
          break;
 
        case INPUT_MACRO:
@@ -504,7 +510,10 @@
          break;
 
        case INPUT_FILE:
-         ch = getc (isp->u.u_f.file);
+         /* If stdin is a terminal, calling getc after peek_input
+            already called it would make the user have to hit ^D
+            twice to quit.  */
+         ch = isp->u.u_f.end ? EOF : getc (isp->u.u_f.file);
          if (ch != EOF)
            {
              if (ch == '\n')
Index: src/m4.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/m4.c,v
retrieving revision 1.1.1.1.2.27
diff -u -r1.1.1.1.2.27 m4.c
--- src/m4.c    8 Aug 2006 23:17:44 -0000       1.1.1.1.2.27
+++ src/m4.c    7 Sep 2006 22:29:15 -0000
@@ -96,7 +96,7 @@
   va_list args;
   va_start (args, format);
   verror_at_line (status, errnum, current_line ? current_file : NULL,
-                  current_line, format, args);
+                 current_line, format, args);
 }
 
 /*-------------------------------.
@@ -105,7 +105,7 @@
 
 void
 m4_error_at_line (int status, int errnum, const char *file, int line,
-                  const char *format, ...)
+                 const char *format, ...)
 {
   va_list args;
   va_start (args, format);
@@ -279,6 +279,7 @@
 
   macro_definition *defines;
   FILE *fp;
+  boolean read_stdin = FALSE;
 
   program_name = argv[0];
   retcode = EXIT_SUCCESS;
@@ -488,14 +489,24 @@
 
   if (optind == argc)
     {
-      push_file (stdin, "stdin");
+      /* No point closing stdin until after wrapped text is
+        processed.  */
+      push_file (stdin, "stdin", FALSE);
+      read_stdin = TRUE;
       expand_input ();
     }
   else
     for (; optind < argc; optind++)
       {
        if (strcmp (argv[optind], "-") == 0)
-         push_file (stdin, "stdin");
+         {
+           /* If stdin is a terminal, we want to allow 'm4 - file -'
+              to read input from stdin twice, like GNU cat.  Besides,
+              there is no point closing stdin before wrapped text, to
+              minimize bugs in syscmd called from wrapped text.  */
+           push_file (stdin, "stdin", FALSE);
+           read_stdin = TRUE;
+         }
        else
          {
            const char *name;
@@ -508,7 +519,7 @@
                retcode = EXIT_FAILURE;
                continue;
              }
-           push_file (fp, name);
+           push_file (fp, name, TRUE);
            free ((char *) name);
          }
        expand_input ();
@@ -520,9 +531,15 @@
   while (pop_wrapup ())
     expand_input ();
 
-  /* Change debug stream back to stderr, to force flushing debug stream and
-     detect any errors it might have encountered.  */
+  /* Change debug stream back to stderr, to force flushing the debug
+     stream and detect any errors it might have encountered.  Close
+     stdin if we read from it, to detect any errors.  */
   debug_set_output (NULL);
+  if (read_stdin && fclose (stdin) == EOF)
+    {
+      M4ERROR ((warning_status, errno, "error reading file"));
+      retcode = EXIT_FAILURE;
+    }
 
   if (frozen_file_to_write)
     produce_frozen_state (frozen_file_to_write);
Index: src/m4.h
===================================================================
RCS file: /sources/m4/m4/src/m4.h,v
retrieving revision 1.1.1.1.2.28
diff -u -r1.1.1.1.2.28 m4.h
--- src/m4.h    18 Aug 2006 23:11:36 -0000      1.1.1.1.2.28
+++ src/m4.h    7 Sep 2006 22:29:15 -0000
@@ -288,7 +288,7 @@
 void skip_line (void);
 
 /* push back input */
-void push_file (FILE *, const char *);
+void push_file (FILE *, const char *, boolean);
 void push_macro (builtin_func *);
 struct obstack *push_string_init (void);
 const char *push_string_finish (void);







reply via email to

[Prev in Thread] Current Thread [Next in Thread]