guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: open-process and related functions for MinGW Guile


From: Eli Zaretskii
Subject: Re: open-process and related functions for MinGW Guile
Date: Tue, 12 Aug 2014 22:44:03 +0300

> From: Mark H Weaver <address@hidden>
> Cc: address@hidden,  address@hidden
> Date: Tue, 12 Aug 2014 14:08:48 -0400
> 
> > +/* The funny conditional avoids a compiler warning in status:stop_sig.  */
> > +# define WIFSTOPPED(stat_val)  ((stat_val) == (stat_val) ? 0 : 0)
> 
> What is the warning?

That stat_val is unused (because the value is always zero).

> Would ((stat_val), 0) also fix it?

I'm not sure, but I will use it if it will.

> > +# define getuid()              (500) /* Local Administrator */
> > +# define getgid()              (513) /* None */
> > +# define setuid(u)             (0)
> > +# define setgid(g)             (0)
> 
> As I've said before, I'm not okay with making 'setuid', 'seteuid',
> 'setgid', or 'setegid' into no-ops.  They should at least raise an error
> when called on Windows, and maybe they should be undefined.

I'll respond to this in a separate thread.

> > +  /* If they ask for the Unix system shell, try to find it on PATH.  */
> > +  if (c_strcasecmp (program, "/bin/sh") == 0)
> > +    {
> > +      bin_sh_requested = 1;
> > +      program = "sh.exe";
> > +    }
> 
> Hmm.  I'm not so comfortable with such cleverness at this low level.

You mean, that we look for 'sh' not in '/bin', but along PATH?  What
else is reasonable? there's no standard '/bin' directory on Windows
systems, certainly not on every drive.  Without something like that,
we will never be able to support portably shell commands that require
a Unixy shell, because on Unix you _must_ use "/bin/sh" verbatim.

This is very important for running the test suite, since there are
quite a few commands there that explicitly invoke /bin/sh and require
Bourne shell functionality.

> I'd prefer to have it higher in the stack, perhaps in the Scheme code in
> (ice-9 popen) or maybe even in the application code.

I don't think this is practical: too many places to change.  And you
will have to keep an eye on it from now to eternity.  It's a losing
battle.

> > +  for (i = 0; extensions[i]; i++)
> > +    {
> > +      abs_namelen = SearchPath (path, program, extensions[i],
> > +                           MAX_PATH, abs_name, NULL);
> > +      if (0 < abs_namelen && abs_namelen <= MAX_PATH)      /* found! */
> > +   break;
> > +    }
> 
> This search is not done correctly.  If I ask to run program "foo", and
> "foo.cmd" comes early in the path and "foo.exe" comes late in the path,
> it should be "foo.cmd" that gets run, not "foo.exe".

No, the search order is exactly the one used by the Windows shell.  A
.exe program is invoked in preference to a .cmd batch file.  The only
deviation is that I've put .com last, whereas it is actually the
first.  But I believe a .com file nowadays is much more likely to be a
VMS command file than a Windows executable.

> > +  /* If they asked for /bin/sh and we didn't find it, fall back on the
> > +     default Windows shell.  */
> > +  if (abs_namelen <= 0 && bin_sh_requested)
> > +    {
> > +      const char *shell = getenv ("ComSpec");
> > +
> > +      if (!shell)
> > +   shell = "C:\\Windows\\system32\\cmd.exe";
> > +
> > +      *bin_sh_replaced = 1;
> > +      strcpy (abs_name, shell);
> > +      abs_namelen = strlen (abs_name);
> > +    }
> 
> Again, I'd prefer to put this at a higher level in the stack.

And I again think that this is impractical to place that burden on
higher levels.

> If the user specifically asks to run "/bin/sh", we should not
> silently substitute a different incompatible shell.

If the command is incompatible with cmd, it will fail anyway.  But if
it is a simple command, it is likely to work with cmd as well, so this
is better than always failing.  People who work on Unix will use
/bin/sh without much thought (what else can they do when they need a
shell?), so if we see it, it doesn't necessarily mean they must have a
Bourne shell.

> > +/* Concatenate command-line arguments in argv[] into a single
> > +   command-line string, while quoting arguments as needed.  The result
> > +   is malloc'ed.  */
> > +static char *
> > +prepare_cmdline (const char *cmd, const char * const *argv, int 
> > bin_sh_replaced)
> > +{
> > +  /* These characters should include anything that is special to _any_
> > +     program, including both Windows and Unixy shells, and the
> > +     widlcard expansion in startup code of a typical Windows app.  */
> > +  const char need_quotes[] = " \t#;\"\'*?[]&|<>(){}$`^";
> 
> Hmm.  This seems unlikely to be correct in the general case.  If the
> program being run is not a command shell, then I wouldn't expect it to
> process backslash escapes at all.

That's what happens on Unix.  But not on Windows, where the quotes are
handled by the startup code that processes the command-line string
into the argv[] array.  With a few exceptions (cmd.exe being one of
them), every Windows program processes quotes and wildcards the same.

> Is there any way to launch a standard C program on Windows where the
> individual 'argv' arguments are passed as separate strings, without
> having to combine them together with quoting and escapes?

No.  The low-level API to launch a program accepts a single string as
the command line.  It is processed into argv[] by the C library
startup code.

> If a standard C program is compiled using Mingw, precisely how is this
> combined string converted into that program's 'argv' argument?

By the startup code that is part of the C runtime.

> How can we ensure that those strings are passed reliably and
> losslessly, without relying on heuristics that only work some of the
> time?

This is not heuristics, this is the official documented way to quote
and escape quotes on the command line.  Working by those rules will
not cause any losses.

> > +  /* Are we constructing a command line for cmd.exe?  */
> > +  if (bin_sh_replaced)
> > +    cmd_exe_quoting = 1;
> > +  else
> > +    {
> > +      for (p = cmd + strlen (cmd);
> > +      p > cmd && p[-1] != '/' && p[-1] != '\\' && p[-1] != ':';
> > +      p--)
> > +   ;
> > +      if (c_strcasecmp (p, "cmd.exe") == 0
> > +     || c_strcasecmp (p, "cmd") == 0)
> > +   cmd_exe_quoting = 1;
> > +    }
> 
> If there's really no way to avoid quoting, then probably these
> heuristics should be moved into 'lookup_cmd', and 'lookup_cmd' should
> return the 'cmd_exe_quoting' flag instead of the 'bin_sh_replaced' flag,
> so that it will do the right thing if the higher levels ask for
> 'cmd.exe' directly.

The design is that lookup_cmd searches for the command, and the
quoting is done elsewhere.  I don't understand why it matters how
these tasks are subdivided between functions, but if it does, I submit
that the way I subdivided them is more modular, as each function does
a single well-defined job.

> > +      /* cmd.exe needs a different style of quoting: all the arguments
> > +    beyond the /c switch are enclosed in an extra pair of quotes,
> > +    and not otherwise quoted/escaped. */
> 
> Again, I think this should be handled at a higher level in the stack.
> It would be good to have primitives at the C level that don't include
> such unreliable heuristics.

Sorry, I don't understand: what heuristics?  The way cmd.exe handles
quotes is well documented, and the code implements those rules.
There's no heuristics here, anywhere.

> > +      /* Gnulib's 'pipe' opens the pipe in binary mode, but we don't
> > +    want to read text-mode input of subprocesses in binary more,
> > +    because then we will get the ^M (a.k.a. "CR") characters we
> > +    don't expect.  */
> > +      _setmode (c2p[0], _O_TEXT);
> > +    }
> 
> We should do the newline conversion elsewhere in Guile's I/O stack, so
> that it is possible to do binary I/O with subprocesses.

That's fine by me, but the default should still be text I/O.  Once
there is a way to propagate the text/binary mode to this level, a
condition can be added not to do the above.  IOW, this is something
for future changes; for now, the vast majority of pipes need text-mode
I/O, so leaving this at binary will break much more code than using
text I/O.

> > +  /* Posix requires to call the shell if execvp fails to invoke EXEC_FILE. 
> >  */
> > +  if (errno_save == ENOEXEC || errno_save == ENOENT)
> > +    {
> > +      const char *shell = getenv ("ComSpec");
> > +
> > +      if (!shell)
> > +   shell = "cmd.exe";
> > +
> > +      if (c_strcasecmp (exec_file, shell) != 0)
> > +   {
> > +     argv[0] = (char *)exec_file;
> > +     return start_child (shell, argv, reading, c2p, writing, p2c, errfd);
> > +   }
> > +    }
> 
> I think this fallback shouldn't be done.  We don't do it in the POSIX
> version of scm_open_process, and we shouldn't do it here either.

AFAIK, on Posix platforms this is done by the library or the kernel
implementation of execvp.

> What exactly does POSIX require?  Can you provide a reference?

See http://pubs.opengroup.org/onlinepubs/009695399/functions/exec.html.

Quote:

 In the cases where the other members of the exec family of functions
 would fail and set errno to [ENOEXEC], the execlp() and execvp()
 functions shall execute a command interpreter and the environment of
 the executed command shall be as if the process invoked the sh
 utility using execl() as follows:

 execl(<shell path>, arg0, file, arg1, ..., (char *)0);

 where <shell path> is an unspecified pathname for the sh utility,
 file is the process image file, and for execvp(), where arg0, arg1,
 and so on correspond to the values passed to execvp() in argv[0],
 argv[1], and so on.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]