[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: open-process and related functions for MinGW Guile
From: |
Eli Zaretskii |
Subject: |
Re: open-process and related functions for MinGW Guile |
Date: |
Tue, 12 Aug 2014 22:44:03 +0300 |
> From: Mark H Weaver <address@hidden>
> Cc: address@hidden, address@hidden
> Date: Tue, 12 Aug 2014 14:08:48 -0400
>
> > +/* The funny conditional avoids a compiler warning in status:stop_sig. */
> > +# define WIFSTOPPED(stat_val) ((stat_val) == (stat_val) ? 0 : 0)
>
> What is the warning?
That stat_val is unused (because the value is always zero).
> Would ((stat_val), 0) also fix it?
I'm not sure, but I will use it if it will.
> > +# define getuid() (500) /* Local Administrator */
> > +# define getgid() (513) /* None */
> > +# define setuid(u) (0)
> > +# define setgid(g) (0)
>
> As I've said before, I'm not okay with making 'setuid', 'seteuid',
> 'setgid', or 'setegid' into no-ops. They should at least raise an error
> when called on Windows, and maybe they should be undefined.
I'll respond to this in a separate thread.
> > + /* If they ask for the Unix system shell, try to find it on PATH. */
> > + if (c_strcasecmp (program, "/bin/sh") == 0)
> > + {
> > + bin_sh_requested = 1;
> > + program = "sh.exe";
> > + }
>
> Hmm. I'm not so comfortable with such cleverness at this low level.
You mean, that we look for 'sh' not in '/bin', but along PATH? What
else is reasonable? there's no standard '/bin' directory on Windows
systems, certainly not on every drive. Without something like that,
we will never be able to support portably shell commands that require
a Unixy shell, because on Unix you _must_ use "/bin/sh" verbatim.
This is very important for running the test suite, since there are
quite a few commands there that explicitly invoke /bin/sh and require
Bourne shell functionality.
> I'd prefer to have it higher in the stack, perhaps in the Scheme code in
> (ice-9 popen) or maybe even in the application code.
I don't think this is practical: too many places to change. And you
will have to keep an eye on it from now to eternity. It's a losing
battle.
> > + for (i = 0; extensions[i]; i++)
> > + {
> > + abs_namelen = SearchPath (path, program, extensions[i],
> > + MAX_PATH, abs_name, NULL);
> > + if (0 < abs_namelen && abs_namelen <= MAX_PATH) /* found! */
> > + break;
> > + }
>
> This search is not done correctly. If I ask to run program "foo", and
> "foo.cmd" comes early in the path and "foo.exe" comes late in the path,
> it should be "foo.cmd" that gets run, not "foo.exe".
No, the search order is exactly the one used by the Windows shell. A
.exe program is invoked in preference to a .cmd batch file. The only
deviation is that I've put .com last, whereas it is actually the
first. But I believe a .com file nowadays is much more likely to be a
VMS command file than a Windows executable.
> > + /* If they asked for /bin/sh and we didn't find it, fall back on the
> > + default Windows shell. */
> > + if (abs_namelen <= 0 && bin_sh_requested)
> > + {
> > + const char *shell = getenv ("ComSpec");
> > +
> > + if (!shell)
> > + shell = "C:\\Windows\\system32\\cmd.exe";
> > +
> > + *bin_sh_replaced = 1;
> > + strcpy (abs_name, shell);
> > + abs_namelen = strlen (abs_name);
> > + }
>
> Again, I'd prefer to put this at a higher level in the stack.
And I again think that this is impractical to place that burden on
higher levels.
> If the user specifically asks to run "/bin/sh", we should not
> silently substitute a different incompatible shell.
If the command is incompatible with cmd, it will fail anyway. But if
it is a simple command, it is likely to work with cmd as well, so this
is better than always failing. People who work on Unix will use
/bin/sh without much thought (what else can they do when they need a
shell?), so if we see it, it doesn't necessarily mean they must have a
Bourne shell.
> > +/* Concatenate command-line arguments in argv[] into a single
> > + command-line string, while quoting arguments as needed. The result
> > + is malloc'ed. */
> > +static char *
> > +prepare_cmdline (const char *cmd, const char * const *argv, int
> > bin_sh_replaced)
> > +{
> > + /* These characters should include anything that is special to _any_
> > + program, including both Windows and Unixy shells, and the
> > + widlcard expansion in startup code of a typical Windows app. */
> > + const char need_quotes[] = " \t#;\"\'*?[]&|<>(){}$`^";
>
> Hmm. This seems unlikely to be correct in the general case. If the
> program being run is not a command shell, then I wouldn't expect it to
> process backslash escapes at all.
That's what happens on Unix. But not on Windows, where the quotes are
handled by the startup code that processes the command-line string
into the argv[] array. With a few exceptions (cmd.exe being one of
them), every Windows program processes quotes and wildcards the same.
> Is there any way to launch a standard C program on Windows where the
> individual 'argv' arguments are passed as separate strings, without
> having to combine them together with quoting and escapes?
No. The low-level API to launch a program accepts a single string as
the command line. It is processed into argv[] by the C library
startup code.
> If a standard C program is compiled using Mingw, precisely how is this
> combined string converted into that program's 'argv' argument?
By the startup code that is part of the C runtime.
> How can we ensure that those strings are passed reliably and
> losslessly, without relying on heuristics that only work some of the
> time?
This is not heuristics, this is the official documented way to quote
and escape quotes on the command line. Working by those rules will
not cause any losses.
> > + /* Are we constructing a command line for cmd.exe? */
> > + if (bin_sh_replaced)
> > + cmd_exe_quoting = 1;
> > + else
> > + {
> > + for (p = cmd + strlen (cmd);
> > + p > cmd && p[-1] != '/' && p[-1] != '\\' && p[-1] != ':';
> > + p--)
> > + ;
> > + if (c_strcasecmp (p, "cmd.exe") == 0
> > + || c_strcasecmp (p, "cmd") == 0)
> > + cmd_exe_quoting = 1;
> > + }
>
> If there's really no way to avoid quoting, then probably these
> heuristics should be moved into 'lookup_cmd', and 'lookup_cmd' should
> return the 'cmd_exe_quoting' flag instead of the 'bin_sh_replaced' flag,
> so that it will do the right thing if the higher levels ask for
> 'cmd.exe' directly.
The design is that lookup_cmd searches for the command, and the
quoting is done elsewhere. I don't understand why it matters how
these tasks are subdivided between functions, but if it does, I submit
that the way I subdivided them is more modular, as each function does
a single well-defined job.
> > + /* cmd.exe needs a different style of quoting: all the arguments
> > + beyond the /c switch are enclosed in an extra pair of quotes,
> > + and not otherwise quoted/escaped. */
>
> Again, I think this should be handled at a higher level in the stack.
> It would be good to have primitives at the C level that don't include
> such unreliable heuristics.
Sorry, I don't understand: what heuristics? The way cmd.exe handles
quotes is well documented, and the code implements those rules.
There's no heuristics here, anywhere.
> > + /* Gnulib's 'pipe' opens the pipe in binary mode, but we don't
> > + want to read text-mode input of subprocesses in binary more,
> > + because then we will get the ^M (a.k.a. "CR") characters we
> > + don't expect. */
> > + _setmode (c2p[0], _O_TEXT);
> > + }
>
> We should do the newline conversion elsewhere in Guile's I/O stack, so
> that it is possible to do binary I/O with subprocesses.
That's fine by me, but the default should still be text I/O. Once
there is a way to propagate the text/binary mode to this level, a
condition can be added not to do the above. IOW, this is something
for future changes; for now, the vast majority of pipes need text-mode
I/O, so leaving this at binary will break much more code than using
text I/O.
> > + /* Posix requires to call the shell if execvp fails to invoke EXEC_FILE.
> > */
> > + if (errno_save == ENOEXEC || errno_save == ENOENT)
> > + {
> > + const char *shell = getenv ("ComSpec");
> > +
> > + if (!shell)
> > + shell = "cmd.exe";
> > +
> > + if (c_strcasecmp (exec_file, shell) != 0)
> > + {
> > + argv[0] = (char *)exec_file;
> > + return start_child (shell, argv, reading, c2p, writing, p2c, errfd);
> > + }
> > + }
>
> I think this fallback shouldn't be done. We don't do it in the POSIX
> version of scm_open_process, and we shouldn't do it here either.
AFAIK, on Posix platforms this is done by the library or the kernel
implementation of execvp.
> What exactly does POSIX require? Can you provide a reference?
See http://pubs.opengroup.org/onlinepubs/009695399/functions/exec.html.
Quote:
In the cases where the other members of the exec family of functions
would fail and set errno to [ENOEXEC], the execlp() and execvp()
functions shall execute a command interpreter and the environment of
the executed command shall be as if the process invoked the sh
utility using execl() as follows:
execl(<shell path>, arg0, file, arg1, ..., (char *)0);
where <shell path> is an unspecified pathname for the sh utility,
file is the process image file, and for execvp(), where arg0, arg1,
and so on correspond to the values passed to execvp() in argv[0],
argv[1], and so on.