groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] Re: MSVC Port--Quoting Arguments for spawn and exec Function


From: Keith Marshall
Subject: Re: [Groff] Re: MSVC Port--Quoting Arguments for spawn and exec Functions
Date: Sat, 17 Jan 2004 22:50:24 +0000

On Saturday 17 January 2004 8:42 am, Jeff Conrad wrote:
> Keith,
>
> > Is 'lp' implemented as a shell script in MKS?
>
> Actually, it's not implemented at all :-( ... my 'lp' script is a wrapper
> for Peter Lerup's PrintFile.

Ah.  I actually did something similar for MS-DOS, some years ago, but I 
actually implemented 'lp' as a short C program, which then spawned a .bat 
script to do the actual printing;  that way I was able to better control the 
environment space, to avoid 'Out of environment space' errors.

> > I fully agree that it is not a good idea to rely on undocumented M$
> > behaviour; who knows when they may decide that it is a bug, and fix it,
> > (without telling anyone, of course)!
>
> I'm shocked that you even could entertain such a thought ...

Well ... it was written rather with tongue in cheek, but I have seen 
instances in the past where M$ have dropped a partially documented feature 
without notice, (e.g. the get/set switchar service in MS-DOS v2.xx and 3.xx, 
which disappeared in v4.00; this was documented to some degree in some OEM 
versions of MS-DOS, but not in M$'s own documentation, or in IBM's PC-DOS).

> > I found a (rather long winded) description of how MSVC applications
> > parse their command lines at
>
> Thanks! This link includes a link to MSDN
> (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vclang/ht
>ml/_pluslang_Parsing_C.2b2b_.Command.2d.Line_Arguments.as p)
> that DOES document the behavior, so I think we're fairly safe.  The
> explanation even is in my MSVC documentation ...  RTFM ...  I had assumed
> that the parsing was done by CreateProcess(), and searched accordingly.
> Like troff, I forgot the proper hyphenation of assume ...
>
> > Using 'append_arg_quoted' looks like a reasonable solution, for the
> > "native" Windows builds, where we need to group arguments in the 'argv[]'
> > which will eventually get rebuilt by the MSVC startup code,
>
> The 'argv[]' is a bit of an illusion--it gets rebuilt into a single string
> even before it's passed to CreateProcess() (which doesn't accept an argv
> array).  At startup, the command string gets parsed again, and depending on
> embedded quotes, rebuilt(?) into something resembling the original 'argv[]'
> before it's passed to the child program ...

The argv[] I was referring to here was that rebuilt by the startup code in 
the MSVC runtime, which then gets passed to the 'main' function in the child 
process. As you correctly state, the OS just passes in a single string, but 
the runtime does parse this and build an argv[], before 'main' is called -- I 
*know* this to be the case, because I studied the gory details with CodeView 
many years ago, when I was trying to understand why single quotes were not 
handled properly.  (This was in the days before Windows came on the scene, 
but I don't think it has changed much to this day).

> > ... however, it needs careful consideration as to *when* to use it --
>
> I agree completely--I'd use it only within #ifdefs (perhaps you can help
> ensure that we have the right conditions), or, as Werner seems to prefer,
> implement APPEND_ARG as a macro with the appropriate definition for a given
> platform--the code usually looks cleaner that way.
>
> > We would also need to pay particular attention to *how* we use
> > 'append_arg_quoted' to group args -- basically *all* args which need to
> > be grouped need to be passed at once, in the same call
>
> Even within #ifdefs, I'd use the quoting very sparingly--for arguments that
> are likely to contain spaces AND for which those spaces are likely to cause
> problems.  As nearly as I can tell, the only situations that apply are
> filenames with spaces and spooler commands that may contain arguments.
>
> > ... -- perhaps we should consider using *two* functions, e.g.
> >
> >     void possible_command::append_arg_quoted_begin(const char *s, const
> > char *t)
> >     {
> >       args += '"';
> >       args += s;
> >       if (t)
> >         args += t;
> >       args += '\0';
> >     }
>
> The trailing NUL appended by append_arg_quoted actually serves as the
> argument separator, so we'd need to avoid it within an argument.  We could
> append a space, or perhaps better, append nothing and make the caller
> responsible for including the necessary space.
>
> Actually, I think we could handle this by building a string and then
> handing it to append_arg_quoted, as in
>
>     commands[SPOOL_INDEX].set_name(BSHELL);
>     commands[SPOOL_INDEX].append_arg(BSHELL_DASH_C);
>     Largs += '\0';
>     Largs = spooler + Largs;
>     commands[SPOOL_INDEX].APPEND_ARG(Largs.contents());
>
> It looks a bit weird to have both the actual function call and the macro,
> but I have no better ideas, though I suppose a QUOTED argument
> would be an alternative.
>
> I tend to prefer a single function, if possible, because doing so should
> require the least change to the program, making us less vulnerable to the
> law of unintended consequences.  If problems do arise because of the
> change, at least they should be easy to isolate.  Moreover, a single
> function would make it much easier to implement as a macro.  What do you
> think?
>
> I think we're close to a solution.  The trickiest quoting involves the 'X'
> option, because some arguments are handled several times.  I think I have
> it right, but unfortunately, I'm not running X, so I have no way of testing
> it.

Actually, having given this some more thought, perhaps the cleanest and 
simplest solution may be to wrap *every* argument in double quotes -- 
whatever is currently passed to 'append_arg' *must* be already correctly 
grouped for a UNIX style 'exec'; Windows 'exec' or 'spawn' will simply 
concatenate all the args into a single string, with just a space between the 
individual arguments, but if we add double quotes in the right places, we can 
coerce the startup code in the child to rebuild argv[] just as we passed it 
to 'spawn'.  If we do it properly, we shouldn't have to worry about embedded 
*spaces* -- what we *do* need to pay attention to are embedded double quotes 
and backslashes!  Something like the following should work (I think) ...

#if defined(_MSC_VER) || defined(__MINGW32__)
void possible_command::append_arg_quoted(const char *s, const char *t)
{
  // escape embedded double quote characters in appended argument
  // in a fashion which is compatible with the parsing algorithm
  // employed by the MSVC runtime startup code
  //
  int backslashes = 0;
  while (*s)
  { if (*s == '\\')
      // just count backslashes when we find them
      // (they will be appended on a subsequent cycle)
      ++backslashes;
    else if (*s == '"')
    { // an embedded quote needs at least one backslash prepended
      // plus twice the number of any which immediately precede it
      backslashes += backslashes + 1;
      while (backslashes--)
        args += '\\';
      args += '"';
    }
    else
    { // any other character is simply appended
      // AFTER any immediately preceding backslashes
      if (backslashes)
        while (backslashes--)
          args += '\\';
      args += *s;
    }
    ++s;
  }
  // any backslashes counted at the end of the string
  // must now be appended, with the count doubled if this is
  // the end of the argument, or trailer begins with a double quote
  if ( ! (t && *t) || (*t == '"'))
    backslashes += backslashes;
  while (backslashes--)
    args += '\\';
}
#endif

void possible_command::append_arg(const char *s, const char *t)
{
#if defined(_MSC_VER) || defined(__MINGW32__)
  //
  // for native Windows builds
  // wrap the entire argument in double quotes
  // (escaping any embedded double quote characters)
  //
  args += '"';
  append_arg_quoted(s, t);
  if (t)
    append_arg_quoted(t, (const char *) 0);
  args += '"';
#else
  //
  // for POSIX compliant builds
  // simply append the specified argument "as is"
  //
  args += s;
  if (t)
    args += t;
#endif
  args += '\0';
}

(There may be other #if conditions to be considered, for other compilers, but 
the above should be ok for MSVC, MinGW and Cygwin.  Also, we may need to 
consider doing something similar in 'insert_arg').

Best regards,
Keith.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]