coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: env: add -S option (split string for shebang lines in scripts)


From: Assaf Gordon
Subject: Re: env: add -S option (split string for shebang lines in scripts)
Date: Fri, 27 Apr 2018 16:58:36 -0600
User-agent: NeoMutt/20170113 (1.7.2)

Hello Eric, Bernhard,

Thank you for commenting, you raise many good point.
Below are some ideas regarding them (combining replies to the last 4 emails).

On Fri, Apr 27, 2018 at 12:31 AM, Bernhard Voelker <address@hidden> wrote:
> One nit: env -v shows a confusing error diagnostic when it is
> separated from the -S option on the shebang line:
>
>   $ cat xxx
>   #!src/env -v -S cat -n
>   hello
>
>   $ ./xxx
>   src/env: invalid option -- ' '
>   Try 'src/env --help' for more information.

Agree, very confusing and unhelpful message.

For comparison, FreeBSD behaves the same:

  $ cat xxx
  #!/usr/bin/env -v -S cat -n
  hello
  $ ./xxx
  env: illegal option --
  usage: env [-iv] [-P utilpath] [-S string] [-u name]
             [name=value ...] [utility [argument ...]]


But of course we can and should do better.


On Fri, Apr 27, 2018 at 7:13 AM, Eric Blake <address@hidden> wrote:
> We could include ' ' (and maybe '\t') as part of the short-option
> optstring accepted in getopt_long(), as an undocumented silent no-op.

On Fri, Apr 27, 2018 at 8:22 AM, Eric Blake <address@hidden> wrote:
> I tested it, and it DOES seem to work:
[...]
>        switch (optc)
>          {
> +        case ' ':
> +        case '\t':
> +        case '-':
> +          /* Undocumented no-ops, to allow '-v -S' to behave like '-vS' */
> +          break;

Good solution, thanks for testing it.


I wonder - would it be better to detect this issue
and report an informative error message instead of silently accepting it?

If GNU env accepts it, we create yet another (very subtle) difference
between FreeBSD and GNU.
If we reject it and explain why, we create a better user experience,
but also promote portable scripting...



On Fri, Apr 27, 2018 at 12:57 PM, Eric Blake <address@hidden> wrote:
> This is missing support for -P, which is one of the essential features
> of FreeBSD env, per their man page:

I can certainly add support "-P" (I'll do it in a separate patch though).

Is "-P" (alternate path) something that is often requested?
I do see a lot of questions about passing multiple arguments with 
"#!/usr/bin/env",
but I haven't noticed people asking about setting per-script non-standard $PATH
(but without changing the actual $PATH).


On Fri, Apr 27, 2018 at 02:42:13PM -0500, Eric Blake wrote:
> Question - do we want to use 'S::' instead of 'S:' in the optstring?
> Right now, your patches made -S take a mandatory argument, making:
> 
> #!/usr/bin/env -S
> 
> try to parse the script name as a string to be split (and unless the
> script name has unusual characters, this results in an infloop of
> treating the script name as the interpreter, for another round of trying
> to exec the same command line).

For reference, the same behaviour (infloop) happens on FreeBSD:

  $ cat yyy
  #!/usr/bin/env -vS

  $ ./yyy 2>&1 | head
  #env executing: ./yyy
  #env    arg[0]= './yyy'
  #env executing: ./yyy
  #env    arg[0]= './yyy'
  #env executing: ./yyy
  #env    arg[0]= './yyy'
  #env executing: ./yyy
  #env    arg[0]= './yyy'
  #env executing: ./yyy
  #env    arg[0]= './yyy'

And indeed if the file name contains recognizable escape sequences,
FreeBSD's env also processes them ('\_' is escape sequence for space):

  $ cat uname\\_-l
  #!/usr/bin/env -vS

  $ ./uname\\_-l 
  #env executing: ./uname
  #env    arg[0]= './uname'
  #env    arg[1]= '-l'
  env: ./uname: No such file or directory


Several more of these kind of edge cases can probably be found.
I wonder if FreeBSD has been living with them since 2005,
perhaps there's no need to over-engineer protection against them?

Otherwise, Perhaps we can detect it and provide an informative error message?


> I'm also trying to think what happens if we want to support platforms
> where the OS splits strings passed to shebang.  (The BSD implementation
> didn't have to worry quite as much about their code being run on a
> different OS, like we do).  Consider:
> 
> #!/usr/bin/env -S interpreter 'arg with space'
> 
> where it already sees "-S", "interpreter", "'arg", "with", "space'",
> "script", "args..." as separate arguments.  If we use 'S:' to
> getopt_long, then "interpreter" will be subject to -S handling but
> nothing else will; if we use 'S::', then none of the subsequent
> arguments will be subject to -S handling (but then we have to revisit
> whether a NULL optarg would be treated as an error on a shebang line
> that ends in -S).  But either way, it would be nice if we could
> reconstruct "arg with space" as a single argument to hand to
> "interpreter", rather than three separate arguments where two of them
> include a lone "'".

Currently, I only found one operating system (MINIX3) which splits
shebang lines by spaces, and it ignores quotes (so quoting a space does 
nothing).
Linux, BSDs, HURD, Cygwin - pass everything as one parameter including spaces.
AIX, Solaris - pass only the second argument, ignores all other.
I haven't tested HP-UX (if someone with HP-UX access can test - will be much 
appreciated).

Technically:

  $ cat showargs.c
  #include <stdio.h>
  #include <stdlib.h>
  int main(int argc, char*argv[])
  {
    for (int i=0;i<argc;++i)
      printf("argv[%d] = '%s'\n",i,argv[i]);
    return 0;
  }

  $ cat 1
  #!showargs printf xx%%sxx\n a b c

On Linux, *BSDs, HURD:

  $ ./1
  argv[0] = '/home/miles/showargs'
  argv[1] = 'printf xx%sxx\n a b c'
  argv[2] = './1'

On AIX, Solaris:

  $ ./1
  argv[0] = '/home/agn/showargs'
  argv[1] = 'printf'
  argv[2] = './1'

On Minix3:

  $ ./1
  argv[0] = '/home/miles/showargs'
  argv[1] = 'printf'
  argv[2] = 'xx%sxx\n'
  argv[3] = 'a'
  argv[4] = 'b'
  argv[5] = 'c'
  argv[6] = './1'

And then on MINIX3 with quotes:

  $ cat 2
  #!/home/miles/showargs printf xx%sxx\n 'a b c'
  $ ./2
  argv[0] = '/home/miles/showargs'
  argv[1] = 'printf'
  argv[2] = 'xx%sxx\n'
  argv[3] = ''a'
  argv[4] = 'b'
  argv[5] = 'c''
  argv[6] = './2'


So for Linux, BSDs, HURD - no problem with splitting spaces.

For AIX, Solaris - anything after space is ignored, which is
why '\_' is an escape sequence for space.
I'll add an explicit note about this in the documentation.

For Minix3 - that's indeed an issue - but is it worth adding much complexity
to address it? I'm not sure (no offence to MINIX3 fans). Since minix3 natively
splits arguments, they have never needed the "env -S" hack anyhow.
opinions welcomed.


> I'm wondering if we need yet another magic environment variable for
> portably marking the demarcation between the arguments to -S and the
> script name, whether the script is run on a platform that hands -S a
> single string, or run on a platform that splits arguments, as in:
> 
> #!/usr/bin/env -S interpreter 'arg with  spaces' ${_ENV_END}

Given the above, is this marker still an important in practice?
(if so, I recommend using a new escape sequence, e.g. "\q" instead
of an environment variable).




> Another question: Does the BSD implementation have any way to pass empty
> strings as explicit arguments?  The code you posted turns:
> 
> #!/usr/bin/env -S sh -c '' echo
> 
> into "sh" "-c" "echo" "script", which did NOT preserve the empty string.

Good catch!

Yes, FreeBSD does preserve empty strings:

  $ cat www
  #!/usr/bin/env -vS sh -c '' echo

  $ ./www
  #env executing: sh
  #env    arg[0]= 'sh'
  #env    arg[1]= '-c'
  #env    arg[2]= ''
  #env    arg[3]= 'echo'
  #env    arg[4]= './www'


This is a bug and I'll fix it.



regards,
 - assaf





reply via email to

[Prev in Thread] Current Thread [Next in Thread]