[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: GNU Parallel Bug Reports why is parallel invoking a shell **by defau
From: |
Ole Tange |
Subject: |
Re: GNU Parallel Bug Reports why is parallel invoking a shell **by default** and associated bugs |
Date: |
Sun, 13 Nov 2016 19:15:38 +0100 |
On Wed, Nov 9, 2016 at 11:32 PM, Stephane Chazelas
<address@hidden> wrote:
> 2015-05-23 21:50:51 +0100, Stephane Chazelas:
> [...]
>> You intend it to run ["cmd", "1"], ["cmd", "2"]... in parallel,
:
> [...]
>> At the moment, depending on the shell (and it's not always clear
>> which one you'll get)
It should be much clearer now:
https://www.gnu.org/software/parallel/parallel_design.html#Which-shell-to-use
>> there are a few bugs.
>>
>> For instance with zsh:
>>
>> $ printf '=z\n' | PARALLEL_SHELL=zsh parallel 'printf "<%s>\n"'
>> zsh:1: z not found
>>
>> In zsh, a leading = is a globbing operator that is not currently
>> escaped by parallel.
>>
>> With csh/tcsh:
>>
>> $ printf 'a\nb\0' | PARALLEL_SHELL=tcsh parallel -0 'printf "<%s>\n"'
>> Unmatched '.
>> Unmatched '.
> [...]
These were fixed in 20150622.
> To add to the list:
>
> $ printf '\243`/tmp/xxx\243`\n' |
> LC_ALL=zh_HK.big5hkscs PARALLEL_SHELL=bash parallel echo
> α
> bash: /tmp/xxxα: No such file or directory
> $ parallel --version
> GNU parallel 20161022
> [...]
> $ LC_ALL=zh_HK.big5hkscs locale charmap
> BIG5-HKSCS
>
> (that α is only rendered as α if your terminal's charset is
> BIG5-HKSCS. In a UTF-8 terminal, you'd probably see something
> like �\ instead)
>
> In that Hong Kong character set, ε (U+03B5) is encoded as 0xa3
> 0x60. 0x60 also happens to be ` (backtick U+0060) in ASCII (and
> BIG5-HKSCS when by itself for that matters)..
>
> parallel thinks that 0x60 is a backtick that it needs to escape
> instead of being the second byte of that ε character. It also
> escapes the \243 byte, both with backslash.
>
> so it actually runs
>
> exec("bash", "-c", "echo
> <0x5c><0xa3><0x5c><0x60>/tmp/xxx<0x5c><0xa3><0x5c><0x60>")
>
> 0x5c is backlash in ASCII and BIG5-HKSCS but 0xa3 0x5c is α
> (U+03b1) in BIG5-HKSCS, so bash which is multi-byte aware sees:
>
> echo \α`/tmp/xxxα`
>
> instead of the intended
>
> echo ε/tmp/xxxε
>
> And tries to run the xxxα command.
>
> Note that it's not the only charset with this kind of problem.
> BIG5, GB18030 and GBK as well (at least). It's not only those
> characters. It's any charaset that have multi-byte characters
> where some of the byte components also happen to be ASCII
> characters special to the shell.
I can confirm that.
> For those charsets above, using single quotes instead of
> backslash for quoting helps as 0x27 is not part of any
> multi-byte character in those charsets.
>
> echo 'ε/tmp/xxxε'
>
> would not be a problem.
>
> echo '<0xa3>''<0x60>'/tmp/xxx'<0xa3>''<0x60>'
>
> would not be a problem with bash but would be with yash that
> chokes on byte sequences that don't form valid characters.
>
> echo \ε/tmp/xxx\ε
>
> would be OK in bash, but not in shells that are not multi-byte
> aware like dash.
Having tried your examples in dash and zsh it seems they work with no
problem. Maybe due to the shells not understanding multi-byte chars.
> Quoting in shell is a tricky business. It's best not to
> invoke a shell in the first place if it can be at all avoided.
You cannot use composed commands, functions, and redirection without
wrapping in a shell. A lot of the helper functions also use a shell:
https://www.gnu.org/software/parallel/parallel_design.html#Convenience-options---nice---basefile---transfer---return---cleanup---tmux---group---compress---cat---fifo---workdir
To me this by far outweighs the problem that you may have by having to
do LC_ALL=C.
But it will be even better if we can find a way to adapt the quoting,
so it will work correctly in both dash/zsh and bash with both LC_ALL=C
and LC_ALL=zh_HK.big5hkscs
It will probably involve changing shell_quote_scalar_*() to take this
into account.
Solutions welcome.
/Ole