bug-parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GNU Parallel Bug Reports why is parallel invoking a shell **by defau


From: Ole Tange
Subject: Re: GNU Parallel Bug Reports why is parallel invoking a shell **by default** and associated bugs
Date: Sun, 13 Nov 2016 19:15:38 +0100

On Wed, Nov 9, 2016 at 11:32 PM, Stephane Chazelas
<address@hidden> wrote:
> 2015-05-23 21:50:51 +0100, Stephane Chazelas:
> [...]
>> You intend it to run ["cmd", "1"], ["cmd", "2"]... in parallel,
:
> [...]
>> At the moment, depending on the shell (and it's not always clear
>> which one you'll get)

It should be much clearer now:
https://www.gnu.org/software/parallel/parallel_design.html#Which-shell-to-use

>> there are a few bugs.
>>
>> For instance with zsh:
>>
>> $ printf '=z\n'  | PARALLEL_SHELL=zsh parallel  'printf "<%s>\n"'
>> zsh:1: z not found
>>
>> In zsh, a leading = is a globbing operator that is not currently
>> escaped by parallel.
>>
>> With csh/tcsh:
>>
>> $ printf 'a\nb\0'  | PARALLEL_SHELL=tcsh parallel -0 'printf "<%s>\n"'
>> Unmatched '.
>> Unmatched '.
> [...]

These were fixed in 20150622.

> To add to the list:
>
> $ printf '\243`/tmp/xxx\243`\n' |
>   LC_ALL=zh_HK.big5hkscs PARALLEL_SHELL=bash parallel echo
> α
> bash: /tmp/xxxα: No such file or directory
> $ parallel --version
> GNU parallel 20161022
> [...]
> $ LC_ALL=zh_HK.big5hkscs locale charmap
> BIG5-HKSCS
>
> (that α is only rendered as α if your terminal's charset is
> BIG5-HKSCS. In a UTF-8 terminal, you'd probably see something
> like �\ instead)
>
> In that Hong Kong character set, ε (U+03B5) is encoded as 0xa3
> 0x60. 0x60 also happens to be ` (backtick U+0060) in ASCII (and
> BIG5-HKSCS when by itself for that matters)..
>
> parallel thinks that 0x60 is a backtick that it needs to escape
> instead of being the second byte of that ε character. It also
> escapes the \243 byte, both with backslash.
>
> so it actually runs
>
> exec("bash", "-c", "echo 
> <0x5c><0xa3><0x5c><0x60>/tmp/xxx<0x5c><0xa3><0x5c><0x60>")
>
> 0x5c is backlash in ASCII and BIG5-HKSCS but 0xa3 0x5c is α
> (U+03b1) in BIG5-HKSCS, so bash which is multi-byte aware sees:
>
> echo \α`/tmp/xxxα`
>
> instead of the intended
>
> echo ε/tmp/xxxε
>
> And tries to run the xxxα command.
>
> Note that it's not the only charset with this kind of problem.
> BIG5, GB18030 and GBK as well (at least). It's not only those
> characters. It's any charaset that have multi-byte characters
> where some of the byte components also happen to be ASCII
> characters special to the shell.

I can confirm that.

> For those charsets above, using single quotes instead of
> backslash for quoting helps as 0x27 is not part of any
> multi-byte character in those charsets.
>
> echo 'ε/tmp/xxxε'
>
> would not be a problem.
>
> echo '<0xa3>''<0x60>'/tmp/xxx'<0xa3>''<0x60>'
>
> would not be a problem with bash but would be with yash that
> chokes on byte sequences that don't form valid characters.
>
> echo \ε/tmp/xxx\ε
>
> would be OK in bash, but not in shells that are not multi-byte
> aware like dash.

Having tried your examples in dash and zsh it seems they work with no
problem. Maybe due to the shells not understanding multi-byte chars.

> Quoting in shell is a tricky business. It's best not to
> invoke a shell in the first place if it can be at all avoided.

You cannot use composed commands, functions, and redirection without
wrapping in a shell. A lot of the helper functions also use a shell:
https://www.gnu.org/software/parallel/parallel_design.html#Convenience-options---nice---basefile---transfer---return---cleanup---tmux---group---compress---cat---fifo---workdir

To me this by far outweighs the problem that you may have by having to
do LC_ALL=C.

But it will be even better if we can find a way to adapt the quoting,
so it will work correctly in both dash/zsh and bash with both LC_ALL=C
and LC_ALL=zh_HK.big5hkscs

It will probably involve changing shell_quote_scalar_*() to take this
into account.

Solutions welcome.


/Ole



reply via email to

[Prev in Thread] Current Thread [Next in Thread]