bug-parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GNU Parallel Bug Reports why is parallel invoking a shell **by defau


From: Stephane Chazelas
Subject: Re: GNU Parallel Bug Reports why is parallel invoking a shell **by default** and associated bugs
Date: Wed, 9 Nov 2016 22:32:40 +0000
User-agent: Mutt/1.5.21 (2010-09-15)

2015-05-23 21:50:51 +0100, Stephane Chazelas:
[...]
> You intend it to run ["cmd", "1"], ["cmd", "2"]... in parallel,
> not ["myshell", "-c", "cmd 1"], ["myshell", "-c", "cmd 2"],
> hoping that the shell will eventually (after initialisation,
> loading libraries, startup files...) run ["cmd", "1"]...
[...]
> At the moment, depending on the shell (and it's not always clear
> which one you'll get) there are a few bugs.
> 
> For instance with zsh:
> 
> $ printf '=z\n'  | PARALLEL_SHELL=zsh parallel  'printf "<%s>\n"'
> zsh:1: z not found
> 
> In zsh, a leading = is a globbing operator that is not currently
> escaped by parallel.
> 
> With csh/tcsh:
> 
> $ printf 'a\nb\0'  | PARALLEL_SHELL=tcsh parallel -0 'printf "<%s>\n"'
> Unmatched '.
> Unmatched '.
[...]

Hiya,

To add to the list:

$ printf '\243`/tmp/xxx\243`\n' |
  LC_ALL=zh_HK.big5hkscs PARALLEL_SHELL=bash parallel echo
α
bash: /tmp/xxxα: No such file or directory
$ parallel --version
GNU parallel 20161022
[...]
$ LC_ALL=zh_HK.big5hkscs locale charmap
BIG5-HKSCS

(that α is only rendered as α if your terminal's charset is
BIG5-HKSCS. In a UTF-8 terminal, you'd probably see something
like �\ instead)

In that Hong Kong character set, ε (U+03B5) is encoded as 0xa3
0x60. 0x60 also happens to be ` (backtick U+0060) in ASCII (and
BIG5-HKSCS when by itself for that matters)..

parallel thinks that 0x60 is a backtick that it needs to escape
instead of being the second byte of that ε character. It also
escapes the \243 byte, both with backslash.

so it actually runs

exec("bash", "-c", "echo 
<0x5c><0xa3><0x5c><0x60>/tmp/xxx<0x5c><0xa3><0x5c><0x60>")

0x5c is backlash in ASCII and BIG5-HKSCS but 0xa3 0x5c is α
(U+03b1) in BIG5-HKSCS, so bash which is multi-byte aware sees:

echo \α`/tmp/xxxα`

instead of the intended

echo ε/tmp/xxxε

And tries to run the xxxα command.

Note that it's not the only charset with this kind of problem.
BIG5, GB18030 and GBK as well (at least). It's not only those
characters. It's any charaset that have multi-byte characters
where some of the byte components also happen to be ASCII
characters special to the shell.

For those charsets above, using single quotes instead of
backslash for quoting helps as 0x27 is not part of any
multi-byte character in those charsets.

echo 'ε/tmp/xxxε'

would not be a problem.

echo '<0xa3>''<0x60>'/tmp/xxx'<0xa3>''<0x60>'

would not be a problem with bash but would be with yash that
chokes on byte sequences that don't form valid characters.

echo \ε/tmp/xxx\ε

would be OK in bash, but not in shells that are not multi-byte
aware like dash.

Quoting in shell is a tricky business. It's best not to
invoke a shell in the first place if it can be at all avoided.

-- 
Stephane



reply via email to

[Prev in Thread] Current Thread [Next in Thread]