emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] Re: [BUG] ob-shell: :shebang changes interpretation of :cmdline


From: Matt
Subject: [PATCH] Re: [BUG] ob-shell: :shebang changes interpretation of :cmdline
Date: Sun, 21 Apr 2024 17:09:06 +0200
User-agent: Zoho Mail

 ---- On Sat, 18 Nov 2023 16:54:39 +0100  Max Nikulin  wrote --- 
 > Hi,
 > 
 > Trying to figure out the origin of the confusion with
 > "bash -c bash /path/to/file-containing-the-source-code.sh"
 > I have faced an inconsistency with :cmdline treatment in ob-shell.el. I 
 > expect same results in the following cases:
 > 
 > #+begin_src bash :cmdline 1 2 3
 >    printf "%s\n" "$1"
 > #+end_src
 > 
 > #+RESULTS:
 > : 1
 > 
 > #+begin_src bash :cmdline 1 2 3 :shebang #!/bin/bash
 >    printf "%s\n" "$1"
 > #+end_src
 > 
 > #+RESULTS:
 > : 1 2 3
 > 
 > Emacs-28, Org is the current git HEAD.

AFAIU, the inconsistency is due to how the characters following :cmdline are 
interpreted when the subprocess call is made.

Consider the following, when only :cmdline is used:

# Evaluates like:
#
#     bash -c "./sh-script-8GJzdG 1 2 3"
#
#+begin_src bash :cmdline 1 2 3
echo \"$1\"
#+end_src

#+RESULTS:
: 1

# Evaluates like:
#
#     bash -c "./sh-script-8GJzdG \"1 2\" 3"
#
#+begin_src bash :cmdline "1 2" 3
echo \"$1\"
#+end_src

#+RESULTS:
: 1 2

For :cmdline alone, the characters following :cmdline are passed as though each 
is quoted.  That is, separate arguments are delimited by one or more spaces.  
The first example is equivalent to the following:

# Evaluates like:
#
#     bash -c "./sh-script-8GJzdG \"1\" \"2\" \"3\""
#
#+begin_src bash :cmdline 1 2 3
echo \"$1\"
#+end_src

#+RESULTS:
: 1

How would you expect :cmdline "1 2 3" to be evaluated?

#+begin_src bash :cmdline "1 2 3"
echo \"$1\"
#+end_src

My expectation would be that it evaluates like:

  bash -c "./sh-script-8GJzdG \"1 2 3\""

It turns out, however, that it's evaluated exactly like :cmdline 1 2 3, or 
:cmdline "1" "2" "3".  The result is "1".

To make the block evaluate as expected requires an extra set of parentheses:

# Evaluates like:
#
#     bash -c "./sh-script-8GJzdG \"1 2 3\""
#
#+begin_src bash :cmdline "\"1 2 3\""
echo \"$1\"
#+end_src

#+RESULTS:
: 1 2 3

This, however, appears to be separate from the reported issue[fn:1].

Now, consider :cmdline paired with :shebang, called with the same values as 
above.

# Evaluates like:
#
#     /tmp/babel-Xd6rGS/sh-script-61jvMa "1 2 3"
#
#+begin_src bash :cmdline 1 2 3 :shebang #!/usr/bin/env bash
echo \"$1\"
#+end_src

#+RESULTS:
: 1 2 3

# Evaluates like:
#
#     /tmp/babel-Xd6rGS/sh-script-61jvMa "\"1 2\" 3"
#
#+begin_src bash :cmdline "1 2" 3 :shebang #!/usr/bin/env bash
echo \"$1\"
#+end_src

#+RESULTS:
: 1 2" 3"

# Evaluates like:
#
#     /tmp/babel-Xd6rGS/sh-script-61jvMa "1 2 3"
#
#+begin_src bash :cmdline "1 2 3" :shebang #!/usr/bin/env bash
echo \"$1\"
#+end_src

#+RESULTS:
: 1 2 3

# Evaluates like:
#
#     /tmp/babel-Xd6rGS/sh-script-61jvMa "\"1 2 3\""
#
#+begin_src bash :cmdline "\"1 2 3\"" :shebang #!/usr/bin/env bash
echo \"$1\"
#+end_src

#+RESULTS:
: 1 2 3""

# Evaluates like:
#
#     /tmp/babel-Xd6rGS/sh-script-61jvMa "\"1\" \"2\" \"3\""
#
#+begin_src bash :cmdline "1" "2" "3" :shebang #!/usr/bin/env bash
echo \"$1\"
#+end_src

#+RESULTS:
: 1" "2" "3""

An immediate observation is that the output results don't format correctly.  If 
you change the results type to "raw", however, you'll see that the Org results 
match those from a terminal, like xfce4-terminal.  The fact that raw output 
matches output from the terminal means that the formatting issue is (also) 
separate from the bug we're trying to fix.  That is, the bug we're trying to 
fix occurs in how the subprocess call is made, not in how the result is 
formatted.

In ob-shell, the subprocess call is made with 'process-file'.  Arguments are 
determined casewise:

1. shebang+cmdline
2. cmdline

The characters following :cmdline are received by the 'cmdline' argument to 
'org-babel-sh-evaluate' as a string.  Both cases put this string into a list 
for the ARGS of 'process-file':

| header               | 'org-babel-sh-evaluate' | process-file ARGS     |
|                      | cmdline variable value  | shebang+cmdline       |
|----------------------+-------------------------+-----------------------|
| :cmdline 1 2 3       | "1 2 3"                 | ("1 2 3")             |
| :cmdline "1 2" 3"    | "\"1 2\" 3"             | ("\"1 2\" 3")         |
| :cmdline "1" "2" "3" | "\"1\" \"2\" \"3\""     | ("\"1\" \"2\" \"3\"") |

| header               | 'org-babel-sh-evaluate' | process-file ARGS     |
|                      | cmdline variable value  | cmdline               |
|----------------------+-------------------------+-----------------------|
| :cmdline 1 2 3       | "1 2 3"                 | ("1 2 3")             |
| :cmdline "1 2" 3"    | "\"1 2\" 3"             | ("\"1 2\" 3")         |
| :cmdline "1" "2" "3" | "\"1\" \"2\" \"3\""     | ("\"1\" \"2\" \"3\"") |

Notice that the ARGS passed to 'process-file' are the same for both cases.  The 
problem is that the "block equivalent shell calls" are *not* the same.  If we 
arrange the equivalent shell calls from the blocks given above into a table, we 
see that the forms are different:

| header               | cmdline variable value | shebang+cmdline call          
                         |
|----------------------+------------------------+--------------------------------------------------------|
| :cmdline 1 2 3       | "1 2 3"                | 
/tmp/babel-Xd6rGS/sh-script-61jvMa "1 2 3"             |
| :cmdline "1 2" 3"    | "\"1 2\" 3"            | 
/tmp/babel-Xd6rGS/sh-script-61jvMa "\"1 2\" 3"         |
| :cmdline "1" "2" "3" | "\"1\" \"2\" \"3\""    | 
/tmp/babel-Xd6rGS/sh-script-61jvMa "\"1\" \"2\" \"3\"" |

| header               | cmdline variable value | cmdline call                  
                 |
|----------------------+------------------------+------------------------------------------------|
| :cmdline 1 2 3       | "1 2 3"                | bash -c "./sh-script-8GJzdG 1 
2 3"             |
| :cmdline "1 2" 3"    | "\"1 2\" 3"            | bash -c "./sh-script-8GJzdG 
\"1 2\" 3"         |
| :cmdline "1" "2" "3" | "\"1\" \"2\" \"3\""    | bash -c "./sh-script-8GJzdG 
\"1\" \"2\" \"3\"" |

The reported bug exists because shebang+cmdline interprets the characters 
following :cmdline as a *single* string.  Without :shebang, a lone :cmdline 
interprets them as space delimited.

One possible solution is to reformat the 'process-file' ARGS for the 
shebang+cmdline case so that characters following :cmdline are interpreted as 
space delimited.  This is possible using 'split-string-and-unquote':

(split-string-and-unquote "1 2 3")             -> ("1" "2" "3")
(split-string-and-unquote "\"1 2\" 3")         -> ("1 2" "3")
(split-string-and-unquote "\"1\" \"2\" \"3\"") -> ("1" "2" "3")

Whether this is a solution, in part, depends on the perennial problem of shell 
blocks: knowing what's wrong means knowing what's right.

The proposed solution assumes we intend to parse the characters following 
:cmdline as space delimited and grouped by quotes.  However, AFAICT, the 
parsing issue makes this solution ambiguous.

Thoughts?

--
Matt Trzcinski
Emacs Org contributor (ob-shell)
Learn more about Org mode at https://orgmode.org
Support Org development at https://liberapay.com/org-mode

[fn:1] AFAICT, it's due to how headers are parsed by 
'org-babel-parse-header-arguments' using 'org-babel-read'.  The cell "\"1 2 
3\"" (corresponding to :cmdline "1 2 3") is reduced through 'string-match' to 
"1 2 3".  The cell "1 2 3" (corresponding to :cmdline 1 2 3), on the other 
hand, passes through.  The result is that :cmdline "1 2 3" and :cmdline 1 2 3 
become indistinguishable.  I mention this because it's easy to get confused by 
this issue which, AFAICT, is independent of the one we're trying to fix.  The 
reported issue appears only to be related to how the result of :cmdline header 
parsing is passed to the subprocess.

Attachment: 0001-testing-lisp-test-ob-shell.el-Test-shebang-cmdline-r.patch
Description: Binary data

Attachment: 0002-lisp-ob-shell.el-Add-comments-to-apply-call.patch
Description: Binary data

Attachment: 0003-lisp-ob-shell.el-Fix-cmdline-shebang-inconsistencies.patch
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]