bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unclosed quotes on heredoc mode


From: Chet Ramey
Subject: Re: Unclosed quotes on heredoc mode
Date: Mon, 20 Dec 2021 11:02:23 -0500
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.4.0

On 12/9/21 5:30 AM, Robert Elz wrote:
     Date:        Wed, 8 Dec 2021 09:56:50 -0500
     From:        Chet Ramey <chet.ramey@case.edu>
     Message-ID:  <e5a57513-5a50-dde6-fe37-3d4f488ce0a9@case.edu>

Let's take this in smaller steps, and try and sort out one issue
at at time.

Rack 'em.

First, I think you're under a mistaken impression, which is
revealed in the following paragraph.

   | The real question is whether you read a command substitution as a single
   | WORD, so that the lexer cannot return "the next newline token" until the
   | command substitution has been completed.

There is absolutely nothing, anywhere, about "returning" the newline
(token) (token in parens, as while we agree that's what it means, the
standard doesn't currently say that either).

I agree that this is where we disagree. The command substitution is a
single WORD (token), and the parsing performed to find the closing `)'
doesn't have any effect on the parsing state outside it. If you like,
you can change "return" to "encounter."

All that is required is that the lexer encounter a newline (token).
As soon as one is seen, here doc reading commences - which is all a
lexical task.

Yes, this is where the standard needs clarification. I believe that it
implies the command substitution cannot contain text that is interpreted
as a here-document started outside it (the "all characters" text I
referred to previously).


Further, I know bash (and any other shell that works correctly, ignoring
how here docs are processed for this) must encounter the newline token
in its lexer while initially scanning the command substitution (to include
it in whatever word it forms part of).

The same point, restated.

Consider the two following (leading sequences of) command substitutions:

        $( echo I need to see the contents of the case $book in order )
and
        $( echo I need to see the contents of the
                case $book in order )

aside from formatting for this e-mail (added white space, which eventually
becomes irrelevant anyway) there is just a one character change between
the first and the second - a single space char was changed to a newline.

In the first of those, the final ')' shown terminates the command
substitution.   In the second, the ')' doesn't, the command substitution
continues with more not shown here (because it is irrelevant to the
point).



In order properly to collect that command substitution, the lexer that is
collecting it, **MUST** see, recognise, and process, the newline token.

The same point, restated.


Then assuming that immediately before that command substitution
(in each case) we had something like

        cat <<'EOF' $( one of the above...

then in the second case, that newline token is the first one seen by the
lexer after the here doc redirection is it not?

Again, the question boils down to whether the contents of a command
substitution affect what's outside it. And again, this is where we disagree.


But the next misconception, or faulty assumption is revealed there.
You're assuming that because, when you look at the page, the here doc in a
case like

        cat <<EOF $(
text
here
EOF
                command sub commands here )

is part of "the characters they contain".   It isn't, here docs are
eliminated by the lexer, just like \<newline> is eliminated - for the
purpose of whatever construct was being built when encountered, they
simply do not exist at all.

There's no support in the standard for interpreting "all characters
following the open parenthesis to the matching closing parenthesis" as
including "except for processing any unclosed here documents." At least
backslash-newline processing is mentioned.


Here docs have different rules, but the same effect.

These rules are not stated as such in the standard. (Except for the
"newline token" part we agree needs revising.)


   | I suppose it's precedence parsing: the command substitution has higher
   | precedence than here-documents.

It isn't, because parsing, even pseudo-parsing,

Concentrate on the `precedence' part rather than the `parsing' part. The
lexer has to read the entire command substitution before considering the
here-document.



   | > So, if one does
   | >
   | >       $( cmd <<END )

[...]

   | Now put the text between $( and ) into a file and run it as a shell script.
   | Is it valid?

Syntactically valid, certainly.

Oh, stop it. Even the netbsd shell throws a syntax error here. You've said
that bash allowing EOF to terminate a here-document (with a warning) is a
bug.


  That it would not execute as it is is not
material.

That is no different than

        $( cmd <&5 )

Put the text of that (from between the $( and ) in a file, and run it
as a shell script, and that won't work either, as in that script nothing
has opened fd 5.

The parser accepts it. That's the difference.


Both are syntactically correct, which is determined by applying the
rules of the grammar in production mode, and seeing if it is possible
for the grammar to produce the text in question.  In both cases, it
is, clearly.

So you're saying the netbsd sh has a bug here? That it requires the
closing `EOF' in error? Come on, that's obviously not true, or you don't
believe it to be true, and we've been over this before.

Again, we disagree. The question is whose interpretation of the standard
is correct, and how to fix the language so that the favored interpretation
is clear.

Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet@case.edu    http://tiswww.cwru.edu/~chet/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]