[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: ending ;
From: |
Paul Eggert |
Subject: |
Re: ending ; |
Date: |
Mon, 3 Feb 2003 15:59:31 -0800 (PST) |
> From: Akim Demaille <address@hidden>
> Date: Sun, 02 Feb 2003 10:54:18 +0100
>
> Paul> We could use "%rule a b: c d;", say.
>
> Let's renamed %rule as ;, and we agree.
:-) :-)
But so far I don't even see the need for "%rule". As far as I can
see, we can get by with ";" alone. That is, we can continue to
support the ";"-less rules that POSIX requires, and we can support new
rules like "a b: c d;". (I'll call these new rules "2LHS rules" as I
don't exactly recall what they're for.)
It is true that we can't support a ";"-less rule that is immediately
followed by a 2LHS rule. But it seems to me that this compromise
solution should satisfy both POSIX and you. It clearly satisfies
POSIX, since POSIX doesn't require 2LHS rules so it doesn't care that
they're not supported after ";"-less rules. And it should satisfy
you, since you dislike ";"-less rules and will never use them, so you
don't care whether they're supported before a 2LHS rule.
> I am not concerned by its size, but more to its goto-like logic, which
> is harder to track than the previous the previous code while we
> already have quite a complex scanner, and how anti-natural it is.
The previous scanner was simpler, but the previous parser was more
complicated and had some undocumented dependencies on the LALR(1)
reduction order. So it's not obvious that the older Bison was simpler
and clearer overall.
Some complexity is necessary here, because the valid contents of {...}
depend on the symbol that precedes the {...}. For example, %union
{...$$...} is not allowed, but %printer {...$$...} is allowed. The
old Bison code did not always check this properly, and sometimes
dumped core. The new code checks this properly, and is a bit more
complicated because of this extra checking.
Now, we could break up Bison a bit, so that scan-gram.l does not care
about $ or @, and so that we have another scanner, invoked by the
parser, that rescans the contents of {...} depending on context
supplied by the parser. This sort of thing will be necessary if we
modify the Bison grammar so that one cannot predict what is
syntactically allowed within {...} by looking at the context that
immediately preceded {...}.
However, it seems to me that there will be some unavoidable complexity
if we make such a change. We'll have to scan the {...} twice, and
we'll have to duplicate the code that scans {...}, which will be a bit
of a pain since we'll have to duplicate scanning through comments,
strings, C99 multibyte constructs, etc. Another possibility is for
scan-gram.l to return something more-complicated than a string for the
result of the {...} scan, but this also adds complexity. Also, we'll
have to explain the extra complexity to the user.
> What I dislike in this code is that it bounds us to have the %keyword
> *right* before the {}. Had the syntax for printer be different (the
> ids and then the code), we would have had to change it!
Something that simple could still be done easily within the current
scanner. However, I'll grant you that one could come up with
more-complicated syntax that can't easily be done within the scanner.
I suspect that such a syntax will bring up the rescanning problems
mentioned above. We can cross that bridge if we have to. I hope that
we can keep the syntax simple enough, though, that we don't have to
cross the bridge.
> Anyway, I'm still actively lobbying for restoring the previous
> situation, where ; is mandatory.
It's OK to make ";" mandatory in grammars that use future Bison
extensions like "a b: c d ;", since that won't break existing code.
However, I don't see the technical need for breaking existing code by
making ";" mandatory in Yacc grammars and in old Bison grammars.
> Paul> The support for those comments between "foo" and ":" is exactly the
> Paul> same Lex code as the support for those comments between any other two
> Paul> grammar symbols. There is no code duplication, so I don't see why
> Paul> this would be considered an extra complexity.
>
> Sorry, I don't understand what you mean here. I don't recall having
> the scanner play gotos between two tokens.
I meant that there's just one copy of the C code that supports
comments between "foo" and ":", and this code didn't change much.
(It did change a little, to fix a bug where locations were being
mishandled in missing-"*/" diagnostics, but that's not relevant here.)
By "gotos" do you mean the "BEGIN SC_AFTER_IDENTIFIER;" in the
<INITIAL>{id} action, and the "BEGIN INITIAL;"s in each
<SC_AFTER_IDENTIFIER> action?
That could be done differently. For example, <INITIAL>{id} could read
ahead with input() and then unput() the last char read. But I don't
see how this would simplify things. It reminds me of what Bison 1.35
did, and though it's superficially appealing to revert to 1.35, it
will complicate the current code, as it will require that we have two
bits of code that know how to scan Yacc /* */ comments, one written in
Lex and the other in C.
In Bison 1.35, if memory serves, there was just one such bit of code,
written in C, that was used every time a Yacc /* */ comment was found.
We could resurrect that code, but we'd have to integrate it with the
flex-generated code, and I don't see how this will reduce complexity
overall.
- Re: ending ;, Akim Demaille, 2003/02/02
- Re: ending ;,
Paul Eggert <=