[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: terminal @number vs. @user-number
From: |
Joel E. Denny |
Subject: |
Re: terminal @number vs. @user-number |
Date: |
Sun, 21 Oct 2007 20:18:55 -0400 (EDT) |
On Mon, 22 Oct 2007, Wojciech Polak wrote:
> On 2007-10-21 at 17:46 -0400, Joel E. Denny wrote:
>
> > Currently, Bison puts a terminal's user number (the one returned by yylex)
> > in its XML "number" attribute. I think we should rename that to
> > "user-number" and add a "number" attribute for Bison's internal symbol
> > number. This would be more consistent with nonterminals.
> > I'd be happy the write the patch. Is all this agreeable to you, Wojciech?
>
> Can you write more about the practical goal (and its further usage)
> of having two numbers, especially Bison's internal symbol number?
> Maybe it's okay to switch, but to have only one kind of number,
> thus changing nonterminal, and not terminal?
While terminals have both user numbers and internal numbers, nonterminals
only have internal numbers. Thus, the only way to change the nonterminal
element that I know of is to eliminate its @number altogether. Is that
what you mean?
On the one hand, I suppose we could argue that the user never really needs
to know any of the symbol numbers for normal and clean usage of the
generated parser. On the other hand, when developing and debugging
Bison's front end, I know I've found all the numbers useful at different
times. The user might find them helpful during low-level debugging of the
generated parser code as well.
At the moment, I'm mainly bothered that @number isn't guaranteed to have a
unique value for each symbol since it seems like it should. If we make
the change I'm suggesting, it will.
Unique @number values are important if someone wants to use @number rather
than @name for symbol references. For example, consider a URI fragment
identifier (like s103 in http://www.example.com/index.html#s103). @name
might be long and it might contain special characters that would have to
be escaped in order to be placed there. @number usually requires less
space and could be placed there with no extra processing.
Of course, the user could use "n" and "t" prefixes to make @number based
fragment identifiers unique, but my point is that it seems unintuitive
that @number isn't already unique.
I suppose the user could use generate-id() or position() instead of
@number in that scenario. However, I'm guessing there might be situations
when the user is debugging with the aid of some custom report he generated
from Bison's XML. It might be less confusing if the number representing a
symbol is guaranteed to be consistent between his customized report and
the C parser tables he's examining. Maybe.
Researchers have been known to instrument Bison and its generated parsers
for various purposes. They might find the numbers in the XML output
useful for generating code that depends on the C parser tables.
Well, I'm brainstorming, so some of my arguments may be flimsy. In
general, it seems like there are scenarios when it would be more
convenient, more consistent, and cleaner for the user to be able to access
all the symbol numbers than to have to resort to other techniques. What
do you think?