[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 0/3] yacc: compute the best type for the state number
From: |
Kaz Kylheku |
Subject: |
Re: [PATCH 0/3] yacc: compute the best type for the state number |
Date: |
Tue, 01 Oct 2019 11:40:24 -0700 |
User-agent: |
Roundcube Webmail/0.9.2 |
On 2019-10-01 01:35, Paul Eggert wrote:
On 9/29/19 11:34 AM, Akim Demaille wrote:
As a matter of fact, we used two types:
- most arrays (such as state stack, and its correspondance in the LAC
infrastructure) are using int16_t. A few "temporary" variables also
have this type.
- yystate, which is an intermediate variable, was an int.
Actually those arrays use int_fast16_t not int16_t, as C99/C11/C18
does not require support for int16_t. It could well be more efficient
for them to use int_least16_t instead, for better caching; see below.
I would make two typedefs: one for the storage type of Yacc
state values, and one for the "register" type for manipulating
them in local variables.
The latter of course, being capable of representing all the values of
the former. E.g. example definitions:
typedef short yy_small_state_t;
typedef int yy_state_t;
The special value -1 should be given a manifest constant, like
YY_BAD_STATE or whatever. This constant should have a value
which is representable in the small type yy_small_state_t,
not a value which changes when converted to yy_small_state_t.
In other words:
YY_BAD_STATE == (yy_state_t) YY_BAD_STATE
This is currently not true of the value -1 for various combinations
of the two types. Obviously if both are signed, there isn't
any issue.
(Needless to say, states should be compared to this value exactly,
never using "state < 0"; the constant is not even required to
to be negative.)
In my proposal below, I fuse both cases to use a single type,
yy_state_num, which is the smallest type that contains the set of
states: an unsigned type.
In other GNU applications, we've been moving away from using unsigned
types due to their confusing behavior (particularly when comparing
signed vs unsigned), and because modern tools such as 'gcc
The unsigned type are indeed silly when arithmetic is involved
because they break basic arithmetic identities like. For instance,
given:
a > b - c
we'd like to be free to move the c to the other side:
a + c > b
if all three are signed types, with small-ish values clustered
around zero, then there is no overflow and the derivation holds.
If they are unsigned types, even if the values are small, the
derivation is not justified, because b - c may produce a huge value.
That's the crux; unsigned has an overflow cliff right at zero,
creating a pitfall for calculations involving trivial numbers.
That said, parser states are more of an enumeration. They are
identifiers.
We shouldn't be doing any math on them of this sort. Given a parser
state
S, there is nothing interesting about S+1; they are not related. When
would
we ever multiply two yacc states, or find their difference and such, or
care whether one is greater than the other?
The unsigned types provide a greater range for the possible states,
without having to use negative values, so they are understandably an
attractive tool for combating the problem of running out of states.
I don't see a big problem with using them. This is all internal to
a skeleton; it's not an API being perpetrated on the user. That's why
there is so much latitude to fiddle with these types at parser
generation
time in the first place. Just the skeleton code and any generated
bits have to get it right; the parser generator user isn't burdened
with anything.
Rearding indexing, it's difficult to keep unsigned types entirely
out of indexing calculations because an unsigned type is injected
into the mix whenever we invoke the sizeof operator.