bug-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH 0/3] yacc: compute the best type for the state number


From: Paul Eggert
Subject: [PATCH 0/3] yacc: compute the best type for the state number
Date: Tue, 1 Oct 2019 01:35:32 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0

On 9/29/19 11:34 AM, Akim Demaille wrote:

As a matter of fact, we used two types:

- most arrays (such as state stack, and its correspondance in the LAC
  infrastructure) are using int16_t.  A few "temporary" variables also
  have this type.

- yystate, which is an intermediate variable, was an int.

Actually those arrays use int_fast16_t not int16_t, as C99/C11/C18 does not require support for int16_t. It could well be more efficient for them to use int_least16_t instead, for better caching; see below.

In my proposal below, I fuse both cases to use a single type,
yy_state_num, which is the smallest type that contains the set of
states: an unsigned type.

In other GNU applications, we've been moving away from using unsigned types due to their confusing behavior (particularly when comparing signed vs unsigned), and because modern tools such as 'gcc -fsanitize=undefined' can check for signed integer overflow but (obviously) not for unsigned integer overflow. In this newer style, it's OK to use unsigned types for bit vectors and hash values since these typically don't involve integer comparisons and integer overflow is typically harmless; for indexes, though, unsigned types are so error-prone that they should be avoided.

In the past, Bison has gotten away with using uint8_fast_t and uint16_fast_t for storing indexes, because on machines with 32-bit int (which is the minimum supported by GNU) these types typically promote to 'int' and so avoid most of the signed-vs-unsigned confusion. But if Bison starts using 32-bit unsigned types it won't have this luxury since these types won't promote to 'int'. And anyway, Bison shouldn't assume that uint8_fast_t etc. promote to 'int', because the C standard doesn't guarantee that.

So, I suggest using signed types for indexes in Bison templates; please see the attached patch. I haven't checked for compatibility between this and your proposed patches, but I assume that's straightforward.

This patch does not address the int_fast16_t versus int_least16_t issue, but that should be easy enough to fix in a followup patch.

That's where I need you Paul: do you think that, for sake of
performances, I should keep the scalar variable as an 'int', and move
to the unsigned type only for arrays?  Is it ok to move to stacks of
uint8_t instead of uint16_t, or should we stick to larger types?  I
can see how using small types is cache friendly, but I can also
imagine that cpus don't like small types.

For local scalar variables it doesn't matter much these days; 'int' is fine and is well-understood, but if the index might not fit into 'int' then 'ptrdiff_t' is needed instead.

Typically, if you have a large number of integers that all fit into 8 bits, you're better off using int_least8_t than wider types since that decreases cache pressure. Cache lines are typically wider than 64 bits anyway, so you're not saving CPU cycles by using 64-bit rather than 8-bit integers.

Attachment: 0001-Prefer-signed-types-for-indexes-in-skeletons.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]