[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Help with benchmarks on termination
From: |
Akim Demaille |
Subject: |
Help with benchmarks on termination |
Date: |
Sun, 23 Feb 2020 19:10:56 +0100 |
In a branch I'm working on, the automaton may have several final states. This
is a problem for the parsing termination (YYACCEPT) which currently reads in
yacc.c:
> /*--------------------------------------------------------------------.
> | yysetstate -- set current state (the top of the stack) to yystate. |
> `--------------------------------------------------------------------*/
> yysetstate:
> YYDPRINTF ((stderr, "Entering state %d\n", yystate));
> [...]
>
> if (yystate == YYFINAL)
> YYACCEPT;
>
> goto yybackup;
I would need to define several YYFINAL and check them all. It would be
inelegant, and require many changes.
So I tried another approach: instead of specifying the list of accepting
states, just recognize that "accepting" means "reducing to $accept" (which is
the real technical start symbol of the grammar recognized by the automaton).
There are several places to do that, but in order to avoid establishing a LAC
context uselessly right before accepting, a nice place seems to be right before
going to yyreduce:
> @@ -1732,9 +1735,11 @@ yyread_pushed_token:]])[
> if (yyn <= 0)
> {
> if (yytable_value_is_error (yyn))
> - goto yyerrlab;]b4_lac_if([[
> + goto yyerrlab;
> + yyn = -yyn;
> + if (yyr1[yyn] == YYNTOKENS)
> + YYACCEPT;]b4_lac_if([[
> YY_LAC_ESTABLISH;]])[
> - yyn = -yyn;
> goto yyreduce;
> }
>
> @@ -1763,7 +1768,9 @@ yyread_pushed_token:]])[
> yydefault:
> yyn = yydefact[yystate];
> if (yyn == 0)
> - goto yyerrlab;
> + goto yyerrlab;
> + else if (yyr1[yyn] == YYNTOKENS)
> + YYACCEPT;
> goto yyreduce;
Technically, $accept is the first nonterminal, so its symbol number is
YYNTOKENS. Thus "yyr1[yyn] == YYNTOKENS" really means "the LHS of rule yyn is
$accept".
I was surprised by the benches: it's faster than checking if yystate is YYFINAL.
================= final-state
BM_parse 10655 ns 10648 ns 262867
================= final-state-nofinal
BM_parse 9949 ns 9945 ns 286108
(This is Google benchmark). It may surprise that the termination criterion has
such an impact, but since we check it each time we enter a state, it's quite
frequent. And checking right before reducing is less frequent, but requires
looking up in a table (yyr1) as opposed to just checking against a constant.
But then, for some reason I just don't understand, when using LAC, this
approach slows down the parser:
================= final-state-lac
BM_parse 13951 ns 13939 ns 203483
================= final-state-nofinal-lac
BM_parse 15114 ns 15107 ns 185246
and when I repeat this, I get consistently the same results. So I'm puzzled,
and I don't know how to interpret this. The benches were run with the
traditional calculator, maybe this is not relevant enough?
Please, if you have some good benchmark somewhere, try the changes I suggested
above, and report your mileage. It would be great if we could speed up the
parsers by improving this.
In any case, I am very interested by bench results. To reproduce my
experiments, please find attached:
- the patch that installs the alternative ending controlled by -Dnofinal
(currently available as V1 here: https://github.com/akimd/bison/tree/multifinal)
- a benchmark case (a *.y file), and
- a Makefile to run the benches.
You'll have to edit the Makefile to fix the paths to Bison, CXX, etc. Then
"make clean bench".
If you have better ideas on how to deal with termination, I'm all ears!
Thanks in advance.
nofinal.patch
Description: Binary data
final-state.y
Description: Binary data
Makefile
Description: Binary data
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Help with benchmarks on termination,
Akim Demaille <=