[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: multistart: use YYSTYPE
From: |
Akim Demaille |
Subject: |
Re: multistart: use YYSTYPE |
Date: |
Sun, 4 Oct 2020 16:47:48 +0200 |
Hi Rici,
> Le 27 sept. 2020 à 20:46, Rici Lake <ricilake@gmail.com> a écrit :
>
> I want to focus on the
> advantages of using YYSTYPE to represent the returned value.
>
> In particular, it's highly flexible. You don't need to restrict YYSTYPE in
> any way. I say this partly because I still prefer tagged types to `%define
> api.value.type union`. Using actual C types has a certain appeal but in
> practice when I try this style, I inevitably end up using typedefs to
> create type aliases, even for primitive types (so that I don't need to edit
> every %type declaration when I decide that long should be uint64_t.). Maybe
> that's just old-fogeyism :-)
No, I understand this, and do the same. But I do prefer typedef + %define
api.value.type union, which is consistent throughout the application, to using
%union, where parser uses a different vocabulary than the rest of the program.
> But there are a number of parsing applications
> in which YYSTYPE is not a C union at all (for example, when it's some kind
> of discriminated union), and these should also be usable with the multiple
> start symbol interface.
Yes, that's definitely something on which my model chokes.
> Also, once you decide to use a compound return type (whether it's a direct
> return or indirect via an out parameter) which includes a specific value
> type alternative, you're committed to create several different structure
> types, each used only for a particular call. While there are not likely to
> be many such structures, it feels a bit ugly, at least in C.
I personally don't feel this is ugly at all.
> Using YYSTYPE
> would require only one compound type (and only one type name), and the
> client is going to have to extract the value from the compound anyway. (I
> understand that if you `#define api.value.type union` then you don't have a
> convenient tagname to extract the value, which means resorting to an ugly
> cast. Since your proposal only applies for this particular case, it makes
> some sense to do the cast automatically. But as I said above, I think it
> would be nicer if multiple start symbols were more general.)
I have the feeling that your concerns would be addressed if I made
the following function public:
> typedef struct
> {
> YYSTYPE yyvalue;
> int yynerrs;
> } yy_parse_impl_t;
>
> // Run a full parse, using YYCHAR as switching token.
> static int
> yy_parse_impl (int yychar, yy_parse_impl_t *yyimpl);
All the other functions are implemented on top of it. For instance
> yyparse_input_t
> yyparse_input (void)
> {
> yyparse_input_t yyres;
> yy_parse_impl_t yyimpl;
> yyres.yystatus = yy_parse_impl (TOK_YY_PARSE_input, &yyimpl);
> yyres.yynerrs = yyimpl.yynerrs;
> return yyres;
> }
I still much prefer to have a structure rather than just passing
more arguments for yynerrs and possibly others, so that we can
have more members in the future if we feel the need for it.
> Finally, let me note that (aside from didactic issues, like "how easy is
> this to explain to SO questioners?"), this change isn't going to affect me
> personally because I almost always use the push interface. With the push
> interface, implementing start-symbol sentinels is extremely easy (which is
> not to say that a bit of assistance wouldn't be appreciated).
I agree. Push parsers are great, and I've often been willing to get
rid of the pull parser implementation, and to offer a pull-on-top-of-push
instance (what you get with %define api.push-pull both). Unfortunately
benchmarks show that pull is faster than push, and some people might be
angry if replaced the native pull by pull-on-top-of-push. Using Bison
benchmarking tool:
> $ ./_build/g9d/etc/bench.pl --gbench --bench push
> Entering directory `benches/241'
> Using bison=/Users/akim/src/gnu/bison/_build/g9d/tests/bison.
> 0. %define api.pure %define api.push-pull both
> 1. %define api.push-pull both
> 2. %define api.pure
> 3.
> Run on (8 X 2900 MHz CPU s)
> CPU Caches:
> L1 Data 32K (x4)
> L1 Instruction 32K (x4)
> L2 Unified 262K (x4)
> L3 Unified 8388K (x1)
> --------------------------------------------------
> Benchmark Time CPU Iterations
> --------------------------------------------------
> BM_y0 10283 ns 10274 ns 65229
> BM_y1 10168 ns 10156 ns 70180
> BM_y2 9279 ns 9270 ns 74935
> BM_y3 9047 ns 9038 ns 76379
This figures are quite stable.
$ cd ./benches/latest
$ make rand
--------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------
BM_y3 9266 ns 9258 ns 71538
BM_y1 10377 ns 10368 ns 67121
BM_y0 10169 ns 10158 ns 67373
BM_y2 9289 ns 9279 ns 75201
$ make rand
--------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------
BM_y1 10262 ns 10252 ns 65069
BM_y3 9204 ns 9195 ns 74677
BM_y0 10191 ns 10182 ns 67133
BM_y2 8759 ns 8750 ns 77941
So that's more than 10% of performance penalty to use a pure pull parser.
(This parser spend little time in the actions: it's the usual calculator).
> Most of the time, what I'd really like is a way of extending the push-parse
> context object to hold some extra members, one of which would be the return
> value from the final parse call.
Agreed. I feel sorry that %parse-param was not used to extend the
context object in the case of push parsers, instead of passing %parse-params
around all the time. In the C++, that's the way %parse-param is implemented:
more members in the parser object.
> Now, let me just plant that issue there for push parsers, and go back to
> reentrant pull parsers. (Reentrant, because one would really want to start
> promoting reentrant parsers a bit more at this point, since reliance on
> globals is a technique from a different era.) Bison's reentrant parsers
> don't require a context object, which is cool in its own way but also a
> limitation.
Yes, it is.
> (For one thing, it makes aspects of the parser state
> inaccessible outside of action code.) Suppose, instead, that the multiple
> start symbol feature required a reentrant parser with a context object.
> With that change, yynerrs and the symbol's return value (i.e. the top of
> the stack when the root symbol is reduced) could just be kept in the
> context object, in a way which would be consistent with push parsers and
> reasonably easy for the client code. I'm not making a formal proposal,
> here: it's just an alternative to consider.
Something I have been considering many times, indeed. But I'm afraid
of the performance penalty.
I'll see if I can explore that track. (But before, I have to really
be sure about the needed changes in IELR).
> Again, sorry for the delay in responding. I'll try to be attentive to my
> email, but as a result of the move my internet access will be very
> intermittent until Thursday or so.
I hope things are better now on your side :)
Cheers!
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: multistart: use YYSTYPE,
Akim Demaille <=