bug-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Please, support easy AST generation


From: Yijun . Yu
Subject: Re: Please, support easy AST generation
Date: Thu, 20 Dec 2018 16:43:27 +0000

Maybe my comment is rather late. Forgive me if it is unrelated to the current 
discussions.

I have been doing the AST generation from Bison for some time since the YAXX 
subproject.
There the initial aim was to output the parsing trees as XML and some people 
liked that.
But occasionally I received questions on how to do this for ASTs. At that time 
I was not
Bison maintainer so the approach I took was to do this as a change to the 
“yacc.c” template
and keep everything else the same.

The actions to decide whether a parsing tree node (or I call it element in the 
XML counterpart)
is worthy to keep or not. If they are not, I simply do no generate the tags, 
similar to what you
propose for conditionally prints.

The nice thing of this approach is it is lightweight in terms of maintenance 
effort, if all you
want to print the AST. What’s needed for the YAXX project was to create an XSLT 
spreadsheet
that reads in a “dictionary” of action names, then use the condition to filter 
out those tags, in
less than 20 lines of code.

But maybe supporting it natively on Bison generated template is better. Before 
deciding on this
effort, do you want to take a look at the yaxx branch below as a starting 
point. 

http://git.savannah.gnu.org/cgit/bison.git/log/?h=yaxx

Cheers — Yijun

> On 9 Dec 2018, at 06:47, Frank Heckenbach <address@hidden> wrote:
> 
> Askar Safin wrote:
> 
>> Hi. Often the only thing I want to do in Bison is just generate
>> AST and nothing else. Unfortunately, in this case code becomes
>> very repetitive. For example, this is Bison input I used to
>> generate AST for small JavaScript subset:
>> https://zerobin.net/?b9af68c9aa7a31a9#JB4wZseYTq9aKfZOwwZrDqBNCqlEZRj/+DM9bKdgtKU=
>> . Please, support some feature, which eliminates the need to write
>> such repetitive code. Such code can be easily generated from
>> grammar alone
> 
> Indeed, I had discussed something similar with Akim Demaille
> privately some time ago; he also mentioned it on this list a few
> weeks ago (I guess I'm one of the "some people" he refers to, but
> unfortunately I was too busy then to get involved):
> 
> : Having a means to ask Bison to generate actions given some form of
> : a generic pattern is a different topic.  It makes a lot of sense,
> : and some people have discussed about this on various occasions
> : here.  That would help to bind to some abstract factory for ASTs
> : for instance, without forcing a specific AST format.
> 
> So let me restate and expand upon my suggestion here, which is,
> of course, just a rough sketch. This refers mainly to the C++ output
> as it makes heavy use of templates and overloading. Maybe something
> similar can be done in Java etc., but I'll leave that to the
> respective experts. In C, I think this will only work in a less
> general way which requires more effort by the end-user.
> 
> Actually, this would not only mostly automatically generate actions
> to generate ASTs and the like, but also some normal actions in my
> current grammars that don't primarily build ASTs, but contain
> actions that build "mini sub-ASTs", i.e. structures that are later
> acted upon in other actions, e.g. parameter lists while parsing a
> function declaration or call. I guess such actions might be common
> in many grammars.
> 
> AFAIK, Bison's current behaviour applied to any rule can be summed
> up like this (I hope I'm not missing important details):
> 
> 1. If a user-specified action is given, output this action and
>   nothing else; otherwise:
> 
> 2. if $$ has no declared type, do nothing; otherwise:
> 
> 3. if there are no RHS symbols, default-initialize $$ (when Bison
>   knows how to, esp. when using variants); otherwise:
> 
> 4. output the default action "$$ = $1;" (possibly with auto-move),
>   warning if the declared types of $$ and types are different
>   (which may still result in a successful compilation if the types
>   are compatible, or a compiler-error down the line if not).
> 
>   As a side note, though I suggest to introduce this default action
>   into C++, I'm not sure if it's actually useful to apply it (in
>   any language) when there's more than one RHS argument (or at
>   least, more than ne typed one, more about this below). But if
>   that's required by POSIX, we'd have no choice, at least in C.
> 
> My suggestion is basically to insert another step (optional, of
> course) to auto-generate actions just based on the types involved
> before 3., or even before 2. -- the latter could be useful to
> generate actions that consume parsed data (e.g. to compile a
> function or to store a declaration in an interpreter) rather than
> build structures such as ASTs, e.g.:
> 
> 1.5: if an auto-action for the types of $$ and $n can be generated,
>     do so; otherwise: ...
> 
> So how can such an auto-action be specified?
> A rather general way may use a form similar to %printer, e.g.:
> 
>  %auto-action { build_foo ($$, $^); } <foo>;
> 
> Then any rule where $$ is of type foo and which has no explicit
> user-action would use build_foo automatically.
> 
>  %auto-action { build_default ($$, $^); } <*>;
> 
> This would apply to any type of $$ not mentioned explicitly.
> 
> I'm using a new special token "$^" here (the name is only exemplary,
> in analogy with make) to insert the RHS arguments as a
> comma-separated list (again with auto-move if enabled). More
> precisely, only those RHS arguments with a declared type, since
> (a) Bison wouldn't know how to insert arguments without declared
> types anyway (esp. with variants), and (b) automatically omitting
> them allows further simplification in a lot of common (IMHO) cases,
> e.g.:
> 
>  expr: '(' expr ')';
> 
> would then auto-generate:
> 
>  build_auto ($$, $2);
> 
> without having to mention $2 explicitly (assuming the '(' and ')'
> tokens have no type), or in a hypothetical language:
> 
>  decl: "function" id '(' parlist ')' attrs ';';
> 
> would auto-generate:
> 
>  build_auto ($$, $2, $4, $6);
> 
> discarding all the syntactic fluff automatically.
> 
> An even more general way would be to discriminate on the types of $$
> and $n, like this:
> 
>  %auto-action { build_foo ($$, $1, $2); } <foo, bar, baz>;
> 
> This would avoid "$^" because the number of RHS arguments is
> specified explicitly. However, (a) I fear this might be a lot of
> effort to implement (whereas the discrimination on $$ only can
> hopefully reuse the %printer infrastructure), (b) might actually
> require more effort by the user to implement more such directives,
> and (c) might mainly be of interest for C grammars.
> 
> In C++ (and perhaps Java etc.), OTOH, I'd rather let the compiler do
> the selection; I might only ever use:
> 
>  %auto-action { build_default ($$, $^); } <*>;
> 
> to defer it all to a single overloaded and templated function.
> 
> So then how could this function be implemented? (Note, the following
> is just a collection of ideas, not tested and probably with a number
> of more or less hairy issues, but just to convey the idea. I'm
> assuming auto-move throughout.)
> 
> First a very general version that constructs $$ from the arguments
> if possible. This already covers case 3. above (if T has a default
> constructor), as well as case 4. with the side note (if T has a copy
> or move constructor), but also actions that must just pack their
> arguments in some simple structure (e.g. std::pair, std::struct or a
> user-defined structure) for later use, or that build some object
> using any of its constructors.
> 
>  // probably need enable_if<is_constructible ...> here ...
>  template<typename T, typename... R>
>  void build_default (T& t, R&&... r)
>  {
>    t = T { std::forward<R> (r)... };
>  }
> 
> Another overload could automatically build smart pointers (here,
> unique_ptr, but likewise for shared_ptr or similar) when applicable:
> 
>  template<typename T, typename... R>
>  void build_default (std::unique_ptr<T>& t, R&&... r)
>  {
>    t = std::make_unique (std::forward<R> (r)...);
>  }
> 
> When building containers (e.g. parameter lists), the rules that
> build the start (i.e., a container with 0 or 1 elements, depending
> on the grammar) should already be covered by the general version
> above using the default constructor or single-element list
> initialization. For rules that extend containers, we could have:
> 
>  // maybe enable_if<is_container<T>> ...
>  template<typename T, typename... R>
>  void build_default (T& t, T&& c, R&&... r)
>  {
>    t = std::move (c);
>    t.emplace_back (std::forward<R> (r)...);
>  }
> 
> Such basic and exemplary overlads could be shipped with Bison,
> either in a generated header (optionally), or a static header
> distributed with Bison (though I know Bison doesn't install any
> headers so far, so this would be something new), or at least in the
> documentation, so users would have the tricky parts taken care of
> already.
> 
> Auto-generating most actions (esp. for AST builders) might then
> become as simple as adding one "<*>" action, and declaring all
> symbols with suitable types (which is good style anyway), so the
> overloads can apply automatically.
> 
> User code could add more application-specific overloads where
> needed. E.g. in some of my grammars I have a number of rules like
> this, one for each precedence group of operators:
> 
>  expr: expr or_op  expr { $$ = build_binary_expr ($1, $2, $3); };
>  expr: expr and_op expr { $$ = build_binary_expr ($1, $2, $3); };
> 
> Then I could instead write just once:
> 
>  void build_default (Exp& t, Exp&& a, Op&& b, Exp&& c)
>  {
>    t = build_binary_expr (std::move (a), std::move (b), std::move (c));
>  }
> 
> In any case, user code keeps all flexibility. If the general
> overloads are not wanted for some rule, one can add a more specific
> overload for the types involved, and if that won't do, one can just
> write an explicit action like before which will disable all
> auto-generation for this rule.
> 
> To cover case 2. above, we'd just need a special syntax for "no $$"
> ("void" won't do with the syntax above, since "void &" is not
> valid). E.g., just dropping $$ from the parameter list in this case
> (and not covering this case with "<*>"), so e.g. to auto-generate
> rules to print everything that's discarded due to lack of an action,
> something like:
> 
>  %auto-action { do_print ($^); } <>;
> 
> :  The thing is...  We need more hands on Bison.
> 
> I know, and unfortunately I'll still be busy for the next few weeks,
> but I'm optimistic I can find some time to work on Bison next year.
> 
> This would be a feature of interest (and immediate use) to me, so I
> think I could work on this; esp. the "build_default" implementations
> (which is basically normal, though heavily templated, C++
> programming), including testing them with my grammars.
> 
> For the work required in Bison proper, I might need some help,
> though I hope it might be as simple as adding one new directive like
> "%printer".
> 
> Cheers,
> Frank
> 


reply via email to

[Prev in Thread] Current Thread [Next in Thread]