Re: GNU Bison: D language support

bison-patches
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: GNU Bison: D language support

From:	H. S. Teoh
Subject:	Re: GNU Bison: D language support
Date:	Fri, 8 Feb 2019 11:12:29 -0800
User-agent:	Mutt/1.10.1 (2018-07-13)
On Fri, Feb 08, 2019 at 06:10:28AM +0100, Akim Demaille wrote:
> [HS and I agreed to move to public lists now.  TS, bison-patches is
> the right place to discuss code changes.  Please, resend your answer
> to the predecessor of this message as a reply to this message.  TIA!]
> -----------------------
[...]

Here it is.


On Thu, Feb 07, 2019 at 07:10:12PM +0100, Akim Demaille wrote:
> Hi HS!
> 
> (Is this the proper way to call you?)

Yes, it will do.


> All the comments should be public on bison's lists.  Are you ok that I
> repost this message there?

Yes, let's take the discussion there.  Which list do we use for such
discussions? bison-patches seems to be primarily for discussing actual
code changes, rather discussion of this sort.


> > Le 7 févr. 2019 à 16:51, H. S. Teoh <address@hidden> a écrit :
[...]
> >> - Along a similar vein, I'm wondering if the generated parser ought
> >>  to be a class at all, or is the inheritability of the parser a key
> >>  Bison feature?  Also, are language-specific directives supported /
> >>  encouraged?  If so, it might be worthwhile to let the user choose
> >>  whether to use a struct/template API vs. an OO class-based API.
> 
> I guess I don't know what you call a class here (as you certainly
> know class and struct are roughly equivalent in C++), but I read
> your comment as "class should be reserved to participants of
> inheritance".

Yes, in C++ struct and class are basically the same thing except for
default access permissions.  However, in D, they are more distinguished:

- Classes are by-reference types, allocated on the heap by default
  (though there are facilities to emplace it on the stack in special
  circumstances), and come equipped with a vtable for OO inheritance and
  runtime polymorphism, and a monitor for synchronisation. They also
  inherit from the universal base class Object, which provide some
  possibly useful default methods and an object factory.

- Structs are, in Andrei Alexandrescu's words, "glorified ints",
  by-value types that primarily live on the stack and/or in registers
  (though they can be allocated on the heap if need be). They do not
  have vtables, and must be manually synchronised if needed.  While it's
  possible to achieve a crude sort of "compile-time inheritance" with
  the "alias this" feature, runtime inheritance is not supported by the
  language and must be manually implemented if desired (though in such
  cases one would typically just switch to using a class instead).


> Well, I made no efforts to have the C++ parser derives from some
> base class.  Yet, there are a couple of "virtual" in the API, but
> let's consider them historical artifacts.  I don't think it makes
> much sense to have a hierarchy of parsers.

Yeah, I didn't think so either.  Because of this, I'm inclined to have
Bison emit a parser struct rather than a class -- at least by default.
If it's not too onerous I suppose we could make it a user-configurable
option.  But I don't anticipate anyone clamoring for that, so perhaps we
should just stick with struct.


> >> - On a more high-level note, I'm wondering how flexible the API of
> >>   the parser can be.  The main thought behind this is that given
> >>   enough flexibility, we may be able to target, e.g., @nogc, @safe,
> >>   pure, etc..  With @safe probably a pretty important target, if
> >>   it's possible to do so.  While this depends of course on the
> >>   exact code the user puts into the .y file, a worthy goal is to
> >>   make the emitted D code @safe (pure, etc.) by default unless the
> >>   user writes address@hidden code in the .y file.
> 
> I cannot comment on this.  But the generated parser should aim at
> the least constrains.  So the generated code itself should not
> require a GC, IMHO.

Makes sense.  Using a struct instead of a class would help towards not
requiring a GC. :-)

Supporting @safe is certainly a worthwhile goal, though it does impose
certain restrictions: unions that involve pointer members, for example,
are verboten in @safe. There's the @trusted escape-hatch for such
occasions, though I personally don't feel comfortable with the idea of
@trusted code that's auto-generated, since the point of @trusted is for
a human reviewer to vet the code for memory safety where the compiler
cannot prove it, and that's incompatible with auto-generation.


> >> - How flexible can the lexer API be?  For example, currently
> >>  lexer.yyerror takes a string argument, which requires using
> >>  std.format in various places.  If permissible, I'd like to have
> >>  yyerror take a generic input range instead, so that we can avoid
> >>  the inherent memory allocation of std.format (e.g., if we wish to
> >>  target @nogc).
> 
> lever.yyerror?  yyerror is expected to be part of the parser,
> not the scanner.

OK, I may have been confused by the calc.y example, where CalcLexer
declares a yyerror() method.


> It's painful that the interface of yyerror has to be declared
> by the user in C and C++, but, again, that's rather an historical
> scar.  You should aim at a fixed signature.

Makes sense.  D does allow static introspection of function signatures,
so potentially one approach could be for the generated code to detect
the signature of yyerror and adapt accordingly.


> >> - Also, is it possible to use exceptions instead of yyerror()?  Or
> >>   would that deviate too far from Bison's design?
> 
> That would be a misunderstanding of the purpose of yyerror.
> This function is called when there's a syntax error, and the
> error message must be passed to the user.  The implementation
> of yyerror then decides whether to print to stderr, syslog it,
> open a GUI, or raise an exception, why not.  But still, that
> would be a waste, because the parser may be able to recover
> from error, and gather even more error messages, which maybe
> delivered eventually to the user after the parse was finished.

I see.  So it's basically a hook for user-defined code to handle errors
however the user sees fit.  Thanks for the clarification.


> >> - On a more general note, I'd like to make the parser/lexer APIs
> >>   range-based as much as possible, esp. when it comes to
> >>   string-handling.  But I'm just not sure how much the APIs are
> >>   expected to conform to the analogous C/C++/Java APIs.
> 
> Because in practice the maintenance falls on the shoulders of
> the Bison maintainers, we want to API to remains as alike as
> possible, without being unnatural to the host language.

Makes sense.

I was hoping for a Bison API more idiomatic to D, e.g., instead of
explicitly binding to a lexer object, the parser could simply receive an
input range of tokens (an input range in D is any type that supports the
iteration primitives .empty, .front, .popFront). This can default to
yylex, but the user would be able to pass any token source to the
parser, including pre-baked arrays of tokens, e.g. in a unittest to
ensure the parser handles certain specific token sequences correctly.

And if output routines like yyerror could be made into output ranges
instead, we could eliminate std.format from the dependencies --
std.format is rather heavyweight, and some D users dislike using it. And
furthermore, the format() function requires GC allocation, which would
be a no-go if we wanted to support @nogc.


> >> - I wonder if YYSemanticType could use std.variant somehow instead
> >>   of a raw union, which would probably force the parser to be
> >>   @system.
> 
> A union is the natural storage: the parser knows the type of
> the current value, it does not need variants that duplicate
> this knowledge of the current type.

Point taken.  That probably means we can't support @safe. (Although I
just checked -- std.variant also doesn't support @safe, so we're not
much better off that way either.)


> >> - Can Bison handle UTF-8 lexer/parser rules?  D uses UTF-8 by
> >>   default, and it would be nice to leverage this support instead of
> >>   manually iterating over bytes, as is done in a few places.
> 
> Bison does not care about your encoding, it sits on top of a
> stream of tokens, not a stream of characters.  Again, because
> of history, it accepts bytes as tokens-of-the-poor, but it should
> not learn to read UTF-8, that's not its business.

What I had in mind when I wrote that was yytnamerr_(), which appears to
be used for formatting error messages.


> >> - Some minor points that should be easy to fix:
[...]
> >>   - D does support the #line directive.  So these should be emitted
> >>     as they are in C/C++. (I noticed they currently only appear as
> >>     comments.)
> 
> Sure.  This could easily be your first contribution :)  It does not
> require paperwork.

OK, I'll take a look into this.


> > So you do not use Github's pull request system?  Or do you use that
> > in conjunction with the bison-patches mailing list?
> 
> GitHub is not Free Software, so "of course" we will never require it.
> And using the mailing lists is a long tradition at GNU.  So please,
> submit your patches there, for them to be discussed.  But feel free to
> fork my repo and submit PRs there, at least to get CI.
[...]

How would that work?  Just submit PRs to your github repo and get CI,
then post the patches and close the PRs?  Just wondering what the
currently accepted process is.


T

-- 
All problems are easy in retrospect.
[Prev in Thread]
Current Thread
[Next in Thread]
Re: GNU Bison: D language support, Akim Demaille, 2019/02/08
- Re: GNU Bison: D language support, H. S. Teoh <=
  - Re: GNU Bison: D language support, Akim Demaille, 2019/02/10
    - Re: GNU Bison: D language support, H. S. Teoh, 2019/02/11
    - Re: GNU Bison: D language support, Akim Demaille, 2019/02/12
Prev by Date: add LR(0) output
Next by Date: Re: report: clean up its format
Previous by thread: Re: GNU Bison: D language support
Next by thread: Re: GNU Bison: D language support
Index(es):
- Date
- Thread