bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GNU Bison: D language support


From: H. S. Teoh
Subject: Re: GNU Bison: D language support
Date: Mon, 11 Feb 2019 10:15:31 -0800
User-agent: Mutt/1.10.1 (2018-07-13)

On Sun, Feb 10, 2019 at 11:05:43AM +0100, Akim Demaille wrote:
[...]
> > Le 8 févr. 2019 à 20:12, H. S. Teoh <address@hidden> a écrit :
[...]
> > If it's not too onerous I suppose we could make it a
> > user-configurable option.  But I don't anticipate anyone clamoring
> > for that, so perhaps we should just stick with struct.
> 
> I also agree here.  One problem we face in Bison is that we have many
> options already, and writing the test suite is itself a challenge.
> And it does happen that some combination is not tested, and behaves
> incorrectly.
> 
> We should avoid offering too many options, unless it is quite clear
> that it's need to offer a specific behavior.  Likewise, the parser API
> should remain narrow IMHO.

I agree.


[...]
> > Supporting @safe is certainly a worthwhile goal, though it does
> > impose certain restrictions: unions that involve pointer members,
> > for example, are verboten in @safe. There's the @trusted
> > escape-hatch for such occasions, though I personally don't feel
> > comfortable with the idea of @trusted code that's auto-generated,
> > since the point of @trusted is for a human reviewer to vet the code
> > for memory safety where the compiler cannot prove it, and that's
> > incompatible with auto-generation.
> 
> There are tons of papers that prove the correctness of the algorithm
> implemented in Bison, so you should not feel worried about that.  Of
> course, proving an algorithm and proving an implementation  is not the
> same thing :)  Besides, the user is free to mess around in her
> actions.  So if it's possible, what is purely generated code (without
> bits from the user) should probably be trusted.  When it comes to user
> actions, maybe we need to be more cautious.  An %define variable such
> as api.parser.trusted maybe?  so that the user can declare herself if
> she wants to claim so.

I apologize for not defining the terms I use before using them.

In D, @safe has a very specific, and narrow meaning: it means memory
safety, i.e., no buffer overruns, memory corruption, or reading from an
invalid pointer (i.e., reading arbitrary memory).  It is restricted to
something that the compiler can statically verify, and does not extend
to safety in other, broader senses of the word.

Inside @safe code, operations that may compromise memory safety are not
allowed, e.g., pointer arithmetic, casting pointers between different
types, assigning integer values to pointers, etc., or, more pertinently,
reading a pointer value from a union where it may overlap with integer
values (since then you could assign an arbitrary integer value to the
union and read it out as a pointer and dereference that, thereby
bypassing the @safe pointer restrictions).

Any code that doesn't conform to the restrictions of @safe is marked by
default as @system. For obvious reasons, @safe code is not allowed to
call @system code.

@safe is, of course, very restrictive, and sometimes undesirable in a
systems programming language like D purports to be. So there is the
concept of @trusted, which is a backdoor that allows the programmer to
tell the compiler, "I know you cannot statically verify that what I did
here is @safe, but I give you my promise that it is in fact @safe".
Marking a function as @trusted then allows @safe to call what would
otherwise be @system code.  The idea here is that code marked @trusted
would be limited in scope and human-verified to be memory-safe, with the
assumption that whenever @trusted code is modified, appropriate code
review processes would be in place to verify the memory-correctness of
the new version of the code.

In order for this to be practical, though, @trusted can only
realistically be applied to a small number of small pieces of code -- if
@trusted is applied too liberally, then it becomes impractical to review
everything by hand, and we lose the guarantees that @safe is supposed to
provide.

This is why I am wary of marking an entire auto-generated parser as
@trusted.  The user has no practical way to verify the memory safety of
the entire parser every time Bison generates a new version of it from
the .y file.  Doing so would defeat the purpose of using Bison in the
first place -- if you had to manually review all the generated code, you
might as well have written the parser by hand yourself.  And even if you
trust Bison to generate correct code, if there's a bug in Bison that
somehow breaks memory safety then the @trusted label is unwarranted, and
the user program's @safe-ty is compromised without warning, and the user
has no recourse to fix the problem without debugging Bison himself (and
spending the time/effort to understand the Bison code in order to debug
it).


[...]
> >> It's painful that the interface of yyerror has to be declared by
> >> the user in C and C++, but, again, that's rather an historical
> >> scar.  You should aim at a fixed signature.
> > 
> > Makes sense.  D does allow static introspection of function
> > signatures, so potentially one approach could be for the generated
> > code to detect the signature of yyerror and adapt accordingly.
> 
> I'm not sure you need this.  (So far) Bison calls yyerror only with a
> single string: it assembles the error message before passing it to
> yyerror.  So, at least for a start, you could keep yyerror's interface
> simple: possibly location, then string.

What I meant was that the generated Bison parser could automatically
detect the signature of yyerror, so that if it supported a location,
Bison would pass the location, otherwise, just a string.  It's really
quite simple:

        if ( ... /* error detected */) {
                static if (is(yyerror(Location.init, "")))
                        yyerror(curr_location, error_msg);
                else static if (is(yyerror("")))
                        yyerror(error_msg);
                else static assert(0, "Unsupported yyerror signature");
        }

The is(...) construct basically tests if the given expression has a
valid type, which would only be true if the given function call compiles
successfully.  If the user declared yyerror with an incompatible
signature, the function call would (silently) fail with a compile error,
and is(...) would return false, in which case the next static condition
is tried, and so on.

This lets the user declare yyerror either way, and the generated parser
would call it with the correct arguments without needing additional
declarations.


[...]
> >> Because in practice the maintenance falls on the shoulders of
> >> the Bison maintainers, we want to API to remains as alike as
> >> possible, without being unnatural to the host language.
> > 
> > Makes sense.
> > 
> > I was hoping for a Bison API more idiomatic to D,
> 
> I'm not saying it should not be!  I agree it should be idiomatic to D.
> But when there are different roughly equivalent options, I'd like to
> stick to the one used in the other backends.

Understood.


> > e.g., instead of explicitly binding to a lexer object, the parser
> > could simply receive an input range of tokens (an input range in D
> > is any type that supports the iteration primitives .empty, .front,
> > .popFront). This can default to yylex, but the user would be able to
> > pass any token source to the parser, including pre-baked arrays of
> > tokens, e.g. in a unittest to ensure the parser handles certain
> > specific token sequences correctly.
> 
> Beware of the test suite...
> 
> I agree that what you suggest sounds good though (and I actually have
> to mock this in the test suite on top of the yylex interface).
> 
> Let's follow your path, and see where it goes.

OK.


[...]
> >> Bison does not care about your encoding, it sits on top of a
> >> stream of tokens, not a stream of characters.  Again, because
> >> of history, it accepts bytes as tokens-of-the-poor, but it should
> >> not learn to read UTF-8, that's not its business.
> > 
> > What I had in mind when I wrote that was yytnamerr_(), which appears
> > to be used for formatting error messages.
> 
> yytnamerr is an abomination.  We need to work on this (in all the
> langages) in the near future.  Don't focus on it too much right now,
> we will probably to something better.

Good to know.


> [GitHub vs bison-patches]
> > How would that work?  Just submit PRs to your github repo and get
> > CI, then post the patches and close the PRs?  Just wondering what
> > the currently accepted process is.
> 
> bison-patches is the proper place for the humans to discuss the patch
> until it is validated.  GitHub's PRs will provide you with a CI on
> travis-ci.org (we used to be on .com, I recently moved to .org where I
> can have five concurrent slaves instead of three), and will provide me
> with a easy means to import your work into master.
[...]

Ah, OK. I'll submit a Github PR then.


T

-- 
Those who don't understand D are condemned to reinvent it, poorly. --
Daniel N



reply via email to

[Prev in Thread] Current Thread [Next in Thread]