[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Enhancement request: enabling Variant in C parsers

From: Akim Demaille
Subject: Re: Enhancement request: enabling Variant in C parsers
Date: Wed, 24 Oct 2018 07:29:11 +0200

Hi Victor,

We’ve made some progress in the last couple of months.
Bison 3.2 is not released yet, but it should happen within
a couple of weeks.

> Le 19 août 2018 à 22:11, Victor Khomenko <address@hidden> a écrit :
> Hi Akim,
>>> I did look at it - this example scared me off c++ parsers :-(
>> Really???  Then I failed.  Can you be more specific?
> Ok, I'll try to make some suggestions for the manual and bison. Please note 
> that much of what I say might be due to lack of understanding - I’m just a 
> regular bison user, and don't understand the skeletons, internals, etc.

So you are the right audience :)  Something would be wrong if
you had to be a Bison contributor to be a Bison user.

> The context is that I have 8 bison C-style pure parsers in my program, most 
> of them are very simple, but a couple are fairly complicated, including a glr 
> one.

This is already quite advanced!  Do you happen to have some of them
living in the same application/library?  Or each time they are

> 1. The C++ section in the bison manual starts with the description of 
> generated files: position.hh, location.hh, stack.hh, file.hh and file.cc. 
> That's 5 generated files, so at this point many prospective users would just 
> run away screaming. I’d say the first three files are completely unnecessary 
> and can be dumped into file.hh (or even .cc)

We made some progress: you can have from 1 to 3 files.
Obviously parser.cc must always exist.  parser.hh is on demand
(and that’s not new, it’s been the case since a long time),
bound to %defines.  location.hh is nice to have if for instance
you tag your ast with locations (I know you don’t, but that’s
not typical).  In which case it’s nice to have a means to reach
the location type without having to include the whole parser.hh.
You control this with api.location.file.

> - note that the contents could be in some bison::detail namespace, so that 
> the types are exactly the same in all parsers and the linker can merge the 
> functions if they are declared as inline but were not actually inlined. The 
> work-around for multiple parsers suggested in Sect. (making one 
> parser generate these includes in master/ and the other parsers use them) is 
> ugly and introduces unwanted include and build dependencies (e.g. there could 
> be two independent libraries within a project, each using a parser; this 
> technique would make them dependent both for includes and the built order).

There’s not much to share.  The location types _can_ be different,
for instance in the way you handle filenames.  And build systems
are all about dependencies, so it should not be too much of a
problem to get things right.

However, you are totally free to design your own location type
and to ask Bison to use it.

> I’d say file.hh can also be suppressed in simple cases, e.g. if one replaces 
> the generated class by a function (see below) - it's then possible to declare 
> the parser function at the point of use.

Yes, just don’t pass %defines.  Like in C.

> 2. int yylex (semantic type* yylval, location type* yylloc, type1 arg1, ...) 
> [Method on parser]
> In C++ one would usually pass the parameters by references rather than 
> pointers - also to avoid handling null-pointers in yylex.

That would be nicer, you are right, but backward compatibility
won’t let us fix that.  Besides, this interface is wrong and
should not be used.  You should use symbol constructors and
return complete symbols rather than hoping for the returned
type and the semantic value to be consistant.

Actually, it would be a nice feature to add to C parsers too.

> 4. calc++ is a rather complicated example, for warming up it would be better 
> to use something simple. Then calc++ would be good to illustrate various 
> intricacies like mutual #include dependencies and a flex scanner.

The section about C++ was meant to be read by people who already
know about the C interface.  That could have been a bad choice.  I
do not plan to duplicate the whole chapters of introduction, they
still should be read, but I agree we needed a simpler introduction.
So I added one (Section 10.1.1, A Simple C++ Example).

> 3. the destructor of calcxx_driver should not be virtual;

I don’t know what you are referring to here.  It is not.
It’s not even explicit.

> in fact, the automatically generated destructor should be ok, so no need to 
> declare it at all. The constructor is not necessary either - since C++11 one 
> can give the default initial values to data members:
>    std::map<std::string, int> variables={ {"one", 1}, {"two", 2} };
>    trace_scanning=false;
>    trace_parsing=false;

We could modernize the example, sure, but it’s C++98, so that
was not an option.  I’m ok with that move.

> Maybe some other functions within the class, like error(), can be defined at 
> the point of declaration.

Bison cannot know how you want the error to be reported to the
user: stderr?  Or a log for a daemon?  Or an alert modal window
for an application with a GUI? etc.

> 4. int calcxx_driver::parse (const std::string& f);
> This is confusing, as the parser also has the function with the same name, 
> but no parameters. Maybe call it parse_file?

The idea is that the driver is the exposed interface to the
rest of the program, and ‘parse’ seems right.  But I could
live with parse_file if you really think it makes a difference.

> 5. scan_begin() and scan_end() could be removed to simplify the interface, 
> and inlined into parse(…).
> 6. implementing calcxx_driver::scan_begin() and calcxx_driver::scan_end() in 
> the lexer file is really bad - I guess one can easily implement them in the 
> calcxx_driver.cpp file - this does not cause any problems with #includes.

I disagree.  Flex’s interface is painful, and I don’t want to
have to have it generate a header file.  So you should leave
Flex-related implementation details there.  Here, it’s ‘just’
‘yyin’ but in more ambitious projects, you will probably need
to use yypush_buffer_state, yy_push_state, yy_switch_to_buffer,
yypop_buffer_state, etc.

My point here was to show the pattern I ended up following to
be able to use Flex in C++ effectively.

> 7. it would be better to use C++ streams rather than C-style fopen/fclose 
> (note that error() does use streams).

Sure.  But then again, that’s Flex, not Bison.

I have put a tarball with the current state of the doc and of
the examples in here:


Please, let me know what you think about it.

Thanks again for your detailed report.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]