[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Patches to support python: 2/5: Changes to existing bison files to suppo
From: |
Dennis Heimbigner |
Subject: |
Patches to support python: 2/5: Changes to existing bison files to support python |
Date: |
Tue, 03 Sep 2013 16:17:59 -0600 |
User-agent: |
Thunderbird 2.0.0.24 (Windows/20100228) |
>From a54de3bf6958bbff1f19e3ceb8ea304bc4874a62 Mon Sep 17 00:00:00 2001
From: dmh <address@hidden>
Date: Tue, 3 Sep 2013 15:40:21 -0600
Subject: [PATCH 1/2] Add support for parsers using the python language. Part1:
changes to existing files
* NEWS: Mention Python
* bootstrap.conf: Add pythoncomp-script and pythonexec-script.
* configure.ac: Invoke gt_PYTHONCOMP and gt_PYTHONEXEC.
* data/local.mk: Add new files.
* doc/bison.texinfo: Add section on python Parsers and add
references to python in various places.
* src/getargs.c (valid_languages): Add Python.
* src/getargs.h (struct bison_language): Update size of string fields.
* src/parse-gram.y : Python requires no-lines.
* tests/local.mk: Add python.at.
* tests/atlocal.in: Add CONF_PYTHON and CONF_PYTHONC.
* tests/local.at: Add python to various macros.
* tests/testsuite.at: Include python.at
* tests/javapush.at: Modified to remove macro name conflicts wrt python
---
NEWS | 10 +
bootstrap.conf | 1 +
build-aux/.gitignore | 1 +
configure.ac | 2 +
data/local.mk | 1 +
doc/bison.texi | 636 +++++++++++++++++++++++++++++++++++++++++++++++++--
m4/.gitignore | 1 +
src/getargs.c | 5 +
src/getargs.h | 5 +-
src/parse-gram.y | 7 +-
tests/atlocal.in | 3 +
tests/javapush.at | 26 +--
tests/local.at | 58 ++++-
tests/local.mk | 1 +
tests/testsuite.at | 3 +
15 files changed, 710 insertions(+), 50 deletions(-)
diff --git a/NEWS b/NEWS
index 07bf5a9..e5d2f00 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,16 @@ GNU Bison NEWS
* Noteworthy changes in release ?.? (????-??-??) [?]
+** Python
+
+ Bison can now generate an LALR(1) parser in Python. The skeleton is
+ "data/lalr1.py".
+
+ See the new section "Python Parsers" in the Bison manual for details.
+
+ The current Python interface is experimental and may evolve. More user
+ feedback will help to stabilize it.
+ Contributed by Dennis Heimbigner
* Noteworthy changes in release 3.0 (2013-07-25) [stable]
diff --git a/bootstrap.conf b/bootstrap.conf
index c58470e..be53238 100644
--- a/bootstrap.conf
+++ b/bootstrap.conf
@@ -30,6 +30,7 @@ gnulib_modules='
obstack
obstack-printf
perror progname
+ pythonexec-script
quote quotearg
readme-release
realloc-posix
diff --git a/build-aux/.gitignore b/build-aux/.gitignore
index 2c8b6fe..b9cbeec 100644
--- a/build-aux/.gitignore
+++ b/build-aux/.gitignore
@@ -27,3 +27,4 @@
/warn-on-use.h
/ylwrap
/prefix-gnulib-mk
+/pythonexec.sh.in
diff --git a/configure.ac b/configure.ac
index f7319a1..140fb06 100644
--- a/configure.ac
+++ b/configure.ac
@@ -249,6 +249,8 @@ AC_SUBST([GCC])
gt_JAVACOMP([1.3], [1.4])
gt_JAVAEXEC
+gt_PYTHONEXEC
+
AC_CONFIG_FILES([Makefile
po/Makefile.in
doc/yacc.1])
diff --git a/data/local.mk b/data/local.mk
index b829052..65d9b4b 100644
--- a/data/local.mk
+++ b/data/local.mk
@@ -27,6 +27,7 @@ dist_pkgdata_DATA = \
data/java.m4 \
data/lalr1.cc \
data/lalr1.java \
+ data/lalr1.py \
data/location.cc \
data/stack.hh \
data/variant.hh \
diff --git a/doc/bison.texi b/doc/bison.texi
index 78d7d06..67db576 100644
--- a/doc/bison.texi
+++ b/doc/bison.texi
@@ -104,7 +104,7 @@ Reference sections:
messy for Bison to handle straightforwardly.
* Debugging:: Understanding or debugging Bison parsers.
* Invocation:: How to run Bison (to produce the parser
implementation).
-* Other Languages:: Creating C++ and Java parsers.
+* Other Languages:: Creating C++, Java, and Python parsers.
* FAQ:: Frequently Asked Questions
* Table of Symbols:: All the keywords of the Bison language are explained.
* Glossary:: Basic concepts are explained.
@@ -334,6 +334,7 @@ Parsers Written In Other Languages
* C++ Parsers:: The interface to generate C++ parser classes
* Java Parsers:: The interface to generate Java parser classes
+* Python Parsers:: The interface to generate Python parser classes
C++ Parsers
@@ -369,6 +370,20 @@ Java Parsers
* Java Push Parser Interface:: Instantiating and running the a push parser
* Java Differences:: Differences between C/C++ and Java Grammars
* Java Declarations Summary:: List of Bison declarations used with Java
+
+Python Parsers
+
+* Python Parser Structure:: Features/Bugs with python parsers
+* Python Bison Interface:: Asking for Python parser generation
+* Python Semantic Values:: %type and %token vs. Python
+* Python Location Values:: The position and location classes
+* Python Parser Interface:: Instantiating and running the parser
+* Python Error Reporting Interface: Reporting errors
+* Python Scanner Interface:: Specifying the scanner for the parser
+* Python Action Features:: Special features for use in actions
+* Python Push Parser Interface:: Instantiating and running the a push parser
+* Python Differences:: Grammar Differences: C/C++/Java vs Python
+* Python Declarations Summary:: List of Bison declarations used with Python
Frequently Asked Questions
@@ -408,8 +423,8 @@ Bison is upward compatible with Yacc: all properly-written
Yacc
grammars ought to work with Bison with no change. Anyone familiar
with Yacc should be able to use Bison with little trouble. You need
to be fluent in C or C++ programming in order to use Bison or to
-understand this manual. Java is also supported as an experimental
-feature.
+understand this manual. Java and Python are also supported on an
+experimental basis.
We begin with tutorial chapters that explain the basic concepts of
using Bison and show three explained examples, each building on the
@@ -5527,7 +5542,7 @@ are chosen as if the grammar file were named
@address@hidden
@deffn {Directive} %language "@var{language}"
Specify the programming language for the generated parser. Currently
-supported languages include C, C++, and Java.
+supported languages include C, C++, Java, and Python.
@var{language} is case-insensitive.
@end deffn
@@ -5758,7 +5773,7 @@ The parser namespace is @code{foo} and @code{yylex} is
referenced as
@deffn {Directive} {%define api.location.type} @address@hidden@}
@itemize @bullet
address@hidden Language(s): C++, Java
address@hidden Language(s): C++, Java, Python
@item Purpose: Define the location type.
@xref{User Defined Location Type}.
@@ -6203,6 +6218,10 @@ after the usual contents of the parser header file.
Thus, the
unqualified form replaces @address@hidden@address@hidden for most purposes.
For Java, the default location is inside the parser class.
+
+For Python, the default location is at the end of the parser file,
+but before the epilogue.
+
@end deffn
@deffn {Directive} %code @var{qualifier} @address@hidden@}
@@ -6227,7 +6246,7 @@ qualifiers produce an error. Some of the accepted
qualifiers are:
@findex %code requires
@itemize @bullet
address@hidden Language(s): C, C++
address@hidden Language(s): C, C++, Python
@item Purpose: This is the best place to write dependency code required for
@code{YYSTYPE} and @code{YYLTYPE}. In other words, it's the best place to
@@ -6239,13 +6258,14 @@ definitions, then it is also the best place. However
you should rather
@item Location(s): The parser header file and the parser implementation file
before the Bison-generated @code{YYSTYPE} and @code{YYLTYPE}
definitions.
+For python, this code is placed just after @code{%code imports}.
@end itemize
@item provides
@findex %code provides
@itemize @bullet
address@hidden Language(s): C, C++
address@hidden Language(s): C, C++, Python
@item Purpose: This is the best place to write additional definitions and
declarations that should be provided to other modules.
@@ -6253,13 +6273,15 @@ declarations that should be provided to other modules.
@item Location(s): The parser header file and the parser implementation
file after the Bison-generated @code{YYSTYPE}, @code{YYLTYPE}, and
token definitions.
+For python, this code is placed at the end of the parser file
+and before any plain @code{%code} directives.
@end itemize
@item top
@findex %code top
@itemize @bullet
address@hidden Language(s): C, C++
address@hidden Language(s): C, C++, Python
@item Purpose: The unqualified @code{%code} or @code{%code requires}
should usually be more appropriate than @code{%code top}. However,
@@ -6274,18 +6296,19 @@ parser implementation file. For example:
@end example
@item Location(s): Near the top of the parser implementation file.
+For python, this code is placed just before any @code{%code requires}
directive.
@end itemize
@item imports
@findex %code imports
@itemize @bullet
address@hidden Language(s): Java
address@hidden Language(s): Java, Python
address@hidden Purpose: This is the best place to write Java import directives.
address@hidden Purpose: This is the best place to write Java or Python import
directives.
@item Location(s): The parser Java file after any Java package directive and
-before any class definitions.
+before any class definitions. For Python, it is at the beginning of the file.
@end itemize
@end table
@@ -6294,7 +6317,6 @@ technically skeleton-dependent. Writers of non-standard
skeletons
however should choose their locations consistently with the behavior
of the standard Bison skeletons.
-
@node Multiple Parsers
@section Multiple Parsers in the Same Program
@@ -9618,8 +9640,8 @@ the listing file. Eventually you will arrive at the
place where
something undesirable happens, and you will see which parts of the
grammar are to blame.
-The parser implementation file is a C/C++/Java program and you can use
-debuggers on it, but it's not easy to interpret what it is doing. The
+The parser implementation file is a C/C++/Java/Python program and you can
+use debuggers on it, but it's not easy to interpret what it is doing. The
parser function is a finite-state machine interpreter, and aside from
the actions it executes the same code over and over. Only the values
of variables show where in the grammar it is working.
@@ -10194,7 +10216,8 @@ any conflicting @code{%define} that may be added to the
grammar file.
@itemx address@hidden
Specify the programming language for the generated parser, as if
@code{%language} was specified (@pxref{Decl Summary, , Bison Declaration
-Summary}). Currently supported languages include C, C++, and Java.
+Summary}). Currently supported languages include C, C++, Java,
+and Python.
@var{language} is case-insensitive.
@item --locations
@@ -10364,6 +10387,7 @@ int yyparse (void);
@menu
* C++ Parsers:: The interface to generate C++ parser classes
* Java Parsers:: The interface to generate Java parser classes
+* Python Parsers:: The interface to generate Python parser classes
@end menu
@node C++ Parsers
@@ -11534,9 +11558,6 @@ state of the parser is always local to an instance of
the parser class.
Therefore, all Java parsers are ``pure'', and the @code{%pure-parser}
and @code{%define api.pure} directives do nothing when used in Java.
-Push parsers are currently unsupported in Java and @code{%define
-api.push-pull} have no effect.
-
GLR parsers are currently unsupported in Java. Do not use the
@code{glr-parser} directive.
@@ -12177,6 +12198,569 @@ The exceptions thrown by user-supplied parser actions
and
@xref{Java Parser Interface}.
@end deffn
address@hidden Python Parsers
address@hidden Python Parsers
+
address@hidden
+* Python Parser Structure:: Features/Bugs with python parsers
+* Python Bison Interface:: Asking for Python parser generation
+* Python Semantic Values:: %type and %token vs. Python
+* Python Location Values:: The position and location classes
+* Python Parser Interface:: Instantiating and running the parsre
+* Python Error Reporting Interface: Reporting errors
+* Python Scanner Interface:: Specifying the scanner for the parser
+* Python Action Features:: Special features for use in actions
+* Python Push Parser Interface:: Instantiating and running the a push parser
+* Python Differences:: C/C++/Java Grammars versus Python Grammars
+* Python Declarations Summary:: List of Bison declarations used with Python
address@hidden menu
+
address@hidden Python Parser Structure
address@hidden Python Parser Structure
address@hidden - %language "Python"
+
+Python presents some significant challenges for bison.
+Bison has the notion of a C-like language built
+deeply into its code, and python violates a number
+of those notions.
+
+The most important difference is that line
+indentation has syntactic and semantic significance.
+This means a grammar file writer must be very
+careful about the indentation of inserted code.
+In practice, the writer may have to iterate on the
+indentation by looking at the bison produced python
+file in order to get it right.
+
+There are, however, some heuristics that can help.
address@hidden
address@hidden Never use tabs except insidestring constants.
address@hidden Assume that the code inside a @code{%code} or @code{%code
qualifier}
+is at indent level zero (0).
address@hidden Single line actions (e.g. @address@hidden@}}) should cause no
problems.
address@hidden Multiple line actions should assume an indentation level greater
+than four (4).
address@hidden itemize
+
+In python code, line breaks can also have syntactic and semantic
+significance. This should not be problem as long as your code
+is legal python.
+
+The python skeleton was derived from the Java skeleton, but
+there are many differences.
+
+Python treats a python file as defining a @code{module}
+with an associated scope. In effect, the module is the equivalent
+of the Java parser class. This means that most definitions in the
+python module (the file) are at the outer scope. This means that
+no inner class definitions are used. In particular,
+the @code{%token}s are defined as constants in the module and not,
+as in Java, in the Lexer class. Further, any lexer defined using
address@hidden lexer @address@hidden is installed in a top level class; it is
+not an inner class.
+
address@hidden Python Bison Interface
address@hidden Python Bison Interface
address@hidden - %language "Python"
+
+(The current Python interface is experimental and may evolve.
+More user feedback will help to stabilize it.)
+
+The Python parser skeletons are selected using the @code{%language "Python"}
+directive or the @option{-L python}/@option{--language=python} option.
+
address@hidden FIXME: Documented bug.
+When generating a Python parser, @code{bison @var{basename}.y} will
+create a single Python source file named @address@hidden
+containing the parser implementation. Using a grammar file without a
address@hidden suffix is currently broken. The basename of the parser
+implementation file can be changed by the @code{%file-prefix}
+directive or the @option{-p}/@option{--name-prefix} option. The
+entire parser implementation file name can be changed by the
address@hidden directive or the @option{-o}/@option{--output} option.
+The parser implementation file contains a single module for the parser.
+
+Like the Java parsers, Python parsers maintain the state of the
+parser local to an instance of the parser class.
+Therefore, all Python parsers are ``pure'', and the @code{%pure-parser}
+and @code{%define api.pure} directives do nothing when used in Python.
+
+GLR parsers are currently unsupported in Python. Do not use the
address@hidden directive.
+
+No header file can be generated for Python parsers. Do not use the
address@hidden directive or the @option{-d}/@option{--defines} options.
+
address@hidden FIXME: Possible code change.
+Currently, support for tracing is always compiled
+in. Thus the @samp{%define parse.trace} and @samp{%token-table}
+directives and the
address@hidden/@option{--debug} and @option{-k}/@option{--token-table}
+options have no effect. This may change in the future to eliminate
+unused code in the generated parser, so use @samp{%define parse.trace}
+explicitly
+if needed. Also, in the future the
address@hidden directive might enable a public interface to
+access the token names and codes.
+
address@hidden Python Semantic Values
address@hidden Python Semantic Values
address@hidden - No %union, %type/%token ignored
address@hidden - YYSTYPE ignored
address@hidden - Printer and destructor
+
+Python is a dynamically typed language, so
+there is no @code{%union} directive in Python parsers.
+The @code{%type} or @code{%token} directives may be used
+to help bison do type checking, but they are not
+reflected in the generated code.
+An important consequence is that
+type checking occurs at execution time.
+
address@hidden @example
address@hidden %type <Expression> expr assignment_expr term factor
address@hidden %type <Integer> number
address@hidden @end example
+
+The python semantic stack is declared to have members of any type.
+which means that the types you specify can be anything.
address@hidden To improve the type safety of the parser, you can declare the
common
address@hidden superclass of all the semantic values using the @samp{%define
api.value.type}
address@hidden directive. For example, after the following declaration:
+
address@hidden @example
address@hidden %define api.value.type @address@hidden
address@hidden @end example
+
address@hidden @noindent
address@hidden any @code{%type} or @code{%token} specifying a semantic type
which
address@hidden is not a subclass of ASTNode, will cause a compile-time error.
+
address@hidden @c FIXME: Documented bug.
address@hidden Types used in the directives may be qualified with a package
name.
address@hidden Primitive data types are accepted for Python version 1.5 or
later. Note
address@hidden that in this case the autoboxing feature of Python 1.5 will be
used.
address@hidden Generic types may not be used; this is due to a limitation in the
address@hidden implementation of Bison, and may change in future releases.
+
+Python parsers do not support @code{%destructor}. This may eventually
+change to use the @code{del} statement, but until then, and as with any other
+python program, the programmer has to be careful to avoid circular references.
+
+In addition, python parsers do not support @code{%printer}, as
address@hidden()} can be used to print the semantic values. This
+however may change (in a backwards-compatible way) in future versions.
+
address@hidden Python Location Values
address@hidden Python Location Values
address@hidden - %locations
address@hidden - class Position
address@hidden - class Location
+
+When the directive @code{%locations} is used, the Python parser
+supports location tracking, see @ref{Tracking Locations}. An
+auxiliary user-defined class defines a @dfn{position}, a single point
+in a file; Bison itself defines a class representing
+a @dfn{location}, a range composed of a pair of positions (possibly
+spanning several files).
+The name is @code{Location} by default, and may also be
+renamed using @code{%define api.location.type @address@hidden@}}.
+
+The Location class treats the position as a completely opaque value.
+By default, the class name is @code{Position}, but this can be changed
+with @code{%define api.position.type @address@hidden@}}. This class must
+be supplied by the user.
+
address@hidden {Location} {Position} begin
address@hidden {Location} {Position} end
+The first, inclusive, position of the range, and the first beyond.
address@hidden deftypeivar
+
address@hidden {Constructor} {Location} {} Location (Position @var{loc})
+Create a @code{Location} denoting an empty range located at a given point.
address@hidden deftypeop
+
address@hidden {Constructor} {Location} {} Location (Position @var{begin},
Position @var{end})
+Create a @code{Location} from the endpoints of the range.
address@hidden deftypeop
+
address@hidden {Location} {String} __str__ ()
+Prints the range represented by the location. For this to work
+properly, the position class should override the @code{__eq__} and
@code{__ne__}
+methods appropriately.
address@hidden deftypemethod
+
address@hidden Python Parser Interface
address@hidden Python Parser Interface
address@hidden - define parser_class_name
address@hidden - Ctor
address@hidden - parse, error, set_debug_level, debug_level, set_debug_stream,
address@hidden debug_stream.
address@hidden - Reporting errors
+
+The name of the generated parser class defaults to @code{YYParser}. The
address@hidden prefix may be changed using the @code{%name-prefix} directive
+or the @option{-p}/@option{--name-prefix} option. Alternatively, use
address@hidden parser_class_name @address@hidden@}} to give a custom name to
+the class.
+The superclass of the parser class can be specified with the
address@hidden extends} directive.
+
+The Python module name is, of course, defined by the output file name.
+
+The module defines a class, @code{Location}, that is used
+for location tracking (see @ref{Python Location Values}.
+Other than this class, and the members described in the
+interface below, all the other members and fields are preceded with a
address@hidden or @code{YY} prefix to avoid clashes with user code.
+
+The arguments to the parser class can be extended using the @code{%parse-param}
+directive. Each occurrence of the directive will add a field
+to the parser class's @code{__init__} function.
+
+The interface of the parser class is detailed below.
+
address@hidden {Constructor} {YYParser} {} def __init__ (@var{self},
address@hidden, @var{parse_param}, @dots{})
+Build a new parser object.
+Note that python allows only a single class constructor.
+The defined named parameters are as follows.
address@hidden @bullet
address@hidden self
+This is a python cliche argument required for all class
+methods.
address@hidden @bullet
address@hidden yylexer
+Specify the yylex object. If @code{%code address@hidden@}}
+is specified, then this argument will be replaced with
+any specifications of @code{%lex-param @address@hidden
address@hidden parse_param
+Specify the parse parameter(s) (if any).
address@hidden itemize
+There are no other parameters, unless @code{%param}s and/or
@code{%parse-param}s
+and/or @code{%lex-params} are used.
+
+Use @code{%code init} for code added to the start of the constructor
+body. This is especially useful to initialize superclasses.
address@hidden itemize
address@hidden deftypemethod
+
address@hidden {YYParser} {bool} parse ()
+Run the syntactic analysis, and return @code{True} on success,
address@hidden otherwise.
address@hidden deftypemethod
+
address@hidden {YYParser} {bool} getErrorVerbose ()
address@hidden {YYParser} {void} setErrorVerbose (@var{verbose})
+Get or set the option to produce verbose error messages. These are only
+available with @samp{%define parse.error verbose}, which also turns on
+verbose error messages.
address@hidden deftypemethod
+
address@hidden {YYParser} {void} yyerror (@var{msg} address@hidden,location}])
+Print an error message using the @code{yyerror} method defined
+in the Lexer instance given to the parser.
+The second argument is only defined if the @code{%locations}
+directive is specified and specifies a location for the error.
+It may be of (dynamic) type @code{Location}.
address@hidden deftypemethod
+
address@hidden {YYParser} {bool} recovering ()
+During the syntactic analysis, return @code{True} if recovering
+from a syntax error.
address@hidden Recovery}.
address@hidden deftypemethod
+
address@hidden {YYParser} {File} getDebugStream ()
address@hidden {YYParser} {void} setDebugStream (@var{file})
+Get or set the stream used for tracing the parsing. It defaults to
address@hidden
address@hidden deftypemethod
+
address@hidden {YYParser} {int} getDebugLevel ()
address@hidden {YYParser} {void} setDebugLevel (@var{l})
+Get or set the tracing level. Currently its value is either 0, no trace,
+or nonzero, full tracing.
address@hidden deftypemethod
+
address@hidden {Constant} {YYParser} {String} {bisonVersion}
address@hidden {Constant} {YYParser} {String} {bisonSkeleton}
+Identify the Bison version and skeleton used to generate this parser.
address@hidden deftypecv
+
address@hidden Reporting errors
address@hidden Reporting errors
address@hidden - yyerror
+
+The YYParser class must be passed an instance of @code{class Lexer}
+to use to report error messages.
+
address@hidden {Function} {void} yyerror (address@hidden,address@hidden)
+This function is defined by the user to emit an error message.
+The @var{location} parameter is omitted if location tracking
+is not active.
address@hidden deftypefn
+
address@hidden Python Scanner Interface
address@hidden Python Scanner Interface
address@hidden - %lex-param
address@hidden - Lexer interface
+
+Python lexer handling is more like C/C++ than like Java.
+This means, assuming pull parsing, that the parser class
+must always be passed a lexer function that it can invoke to
+obtain tokens and associated values and locations.
+
+As previously noted, the constants for the user-defined token names
+and the predefined @code{EOF} token are defined at the @code{module}
+level.
+
address@hidden {Function} {int, Object} yylex ()
+Python allows the return of multiple values, so a call to @code{yylex}
+returns a two element tuple containing
+first the token (an integer) and second the lval (an arbitrary python value).
+As with the Java Lexer class, and only if location tracking is enabled,
+position information is extracted using the methods @code{getStartPos}
+and @code{getEndPos}.
address@hidden deftypefn
+
+WARNING: the parser code is expecting the token to be an
+integer. Unlike Java and C, single character constants are not
+automatically cast to integers, so the lexer must do that if
+necessary. So, for example, returning 'a' is not the same as
+returning 97.
+
address@hidden Python Action Features
address@hidden Special Features for Use in Python Actions
+
+The following special constructs are available
+for use in Python Actions.
+Other analogous C or Java action features are currently unavailable for Python.
+
address@hidden address@hidden
+The semantic value for the @var{n}th component of the current rule.
+This may not be assigned to.
address@hidden Semantic Values}.
address@hidden defvar
+
address@hidden $$
+The semantic value for the grouping made by the current rule.
address@hidden Semantic Values}.
address@hidden defvar
+
address@hidden @@@var{n}
+The location information of the @var{n}th component of the current rule.
+This may not be assigned to.
address@hidden Location Values}.
address@hidden defvar
+
address@hidden @@$
+The location information of the grouping made by the current rule.
address@hidden Location Values}.
address@hidden defvar
+
address@hidden {Statement} return YYABORT
+Return immediately from the parser, indicating failure.
address@hidden Parser Interface}.
address@hidden deftypefn
+
address@hidden {Statement} return YYACCEPT
+Return immediately from the parser, indicating success.
address@hidden Parser Interface}.
address@hidden deftypefn
+
address@hidden {Statement} return YYERROR
+Start error recovery (without printing an error message).
address@hidden Recovery}.
address@hidden deftypefn
+
address@hidden Python Push Parser Interface
address@hidden Python Push Parser Interface
address@hidden - define push_parse
address@hidden %define api.push-pull
+
+(The current push parsing interface is experimental and may evolve. More
+user feedback will help to stabilize it.)
+
+Normally, Bison generates a pull parser for Python.
+The following Bison declaration says that you want the parser to be a push
+parser (@pxref{%define Summary,,api.push-pull}):
+
address@hidden
+%define api.push-pull push
address@hidden example
+
+Most of the discussion about the Python pull Parser Interface,
+(@pxref{Python Parser Interface}) applies to the push parser interface as well.
+
+When generating a push parser, the method @code{push_parse} is created with
+the following signature (depending on if locations are enabled).
+
address@hidden {YYParser} {void} push_parse (@var{token}, @var{lval} [,
@var{location}])
+The @var{token} is an integer matching one of the defined token constants.
+The @var{lval} is the associated value; use @code{None} if there is no value.
+The parameter @var{location} will be defined only if location tracking
+is enabled.
address@hidden deftypemethod
+
+The primary difference with respect to a pull parser is that the parser
+method @code{push_parse} is invoked repeatedly to parse each token. This
+function is available if either the "%define api.push-pull push" or
+"%define api.push-pull both" declaration is used
+(@pxref{%define Summary,,api.push-pull}).
+
+The value returned by the @code{push_parse} method is one of the following
+three constants: @code{YYABORT}, @code{YYACCEPT}, or @code{YYPUSH_MORE}.
+This new value, @code{YYPUSH_MORE}, may be returned if
+more input is required to finish parsing the grammar.
+
+If api.push-pull is declared as @code{both}, then the generated parser class
+will also implement the @code{parse} method. This method's body is a loop
+that repeatedly invokes the scanner and then passes the values obtained from
+the scanner to the @code{push_parse} method.
+
+As with the Java push-parser, there is one additional complication.
+Technically, the push parser does not need to know about the scanner
+(i.e. an object implementing the
address@hidden interface), but it does need access to the
address@hidden method. Currently, the @code{yyerror} method is defined in
+the @code{YYParser.Lexer} interface. Hence, an implementation of that
+interface is still required in order to provide an implementation of
address@hidden The current approach (and subject to change) is to require
+the @code{YYParser} constructor to be given an object implementing the
address@hidden interface. This object need only implement the
address@hidden method; the other methods can be stubbed since they will
+never be invoked. The simplest way to do this is to add a trivial scanner
+implementation to your grammar file using whatever implementation of
address@hidden is desired. The following code sample shows a simple way to
+accomplish this.
+
address@hidden
+%code lexer
address@hidden
+ def yylex () : return (0,None)
+ def yyerror (msg) : syst.stderr.write(msg+'\n')
address@hidden
address@hidden example
+
address@hidden Python Differences
address@hidden Differences between C/C++/Java and Python Grammars
+
+The different structure of the Python language forces several differences
+between grammars for other languages, and grammars designed for Python
parsers. This
+section summarizes some of these differences.
+
address@hidden
address@hidden
+Python lacks a preprocessor, so the @code{YYERROR}, @code{YYACCEPT},
address@hidden symbols (@pxref{Table of Symbols}) cannot obviously be
+macros. Instead, they should be preceded by @code{return} when they
+appear in an action. The actual definition of these symbols is
+opaque to the Bison grammar, and it might change in the future. The
+only meaningful operation that you can do, is to return them.
address@hidden Action Features}.
+
+Note that of these three symbols, only @code{YYACCEPT} and
address@hidden will cause a return from the @code{yyparse}
address@hidden parsers include the actions in a separate
+method than @code{yyparse} in order to have an intuitive syntax that
+corresponds to these C macros.}.
+
address@hidden
+Python is dynamically typed, so @code{%union}, @code{%type},
+and angle brackets on @code{%token}, @code{type},
address@hidden@var{n}} and @code{$$} have no consequences for the
+generated code. They may still be useful to help bison to
+do some type checking. See @ref{Python Semantic Values} and
address@hidden Action Features}.
+
address@hidden
+Python does not (yet) contain a @code{switch} statement.
+Instead, a sequence of @code{if ... elif...else} statements
+is used to simulate a @code{switch}. This can have significant
+performance consequences for large grammars with many states.
+It should be noted that a dictionary mapping states to
address@hidden statments does not work because the action code
+modified variables outside the action.
+
address@hidden
+Python supports exceptions, but there is no way to specify the exceptions
+raised by a function. So, any exception related directive is ignored.
+
address@hidden
+Some prologue declarations have a different meaning than in C/C++ code
+and are more similar to Java code. The placement specified below
+is subject to change.
+
address@hidden @asis
address@hidden %code imports
+blocks are placed at the beginning of the Python source code. They may
+include copyright notices.
+
address@hidden unqualified @code{%code}
+blocks are placed at the end of the parser file, before any
+epilogue code.
+
address@hidden %code requires
+blocks are placed just after any @code{%code imports} blocks.
address@hidden %code provides
+blocks are placed preceding any unqualified @code{%code} blocks.
address@hidden %code top
+blocks are placed preceding any @code{%code requires} blocks.
address@hidden table
+
+Other @code{%code} blocks are not supported in Python parsers.
+In particular, @address@hidden @dots{} address@hidden blocks should not be used
+and may give an error in future versions of Bison.
+
+The epilogue has the same meaning as in C/C++/Java code and it can
+be used to define other classes used by the parser.
+The epilogue code is the last code in the produced python parser file.
address@hidden itemize
+
address@hidden Python Declarations Summary
address@hidden Python Declarations Summary
+
+The following declarations are supported for python and have the
+same meaning as in, for example, Java.
address@hidden
address@hidden %name-prefix "@var{prefix}"
address@hidden %parse-param @address@hidden @address@hidden
address@hidden %token <@var{type}> @var{token} @dots{}
address@hidden %type <@var{type}> @var{nonterminal} @dots{}
address@hidden %define api.location.type @address@hidden@}
address@hidden %define parser_class_name @address@hidden@}
address@hidden %define api.position.type @address@hidden@}
address@hidden itemize
+
+The following declarations are supported for python
+but are slightly different.
+
address@hidden {Directive} {%language "Python"}
+Generate a Python class for the parser.
address@hidden deffn
+
address@hidden {Directive} %code @{ @var{code} @dots{} @}
+Described previously.
address@hidden deffn
+
address@hidden {Directive} {%code imports} @{ @var{code} @dots{} @}
+Described previously.
address@hidden deffn
+
address@hidden {Directive} {%code init} @{ @var{code} @dots{} @}
+Code inserted at the beginning of the parser constructor body.
address@hidden deffn
+
address@hidden {Directive} %% @var{code} @dots{}
+Described previously.
address@hidden deffn
+
address@hidden {Directive} {%define extends} @address@hidden@}
+The superclass of the parser class. Default is none (same as
+object).
address@hidden deffn
+
@c ================================================= FAQ
@@ -12550,8 +13134,8 @@ Will Bison ever have C++ and Java support? How about
@var{insert your
favorite language here}?
@end quotation
-C++ and Java support is there now, and is documented. We'd love to add other
-languages; contributions are welcome.
+C++, Java, and Python support is there now, and is documented.
+We'd love to add other languages; contributions are welcome.
@node Beta Testing
@section Beta Testing
@@ -12935,8 +13519,10 @@ making @code{yyparse} return 1 immediately. The error
reporting
function @code{yyerror} is not called. @xref{Parser Function, ,The
Parser Function @code{yyparse}}.
-For Java parsers, this functionality is invoked using @code{return YYABORT;}
+For Java and Python parsers,
+this functionality is invoked using @code{return YYABORT;}
instead.
+
@end deffn
@deffn {Macro} YYACCEPT
@@ -12944,8 +13530,8 @@ Macro to pretend that a complete utterance of the
language has been
read, by making @code{yyparse} return 0 immediately.
@xref{Parser Function, ,The Parser Function @code{yyparse}}.
-For Java parsers, this functionality is invoked using @code{return YYACCEPT;}
-instead.
+For Java and Python parsers, this functionality is invoked using
address@hidden YYACCEPT;} instead.
@end deffn
@deffn {Macro} YYBACKUP
@@ -12988,8 +13574,8 @@ does not call @code{yyerror}, and does not print any
message. If you
want to print an error message, call @code{yyerror} explicitly before
the @samp{YYERROR;} statement. @xref{Error Recovery}.
-For Java parsers, this functionality is invoked using @code{return YYERROR;}
-instead.
+For Java and Python parsers, this functionality is invoked using
address@hidden YYERROR;} instead.
@end deffn
@deffn {Function} yyerror
diff --git a/m4/.gitignore b/m4/.gitignore
index 5b7d363..f584e66 100644
--- a/m4/.gitignore
+++ b/m4/.gitignore
@@ -180,3 +180,4 @@
/obstack-printf.m4
/extern-inline.m4
/non-recursive-gnulib-prefix-hack.m4
+/pythonexec.m4
diff --git a/src/getargs.c b/src/getargs.c
index 1fd9cfa..26e69da 100644
--- a/src/getargs.c
+++ b/src/getargs.c
@@ -54,6 +54,7 @@ static struct bison_language const valid_languages[] = {
{ "c", "c-skel.m4", ".c", ".h", true },
{ "c++", "c++-skel.m4", ".cc", ".hh", true },
{ "java", "java-skel.m4", ".java", ".java", false },
+ { "python", "python-skel.m4", ".py", ".py", false },
{ "", "", "", "", false }
};
@@ -727,6 +728,10 @@ getargs (int argc, char *argv[])
usage (EXIT_FAILURE);
}
+ /* Python requires --no-lines */
+ if (c_strcasecmp ("python", language->language) == 0)
+ no_lines_flag = true;
+
current_file = grammar_file = uniqstr_new (argv[optind]);
MUSCLE_INSERT_C_STRING ("file_name", grammar_file);
}
diff --git a/src/getargs.h b/src/getargs.h
index 5d4dfb0..bf33f51 100644
--- a/src/getargs.h
+++ b/src/getargs.h
@@ -56,10 +56,11 @@ extern bool nondeterministic_parser;
/* --language. */
+/* Constants should reference the longest ones across all supported language */
struct bison_language
{
- char language[sizeof "Java"];
- char skeleton[sizeof "java-skel.m4"];
+ char language[sizeof "Python"];
+ char skeleton[sizeof "python-skel.m4"];
char src_extension[sizeof ".java"];
char header_extension[sizeof ".java"];
bool add_tab;
diff --git a/src/parse-gram.y b/src/parse-gram.y
index 1ec4b4d..e29331e 100644
--- a/src/parse-gram.y
+++ b/src/parse-gram.y
@@ -34,6 +34,7 @@
#include "system.h"
#include "c-ctype.h"
+ #include "c-strcase.h"
#include "complain.h"
#include "conflicts.h"
#include "files.h"
@@ -316,7 +317,11 @@ prologue_declaration:
muscle_code_grow ("initial_action", translate_code ($2, @2, false), @2);
code_scanner_last_string_free ();
}
-| "%language" STRING { language_argmatch ($2, grammar_prio, @1); }
+| "%language" STRING { language_argmatch ($2, grammar_prio, @1);
+ /* Python requires --no-lines */
+ if (c_strcasecmp ("python",
language->language) == 0)
+ no_lines_flag = true;
+ }
| "%name-prefix" STRING { spec_name_prefix = $2; }
| "%no-lines" { no_lines_flag = true; }
| "%nondeterministic-parser" { nondeterministic_parser = true; }
diff --git a/tests/atlocal.in b/tests/atlocal.in
index 19ecfd7..4a472c1 100644
--- a/tests/atlocal.in
+++ b/tests/atlocal.in
@@ -114,6 +114,9 @@ CONF_JAVAC='@CONF_JAVAC@'
# Empty if no Java VM was found
CONF_JAVA='@CONF_JAVA@'
+# Empty if no python was found
+CONF_PYTHON='@CONF_PYTHON@'
+
# We need egrep and perl.
: ${EGREP='@EGREP@'}
: ${PERL='@PERL@'}
diff --git a/tests/javapush.at b/tests/javapush.at
index 2f71053..fd55bef 100644
--- a/tests/javapush.at
+++ b/tests/javapush.at
@@ -43,7 +43,7 @@ AT_BANNER([[Java Push Parsing Tests]])
# Define a single copy of the trivial parser grammar.
# This is missing main(), so two versions
# are instantiated with different main() procedures.
-m4_define([AT_TRIVIAL_GRAMMAR],[
+m4_define([AT_JAVA_TRIVIAL_GRAMMAR],[
%define parser_class_name {YYParser}
%error-verbose
@@ -61,7 +61,7 @@ start: 'a' 'b' 'c' ;
# Define comon code across to be includede in
# class Main for the trivial parser tests.
-m4_define([AT_TRIVIAL_COMMON],[
+m4_define([AT_JAVA_TRIVIAL_COMMON],[
static class YYerror implements YYParser.Lexer
{
public Object getLVal() {return null;}
@@ -96,13 +96,13 @@ m4_define([AT_TRIVIAL_COMMON],[
}
])
-m4_define([AT_TRIVIAL_PARSER],[
- AT_TRIVIAL_GRAMMAR
+m4_define([AT_JAVA_TRIVIAL_PARSER],[
+ AT_JAVA_TRIVIAL_GRAMMAR
public class Main
{
- AT_TRIVIAL_COMMON
+ AT_JAVA_TRIVIAL_COMMON
static public void main (String[[]] argv)
throws IOException
@@ -133,13 +133,13 @@ m4_define([AT_TRIVIAL_PARSER],[
}
])
-m4_define([AT_TRIVIAL_PARSER_INITIAL_ACTION],[
- AT_TRIVIAL_GRAMMAR
+m4_define([AT_JAVA_TRIVIAL_PARSER_INITIAL_ACTION],[
+ AT_JAVA_TRIVIAL_GRAMMAR
public class Main
{
- AT_TRIVIAL_COMMON
+ AT_JAVA_TRIVIAL_COMMON
static public void main (String[[]] argv)
throws IOException
@@ -170,7 +170,7 @@ AT_BISON_OPTION_PUSHDEFS
AT_DATA([[input.y]],
[[%language "Java"
-]AT_TRIVIAL_PARSER[
+]AT_JAVA_TRIVIAL_PARSER[
]])
# Verify that the proper procedure(s) are generated for each case.
@@ -216,7 +216,7 @@ AT_DATA([[input.y]],[[%language "Java"
%initial-action {
System.err.println("Initial action invoked");
}
-]AT_TRIVIAL_PARSER_INITIAL_ACTION[
+]AT_JAVA_TRIVIAL_PARSER_INITIAL_ACTION[
]])
AT_BISON_OPTION_POPDEFS
AT_BISON_CHECK([[-Dapi.push-pull=push -o Main.java input.y]])
@@ -232,7 +232,7 @@ AT_CHECK_JAVA_GREP(
AT_CLEANUP
# Define a single copy of the Calculator grammar.
-m4_define([AT_CALC_BODY],[
+m4_define([AT_JAVA_CALC_BODY],[
%code imports {
import java.io.*;
}
@@ -389,7 +389,7 @@ public static void main (String[] argv)
}
-]AT_CALC_BODY[
+]AT_JAVA_CALC_BODY[
]])
@@ -691,7 +691,7 @@ public static void main (String[] argv)
}
}
-]AT_CALC_BODY[
+]AT_JAVA_CALC_BODY[
]])
diff --git a/tests/local.at b/tests/local.at
index 7948faa..96cdad7 100644
--- a/tests/local.at
+++ b/tests/local.at
@@ -147,11 +147,14 @@ m4_pushdef([AT_SKEL_CC_IF],
[m4_bmatch([$3], [%language "[Cc]\+\+"\|%skeleton "[a-z0-9]+\.cc"], [$1],
[$2])])
m4_pushdef([AT_SKEL_JAVA_IF],
[m4_bmatch([$3], [%language "[Jj][Aa][Vv][Aa]"\|%skeleton "[a-z0-9]+\.java"],
[$1], [$2])])
-# The target language: "c", "c++", or "java".
+m4_pushdef([AT_SKEL_PYTHON_IF],
+[m4_bmatch([$3], [%language "[Pp][Yy][Tt][Hh][Oo][Nn]"\|%skeleton
"[a-z0-9]+\.py"],[$1], [$2])])
+# The target language: "c", "c++", or "java" or "python".
m4_pushdef([AT_LANG],
-[AT_SKEL_JAVA_IF([java],
- [AT_SKEL_CC_IF([c++],
- [c])])])
+ [AT_SKEL_PYTHON_IF([python],
+ [AT_SKEL_JAVA_IF([java],
+ [AT_SKEL_CC_IF([c++],
+ [c])])])])
m4_pushdef([AT_GLR_IF],
[m4_bmatch([$3], [%glr-parser\|%skeleton "glr\..*"], [$1], [$2])])
m4_pushdef([AT_LALR1_CC_IF],
@@ -180,7 +183,7 @@ m4_pushdef([AT_PURE_IF],
[m4_bmatch([$3], [%define *api\.pure\|%pure-parser],
[m4_bmatch([$3], [%define *api\.pure *false], [$2], [$1])],
[$2])])
-# AT_NAME_PREFIX: also consider api.namespace.
+# AT_NAME_PREFIX: also consider api.namespace
m4_pushdef([AT_NAME_PREFIX],
[m4_bmatch([$3], [\(%define api\.\(namespace\|prefix\)\|%name-prefix\) .*],
[m4_bregexp([$3],
@@ -237,7 +240,6 @@ m4_pushdef([AT_YYLTYPE],
[AT_SKEL_CC_IF([AT_NAME_PREFIX[::parser::location_type]],
[AT_API_PREFIX[LTYPE]])])
-
AT_PURE_LEX_IF(
[m4_pushdef([AT_LOC], [(*llocp)])
m4_pushdef([AT_VAL], [(*lvalp)])
@@ -306,6 +308,7 @@ m4_popdef([AT_GLR_IF])
m4_popdef([AT_SKEL_CC_IF])
m4_popdef([AT_LANG])
m4_popdef([AT_SKEL_JAVA_IF])
+m4_popdef([AT_SKEL_PYTHON_IF])
m4_popdef([AT_GLR_CC_IF])
m4_popdef([AT_LALR1_CC_IF])
m4_popdef([AT_DEFINES_IF])
@@ -559,8 +562,6 @@ main (int argc, char const* argv[])
return p.parse ();
}]])
-
-
## ------ ##
## Java. ##
## ------ ##
@@ -591,6 +592,23 @@ m4_define([AT_MAIN_DEFINE(java)],
}]])
+## ------ ##
+## Python ##
+## ------ ##
+
+m4_define([AT_YYERROR_DEFINE(python)],
+[[def yyerror (s, location=None) :
+ if (location == None) :
+ sys.stderr.write (s+"\n")
+ else :
+ sys.stderr.write (`location` + ": " + s + "\n")
+]])
+
+m4_define([AT_MAIN_DEFINE(python)],
+[[def main (args)
+ YYParser p = YYParser ();
+ p.parse ();
+]])
## --------------- ##
## Running Bison. ##
@@ -786,6 +804,16 @@ AT_SKIP_IF([[test -z "$CONF_JAVA"]])
AT_CHECK([[$SHELL ../../../javacomp.sh ]$1],
[[0]], [ignore], [ignore])])
+# AT_PYTHON_COMPILE(SOURCES)
+# ------------------------
+# Compile SOURCES into Python files. Skip the test if python
+# is not installed.
+m4_define([AT_PYTHON_COMPILE],
+[AT_KEYWORDS(python)
+AT_SKIP_IF([[test -z "$CONF_PYTHONC"]])
+AT_SKIP_IF([[test -z "$CONF_PYTHON"]])
+AT_CHECK([[$SHELL ../../../pythoncomp.sh ]$1],
+ [[0]], [ignore], [ignore])])
# AT_LANG_COMPILE(OUTPUT, [SOURCES = OUTPUT.c]
# --------------------------------------------
@@ -798,6 +826,7 @@ m4_define([AT_LANG_COMPILE], [AT_LANG_DISPATCH([$0], $@)])
m4_define([AT_LANG_COMPILE(c)], [AT_COMPILE([$1], [$2])])
m4_define([AT_LANG_COMPILE(c++)], [AT_COMPILE_CXX([$1], [$2])])
m4_define([AT_LANG_COMPILE(java)], [AT_JAVA_COMPILE([$1.java], [$2])])
+m4_define([AT_LANG_COMPILE(compile)], [AT_JAVA_COMPILE([$1.py], [$2])])
# AT_FULL_COMPILE(OUTPUT, [OTHER1], [OTHER2])
@@ -832,7 +861,13 @@ m4_define([AT_FULL_COMPILE(java)],
m4_ifval($2, [[$1-$2.java]]),
m4_ifval($3, [[$1-$3.java]])))])
-
+m4_define([AT_FULL_COMPILE(python)],
+[AT_BISON_CHECK([-o $1.py $1.y])
+ AT_LANG_COMPILE([$1],
+ m4_join([ ],
+ [$1.py],
+ m4_ifval($2, [[$1-$2.py]]),
+ m4_ifval($3, [[$1-$3.py]])))])
# AT_SKIP_IF_CANNOT_LINK_C_AND_CXX
@@ -895,6 +930,11 @@ AT_CHECK([sed >&2 -e '/^profiling:.*:Merge mismatch for
summaries/d' stderr],
m4_define([AT_JAVA_PARSER_CHECK],
[AT_CHECK([$5[ $SHELL ../../../javaexec.sh ]$1], [$2], [$3], [$4])])
+# AT_PYTHON_PARSER_CHECK(COMMAND, EXIT-STATUS, EXPOUT, EXPERR, [PRE])
+# -----------------------------------------------------------------
+m4_define([AT_PYTHON_PARSER_CHECK],
+[AT_CHECK([$5[ $SHELL ../../../pythonexec.sh ]$1], [$2], [$3], [$4])])
+
# AT_TEST_TABLES_AND_PARSE(TITLE, COND-VALUE, TEST-SPEC,
# DECLS, GRAMMAR, INPUT,
diff --git a/tests/local.mk b/tests/local.mk
index 5f7fa45..1ee4064 100644
--- a/tests/local.mk
+++ b/tests/local.mk
@@ -59,6 +59,7 @@ TESTSUITE_AT = \
tests/output.at \
tests/package.m4 \
tests/push.at \
+ tests/python.at \
tests/reduce.at \
tests/regression.at \
tests/sets.at \
diff --git a/tests/testsuite.at b/tests/testsuite.at
index 47913ad..87846e5 100644
--- a/tests/testsuite.at
+++ b/tests/testsuite.at
@@ -78,3 +78,6 @@ m4_include([javapush.at])
m4_include([cxx-type.at])
# Regression tests
m4_include([glr-regression.at])
+
+# Python tests
+m4_include([python.at])
--
1.8.4.rc0.1.g8f6a3e5
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Patches to support python: 2/5: Changes to existing bison files to support python,
Dennis Heimbigner <=