|
From: | Darcy Shen |
Subject: | [Texmacs-dev] Recent Works on Programming Language Parsers |
Date: | Mon, 23 Mar 2020 02:36:03 +0800 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 |
I didn't use a higher-level abstraction like packrat-parser. In my opinion,for now, low level small parsers written in C++ should work fine.
Most programing languages are similar in syntax. Doing simple
abstraction is sufficient for syntax highlighting.
+ dot: new composition of parsers, string_parser, keyword_parser, operator_parser
+ cpp: new composition of parsers, string_parser
+ java/scala/python: old composition of parsers, string_parser
+ others: old composition of parsers
Almost all xxx_language.cpp (except scheme_language.cpp) is derived from mathemagix_language.cpp .
I call it old-style composition of parsers. It is not efficient enough, because, we have to re-parse the code in `get_color`.
Dive into concat_text.cpp:typeset_prog_string, we will find what actually is `get_color` and `advance`.
The new-style composition of parsers, reduce the unnecessary parsings in get_color.
Aims to keep the (type, keyworkd) mapping in Scheme files. Please refer to `dot-lang.scm`.
Currently, the String parser only support inline string. Actually, string and multi-comment are the same type.
They both have openings and corresponding closings.
The string parser will finally support multi-line. Once it is ready, the multi-comment parser will also be implemented
in a short time.
I will continue my developments on newly-supported languages (like dot). The goal is to make it extremely easy to
support a new language. For coloring schemes, it is another topic. In the next one or two months, I will continue to
work on improving the xyz_parser and abc_language.
Darcy
2020/03/23
[Prev in Thread] | Current Thread | [Next in Thread] |