emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: rx.el sexp regexp syntax (WAS: Off Topic)


From: Alan Mackenzie
Subject: Re: rx.el sexp regexp syntax (WAS: Off Topic)
Date: Fri, 25 May 2018 15:51:26 +0000
User-agent: Mutt/1.9.4 (2018-02-28)

Hello, Pierre.

On Fri, May 25, 2018 at 10:52:03 +0200, Pierre Neidhardt wrote:

> rx.el is one of the best concepts I've discovered in a long time.
> It's another instance of "Don't come up with a new (mini)language when
> Lisp can do better": it's easier to learn, more flexible, easier to
> write, much easier to read and as a consequence much more maintainable.

Much easier than what?  Than the putative mini-language that doesn't get
written?

> > Some people, when confronted with a problem, think "I know, I'll use
> > regular expressions." Now they have two problems.
> > -- Jamie Zawinski

> It's also much more "programmable" thanks to its `eval' expression.
> (It's possible to count!)

> See http://francismurillo.github.io/2017-03-30-Exploring-Emacs-rx-Macro/
> for some nice examples.

> I think it's high time we moved away from traditional regexps and
> embraced the concept of rx.el.  I'm thinking of implementing it for
> Guile.

There's nothing stopping anybody from using rx.el.  However, people have
mostly _not_ used it.  The "I think it's high time ...." suggests in
some way forcing people to use it.  Before mandating something like
this, I think we should find out why it's not already in common use.

> At the moment the rx.el implementation is built on top of Emacs regexps
> which are implemented in C.  I believe this does not use the power of
> Lisp as much as it could.

But would any alternative use the power of regexps?

> The traditional regexps work in two steps: first build a blackbox
> automaton from the string expression, then test if the input matches.

> Building the automaton is costly.  In C, we build it once and save the
> result in a variable so that every regexp match does not rebuild the
> automaton each time.

Emacs has a (moderately large) cache of regexps, so that building the
automatons is done very rarely.  Possibly just once each for each
session of Emacs.

> In high-level languages, automatons are automatically cached to save the
> cost of building them.

Emacs Lisp does this too.

> The rx.el library/concept could alleviate this issue altogether: because
> we express the automaton directly in Lisp, the parsing step is not
> needed and thus the building cost could be tremendously reduced.

> So the rx.el building steps

>   rx expression -> regexp string -> C regexp automaton

> could boil down to simply

>   rx automaton

I don't see what you're trying to save, here.  At some stage, the regexp
source, in whatever form, needs to be converted to an automaton.

Are you suggesting here building an interpreter in Lisp directly to
execute rx expressions?

> It would be interesting to compare the performance.  This also means
> that there would be no need for caching on behalf of the supporting
> language.

I will predict that an rx interpreter built in Lisp will be two orders
of magnitude slower than the current regexp machine, where both the
construction of an automaton, and the byte-code interpreter which runs
it are written in C (and probably quite optimised C at that).

Regexp performance is critical to Emacs's performance in general.

> What do you think?

I think we will, in the main, carry on using conventional regular
expressions expressed as strings.  I can't get excited about rx syntax,
which I'm sure would be just as tedious, and possibly more difficult to
read than a standard regexp.  Analagously, as a musician, I read
standard musical notation (with sets of five lines and dots) far more
easily and fluently than I could any "simplified" system designed for
beginners, which would be bloated by comparison.

Regular expressions can be difficult.  I don't believe this difficulty
lies, in the main, in the compact notation used to express them.  Rather
it lies in the concepts and the semantics of the regexp elements, and
being able to express a "mental automaton" in regexp semantics.

> --
> Pierre Neidhardt

-- 
Alan Mackenzie (Nuremberg, Germany).



reply via email to

[Prev in Thread] Current Thread [Next in Thread]