emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: rx.el sexp regexp syntax (WAS: Off Topic)


From: Peter Neidhardt
Subject: Re: rx.el sexp regexp syntax (WAS: Off Topic)
Date: Fri, 25 May 2018 22:35:06 +0200
User-agent: mu4e 1.0; emacs 26.1

Alan Mackenzie <address@hidden> writes:

> It may be part of the explanation.  But more salient, I think, is that
> hackers prefer powerful means of expression.  A single character in a
> string regexp has the power of a sexp in the corresponding rx regexp.
> Paul Graham (at http://www.paulgraham.com) has had quite a bit to say
> about this in the (distant) past.  Conciseness of expression is where
> it's at.

I think you are referring to this article:

        http://paulgraham.com/ineq.html

> Another easy test is the number of characters in a program, but this
> is not very good either; some languages (Perl, for example) just use
> shorter identifiers than others.
>
> I think a better measure of the size of a program would be the number
> of elements, where an element is anything that would be a distinct
> node if you drew a tree representing the source code. The name of a
> variable or function is an element; an integer or a floating-point
> number is an element; a segment of literal text is an element; an
> element of a pattern, or a format directive, is an element; a new
> block is an element. There are borderline cases (is -5 two elements or
> one?) but I think most of them are the same for every language, so
> they don't affect comparisons much.

With this definition, rx and regexp have the same length (except for
`eval').  "Conciseness in characters" is not what Paul Graham was
referring to.

Others might think differently, for instance those who prefer Perl to
Lisp.

In the end this is all what it boils down to: the "Unix" hacker culture
vs. the Lisp one.  The Unix tradition has long spread the use of
acronyms and and shortcuts.  Lisp on the other hand (espcecially Scheme)
put a lot of emphasis on explicit full names.

My opinion is that acronyms and shortcuts were mostly useful in the
era of teletypes and limited terminals and shells.  Now we have
completion and fuzzy-search, for which explicit full names not only make
sense but are necessary.
(It's much more intuitive to search for "string compare" in Emacs
Lisp than "str cmp" in C.)

In the end, rx vs. regexp reflects the same mindset difference.

>> Have you used rx?
>
> No.  Neither have I used Cobol (much).

Cobol is not very relevant, let's focus on the discussion here.  Try
using rx on some midly complex regular expressions, it could be
insightful for this discussion.

> You seem to want to increase the readability for beginners, for people
> who have laboriously to slog through an expression trying to make sense
> of each bit of it.  I don't think experienced regexp users have
> difficulty with the syntax.  I don't, for one.
>
> There was a time when people thought that
>
>     ADD 1 TO A GIVING B
>
> was more readable than
>
>     b = a + 1;

This is not what rx is about though.  Your example does not show any
change in structure.  rx does.

> Hexadecimal CPU codes aren't and aren't intended to be human-readable.
> String regular expressions are.

Well, "readable" is not black and white.  If we can have "more readable",
then even better.

> rx MUST be written over several lines and indented.  A string regexp, by
> contrast, usually fits onto a single line.

No, it does not have to be written over several lines.  I don't know
where you got that from.

That said, is "fitting onto a single line" necessarily good?

>> - rx does not require escaping any character with backslashes.  This
>>   is always a great source of confusion when switching from BRE to ERE,
>>   between different interpreters and when storing regexp in Lisp strings
>>   where backslashes must be escaped themselves for instance.
>
>> - Symbols with non-trivial meanings in regexp (e.g. \<, :, ^, etc.) have
>>   a trivial _English_ counterpart in rx: (respectively "word-start",
>>   nothing, "line-start" _and_ "not").
>
> The "English" counterpart used in rx is bulky and difficult to learn.
> Somehow, you've got to learn that it's "word-start" and not
> "word-beginning",

Could argue the same about "*" vs. "%".  But words that have a meaning
in a natural language are easier to remember than arbitrary symbols.

> that it's "not" and not "non", and so on.  This is more
> difficult than just learning \< and ^.  If your native language isn't
> English, it might be much more difficult.

All programmers learn some basic English, say, "if then else".  I don't
think that symbolic languages are easier to learn than natural languages
for human beings.

> Well, so far, on this list, two or three people have said they "like"
> rx.el.  Nobody has said "I'm going to be using rx.el in my programs from
> now on".

Which is precisely why we are talking about it.  To let people know,
pique their curiosity, let them try and report feedback.

"Not famous" does not equal bad quality.  That's why we need to
communicate to give good products a better chance.

--
Peter Neidhardt

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]