bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#36496: [PATCH] Describe the rx notation in the lisp manual


From: Mattias Engdegård
Subject: bug#36496: [PATCH] Describe the rx notation in the lisp manual
Date: Fri, 5 Jul 2019 16:13:52 +0200

4 juli 2019 kl. 18.28 skrev Eli Zaretskii <eliz@gnu.org>:
> 
> This is a large section.  The ELisp reference is already a large book,
> printed in two separate volumes.  So I think if we want to include
> this section, it will have to be on a separate file that is
> conditionally included @ifnottex.
> 
> Alternatively, we could make this a separate manual.

It is about 7-8 pages in all. One page could be saved by combining the 
character class descriptions with the existing ones; they are basically the 
same. However, that would probably preclude separation into separate files or 
manuals.

The category names also take up about one page, but that information isn't 
available anywhere else, since those names are specific to rx. (It would be 
nice if the names were defined along with the categories, but that isn't the 
case at present.)

I would prefer @ifnottex to having a separate manual, since one of the points 
is to make rx feel like a part of elisp and a genuine, practical alternative to 
regexp strings rather than an add-on. For example, the "Complex Regexp Example" 
turned out to be a good place for an rx version.

The revised patch (attached) does not separate the contents, because I wanted 
to hear your opinion on the matter first.

>> The existing `rx' doc string can be left unchanged, or reduced to something 
>> more concise, perhaps without a description of the entire rx language but 
>> with a manual reference. Suggestions are welcome.
> 
> Yes, the doc string should be reduced to the summary of the
> constructs.

Good, let's do that when the changes to the manual are done.

>> +Bind the name @var{ref} to a submatch that matches @var{rx-expr}@enddots{}.
>   ^^^^^^^^^^^^^^^^^^^^^^^
> "Bind the symbol @var{ref}", no?

Yes, thank you.

>> +or, using shorter synonyms and written more compactly,
> 
> This last line needs @noindent before it.

Added, and in another place.

>> +@table @asis
>> +@item @code{"some-string"}
> 
> Why @code{"..."} and not @samp{...}?  The latter will look better both
> in print and in Info format.

I looked at the result in all formats (pdf, info, html) and came to the 
opposite conclusion; it makes it clear that it's about a string literal. It's 
not a strongly held opinion, however.

>> +Corresponding string regexp: @samp{AB@dots{}} (subexpressions in sequence).
>                                ^^^^^^^^^^^^^^^^
> I think this should use @samp{@var{a}@var{b}@dots{}} instead. And
> likewise for the other "corresponding string regexps".  The reason is
> that neither A nor B stand for themselves, literally, they are
> meta-variables.

Right; again I made experiments, and ended up with @samp{var{A}@var{B}@dots{}}. 
The upper-case variables looked much better in print and html.

>> +Match the @var{rx}s once or not at all.@*
> 
> "Match @var{rx} or an empty string" sounds better to me.

Much better, thank you. Changed in all places.

>> +Match the @var{rx}s zero or more times, non-greedily.@*
> 
> I would add here a cross-reference to where greedy matching is
> described.

Done, with a separate sub subheading for the non-greedy stuff.

>> +@item @code{(any @var{charset}@dots{})}
> 
> Please don't call this "charset", as that term is already taken by a
> very different creature in Emacs.  I suggest "character set" instead.

Yes, I ended up using "set" since it's shorter and even better in this case.

>> +Each @var{charset} is a character, a string representing the set of
>> +its characters, a range or a character class.  A range is either a
>> +hyphen-separated string like @code{"A-Z"}, or a cons of characters
>> +like @code{(?A . ?Z)}.
> 
> Again, a cross-reference to where "character class" described would be
> good here, as would a @cindex entry for "character class in rx".

Done; the cross-reference is just a "see below" since it's very near.

>> +@item @code{space}, @code{whitespace}, @code{white}
>> +Match any character that has whitespace syntax.
> 
> Only ASCII or also non-ASCII?  This should be spelled out.

It's a matter of the syntax table; I used the exact formulation of the existing 
char class description.

>> +@xref{Syntax Class Table} for details.  Please note that
>                            ^
> Comma missing there.

Ah, yes. Apparently, a comma is inserted automatically in the TeX version, so 
that we get the desired "See Section XIV, page 123, for details"; this is 
documented. In the info and html versions there is no page number, so a comma 
doesn't feel like proper English: "See Section XIV, for details" has a distinct 
German tone to my ears.
Explicit comma after @xref seems to be common in the Emacs manuals, so rather 
than to fight it out I castled the clauses.

Attachment: 0001-Describe-the-rx-notation-in-the-elisp-manual-bug-364.patch
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]