bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers

From:	David Fussner
Subject:	bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date:	Tue, 7 May 2024 14:15:33 +0100

Hi Stefan,

I apologize in advance if my reply gets lengthy.

> Also I don't clearly see how the above regexp distinguishes expl3 code
> from "normal" LaTeX code, so the comment should say something about it.

You are quite right that the regexp can (and does) match "normal"
LaTeX code, and I can see that this isn't acceptable as it stands. The
only way to be sure about ":" and "_" is to determine whether they're
inside a matched pair of \ExplSyntaxOn \ExplSyntaxOff macros (or in a
file which does something like \ProvidesExpl[File|Class]). The file
case can, I think, be sorted by modifying the syntax table as part of
setting up the relevant major modes. The temporary toggling of
ExplSyntax is trickier, but I have some proof-of-concept code that
adds a function to the `syntax-propertize-extend-region-functions'
hook that creates a list (`tex-expl-region-list`) of the start and end
of all such regions in a buffer and updates it whenever
`syntax-propertize` runs. In the `syntax-propertize-function` we test
whether hits are inside one of these regions and only change the
syntax when they are. (A very lightly-tested and incomplete patch on
top of my earlier patch is attached. It only applies to the "_" now,
but would need extending to the other sub-matches, too.)

> Currently we "skip" inappropriate underscores via
> `tex-font-lock-match-suscript` and/or by adding a particular `face` text
> property rather than via `syntax-table/propertize`.
>
> For algorithmic reasons, it's better to minimize the work done in
> `syntax-propertize-function` as much as possible (font-lock is more lazy
> than `syntax-propertize`), so I recommend you try and moving the above
> to font-lock rules.

The reason I've been using `syntax-propertize` rather than `font-lock`
is because the former may confer advantages when using
`xref-find-references`, but that in turn depends on how we decide to
define that function in the `tex-etags` backend. Please see below. In
any case, I think I can easily use `tex-expl-region-list` in a test
for how to fontify "_", so if you don't object to the addition of
`tex-expl-region-set` to the
`syntax-propertize-extend-region-functions' hook then we should be
able to get pretty close to a rigorous demarcation between "normal"
LaTeX and expl3 code in this context.

> > +                (setq-local syntax-propertize-function
> > +                            (eval
> > +                             `(tex-xref-syntax-function
> > +                               ,identifier ,beg ,end)))
>
> Why do we need to change `syntax-propertize-function` and why do we need
> `eval`?
>
> > +                (setq syntax-propertize--done 0)
>
> This is not sufficient.  You want to `syntax-ppss-flush-cache`.

We only need `eval` because I'm confused about the handling of macros
-- I have some code in progress to fix this. As for why we need to
change `syntax-propertize-function`, that's the core of the issue with
`xref-find-references`. In the current patch, the wrapper for the
standard `xref-backend-references` gathers file extensions and also
tests whether the search string begins and/or ends with a non-word,
non-symbol character. In `xref-references-in-directory` the only hits
offered to the user match (format "\\_<%s\\_>" ...), so I create a
bespoke `syntax-propertize-function` to ensure that the search string
matches that format. (Actually, I would need to improve that to cope
with searches for "\command" in code that looks like
"\let\command\othercommand" -- even when the "\" has the right syntax
the search fails because the "t" in "let" doesn't.)

My mental model for `syntax-propertize` was/is insufficient, as you
point out, so your improvement ensures that buffers return to the
status quo ante after the search is complete, but it's still an open
question whether we want to do this at all. I see at least 3 options:

1. The maximalist approach, which tries to ensure that any TeX symbol
may be searched for successfully, even if the syntax of its components
is inconvenient. My patch is a (faulty) attempt at this.

2. The minimalist approach, providing no bespoke
`syntax-propertize-function`, and accepting failure when searching
some strings, especially strings which aren't offered up by
default. (In the above example, "command" would be the default offered
up, so manual intervention is needed to search for "\command".) In
this case I'd be very keen to have the expl3 "_" and ":" code actually
in `syntax-propertize`, because then searches for expl3 constructs
(without the "\") would work. (I'd also be very keen on having
something similar in AUCTeX, though their current method works fine in
most files.)

3. The non-standard approach, the `tex-etags` backend calling a
variant of `project-find-regexp` instead of `xref-backend-references`
when someone presses M-?. We could supply file extensions to search,
as now, and allow the choice of projects and/or directories, as now,
but the output will always be very non-standard, more like
`xref-backend-apropos` than `xref-backend-references`. The syntax of
the search string won't matter, and the problem will be "too many
hits" rather than "too few or none".

If you have any thoughts on the matter I'd be all ears.

> > +(add-to-list 'auto-mode-alist '("\\.[tT]e[xX]\\'" . latex-mode) t)
>
> Maybe I have not followed the whole discussion closely enough, but at
> least to me the above "soon" is very unclear.
> I'll assume that this code will be removed before we install the patch.
> If not, please explain in the comment why this specific hack is needed
> and how it works.

As soon as AUCTeX has "*.[tT]e[xX]" in its contributions to
`semantic-symref-filepattern-alist` this will be redundant. As it
stands, not searching *.tex files for symbols in LaTeX-mode buffers
is kind of terrible.

> > +;; Setup AUCTeX modes (for testing purposes only).
> > +
> > +(add-hook 'TeX-mode-hook #'tex-set-auctex-xref-backend)
> > +
> > +(defun tex-set-auctex-xref-backend ()
> > +  (add-hook 'xref-backend-functions #'tex--xref-backend nil t))
>
> I assume this will be sent to AUCTeX and is not meant to be in
> `tex-mode.el`, right?

Yes.

Please assume I agree with all of your other corrections and
clarifications, and that I'll modify the patch accordingly. Once
again, thank you for the careful review, and my apologies for
occupying too much of your time.

Best, David.

On Fri, 3 May 2024 at 15:11, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>
> Hi,
>
> Apparently I'm the `tex-mode.el` guy, so I tried to take a look.
>
> > diff --git a/lisp/textmodes/tex-mode.el b/lisp/textmodes/tex-mode.el
> > index 97c950267c6..d990a2dbfa9 100644
> > --- a/lisp/textmodes/tex-mode.el
> > +++ b/lisp/textmodes/tex-mode.el
> > @@ -695,7 +696,25 @@ tex-verbatim-environments
> >       ("\\\\\\(?:end\\|begin\\) *\\({[^\n{}]*}\\)"
> >        (1 (ignore
> >            (tex-env-mark (match-beginning 0)
> > -                        (match-beginning 1) (match-end 1))))))))
> > +                        (match-beginning 1) (match-end 1)))))
> > +     ;; The next two rules change the syntax of `:' and `_' in expl3
> > +     ;; constructs, so that `tex-font-lock-suscript' can fontify them
> > +     ;; more accurately.
> > +     ((concat "\\(\\(?:[\\\\[:space:]{]_\\|"
> > +              
> > "[\\\\{[:space:]][^][_[:space:][:cntrl:][:digit:]\\\\{}()/=]+\\)"
> > +              
> > "\\(?:_+\\(?:[^][[:space:][:cntrl:][:digit:]:\\\\{}()/#_=]+\\|"
> > +              "#+[1-9]\\)\\)+\\)\\([:_]?\\)")
>
> Can you add in the comment some URL pointing to some relevant expl3
> documentation which "explains" why the above regexp makes sense?
> Also I don't clearly see how the above regexp distinguishes expl3 code
> from "normal" LaTeX code, so the comment should say something about it.
>
> Side note: I'd avoid [:space:] whose exact meaning is rarely quite what
> we need.
> Side note: backslash doesn't need to be backslashed in [...].
>
> > +      (1 (ignore
> > +          (let* ((expr (buffer-substring-no-properties (match-beginning 1)
> > +                                                       (match-end 1)))
> > +                 (list (seq-positions expr ?_)))
> > +            (dolist (pos list)
> > +              (put-text-property (+ pos (match-beginning 1))
> > +                                 (1+ (+ pos (match-beginning 1)))
> > +                                 'syntax-table (string-to-syntax "_"))))))
> > +      (2 "_"))
> > +     ("\\\\[[:alpha:]]+\\(:\\)[[:alpha:][:space:]\n]"
> > +      (1 "_")))))
>
> Currently we "skip" inappropriate underscores via
> `tex-font-lock-match-suscript` and/or by adding a particular `face` text
> property rather than via `syntax-table/propertize`.
>
> For algorithmic reasons, it's better to minimize the work done in
> `syntax-propertize-function` as much as possible (font-lock is more lazy
> than `syntax-propertize`), so I recommend you try and moving the above
> to font-lock rules.
>
> > +(defvar tex-esc-and-group-chars '(?\\ ?{ ?})
> > +  "The current TeX escape and grouping characters.
>
> I recommend you backslash escape the { and } above (although it's not
> indispensable, `emacs-lisp-mode` will parse the code better).
> More importantly, the docstring doesn't explain what this list
> means/does.  E.g. does the order matter?  Can it be longer than 3 elements?
>
> From the current docstring I can't guess what would be the consequence
> of adding/removing elements to/from this list.
>
> > +;; Populate `semantic-symref-filepattern-alist' for the in-tree modes;
> > +;; AUCTeX is doing the same for its modes.
> > +(defvar semantic-symref-filepattern-alist)
> > +(with-eval-after-load 'semantic/symref/grep
> > +  (push '(latex-mode "*.[tT]e[xX]" "*.ltx" "*.sty" "*.cl[so]"
> > +                     "*.bbl" "*.drv" "*.hva")
> > +        semantic-symref-filepattern-alist)
> > +  (push '(plain-tex-mode "*.[tT]e[xX]" "*.ins")
> > +        semantic-symref-filepattern-alist)
> > +  (push '(doctex-mode "*.dtx") semantic-symref-filepattern-alist))
>
> We know `semantic-symref-filepattern-alist` will exist when
> `semantic/symref/grep` is loaded, but not before, so I'd put the
> `defvar` inside the `with-eval-after-load`.
>
> > +;; Setup AUCTeX modes (for testing purposes only).
> > +
> > +(add-hook 'TeX-mode-hook #'tex-set-auctex-xref-backend)
> > +
> > +(defun tex-set-auctex-xref-backend ()
> > +  (add-hook 'xref-backend-functions #'tex--xref-backend nil t))
>
> I assume this will be sent to AUCTeX and is not meant to be in
> `tex-mode.el`, right?
>
> > +;; `xref-find-references' currently may need this when called from a
> > +;; latex-mode buffer in order to search files or buffers with a .tex
> > +;; suffix (including the buffer from which it has been called).  We
> > +;; append it to `auto-mode-alist' so as not to interfere with the usual
> > +;; mode-setting apparatus.  Changes here and in AUCTeX should soon
> > +;; render it unnecessary.
> > +(add-to-list 'auto-mode-alist '("\\.[tT]e[xX]\\'" . latex-mode) t)
>
> Maybe I have not followed the whole discussion closely enough, but at
> least to me the above "soon" is very unclear.
> I'll assume that this code will be removed before we install the patch.
> If not, please explain in the comment why this specific hack is needed
> and how it works.
>
> > +(cl-defmethod xref-backend-references ((_backend (eql 'tex-etags)) 
> > identifier)
> > +  "Find references of IDENTIFIER in TeX buffers and files."
> > +  (require 'semantic/symref/grep)
> > +  (let (bufs texbufs
> > +             (mode major-mode))
> > +    (dolist (buf (buffer-list))
> > +      (if (eq (buffer-local-value 'major-mode buf) mode)
> > +          (push buf bufs)
> > +        (when (string-match-p ".*\\.[tT]e[xX]" (buffer-name buf))
> > +          (push buf texbufs))))
> > +    (unless (seq-set-equal-p tex--buffers-list bufs)
> > +      (let* ((amalist (tex--collect-file-extensions))
> > +          (extlist (alist-get mode semantic-symref-filepattern-alist))
> > +          (extlist-new (seq-uniq
> > +                           (seq-union amalist extlist #'string-match-p))))
>
> After sinking the `defvar` above, you'll need to add a new `defvar` for
> `semantic-symref-filepattern-alist` just after the `require`.
>
> > +                (setq-local syntax-propertize-function
> > +                            (eval
> > +                             `(tex-xref-syntax-function
> > +                               ,identifier ,beg ,end)))
>
> Why do we need to change `syntax-propertize-function` and why do we need
> `eval`?
>
> > +                (setq syntax-propertize--done 0)
>
> This is not sufficient.  You want to `syntax-ppss-flush-cache`.
>
>
>         Stefan
>

0001-expl-region.patch
Description: Text Data

[Prev in Thread]

Current Thread

[Next in Thread]

bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers, (continued)
- bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers, Arash Esbati, 2024/05/02
  - bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers, David Fussner, 2024/05/02
- bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers, Stefan Monnier, 2024/05/03
  - bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers, David Fussner, 2024/05/04
  - bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers, Arash Esbati, 2024/05/04
    - bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers, Stefan Monnier, 2024/05/04
    - bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers, Arash Esbati, 2024/05/04
  - bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers, David Fussner <=
  - bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers, David Fussner, 2024/05/15
    - bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers, Arash Esbati, 2024/05/16
    - bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers, David Fussner, 2024/05/16

Prev by Date: bug#67931: [PATCH] Use S/MIME key from content for mail signing via OpenSSL
Next by Date: bug#70784: Abolish string resizing
Previous by thread: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Next by thread: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Index(es):
- Date
- Thread