bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#68246: 30.0.50; Add non-TS mode as extra parent of TS modes


From: Dmitry Gutov
Subject: bug#68246: 30.0.50; Add non-TS mode as extra parent of TS modes
Date: Thu, 18 Jan 2024 21:55:34 +0200
User-agent: Mozilla Thunderbird

On 18/01/2024 16:17, Stefan Monnier wrote:
away).  The language-specific version of major-mode-remap-alist looks
necessary after all.

It doesn't have to be specifically about languages.
It can just be a "default" set of "major mode" remappings.

Could be. And it's something we should probably add irrespective of the outcome of this dicsussion.

@@ -3206,10 +3209,10 @@ interpreter-mode-alist
        ("emacs" . emacs-lisp-mode)))
     "Alist mapping interpreter names to major modes.
   This is used for files whose first lines match 
`auto-mode-interpreter-regexp'.
-Each element looks like (REGEXP . MODE).
+Each element looks like (REGEXP . MODE-OR-LANGUAGE).
   If REGEXP matches the entire name (minus any directory part) of
   the interpreter specified in the first line of a script, enable
-major mode MODE.
+MODE-OR-LANGUAGE.
There's a similar need for "content type" rather than "language".  If we
want to mention "language" we should also take the opportunity to
mention other related categorizations like "content type".
Are "content type" and "language" going to be different things?
They seem the same to me.

I think there's the same kind of difference between "language" and
"content type" as between "language" and "major mode" :-)

A "content type" could be serviced by multiple "languages"? Still not sure how that would work. I mean, we could have content-type like text/html or application/json, but neither splits into two languages, really.

OTOH, the major mode can only run the language hook, I think, if any major
mode can correspond only to one language.

Not so.  A major mode can easily do

     (run-mode-hooks (compute-the-hook))

I guess that would mean that the language hook is not run automatically, that each major mode would need explicit code to compute it and run.

Though I suppose if set-auto-mode-0 saves the currently "detected"
language somewhere, the major mode definitions could pick it up and
call the corresponding hook.

Major modes are not activated solely via `set-auto-mode-0`, so relying
on that is a crutch/hack, not something on which to base a design.

The major mode could compute which language it is for. But the algorithm could be undecidable if the buffer is not visiting a file yet, doesn't have an interpreter comment, etc. That's where the command set-buffer-language was supposed to come in handy.

I'm not comfortable enshrining the "-ts-mode" convention here.
We can still go the "strict" approach, where when no language is assigned,
we don't try to guess it.

I think the `<LANG>-mode` heuristic is acceptable, because it's been
*the* convention used in Emacs.

We are now getting a whole set of new modes for which this heuristic isn't going to work (the tree-sitter based ones), and that list will grow. Perhaps it would be more consistent to drop the heuristic if we don't manage to make it work, somehow, for both kinds of modes.

Also I think if we want a `buffer-language` function, it should not rely
on how the mode was installed (e.g. `set-auto-mode--last`) but only on
the major mode itself, i.e. something like
      (defun buffer-language ()
        (or buffer-language
Where would the buffer-language variable be set, if not inside
set-auto-mode-*?

In the major mode?

Then perhaps we won't need the fallbacks (the part that comes after 'or') - the major mode's setting of the language could perform those "heuristic based" computations as well.

            (some heuristic based on major-mode and/or derived-modes)))
If we're sure we don't want several languages to be able to refer to the
same major mode...

A major mode can

     (setq major-mode ...)

If/when such "generic" major modes become a thing, and the `(setq major-mode 
...)`
hack becomes too inconvenient, we can devise a better solution
(e.g. extending/tweaking the way `derived-mode-*` work).

The major-mode could be fundamental-mode. If the language were to be specifiable through settings external to major modes, we could still do useful things while in fundamental-mode (e.g. do some useful editing with Eglot, provided it supports indentation and completion), or suggest which major modes to install from ELPA.

[ Of course, I already mentioned that I also suspect that there can/will
    be sometimes several languages (or none).  ]
I'm not clear on this. You mentioned complex cases - like an xml inside an
archive? But depending on the usage, only one of the languages might be
"active" at a given time.

But depending on what "the language/type/mode" is used for, we may not
care really about which language/type/mode is "active" but about which
languages/types/modes are applicable (e.g. for `.dir-locals.el`).

Would we really care that an xml file inside an archive is applied both archive-subfile-mode and xml-mode dir-locals settings? Offhand, I would really expect the xml-mode settings only. Though the former could be a nice bonus in rare cases.

Perhaps dir-locals.el could get a syntax for specifying variables when specific minor modes are enabled as well.

+(defun set-buffer-language (language)
+  "Set the language of the current buffer.
+And switch the major mode appropriately."
+  (interactive
+   (list (let* ((ct (mapcan
+                     (lambda (pair) (and (keywordp (car pair))
+                                    (list (symbol-name (car pair)))))
+                     major-mode-remap-alist))
+                (lang (completing-read "Language: " ct)))
+           (and lang (intern lang)))))
+  (set-auto-mode-0 language))
I see several issues with this function (name and implementation), but
I wonder when we'd ever need such a thing.

It seemed like a missed opportunity not to provide a more high-level command
to switch to a specific language for the buffer. E.g. how we sometimes use
'M-x foo-major-mode' when a file type's been misdetected, or the buffer is
non-file-visiting (perhaps very temporary).

A command which does this with exhaustive completion across the configured
languages seems handy. At least that's my impression from briefly testing
it out.

We can do the same with major modes, of course (just `mapatom` and filter out
the non-major modes), so feel free to add such a command, but it doesn't
seem like offering it for "languages" is particularly more useful than
offering it for "major modes".

If modes are annotated with their languages, the result could be almost as handy indeed, so maybe I will add such command, later.

Also, get-current-mode-for-language can be implemented in terms of
set-buffer-language (see my earlier email to Joao).

That seems to be a roundabout way to go about it.
`get-current-mode-for-language/type/mode` should be used by
`set-auto-mode` rather than other way around, no?

If the major modes decide the language, and if we don't mind that this won't work without an installed/available major mode, yes.

<LANG>-mode is lexically indistinguishable from <NONLANG>-mode. If we used
the names like <LANG>-lang, at least one could tell whether one of the
parents of a given <foo>-mode is a language.

Other than for Eglot, where does this distinction matter?

I suppose it comes down to the ease of implementing interaction with any external tools that need to be passed a language name.

If a function get-language-for-mode is possible to implement, then you only need to store the mapping language->language-name-spelled-in-specific-way for a number of exceptions, whereas if instead of get-language-for-mode you only have the full hierarchy of modes, then the mode->correct-spelling will likely need to be exhaustive in all cases, in order not to match any parent modes (e.g. prog-mode) that don't denote a language. And when such mappings have to be exhaustive, support for any new language would also need to be done explicitly in all cases.

Eglot would need explicit mappings either way because the name of the language server program is always different (though they would be simplified), but something like 'rg -t js Foo' only needs the language name.

Another issue I see if we don't use something like
`derived-mode-add-parents` is that all the various places where we use
mode-indexing, such as `.dir-locals.el`, `ffap`, YASnippet, etc... will
need to be extended with a way to use "languages" as well, and then we
also need to define a sane precedence between settings that apply to
a given mode and settings that apply to a given language (setting for
`js-ts-mode` should presumably take precedence over settings for
`:js` which should take precedence over settings for `prog-mode`).
That's a good point: if "languages" as a separate notion gets added, it
would make sense to use them in more places (not 100% necessary, but good
for consistency). With the associated complexity that you mention.

And if it's not merged into the same hierarchy as major modes, how do
you get `:js` (i.e. "language") to be sometimes higher-precedence and
sometimes lower precedence than a mode?

I'm not sure it's a requirement. If we decide that languages are "above" major modes, it would make just as much sense to first apply language settings, and then those for the major mode and its parents. Even though a language is often more specific than prog-mode. It can be different for other hierarchies (e.g. js-base-mode would be "below" language). We wouldn't want to specify the parent for each language as well, right?

E.g. I suppose if js-language was a major mode which also inherits from prog-mode, a priority resolution algorithm could then decide that it's also lesser priority when applying local variables for any modes which add js-language as its extra parent. But that seems like more work (both for the writers to implement and for the users to understand) for relatively minor gain.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]