emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: access to parser stack in SMIE


From: Stephen Leake
Subject: Re: access to parser stack in SMIE
Date: Sat, 06 Oct 2012 14:55:39 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (windows-nt)

Stefan Monnier <address@hidden> writes:

> Your problem is one I also bumped into for the Modula-2 and Pascal modes
> (can't remember how I "solved" them nor to what extent the "solution"
> works).

Ok.

>> When all the tokens are properly disambiguated, SMIE can traverse
>> correctly from package "begin" to "package". But we can't do that while
>> disambiguating "begin"; that's circular.
>
> Actually, in some cases, it can be made to work: to disambiguate "begin"
> setup a loop that calls smie-backward-sexp repeatedly (starting from the
> position just before the "begin", of course) checking after each call
> whether the result lets us decide which of the two begins we're
> dealing with.

In the Ada use case I presented (package ... begin), that ends up going
all the way back to package at the beginning of the buffer, encountering
more "begin" tokens on the way; it's much more efficient to start there
in the first place.

>> It might also make sense to incorporate the refined keyword cache
>> mechanism into smie.
>
> Right, if we want to make the stack visible, then we also need to
> implement the cache.

Ok. Although different languages may want to cache different things, so
I'm not sure that would really be common code.

> Note that such a "forward full-parse with cache" approach has several
> downsides:
> - potential performance impact on long buffers.

I have not timed this on truly large code yet; I'll ask for examples.

The cache could be improved, by storing the stack with each token as
well; then when you need to do a full parse forward, you can start at
the previous valid token, not at the start of the buffer.

> - risk of the cache going out of sync.

Yes. I've already got an interactive `ada-indent-invalidate-cache' to
handle that, but the user will have to be aware. So far, the cache has
not gotten out of sync, but I haven't used this to write real code yet.

> - parse errors far away in an unrelated (earlier) part of the buffer
>   can prevent proper local indentation.  Parse errors can occur for lots
>   of reasons (e.g. temporarily incorrect code, incomplete parser, or
>   use of "exotic" language extension, use of a preprocessor, ...).

Yes; those are all reasons to stick with local parsing whenever
possible.

Hmm. "begin" occurs a lot in Ada code (every function body), so I
guess I can't really claim I'm not relying on forward full-parse much,
since I need it for every "begin".

> That doesn't mean it's not workable, but those downsides had better come
> with some significant upside.  One significant upside is that by only
> parsing forward we can use any other parsing technology, such as LALR,
> GLR, PEG, ...

A much more significant upside: it supports the language I need (Ada)!

Switching to another parsing technology just for the forward full-parse
means supporting a second grammar; that's a lot of work. It might be
simpler to switch to only forward full-parse (which I think is what you
are suggesting).

If I switch to only using forward full-parse, I'd have to do things
differently in the indentation rules as well. The key information they
need need is the location of the statement start for every indentation
point. So the forward parse could store the relevant statement start
(and maybe other stuff) with every token.

I could look at semantic, and see where that gets me. That would have
all the downsides listed here, without the benefit of being able to use
local parsing when it works. I've started that several times, and given
up because I had working examples of indentation engines that use smie,
or I figured out how to get smie to do what I need. But I guess it will
be worth giving it a more serious try. I should not stick with smie just
because I haven't learned semantic (how many times have I said that to
someone else? - sigh).

One upside of the semantic approach would be avoiding writing all the
token refining code; most is very simple, but some is pretty hairy, and
it can only get worse as I implement more of the Ada language.

> I.e. I think it's an interesting direction but it would be another package.

Hmm. The actual change to smie I'm asking for is very small (if we leave
out the cache); perhaps you could put that in, with a large "use at your own
risk" comment?

Otherwise, I can reimplement smie-next-sexp in ada-indent; there
have been times when I thought that would be a good idea anyway (it's
more complicated than I need). Then I would only be using smie for the
grammar generating code.

But I'll give semantic a serious try first; "today is a good day to
learn something new" :). 

-- 
-- Stephe



reply via email to

[Prev in Thread] Current Thread [Next in Thread]