emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A formal grammar for Org


From: Tom Gillespie
Subject: Re: A formal grammar for Org
Date: Tue, 1 Jun 2021 02:53:15 -0700

Hi Jakob,
    Thank you for getting in touch. I had been meaning to after
someone pointed me to your repo in a reddit thread, but you beat me to
it. Replies in line. Best!
Tom

PS ccing this back to the list for the record.

On Tue, Jun 1, 2021 at 1:56 AM Jakob Schöttl <jschoett@gmail.com> wrote:
>
> Hi Tom,
>
> I came to your post at the mailing list from here:
> https://github.com/gagbo/LuaOrgParser/issues/1
> Sorry, I don't know, how I can answer on the mailing list when I don't have 
> received the original mail.

No worries, I never managed to figure that out either so I just
subscribed. Maybe by matching the subject as you do here and ccing the
list (attempting it in this email to see what happens)?

> We have a pretty similar project, org-parser[1]. It's also written in a Lisp 
> dialect, Clojure, but it uses instaparse instead of brag as parser library.

https://github.com/tgbugs/laundry/tree/next#similar-projects I managed
to get it into my README as a reminder to myself to have a thorough
look at it, but have been occupied with other work since then.

> My idea was, to transform the formal grammar to a grammar.js for tree-sitter. 
> It would be so cool, if it could be generated from one formal specification.

Yes, that would be great. It would be a major step to have a couple of
grammars for org that can be used for stuff like this and compared to
each other, along with test cases that we can use to define correct
behavior.

One issue that I don't have a full understanding of at the
moment is how certain ambiguous forms will impact the ability to
transform directly into the tree sitter grammar.

The reason I mention
this is because I have had to move to a two phase parser in order to
deal with ambiguous parses.

Having not looked carefully at your
approach I don't know whether you have encountered similar issues. For
the tree sitter use case in particular I'm not entirely sure that the
ambiguity matters, but I haven't had a chance to look at it yet.

> Do you plan, in your parser, to do a transformation step from the raw parser 
> AST to a higher-level AST? E.g. the raw parser AST would parse a (:date  
> "2021-06-01") and the transformed AST would transform this to a higher-level 
> timestamp object.

Yes. I already do that to a certain extent in the expander
https://github.com/tgbugs/laundry/blob/next/laundry/expander.rkt (the
raw AST is hard to work with directly), but there will be more. I also
expect that I will add an intermediate step where the AST is
rearranged to account for aspects of org semantics that cannot be
captured by the context free part of the grammar.

After that step there are a number of potential conversions, one of which will
transform the AST into Racket structs, but I haven't made it quite
that far yet. That said, I think that in terms of defining a canonical
parse, I am aiming to do that in the transformed intermediate
s-expression representation because I think it will be easier to
define the correctness of certain user interactions on that form rather than
on the higher level object representation, even if the higher level
objects are ultimately used to actually implement that behavior.

> Do you have any automated tests for your parser?

Yes. See https://github.com/tgbugs/laundry/blob/next/laundry/test.rkt
you can run them from the working directory via =raco test laundry=.
I haven't fully specified the expected AST (and transforms) in most
cases because I'm still hammering out details. In some cases I do
specify the parse that I expect, e.g. for headings I specify when
tags are expected in cases where there might be some ambiguity. If you
are looking for edge cases there are a number that are not yet in the
automated tests but that are in
https://github.com/tgbugs/laundry/blob/next/laundry/cursed.org because
they hit on some cases of extreme ambiguity and internal inconsistency
in the elisp implementation or on weird behavior under user
interaction (I also have some other test cases that haven't been
committed to the repo yet).

It would be great to align the grammars and the behavior using a set
of common test cases.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]