Re: Some Questions

guile-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Some Questions

From:	Michael Lucy
Subject:	Re: Some Questions
Date:	Mon, 29 Mar 2010 23:08:18 -0600

Hi,

On Mon, Mar 29, 2010 at 3:40 AM, Andy Wingo <address@hidden> wrote:
>
> Tree-IL is the right thing IMO, mostly because it allows you keep source
> location information, but it also allows you to express more precisely
> what you want to compile to. You don't want to run your compiler's
> output through the Scheme syntax expander, that's unnecessary.
>
> Also tree-il is the right place to hook into Guile's compiler
> infrastructure.
>

Ah, k.

>
> Ah, I should have responded to you before I responded to "No Itisnt"; if
> you're down for this, let's do it then. I suspect that it's a project
> that would take 2 months to do *well*:
>
>  * Compile PEG grammars at syntax time using a macro
>    - into state-machine-like lexically-bound closures
>    - see LPEG VM for example
>      http://www.inf.puc-rio.br/~roberto/docs/peg.pdf
>    - potentially augment Guile's VM with needed instructions
>  * Procedural, LPEG-like interface?
>    - run the compiler at runtime?

So, I'm assuming "syntax time" means the grammars will be compiled as
soon as the source file is loaded and macros are parsed, while
"runtime" means the grammars will be compiled when a function using
them is called.  Do you mean you'd like to see both of these
supported?

As for the interface, the lisp tradition I'm most comfortable in calls
for transparent composing and decomposing of lists as the main way of
dealing with data, so I was thinking of representing uncompiled
grammars in a form similar to the example at the top of peg.scm and
then compiling this to something efficient (a choice between this and
more traditional string syntax is how CL-PPCRE approaches things:
http://weitz.de/cl-ppcre/#scan ).  For example, if I have a pattern
stored in *var* and I want a new pattern that's one or more
repetitions of that, I'd write `(,*var* *), or (list *var* '*).

It looks like LPEG's approach is functions that return opaque data and
other functions to combine these opaque data.  For example, if I have
a pattern stored in *var* and I want a new pattern that's one or more
repetitions of that, I'd write (one-or-more *var*).  The problem with
this approach is that you have to choose between verbosity or
confusion (e.g. LPEG uses * to concatenate things in the lpeg module,
but when present in a string for the re module it means what it
normally would in a PEG grammar).  I haven't finished reading the
paper, but I'm assuming the advantage to this approach is that the
opaque representation is already compiled, so doing several hundred
matches with one grammar won't require several hundred compilations.

It seems like we could get the best of both worlds by having
transparent list representations, functions to compose them (so
(one-or-more *var*) yields the same thing as `(,*var* *)), and a
compile-peg function that would take the list representation and
produce a compiled PEG parser.  All the peg functions would take
either the uncompiled or the compiled form (they would check at
runtime what they were dealing with), so in situations where speed is
important people can store the compiled grammar and re-use it.

>  * PEG grammars as text?
>    - Guile PEG language, parsed by PEG itself?

Once a parser's written, parsing strings into grammars sounds useful
and not too difficult.

>  * Benchmarking
>    - LPEG benchmarks would be a good first start
>  * Test suite
>  * Docs
>  * Stream parsing?
>
> If you finished the first tasks, figuring out an efficient way to do
> stream parsing could well take all the rest of the time. (The LPEG paper
> works on strings.)
>
> So if you're down with it, perhaps you can do the PEG stuff, and No
> Itisnt can work on Lua. There's definitely enough work for everyone :)
> Hopefully the GOOG comes through with funding.

Cool, PEG it is then.

>
>
> So, let me know what you think about PEG, if you think it's the right
> size for summer. I think it has great potential, and implementing it
> well (as a compiler and procedurally) will be of great use. Otherwise we
> can go ahead with Python, if that's to your liking.

How does this look for a timeline (unless my math is very horrible,
the "midterm" evaluations are 70% of the way through, so most of the
real work will be done by then):

The ~7 weeks before mid-term evaluations are due:
1 week defining the PEG syntax carefully and writing functions for
dealing with it
3 weeks making the PEG syntax compile to VM code
1 week writing tests / fixing bugs that tests show
1 week documenting things that weren't well enough documented on the way
1 week for things to go wrong in

The ~3 weeks after mid-term evaluations are in:
1 week writing useful add-ons (e.g. taking PEG grammar in strings)
1 week writing benchmarks and tuning the speed
1 week for cleaning things up and writing more documentation/examples

The 1 week after suggested pencils-down date:
More documentation/examples.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Some Questions, Michael Lucy, 2010/03/29
- Re: Some Questions, Andy Wingo, 2010/03/29
  - Re: Some Questions, Michael Lucy <=

Prev by Date: Integration of lalr-scm
Next by Date: Re: Merging R6RS libraries?
Previous by thread: Re: Some Questions
Next by thread: Merging R6RS libraries?
Index(es):
- Date
- Thread