Eigenstate: myrddin-dev mailing list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parser Generator: Demo Exists.


And it's basically working usefully now.

I still want to upgrade to a more powerful parsing algorithm, but the bulk
of the machinery is done.

On Mon, 03 Aug 2015 09:11:25 -0500, Ryan Gonzalez <rymg19@xxxxxxxxx> wrote:

> So...maybe I'm just saying the obvious...
> 
> ...but have you ever considered just porting Berkeley YACC (byacc)? It's public domain, so you can license it however the heck you want, and I'm pretty sure that'll end up easier than trying to do the whole thing from scratch.
> 
> 
> On August 3, 2015 1:49:00 AM CDT, Ori Bernstein <ori@xxxxxxxxxxxxxx> wrote:
> >So, it's far from done, and is currently completely useless, but enough
> >is
> >working that I feel like I can announce that it exists:
> >
> >    http://git.eigenstate.org/ori/mpgen.git
> >
> >This currently implements a lexer and an lr0 parser generator; the plan
> >is to move to ielr.
> >
> >An example input is here:
> >
> >        %pkg parse
> >        %tok id = "/[a-z]*/"
> >        %tok Lbra = "("
> >        %tok Rbra = ")"
> >        %tok Plus = "+"
> >        %start expr
> >
> >        expr: term  {std.put("got a term\n")}
> >            | expr Plus term
> >            ;
> >
> >        term: id {std.put("got an id\n")}
> >            | Lbra expr Rbra
> >            ;
> >
> >        %myr {
> >        use std
> >        }
> >
> >For documentation, you'll mostly have to read the source, but a
> >summary:
> >
> >    %pkg:
> >       optional; this sets the package that the parse() function is in.
> >
> >    %tok name = pat
> >        Creates a token with the name 'name', and pattern 'pat'. Two
> >        types of pattern are accepted:
> >       - String patterns. These are quoted strings that are interpreted
> >              verbatim: "foo.*" will match the exact string 'foo.*'.
> >         - Regex patterns. These are quoted in slashes, and are treated
> >         as regexes. Capture groups, of course, aren't supported, since
> >              we compile to a DFA.
> >
> >    %skip name = pat
> >     Skips tokens; useful for, eg, ignoring whitespace. Should probably
> >        drop the name= bit.
> >
> >    %start sym
> >        Identifies the initial symbol that we use. This is mandatory.
> >
> >    %myr
> >        A single myrddin blob. This is injected at the end of the input
> >        (although, thanks to the order independence, the exact location
> >        is never important).
> >
> >A single function is exported:
> >
> >    const parse : (input : byte[:] -> bool)
> >
> >Things that are not supported yet:
> >
> >    - Usefully powerful grammars (lr0, hah)
> >    - Actions on reductions -- right now, the code snippets are no-ops.
> >    - Types and values
> >    - Unicode
> >    - Parsing from an input stream (currently uses a string)
> >    - Implicit tokens; I'd like to have strings usable directly in the
> >      grammar, instead of needing %tok directives.
> >    - Error handling; we just blow up.
> >
> >It's also buggy as fuck. So, 'tis a demo, not a useful tool. Yet.
> >
> >-- 
> >    Ori Bernstein
> 
> -- 
> Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity.


-- 
    Ori Bernstein

References:
Parser Generator: Demo Exists.Ori Bernstein <ori@xxxxxxxxxxxxxx>
Re: Parser Generator: Demo Exists.Ryan Gonzalez <rymg19@xxxxxxxxx>