Eigenstate: myrddin-dev mailing list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Parser Generator: Demo Exists.


So, it's far from done, and is currently completely useless, but enough is
working that I feel like I can announce that it exists:

    http://git.eigenstate.org/ori/mpgen.git

This currently implements a lexer and an lr0 parser generator; the plan
is to move to ielr.

An example input is here:

        %pkg parse
        %tok id = "/[a-z]*/"
        %tok Lbra = "("
        %tok Rbra = ")"
        %tok Plus = "+"
        %start expr

        expr: term  {std.put("got a term\n")}
            | expr Plus term
            ;

        term: id {std.put("got an id\n")}
            | Lbra expr Rbra
            ;

        %myr {
        use std
        }

For documentation, you'll mostly have to read the source, but a summary:

    %pkg:
        optional; this sets the package that the parse() function is in.

    %tok name = pat
        Creates a token with the name 'name', and pattern 'pat'. Two
        types of pattern are accepted:
            - String patterns. These are quoted strings that are interpreted
              verbatim: "foo.*" will match the exact string 'foo.*'.
            - Regex patterns. These are quoted in slashes, and are treated
              as regexes. Capture groups, of course, aren't supported, since
              we compile to a DFA.

    %skip name = pat
        Skips tokens; useful for, eg, ignoring whitespace. Should probably
        drop the name= bit.

    %start sym
        Identifies the initial symbol that we use. This is mandatory.

    %myr
        A single myrddin blob. This is injected at the end of the input
        (although, thanks to the order independence, the exact location
        is never important).

A single function is exported:

    const parse : (input : byte[:] -> bool)

Things that are not supported yet:

    - Usefully powerful grammars (lr0, hah)
    - Actions on reductions -- right now, the code snippets are no-ops.
    - Types and values
    - Unicode
    - Parsing from an input stream (currently uses a string)
    - Implicit tokens; I'd like to have strings usable directly in the
      grammar, instead of needing %tok directives.
    - Error handling; we just blow up.

It's also buggy as fuck. So, 'tis a demo, not a useful tool. Yet.

-- 
    Ori Bernstein

Follow-Ups:
Re: Parser Generator: Demo Exists.Ryan Gonzalez <rymg19@xxxxxxxxx>