Eigenstate : Myrddin C Binding Generation

Automatic C Binding Generation for Myrddin

The Myrddin C binding generator, mcbind, now exists.

Until recently, creating Myrddin bindings for C libraries was possible, but painful. The functions had to be wrapped up by hand, one by one, making sure that the extern declarations in Myr matched the ones in C. If there was a mismatch, you'd still get a silent failure.

This approach is still possible, but I've been working on a way to generate this kind of code automatically.

Building on qc, the C compiler written in Myrddin by Andrew Chambers, I've put together code that parses C headers, and spits out the glue code.

It's very rough around the edges -- just barely past demoware -- but I can compile and link to significant modules like libsqlite3, as well as the basics like libc.

What Exists

The implementation lives here:

https://git.eigenstate.org/ori/mcbind.git/tree/

It installs a program named mcbind, which works like so:

 mcbind [-h?] [-b pkg] [-I inc] [-D def] [-l lib] file.h...
     -h      print this help message
     -?      print this help message
     -b pkg  generate bindings
     -I inc  add 'inc' to your include path
     -D def  define a macro in the preprocessor e.g. -Dfoo=bar
     -l lib  link against library 'lib'

You can use it like this:

mcbind -lc -lsqlite3 \
    -I /usr/include -I/usr/local/include \
    -D__GNUCLIKE_BUILTIN_STDARG \
    -b sqlite3 \
    sqlite3.h

Or you can use it from a bld.proj file:

bin foo =
    # The code that uses the bindings.
    # This is what you write.
    main.myr

    # The generated source code, from
    # the gen rule before.
    stdio.myr
    stdio.glue.c
;;

# Generate the source files.
gen stdio.myr stdio.glue.c {dep=stdio-incs.h} =
    mcbind -lc -I /usr/include -b stdio stdio-incs.h
;;

That command generates a giant module with all of the sqlite3 types (and everything included by sqlite3.h). You can call it like this:

use "sqlite3"

const main = {
    rc = sqlite3.sqlite3_open(("db.sqlite\0" : byte#), &db)
    if rc != 0
        std.fatal("error: {}\n", rc)
    ;;
    rc = sqlite3.sqlite3_exec(db,
        ("SELECT * FROM table;\0" : byte#),
        (callback : sqlite3.cfunc#),
        (0 : void#), &errmsg)
    if rc != 0
        std.fatal("error: {} ({})\n", 
            rc, std.cstrconvp(errmsg))
        sqlite3.sqlite3_free((errmsg : void#))
    ;;
    sqlite3.sqlite3_close(db)
}

Unfortunately, the chosen example, Sqlite3, needs a small amount of tweaking: The header exports some functions that are not in the binary, which causes missing symbols -- these need to be removed by hand right now.

I've updated the mcbind example code to reflect the current state of the art, here: https://git.eigenstate.org/ori/cbind-example.git

It's been tested on one platform with a small handful of test cases, so expect bugs. However, barring bugs and warts, it's intended to work on all platforms except Plan 9, where the Myrddin ABI does not match up in any usable way with the C ABI.

Next Steps

There are a number of known rough edges. Strings need to explicitly be null terminated, since Myrddin allows slices into strings. They need to be cast to a C string explicitly. And we don't currently strip C namespace prefixes from functions, so you get sqlite3.sqlite3_foo, instead of the preferred sqlite3.foo. And macros are currently ignored entirely, which means defined constants are often missing. The generated C code contains far too many warnings, some of which would be trivial to fix, like paying attention to 'const' on parameters.

Some of these, such as getting basic defined constants, are shallow problems that only require a bit of typing to solve.

In addition, there are a number of defines that trigger other defines, which are needed in order for some code to compile correctly. Over time, we'll accumulate these.

The harder part of the problem comes down to namespacing. I'm thinking of writing something inspired by the C++ module proposal, with changes to deal with the fact that Myrddin namespaces are kind of coupled to the symbol name.

This means that to create a binding, you'd write something along the lines of:

bind zlib_c {
    /*
      we don't want zip.int32_t, we want cstd.int32_t
      to be shared with all C binding code
     */
    depend "cstd"
    include "zlib.h"
}

A more complicated example might go along these lines:

bind gtk {
    depend "cstd"

    /* use pkg-config to get the include path and headers */
    pkg "gtk+-3.0"
    header "gtk/gtk.h"

    /*
     * GTK includes headers for a lot of deps,
     * we should only export the ones we care
     * about.
     */
    export "@/gtk/*"

    /*
     * we already have gtk in the namespace, strip off
     * this prefix so we can do `gtk.foo` instead of
     * `gtk.gtk_foo
     */
    prefix = gtk_
}

And possibly, we might want a way of mapping improved names and inserting automatic transformers between the types, so that we can automatically convert char* and Myrddin strings around call sites.

This may take the approach of pointing the code generator at Myrddin source that implements the more specific functions:

const strstr = {str : byte[:]
    _strstr(cbindutil.cstr(str))
}

This leaves C strings, which I think are simply going to continue to be painful -- there's no good, general solution, so wrapping with functions is likely to be the best choice.

How It Works

The binding generator is implemented in pure Myrddin code. It's conceptually composed of three parts:

The C generator outputs a file named pkg.glue.c. The Myrddin build system already knew about .glue.c files, which are C source, compiled with a C compiler, but which follow some conventions to simplify the compilation and linking process.

The first of these conventions is that these file are standalone. They don't depend on any other locally included headers, which means that there is no need for dependency tracking -- each file is standalone.

The second of these conventions is that they contain comments with the cflags and libraries that the C code uses:

/* CFLAGS: -I/usr/local/include -DFOO -DBAR */
/* LIBS: c x11 crypt */

The cflags are passed to the C compiler when compiling the .glue.c file to a .glue.o file. The libs are passed along to the final link, where they are added as dynamic dependencies.

Functions are wrapped up with name-mangled wrappers. The Myrddin name mangling scheme for functions is simply

namespace$function

So the output of the C generator looks something like:

int 
sqlite3$sqlite3_open(char (*_a0), sqlite3 (**_a1))
{
    return sqlite3_open(_a0, _a1);
}

The Myr generator creates a .myr file with extern declarations that forwards to these implementations, something like:

pkg sqlite3 =
    extern const sqlite3_open : \
        (a0 : char#, a1 : sqlite3## -> int)
;;

The code for generating C declarations from the AST is fairly straightforward. It lives here. It is a fairly straightforward recursive walk of the AST.

The code for generating Myrddin glue is more complicated, due to C concepts that do not translate directly. The code needs to deal with incomplete types, tentative declarations, and colliding names. It lives here.

The C parser exists here, again implemented in pure Myrddin.

It's pretty gratifying to be able to parse C well enough to generate bindings for system libraries -- without a single external dependency or library.