Eigenstate : libstd

Herin, I document Myrddin's standard library. It fits on one page. With time, hopefully the problematic parsimony will be solved.

Memory Management

/* single item allocation */
    generic alloc   : (-> @a#)
    generic zalloc  : (-> @a#)
    generic free    : (v : @a# -> void)

/* slice allocation */
generic slalloc : (len : size -> @a[:])
generic slzalloc    : (len : size -> @a[:])
generic slfree  : (sl : @a[:], len : size -> @a[:])

/* slice resizing */
generic slgrow  : (sl : @a[:], len : size -> @a[:])
generic slzgrow : (sl : @a[:], len : size -> @a[:])

/* byte allocation */
const bytealloc : (sz : size -> byte#)
const bytefree  : (m : byte#, sz : size -> void)

The functions in this section deal with memory. By convention, they will raise an error on failure to allocate memory, instead of trying to deal with the out of memory condition. On modern operating systems with overcommit, trying to correctly handle out of memory is futile. By convention, a zero slice [][:] is always safe to pass to the slice free or grow functions.

alloc allocates or free a single element of type @a, respectively. zalloc does the same, but zeros the memory allocated before returning it. free is used to return the memory allocated by these functions to the system.

slalloc allocates or frees a slice of len items of type @a. slzalloc does the same, but zeros the memory allocated before returning it. slfree is used to return the memory to the system.

slgrow resizes the slize sl to length len, allocating the appropriate amount of memory. slzgrow does the same, but any elements between the old and new length are zeroed, as in slzalloc.

bytealloc and bytefree are the low level raw-byte allocation interface, returning blobs of bytes. Since the only way to use more than one of these bytes is to cast to a different type, it is strongly recommended to avoid using these functions in user code.

String And Unicode Utils

const strcmp    : (a : byte[:], b : byte[:] -> order)
const strncmp   : (a : byte[:], b : byte[:], n : size -> order)
const streq : (a : byte[:], b : byte[:] -> bool)

const strfind   : (haystack : byte[:], needle : byte[:] -> option(size))

const strcat    : (a : byte[:], b : byte[:] -> byte[:])
const strjoin   : (strings : byte[:][:], delim : byte[:] -> byte[:])
const strsplit  : (s : byte[:], delim : byte[:] -> byte[:][:])

const strstrip  : (str : byte[:] -> byte[:])
const strfstrip : (str : byte[:] -> byte[:])
const strrstrip : (str : byte[:] -> byte[:])

These functions are documented twice because, although they are useful utility functions suitable for hash tables and such, they are also string utilities.

strcmp returns an order on two strings. strcmp does not handle normalization or canonicalization. strncmp is similar, although it only compares at most the first n bytes of the strings.

streq returns true if the strings would compare equal.

strfind finds the substring needle in the string haystack, and returns \Some indexif it finds it, or`None` if the string is not present.

strcat will join two strings, allocating a return buffer. It is equivalent to strjoin([a, b][:], "").

strjoin will take a slice of strings, and join them with the provided delimiter interjected between each element. So, for example, strjoin(["a", "b", "c"][:], ",") will return a new heap allocated string containing "a,b,c".

strsplit will split a string on a delimiter. It will allocate a slice to return the split segments in, however, the split segments will point into the original string. Duplicated delimiters will return an empty field. std.strsplit("a,b,,c", ",") will return the slice ["a","b","","c"][:].

strstrip and friends will strip spaces from a string. strstrip will remove both leading and trailing spaces. strfstrip will remove spaces scanning forward from the start of the string, and strrstrip will remove spaces scanning in reverse from the end of the string.

These functions return a subslice of the original input buffer, and do not allocate or modify their input.

Unicode types

const Badchar   : char = -1 castto(char)
const Maxcharlen : size = 4
const Maxcharval : char = 0x10FFFF

const charlen   : (chr : char -> size)
const encode    : (buf : byte[:], chr : char -> size)
const decode    : (buf : byte[:] -> char)
const striter   : (str : byte[:] -> [char, byte[:]])

Badchar is a value returned by libstd when it cannot decode a character. Maxcharlen is the maximum size of a utf8 sequence. Maxcharval is the maximum codepoint that is available in Unicode.

charlen returns the length of the character buffer that would be needed to encode a character chr.

encode and decode respectively convert a single character to a utf8 encoded byte buffer, and vice versa.

striter is a slightly tricky to use interface for iterating over unicode encoded strings. A better API is needed here.

Example

const walkstring = {str
    var c
    while str.len > 0
        (c, str) = std.striter(str)
        consume(c)
    ;;
}

Character Types

const isalpha   : (c : char -> bool)
const isnum     : (c : char -> bool)
const isalnum   : (c : char -> bool)
const isspace   : (c : char -> bool)
const islower   : (c : char -> bool)
const isupper   : (c : char -> bool)
const istitle   : (c : char -> bool)

All of the above functions take a char (ie, utf codepoint) and returns a boolean representing whether the predicate holds.

const tolower   : (c : char -> char)
const toupper   : (c : char -> char)
const totitle   : (c : char -> char)

These functions convert characters to upper, lower, or title case respectively. Because they work on a character by character basis, they are not always correct for unicode, where, for example, toupper(ß) should return SS. However, because SS is two codepoints, this API does not work correctly.

System Calls

type pid    = int64
type scno   = int64 /* syscall */
type fdopt  = int64 /* fd options */
type fd     = int64 /* fd */
type mprot  = int64 /* memory protection */
type mopt   = int64 /* memory mapping options */
type socktype   = int64 /* socket type */
type sockproto  = int64 /* socket protocol */
type sockfam    = uint16    /* socket family */
type whence = uint64

type clock = union
    `Clockrealtime
    `Clockmonotonic
    `Clockproccpu
    `Clockthreadcpu
    `Clockmonotonicraw
    `Clockrealtimecoarse
    `Clockmonotoniccoarse
    `Clockboottime
    `Clockrealtimealarm
    `Clockboottimealarm
;;

type timespec = struct
    sec : uint64
    nsec    : uint64
;;

type timeval = struct
    sec : uint64
    usec    : uint64
;;

type rusage = struct
    utime   : timeval   /* user time */
    stime   : timeval   /* system time */
;;

type statbuf = struct
     dev     : uint
     ino     : uint
     mode    : uint16
     nlink   : uint16
     uid     : uint16
     gid     : uint16
     rdev    : uint
     size    : uint
     blksize : uint
     blocks  : uint
     atime   : uint
     atimens : uint
     mtime   : uint
     mtimens : uint
     ctime   : uint
     ctimens : uint
     _unused1: uint
     _unused2: uint
;;

type utsname = struct
    system  : byte[65]
    node    : byte[65]
    release : byte[65]
    version : byte[65]
    machine : byte[65]
    domain  : byte[65]
;;

type sockaddr = struct
    fam : sockfam
    data    : byte[14]
;;

type sockaddr_in = struct
    fam : sockfam
    port    : uint16
    addr    : byte[4]
    zero    : byte[8]
;;

type sockaddr_storage = struct
    fam : sockfam
    __align : uint32
    __pad   : byte[112]
;;

These types represent the arguments and return values for a large number of system calls. They are used in the functions listed below.

extern const syscall    : (sc:scno, args:... -> int64)

Syscall() takes a system call number, arguments, returning -errno. It's up to the caller to match the correct ABI.

/* process management */
const exit  : (status:int -> void)
const getpid    : ( -> int64)
const kill  : (pid:int64, sig:int64 -> int64)
const fork  : (-> int64)
const wait4 : (pid:int64, loc:int32#, opt : int64, usage:rusage#    -> int64)
const waitpid   : (pid:int64, loc:int32#, opt : int64   -> int64)
const execv : (cmd : byte[:], args : byte[:][:] -> int64)
const execve    : (cmd : byte[:], args : byte[:][:], env : byte[:][:] -> int64)


/* fd manipulation */
const open  : (path:byte[:], opts:fdopt, mode:int64 -> fd)
const close : (fd:fd -> int64)
const creat : (path:byte[:], mode:int64 -> fd)
const read  : (fd:fd, buf:byte[:] -> size)
const write : (fd:fd, buf:byte[:] -> size)
const lseek : (fd:fd, off:uint64, whence:int64 -> int64)
const stat  : (path:byte[:], sb:statbuf# -> int64)
const fstat : (fd:fd, sb:statbuf# -> int64)
const mkdir : (path : byte[:], mode : int64 -> int64)

/* networking */
const socket    : (dom : sockfam, stype : socktype, proto : sockproto   -> fd)
const connect   : (sock : fd, addr : sockaddr#, len : size -> int)
const accept    : (sock : fd, addr : sockaddr#, len : size# -> fd)
const listen    : (sock : fd, backlog : int -> int)
const bind  : (sock : fd, addr : sockaddr#, len : size -> int)

/* memory mapping */
const munmap    : (addr:byte#, len:size -> int64)
const mmap  : (addr:byte#, len:size, prot:mprot, flags:mopt, fd:fd, off:off -> byte#)

/* time */
const clock_getres  : (clk : clock, ts : timespec# -> int32)
const clock_gettime : (clk : clock, ts : timespec# -> int32)
const clock_settime : (clk : clock, ts : timespec# -> int32)
const sleep : (time : uint64 -> int32)
const nanosleep : (req : timespec#, rem : timespec# -> int32)

/* system information */
const uname     : (buf : utsname# -> int)

All of these functions match the standard system calls documented in syscall-name(2). These are merely Myrddin wrappers for them, which will call the 'syscall()' function with the appropriate arguments. They may do some transformation to the arguments, or in some cases, wrap other system calls for cross platform compatibility. For example, Darwin (ie, OSX) does not have a uname() system call, so this code makes the appropriate sysctl requests to fill the data structure.

const Sys$(syscall_name) = ...

All the system call numbers are exposed from libstd. These system calls are platform specific, and probably should not be used directly.

Networking

DNS Resolution

type rectype = uint16

const DnsA  : rectype = 1  /* host address */
const DnsNS : rectype = 2  /* authoritative name server */
const DnsMD : rectype = 3  /* mail destination (Obsolete - use MX) */
const DnsMF : rectype = 4  /* mail forwarder (Obsolete - use MX) */
const DnsCNAME  : rectype = 5  /* canonical name for an alias */
const DnsSOA    : rectype = 6  /* marks the start of a zone of authority */
const DnsMB : rectype = 7  /* mailbox domain name (EXPERIMENTAL) */
const DnsMG : rectype = 8  /* mail group member (EXPERIMENTAL) */
const DnsMR : rectype = 9  /* mail rename domain name (EXPERIMENTAL) */
const DnsNULL   : rectype = 10 /* null RR (EXPERIMENTAL) */
const DnsWKS    : rectype = 11 /* well known service description */
const DnsPTR    : rectype = 12 /* domain name pointer */
const DnsHINFO  : rectype = 13 /* host information */
const DnsMINFO  : rectype = 14 /* mailbox or mail list information */
const DnsMX : rectype = 15 /* mail exchange */
const DnsTXT    : rectype = 16 /* text strings */
const DnsAAAA   : rectype = 28 /* ipv6 host address */


type resolveerr = union
    `Badhost
    `Badsrv
    `Badquery
    `Badresp
;;

type hostinfo = struct
    fam : sockfam
    stype   : socktype
    ttl : uint32
    addr    : netaddr
/*
    flags   : uint32
    addr    : sockaddr[:]
    canon   : byte[:]
*/
;;

const resolve   : (host : byte[:]   -> error(hostinfo[:], resolveerr))
const resolvemx : (host : byte[:]   -> error(hostinfo[:], resolveerr))
const resolverec    : (host : byte[:], t : rectype  -> error(hostinfo[:], resolveerr))

DNS is implemented in libstd through the resolve() family of functions. 'resolve()' defaults to looking up A records, although when IPv6 support lands fully, it will return both A and AAAA records.

resolvemx() will look up an MX record, and resolverec() will resolve any record type passed to it. This API will need to change in order to return all useful and relevant information.

Dialing Hosts

const dial  : (dialstr : byte[:] -> error(fd, byte[:]))

Dial wraps up the ugliness of raw BSD sockets, allowing you to connect to a host with a Plan 9 style dial string, of the form::

"proto!host-or-ip!port"

Protocol is one of tcp, udp, unix. No other protocols are currently supported, although adding support would be a good thing. Host-or-ip currently only supports DNS A records or IPv4 IP addresses. Ports may be any name in /etc/services, or a numeric port number.

Extrema

generic min : (a : @a::numeric, b : @a::numeric  -> @a::numeric)
generic max : (a : @a::numeric, b : @a::numeric  -> @a::numeric)

As one would expect, min() returns the minimum of it's two arguments, and max() returns the maximum of the two arguments, as determined by the '<' and > operators respectively. If the two values are equal, the first argument will be returned.

Slice Utilities

generic slcp    : (dst : @a[:], src : @a[:] -> void)
generic sldup   : (sl : @a[:] -> @a[:])
generic sleq    : (a : @a[:], b : @a[:] -> bool)
generic slfill  : (sl : @a[:], v : @a   -> @a[:])
generic sljoin  : (dst : @a[:], src : @a[:] -> @a[:])
generic slpush  : (sl : @a[:], elt : @a -> @a[:])
generic slinsert    : (sl : @a[:], idx : size, elt : @a -> @a[:])

These utility functions are used for manipulating slices of any type. By convention, if they return a value, that value is allocated on the heap. Any exceptions will be noted.

slcp copies elements from slice src to slice dst with a shallow copy. Both dst and src must be the same length, or the program will abort.

sldup duplicates a slice, allocating it on the heap with slalloc(), and then copying the elements over withslcp()`.

sleq compares two slices for equality, using the == operator elementwise.

slfill fills all elements in a slice with the value provided. This is useful to, for example, fill an array with default elements.

sljoin will take a slice dst, and append src to it destructively.

slpush will take a slice sl, and append elt to it destructively, growing if needed.

slinsert will take an element elt and insert it at an arbitrary index idx, shifting the other elements downwards in the array.

Misc Utility Functions

type order = union
    `Before
    `Equal
    `After
;;

/* comparison functions */
generic numcmp  : (a : @a, b : @a -> order)
const   strcmp  : (a : byte[:], b : byte[:] -> order)
const   strncmp : (a : byte[:], b : byte[:], n : size -> order)

/* hash functions */
const   strhash : (s : byte[:]  -> uint32)
generic ptrhash : (p : @a#  -> uint32)
generic inthash : (v : @a::(integral,numeric)   -> uint32)

/* equality functions */
generic inteq   : (a : @a::(integral,numeric), b : @a::(integral,numeric) -> bool)
const   streq   : (a : byte[:], b : byte[:] -> bool)
generic ptreq   : (a : @a#, b : @a# -> bool)

These utility functions are designed to be used as arguments to a variety of other functions, for example, generic algorithms, hash tables, and similar. They are implemented in a suitable way for some more common data types that a Myrddin program will use.

The comparison functions define an order on their two arguments, and will return \Beforeifa < b,`Equalifa == b, and`Afterif a > b`.

The hash functions will return a 32 bit, non cryptographic hash usable for hash tables, sets, and other program-internal hashing operations.

The equality functions will return true if a == b, and false in all other cases.

Generic Algorithms

Sorting

generic sort    : (sl:@a[:], cmp:(a:@a, b:@a -> order) -> @a[:])

sort() is a generic sort function that orders based on a user provided comparison function cmp. This function must return an order, where for any two elements, cmp(a, b) will return \Beforeifa < b, `Equalifa == b, and`Afterifa > b`. This ordering must be transitive, or algorithms will fail.

Error Handling

type error(@a, @b) = union
    `Success    @a
    `Failure    @b
;;

type option(@a) = union
    `Some @a
    `None
;;

Error handling in Myrddin is limited, and library support is merely down to providing two types for it. error(@a, @b) is used for where you want to indicate failure with a value. option(@a) is used for where you want to indicate that a value may or may not be present.

Option Parsing

type optctx = struct
    /* public variables */
    args    : byte[:][:]

;;

const optinit   : (optstr: byte[:], optargs : byte[:][:] -> optctx#)
const optnext   : (ctx : optctx# -> [char, byte[:]])
const optdone   : (ctx : optctx# -> bool)
const optfin    : (ctx : optctx# -> byte[:][:])

optctx functions are used for parsing command line arguments. They take an option string in the style of getopt(3). In other words, they take a string of characters abcde:f?g:. Each character represents a valid option, except for the two metacharacters : and ?. These represent a mandatory or optional argument, respectively.

Option parsing stops at the first -- but other than that, all options are order independent, may be repeated, and may be seen at any point in the program. Arguments that do not take arguments may be glommed together into a single argument. If an optional argument is not present, it is represented by an empty string. This should be fixed so that an option type is used.

For example, with the option string "abc:d?, the following parses will result: ::

-ab         => -a -b
-a -b           => -a -b
-abc            => Error, 's' has no argument
-abcd           => -a -b -c "d"
-abc str        => -a -b -c "str"
-ab -c str      => -a -b -c "str"
-ab -c str -d       => -a -b -c "str"
-ab -c str -d       => -a -b -c "str" -d
-ab -c str -d xyz   => -a -b -c "str" -d "xyz"

An example of the usage of these functions is below:

use std
const main = {args
    var optctx

    optctx = std.optbegin("abc:d?", args)
    while !std.optdone(optctx)
        match std.optnext(optctx)
        | ('a', _): std.put("Got option 'a'\n")
        | ('b', _): std.put("Got option 'b'\n")
        | ('c', str):   std.put("Got option 'c' with arg %s\n", str)
        | ('d', str):   std.put("Got option 'c' with arg %s\n", str)
        ;;
    ;;
}

Variadic Functions

const vastart   : (args : ...* -> valist)
generic vanext  : (ap : valist -> [@a, valist])

These functions are used to iterate through a variadic argument list. They have no type safety, and no good error handling. They can crash your program or silently pull out incorrect values if used incorrectly, and will be a pain to debug. There are a number of ideas to fix it so that they are at least runtime type safe, but for now, this is what you got.

vastart converts a variadic argument list type (...) to a valist type, which can be iterated for arguments. vanext will iterate through this, pulling one value out of the argument list, and returning it along side the rest of the list.

A usage example is below:

use std

const main = {
    /* easy as... */
    f(1, 2, 3, 'a', 'b', 'c')
}

const f = {args : ...
    var ival : int
    var cval : char
    var ap

    ap = std.vastart(&args)

    /* 3 integers */
    (ival, ap) = vanext(ap)
    std.put("first int: %i\n", ival)
    (ival, ap) = vanext(ap)
    std.put("second int: %i\n", ival)
    (ival, ap) = vanext(ap)
    std.put("third int: %i\n", ival)

    /* 3 chars */
    (cval, ap) = vanext(ap)
    std.put("first char: %c\n", cval)
    (cval, ap) = vanext(ap)
    std.put("second char: %c\n", cval)
    (cval, ap) = vanext(ap)
    std.put("third char: %c\n", cval)
}

Data Structures: Hash Tables

type htab(@k, @v)

generic mkht    : (h : (k : @k -> uint32), eq : (a : @k, b : @k -> bool) -> htab(@k, @v)#)
generic htfree  : (ht : htab(@k, @v)# -> void)

generic htput   : (ht : htab(@k, @v)#, k : @k, v : @v -> void)
generic htdel   : (ht : htab(@k, @v)#, k : @k -> void)
generic htget   : (ht : htab(@k, @v)#, k : @k -> option(@v))
generic htgetv  : (ht : htab(@k, @v)#, k : @k, fallback : @v-> @v)
generic hthas   : (ht : htab(@k, @v)#, k : @k -> bool)

generic htkeys  : (ht : htab(@k, @v)# -> @k[:])

mkht and htfree respectively allocate and free a hash table. The hash table will use the hash and comparsion functions provided. It does not allocate or free nodes.

htput, htdel, hthas, htget, and htgetv, respectively insert, delete, query the presence of, and retrieve keys from the hash table. htgetv allows retrieval from the hash table that will never fail, always returning a fallback.

htkeys will return a list of all keys currently in the hash table, allowing iteration over them.

Example

use std

const main = {
    var ht : std.htab(byte[:], int)#
    var keys

    ht = std.mkht(std.strhash, std.streq)
    std.htput(ht, "abc", 123)
    std.htput(ht, "xyz", 987)
    if !std.hthas(ht, "abc")
        std.die("our hash table is broken")
    ;;
    match std.htget(ht, "xyz")
    | `std.Some v:  std.put("Got val %i\n", val)
    | `std.None:    std.put("No value present\n")
    ;;
    std.put("Defaulted value: %i\n", std.htgetv(ht, "nope", 42))
    keys = std.htkeys(ht)
    for k in keys
        std.put("%s -> %i\n", k, std.htgetv(ht, k, -1))
    ;;
}

Data Structures: Bit Sets

type bitset

const mkbs  : (-> bitset#)
const bsfree    : (bs : bitset# -> void)
const bsdup : (bs : bitset# -> bitset#)

generic bsput   : (bs : bitset#, v : @a::(integral,numeric) -> void)
generic bsdel   : (bs : bitset#, v : @a::(integral,numeric) -> void)
generic bshas   : (bs : bitset#, v : @a::(integral,numeric) -> bool)

const bsdiff    : (a : bitset#, b : bitset# -> void)
const bsintersect   : (a : bitset#, b : bitset# -> void)
const bsunion   : (a : bitset#, b : bitset# -> void)

const bseq  : (a : bitset#, b : bitset# -> bool)
const bsissub   : (set : bitset#, sub : bitset# -> bool)

const bsclear   : (bs : bitset# -> bitset#)
const bscount   : (bs : bitset# -> bitset#)

Libstd ships with an implementation of simple bitsets. These bitsets perform best on relatively low valued, densely packed sets of integers, and take space proportional to the highest value inserted into them. They do no compression.

mkbs, and bsfree, respectively, allocate and free bitsets. bsdup duplicates a bitset, which must be bsfreeed after use.

bsput, bsdel, and bshas respectively insert, remove, and query for the presence of a value in the bitset.

bsdiff, bsintersect, and bsunion perform the set operations that you would expect them to, in O(n) time.

bseq checks to see if a is equal to b, whilebsissubchecks ifsubis a subset ofset`.

bsclear clears all set values in bs, returning an empty set.

bscount iterates over all the items in set, returning the total count of items.