Contents
Getting a copy
Myrddin runs on amd64 versions of Linux, OSX, FreeBSD, and 9front, with more ports warmly welcomed.
The easiest way to get an up to date copy of Myrddin is to install it from git. The language is still changing rapidly, and releases simply haven't made much sense to date (although it's actually getting there!)
To start off, download the dependencies. If you're on a Debian derivative, this should be sufficient:
sudo apt-get install bison gcc git make
Then, get the compiler:
git clone git://git.eigenstate.org/git/ori/mc.git
Building and installing it matches the traditional ./configure; make; make install dance:
cd mc # the directory you cloned into
You then run ./configure
, as usual. I generally configure with
--prefix=$HOME/bin
, instead of the default /usr/local
. If you chose a
nonstandard prefix, make sure that $prefix/bin
is in $PATH
, or the binaries
will not be found when you try to run commands.
./configure # probe the OS
make # build it
At this point, it would be good to make sure that the build you have works:
make check
This should succeed. Assuming that's the case, you can install.
make install
Optional: Editor Support
Shipped with Myrddin, but not installed by default, is autoindent and syntax
highlighting support for Myrddin for vim. To set this up, copy the files
into your ~/.vim
directory. From within the git checkout:
mkdir ~/.vim # if it doesn't already exist
cp -r support/vim/* ~/.vim
If you happen to like Howl, Ryan Gonzalez has contributed support for it, and it's available here: https://github.com/kirbyfan64/howl-myr. To install, follow the directions listed:
mkdir -p ~/.howl/bundles
cd ~/.howl/bundles
git clone https://github.com/kirbyfan64/howl-myr.git
Optional: Ctags Support
There's also a patched version of exuberant-ctags which can be used for indexing Myrddin code, available here: https://github.com/oridb/ctags-myr.git. This code can be built and installed with the following sequence of commands:
sudo apt-get build-dep exuberant-ctags
git clone https://github.com/oridb/ctags-myr.git
cd ctags-myr/
aclocal
automake
autoreconf
./configure
make
sudo make install
Hello, World
The way to learn a language is to write programs in it, and following the time worn traditions of the greats, our first program to write is one that will print the following words: `"Hello, world".
Once we have accomplished that, we've got a good starting point: The system is set up, and everything is working. So, type in the text below, or run it online if you so desire:
Type it into a file. The name doesn't matter, as long as it ends with
the .myr
suffix. I'm going to put mine into hello.myr
.
To compile it, the mbld
program that ships with Myrddin can be run like so,
from within the directory that contains your hello.myr
file:
mbld -b hello hello.myr
Something that resmebles following output should show up on your screen:
hello...
6m hello.myr
ld -o hello hello.o -L/home/ori/bin/lib/myr -lstd -lsys
The details of the output may vary by system, but at the end of the process,
and an output binary will be created in the directory in which you ran mbld
.
When you run the binary, you should get the following output:
$ ./hello
Hello World
The command mbld
is the Myrddin build tool, and is responsible for invoking
the appropirate compiler and linker commands in the correct order. It can be
configured directly from the command line, or configured via a file. For simplicity,
we will be using the command line configuration options.
The -b name
option tells mbld to produce a binary named name
, and the
remainder of the arguments are the inputs to combine into the binary. Any
arbitrary name can be selected.
Now that the code is working, an explanation is due. A Myrddin program
consists of declarations, types, and statements. A function is a sequence of
statements, enclosed by {' and
}, which are executed in order. In this
example, there is one declaration -- the constant
main, which is initialized
with a simple function containing one statement. Normally, functions can be
named anything, although the
mainfunction, when not in a namespace, is
special, since every program begins executing at the beginning of
main`.
The first line of the program,
use std
brings a number of declarations into scope. These declarations include
the function put
, which is used to output data to the terminal. The
constents of libstd are described in full here.
The declaration of main is slightly interesting, because unlike most languages, there is no distinction in Myrddin between function declarations and variable declarations. A function decaration is simply a constant with a function assigned to it, and a function is always an unnamed list of statements within curly braces.
const main = {;...}
In this example, the main function takes no parameters, as signified by the
lack of parameter names listed before the first ;
in the function.
The body of the main function
std.put("Hello World\n")
contains a single statement. This statement is a function call, calling
the function put
defined within the library std
, passing it a single
parameter, the string "Hello World\n"
.
Within a function body, any number of statements can be placed. So, for example, if we had written
the code would have executed both statements in the sequence denoted in the text.
Another feature to note here is the text enclosed within /*
and */
.
This text is known as a comment, and is ignored by the compiler. It is
used so that you, the programmer, can make notes to yourself.
More Complexity: Loops and Variables
The next program that we will write is going to introduce the concepts of variables, arithmetic, conditionals, and loops. The problem at hand is to print the word "Fizz" if a number is divisible by three, "Buzz" if the number is divisible by 5, and "Fizzbuzz" if it is divisible by both 3 and 5. It should run over the list of all integers from 1 to 100. An example run of the program should produce the following output:
1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
Fizzbuzz
16
...
The program still consists of a single function named main
, but the body
of it is more complicated. The body uses a for loop to iterate over the values
from 1 to 100, and several if statements to decide what to do with the number
that it is currently considering.
Comments
The first new feature we see reading this program is the text
/* the well known fizzbuz program */
which is the syntax for a comment, ignored by the compiler. Comments in Myrddin may appear anywhere that whitespace may show up -- between declarations, at the end of lines, and so on. Comments may nest, allowing entire blocks of code to be commented out, even if they have comments in them already -- something quite useful for quickly disabling some code for testing.
/* a comment /* holding a comment */ inside */
Following this are declarations. These were glossed over them when putting together the hello world program.
const main = {
var i
var start, end
Declarations are the fundamental building block of programs. Declarations are used to attach a name to a value. This value may either be constant, or it may be mutable -- in other words, it may be possible to change the value. Both cases of declared name are referred to, perhaps slightly confusingly, as variables. All variables, and indeed, all values in a Myrddin program, have a type attached.
To denote a value that may not be assigned to, the keyword const
is used,
as in const main
. To denote a value that may be modified, the keyword var
is used. Multiple variables may follow a single specifier, as we see on the
third line of the sample:
var start, end
Generally, the Myrddin compiler will be able to determine the types needed for a variable declaration without any help from the programmer, however some instances where there is insufficient information for the compiler to determine what type a variable must have. For example, if we wrote a function with a variable that was never assigned to, we would get a compilation error, complaining about an ambiguous type:
In this case, the variable is useless and can safely be deleted. However,
in some instances, we must annotate the variable with a type. This is done
by adding a :
after the name of the variable, followed by a type name. For
example, we could decide that we want x
in the above example to be a 64
bit integer:
Myrddin comes with a number of builtin types, covering integers, floating point values, characters, strings, and booleans. It also comes with facilities for building more complex types.
One quirk of Myrddin is that variables may be declared in any order, including after their use. The initialization for constants is done at compile time, and their values are fixed for the duratin of programs, however, variables are initialized at their first use. For example:
This allows for functions to be mutually recursive, but can lead to confusing code otherwise. It is strongly recommended stylistically to declare variables before their use within a function.
The next feature we see are a pair of assignments, An assignment statement
consists of a value that can be assigned to on the left hand side, known as
an lvalue
, and an expression on the right hand side that computes something,
known as an rvalue
. Assignment statements will set or overwrite the value
on their LHS to the value on the RHS.
start = 1
end = 100
i = start
Every expression statement is terminated by either a line ending or a
semicolon. Myrddin treats newlines and ;
as equivalent. Each expression
is executed in sequence, one after the other.
We want to make the same decision for every number between the start and end point of the range we're considering. In this instance, we use a while loop, although we will explore better options presently.
while i <= end
/* stuff */
;;
A while loop continously repeats the code within the block, descriptively abridged
as /* stuff */
in the snippet, until the condition becomes false. The
condition is a single expression which evaluates to a boolean true
or
false
. In this case, the loop will exit when i
becomes greater than end
,
making the test i <= end
false.
The body of a while
loop, and begins on the line following the while
keyword,
and continues until the ';;' which denotes the end of the block. This pattern
is a common syntactic construct in Myrddin.
The body of the loop contains a single if
statements. An if
statement will
execute its body if the condition is true. An elif
statement will chain
off of an if
statement, executing only if the condition for the if
was
false and the condition for the elif
was true. The final else
condition
acts as a catch-all statement, executing if none of the previous elements
in the if-elif chain was present.
if i % 5 == 0 && i % 3 == 0
std.put("Fizzbuzz\n")
elif i % 3 == 0
std.put("Fizz\n")
elif i % 5 == 0
std.put("Buzz\n")
else
std.put("{}\n", i)
;;
An if statmeent may be followed by zero or more elif
clauses, and at most
one else
clause.
In the final else
clause, we also notice one more feature of the std.put
function. The {}
within the first argument to std.put()
will act as a
placeholder for any future arguments passed to the function. These arguments
will be displayed in as sensible a way as Myrddin is capable of showing.
The expression in the first if clause is also worth examining in more detail:
i % 5 == 0 && i % 3 == 0
How does this expression work? In myrddin, all expressions are evaluated
from right to left taking precedence into account. The operator with the
lowest precedence in this expression is &&
, which is a lazy logical and
expression. It evaluates its left operand, i % 5 == 0
, and only if it is
true does it evaluate the right operand i % 3 == 0
. If both its left hand
side and right hand side evaluate to true, then the entire expression
evaluates to true. Otherwise, it evaluates to false.
The subexpression i % 5 == 0
first evaluate i % 5
. That is, it computes
if the remainder of i
divided by 5
using the modulus operator %
. This
result is then compared against the value of 0
, with the ==
operator
returning true or false as needed.
The final statement in the while loop, i++
, is another slightly odd
expression. It is a postincrement expression, which is equivalent to
evaluating i
, and then after the expression is evaluated, adding
one to it. It could be rewritten with the equivalent code
i
i = i + 1
Now, the program that we have written is not ideal. It would be better to modify it to use a for loop, and pattern match on the truth values:
This variant of the program accomplishes the same thing, but has a number of minor advantages and differences. The largest advantage being that it demonstrates some constructs that will show up elsewhere.
The first change of note is that the while loop has been replaced with a for loop. For loops come in two different forms. The one in use here is closely related to a while loop, and has the general structure:
for setup; condition; step
/* body */
;;
The expression in setup is evaluated exactly once before the loop is run. The
setup step can also contain an initialized declaration. Condition plays the
same role in a for loop as it does in a while loop, and the step is performed
once at the end of every iteration, after body
has run. Therefore, this
style of for loop can be easily transformed into a while loop like so:
setup
while condition
/* body */
step
;;
We have also replaced the chain of if statements with a match statement and a set of patterns. The expression
(i % 5 == 0, i % 3 == 0)
produces a sequence of values known as a "tuple". The first element of
the tuple contains whether i
is divisible by 5, and the second contains
whether it is divisible by 3. The code then applies the following patterns
and evaluates the first one that matches:
| (true, true): std.put("Fizzbuzz\n")
| (true, false): std.put("Fizz\n")
| (false, true): std.put("Buzz\n")
| (false, false): std.put("{}\n", i)
If both values in the tuple are true, then the first value matches. If the first value is true, and the second value is false, then the second case matches. And so on, through the list.
Match statements are fairly picly, and will not allow cases to go unmatched. They will also not allow duplicate cases which hide others. This means that if by some accident we had either forgotten a case, or mistyped it so that another case would have matched before the one we expected, the compiler would kindly tell us that we had made a mistake.
Pattern matches also allow us to extract values from data types. For example, we could put use the pattern:
| (true, divisible): std.put("Fizz divisible={}\n", divisible)
Once all of the conflicting cases were resolved, when the code was run
the compiler would only inspect the first value to be matched, and then store
the result of the second matched value into the variable divisible
.
Types
Functions
Functions in Myrddin are just values that support a ` but beyond that they are simply constants that have values assigned to them.
Input and Output
So far, all of our programs have been self contained. We ran them, and they gave some output. But in order to do something useful without forcing recompilation of the program for every instance of the problem, you will find that reading input and producing output is very useful.
The next program will be similar to the unix program cat
, however, instead
of simply displaying the output, we will unescape characters.
const main = {args : byte[:][:]
for arg in args
catfile(arg)
;;
}
const catfile = {path
var buf : byte[4096] /* 4k buffer */
var fd
fd = std.try(std.open(path, std.Rdonly))
while true
match std.read(fd, buf[:])
| `std.Ok 0: break
| `std.Ok n: std.write(1, buf[:n])
| `std.Fail e: std.fatal("failure reading {}: {}\n", path, e)
;;
;;
std.close(fd)
}
This code has several notable differences. First is the signature of
main
:
const main = {args : byte[:][:]
Here, main is given an argument named args
, which is of type byte[:][:]
.
This args
argument to main is how the command line arguments are passed to a
Myrddin program.
The notation type[:]
denotes a slice of types. Applying the [:]
to the
byte
variable, we get a slice of a slice of bytes. A slice is a sequence of
values, laid out in a contiguous sequence, with a length attached. Slices are
reference types, and may refer to subsections of other slices or of arrays.
The next line,
for arg in args
demonstrates the second form a for loop may take. The general form of this type of for loop is as follows:
for PATTERN in ITERABLE
where pattern can be any pattern allowed by a match statement, and iterable is either a slice or an array. The most common pattern, however, is a "match all" pattern.
This program is also the first that has defined a function other than main
.
catfile
is not strictly required, but it is generally good practice to break
a program into smaller functions. The catfile function takes a single argument
named path
, which contains the path to the file to be printed. It uses
the standard library functon std.open()
in order to open the file for
reading.
Two variables are declared at the opening of the function.
var buf : byte[4096] /* 4k buffer */
var fd
The variables declared, buf
and fd
, represent the buffer that Myrddin
code is read into, and the file descriptor that will be used for reading.
The declaration of buf
creates an array of 4096 bytes. This is where
we will store the result of the read from the file. Arrays in Myrddin are of a
fixed size, with the size kept as part of the type of the array. In other
words, int[2] and int[4] are incompatible. This may sound restrictive, but in
practice this interferes with litte, as slices are used pervasively.
This code checks for a failure in opening the file by std.try()
. This is a
convenience function that does little more than take a value wrapped in a
std.result
wrapper, check if it was successful, and unwrap it. If the result
is a failure, it exits the program with a message. For small programs, this is
convenient, although for long running programs it is better to explicitly
check the error, as we do when using std.read
on the next line.
Next, we have a loop that repeats infinitely, using the while true
construct. Until the loop is exited with a break
statement, the contents
will be repeated.
The bulk of the work is done in the match statement that is executed within the while loop.
match std.read(fd, buf[:])
| `std.Ok 0: break
| `std.Ok n: std.write(1, buf[:n])
| `std.Fail e: std.fatal("failure reading {}: {}\n", path, e)
;;
The first thing to notice is a new operator, the slice operator [:]
, which
is being appied to the buffer buf
. The slice operator takes a slicable
value on the left hand side, and an upper and lower bound on the right hand
side. If these bounds are omitted, they are substituted for 0 for the lower
bound, and 'lhs.len' for the upper bound. The lower bound is inclusive,
and the upper bound is exclusive, for symmetry with arrays.
The first use of this operator, therefore, takes a slice of the whole
buf
array, to pass to the read function. When passing it to the write
function, however, we only want to write the bytes that were read, so we
slice up to the length read.
The next thing to look at is the match statement, and how it works. If you look at signature of the std.read function, as described in the manual, is:
const read : (fd : std.fd, buf : byte[:] -> result(std.size, std.errno))
The parameters to this function are fairly clear. fd
is a file descriptor,
and buf
is a buffer to read into, and is a thin wrapper around the functions
provided by many common operating systems. The return value, however, is
something we have not yet seen, and warrants elaboration.
The return type, std.result(std.size, std.errno)
is a parameterized type.
It is defined as follows:
type result(@a, @b) = union
`Ok @a
`Fail @b
;;
So, std.result
is a parameterized union type, which is tagged. It can hold
one of two values:
`Ok @a
Or
`Fail @b
Where the type @a and @b are substituted in as parameters. The match statement
we are examining knows the tag of the union, and is considering both the tag
and the value contained in it. Knowing this, and combining the information
with the documentation for the std.read
function, we know that on end
of file, it returns a 0
for the length of the read. Knowing this, the first
case in the match,
| `std.Ok 0: break
will break out of the loop when we reach the end of the input file. The second case,
| `std.Ok n: std.write(1, buf[:n])
matches a \
std.Okwhere the value is anything. It does this because the
name
n` used in the pattern is not a constant or variable that is in scope,
which means that it is interpreted as a wildcard that binds the value in the
union. This means that the body of the pattern,
std.write(1, buf[:n])
is exectued, the n
used is the number of bytes that we have successfully
read. And finally, if we encounter an error reading the file, then the last
case has us covered.
| `std.Fail e: std.fatal("failure reading {}: {}\n", path, e)
will match an error read, using e
as the wild card to bind the value. When
the body of the statement is executed, the program will exit with a message
describing what went wrong reading the file.
Match statements require all cases to be covered, and the program will fail to compile if there is a value that could potentially matched against which will not be handled by at least one case. It is also an error to have multiple cases which will match the exact same values, because one of them will necessarily be redunant.
If there is a reason to want to ignore a value, you can use a wildcard as the
fallback case. the blackhole wild card (_
) is traditionally used for this.
So, for example, if it was desirable to ignore all errors, the match statement
in the loop could be rewritten to look like:
match std.read(fd, buf[:])
| `std.Ok 0: break
| `std.Ok n: std.write(1, buf[:n])
| _: /* ignore anything else */
;;