Porting Myrddin to a New OS
This guide describes the starting point for porting the Myrddin programming language to a new operating system. It does not cover the porting of the compiler to a new target CPU.
The Compiler:
Because Myrddin lives in it's own little world, there is actually very little to do in the compiler. It depends on a number of external programs such as a gnu 'as'-style assembler, 'ld'-style linker, and traditional unix 'ar'. You need to tell the compiler what the program names for these are. They are defined in the 'configure' script.
An example for OSX is below:
*Darwin*)
echo '#define Asmcmd "as -g -o %s %s"' >> config.h
echo '#define Symprefix "_"' >> config.h
echo 'export SYS=osx' >> config.mk
;;
The 'Asmcmd' define in config.h defines the command used to run the system assembler, which currently must accept Gnu syntax. The 'Symprefix' define tells us that symbols are expected to be prefixed with a value for mangling, and the 'SYS=osx' define is used by libstd to decide which system specific files to compile into the library.
For systems that do not provide a posix-like API, and a gas-like assembler, there is a bit more work needed. Porting to a new assembler would involve changing the instruction definitions in 6/insns.def, and writing the compiler against a non-posix API would involve wrapping a small number of system calls.
Because the compiler's generated code does not generally interact with code outside of the Myrddin ecosystem, there is very little OS dependence within the compiler itself.
Myrbuild:
Myrbuild is currently implemented twice, once in C in mc/, and once in Myrddin in its own repository. Both should be ported if needed, although it is quite likely that nothing will need to be done. Currently, the only OS-specific code is for OSX, used to silence a linker warning.
The dependence on the system linker is kind of ugly, and cross compiling is currently not possible. Long term, I would like to remove the dependency on the system linker, and provide our own portable one, with a final post-processing pass that converts to platform-native binaries.
The Libraries:
The only library that should need to be ported as things stand is libstd, which is shipped with the compiler. This libstd exposes a lowest common denominator interface to the system, as well as whatever system calls are needed to export system-specific functionality from other libraries. Unfortunately, there currently is nothing specific separating the two, which means it takes some discipline to write code portably. This should probably be fixed at some point, and the organization of libstd/ should be reconsidered.
Libstd has a number of files that need to be ported:
start-$(SYS).s:
This file contains the init code for the program. This code will:
- set up the program arguments as a slice of utf8 encoded byte slices, as opposed to what the OS gives it (usually, a char** and int size).
- Do the same for the environment pointer, as well as stashing it in a variable called 'std._environment' (using whatever name mangling is used for the OS)
- If it is useful, store a copy of the raw environment pointer in 'std.__cenvp'. On Linux and OSX, this is used for some exec variants that don't care to do anything special with the environment.
- Call main()
This file will likely be quite similar across multiple posix-like OSes, and may benefit from being split into a common portion, if it turns out to be heavily duplicated.
syscall-$(SYS).s:
This file contains the code for entering the kernel. It saves the registers that the kernel is allowed to clobber, sets up the arguments for the kernel, and does the system call. It then restores the saved registers and returns.
This code may actually turn out to be common across all amd64 sysv unixes, and we may want to common out this file as well somehow.
sys-$(SYS).myr
This is the file that tends to be a chore to port. It is designed to provide an interface to all of the operating system's useful system calls, and enough of a compatibility wrapper around them to allow the rest of libstd to use them portably when needed.
The structures passed to the kernel need to be carefully checked against the C headers of the target operating system to ensure compatibility, as do the system call numbers and flags.
I strongly suggest finding the header containing all the system call numbers, and applying sed to it to convert it. On OSX, the header is in
/usr/include/sys/syscall.h
. On Linux, it is in/usr/include/bits/asm-generic/syscall.h
.waitstatus-$(SYS).myr
The result of wait() is highly OS specific, so it is implemented in it's own file.
Perhaps it should be inlined into sys-$(SYS).myr, and wait() should stop returning raw integer values.
dir-$(SYS).myr
And, again, directory reading seems to be quite os-specific, with different calls and types returned from the kernel. This implements a portable directory reading interface.
ifreq-$(SYS).myr
This provides ifreq structs for networking ioctl calls. These are not entirely portable, and the ones that are may be defined differently for each OS that is being ported to.