Porting Myrddin to Plan 9
Porting Myrddin to new systems is not especially difficult. It is especially easy when porting to a Posixy system that uses the GNU toolchain, because all of the parts are in place. However the amout of system specific code that Myrddin depends on is small, and porting to more exotic systems is still easy.
There were a number of parts to the Plan 9 port. First, the backend needed to be taught how to generate Plan 9 assembly and object files. Second, the startup code needed to be written. At that point, a file that does no input and output will run.
From there, the libraries needed to be ported. They were initially designed with a clean separation between system specific code and shared code, and porting to Plan 9 only helped increase the quality of this separation.
The Differences
Plan 9 is not Unix. It isn't even Posix. It bears a family resemblance, but it does many things very differently. To produce a compiler that is natively usable takes some work. To make that compiler produce binaries that behave natively is a bit more work on top of that.
The largest differences that need to be handled:
- The C compiler, assembler and linker are completely different. Myrddin currently depeneds on the system toolchain to generate working binaries, and needs to be adapted to use the Plan 9 compiler and linker.
- Plan 9 system calls are completely different from the Posix calls. This means that the standard library needs fairly different implementations of a number of primitive calls.
- Plan 9 does not use the SysV ABI for system calls. This means that the existing system call code is not going to work, and will need to be ported.
- Relatedly, the Plan 9 system startup sequence is completely different from
Unix. The only real commonality is that argc and argv get passed as
arguments. The environment comes in through the file system in
/env
, instead of as a third argument tomain
. The top of stack contains some control data that needs to be stashed. And the entry point is called_main
instead of_start
. - The file system layout is not unixy. Things get installed into different locations. In order to feel native, Myrddin needs to respect this.
- Many things that are handled in libraries on Unix are done via system services in Plan 9. In order to be a good citizen, Myrddin should use the system services.
Looking at this list, it turns out that the work isn't so hard. The biggest problems with porting involve dealing with the non-posix system calls. The rest have more to do with being a good citizen than with actual technical challenges.
The Compiler
While it's strictly possible to do the port with a cross compiler, it feels a great deal less clunky to do the work natively on the target system. Plan 9 comes with APE, the Ansi/Posix Emulation layer. It also comes with Posix make, which doesn't deal with the Gnu makefiles that I ship with. A port of GNU Make is available. However, right way was to ship with mkfiles instead.
APE is neither especially comprehensive nor solidly implemented, and I ran
into a few issues with missing C99 headers such as stdint.h
, as well as
a minor bug with printf
implementing %lld
format strings incorrectly.
However, the 9front community picked up my fixes extremely quickly.
As mentioned earlier, APE's version of Make is extremely basic, and supported approximately none of the libraries I had put together. Using GNU make is possible, but it isn't available by default. And while the APE compiler does follow the Posix specificiation, it's different enough that simply reusing the existing Makefiles would be uncomfortable.
The answer is to use the native build system, mk
. Because Plan 9 comes
with templates for building software, it takes very little effort to write
mkfiles that cover the use cases. They end up looking something like:
</$objtype/mkfile
TARG=6m
OFILES=\
blob.$O\
ben.$O\
...
LIB=../util/libutil.a ../parse/libparse.a ../mi/libmi.a
HFILES=asm.h ../parse/parse.h ../mi/mi.h \
.../config.h insns.def regs.def
BIN=/$objtype/bin
</sys/src/cmd/mkone
unintall:V:
rm -f /$objtype/bin/$TARG
Nadia Heninger With that out of the way, getting the code running was straightforward. Plan 9 yacc accepted the grammar Myrddin uses out of the box. The C compilers warned on a few things that GCC allowed through. I fixed the warnings, and had a compiler that would generate Linux binaries on Plan 9.
The Assembly Backend
The first order of business when porting a compiler is making sure it can generate binaries for the target platform. Plan 9's toolchain is rather different from the toolchains found on most other platforms. The C ABI is different too, but because I don't link against C, I decided that I didn't care. I use the same SysV-ish ABI on both systems.
Because Myrddin doesn't care about the ABI, instructions are instructions. Porting
the assembly generation was largely formatting the same data differently.
Another format string was added to
insns.def
.
A formatter was also added for blobs to write out the rather harebrained data
format that Plan 9 uses.
For example, if I were compiling the following code:
const fn = {
-> 42
}
The assembly output on a system that uses AT&T syntax would look like:
.text
fn:
pushq %rbp
movq %rsp,%rbp
movl $42,%eax
.Lret0:
.L0:
movq %rbp,%rsp
popq %rbp
ret
However, when compiling for the Plan 9 assemblers, it would look like:
TEXT fn<>+0(SB),$0
PUSHQ BP
MOVQ SP,BP
MOVL $42,AX
.Lret0:
.L0:
MOVQ BP,SP
POPQ BP
RET
The assembly format is documented best at http://9p.io/sys/doc/asm.html, although it bears a very strong family resemblance to the Golang assemblers, https://golang.org/doc/asm.
Instruction names are ALLCAPS
. They mostly share the same ordering with the
AT&T syntax, with some exceptions. For example, it took some time to realize
out that CMP
instruction did not match the argument order of the SUB
instruction. As a result, the format strings in insns.def
grew indexes
after the %
, allowing me to specify "CMP%T %2R,%1X"
.
After the this work, the generation of assembly code was split between
gen.c
, which would initialize the output state and drive the assembly
generation,
gengas.c
which generated code for systems using assemblers compatible with
AT&T syntax, and
genp9.c
which genereates Plan 9 syntax.
Eventually, this will go away when Myrddin moves to its own cross compiling toolchain, but that's the state of the world right now.
The Startup Code
At this point, the compiler could generate code in the format that's needed.
The assembler accepted the .s files that the Myrddin compiler generated, and
converted them to object files. Unlike other systems, the object files have
the architecture number as the extension (in the case of amd64, obj.6
),
to allow cross-platform builds. However, if you try to link them into a
binary, the linker will complain about the entry point (main
) not being
defined.
So, we need to define it. The first thing the code needs to do stash away the information that we will need later in the system run.
First off,
TEXT _main(SB), 1, $(2*8+NPRIVATES*8)
MOVQ AX, sys$tosptr(SB)
LEAQ 16(SP), AX
MOVQ AX, _privates(SB)
MOVL $NPRIVATES, _nprivates(SB)
This code stores %rax
into the variable named sys$tosptr
- Set up command line arguments provided by the OS. They need to be converted
into a slice of Myrddin strings so that they can be passed to
main()
- Any information needed by the syscall library needs to be stashed somewhere
that libsys can find it. In this case, that means stashing the top of stack
pointer in
sys$tosptr
, and stashing the privates in_privates
. - The
__init__
function that is generated by the Myrddin compiler should be called. This function is what invokes all the various module__init__
functions, so they can do their setup. The abort functions (currently just
_rt$abort_oob
) need to be implemented. Because Plan 9 does some nice things when a program crashes, all this code needs to do is abort.MOVL inargc-8(FP), R13 LEAQ inargv+0(FP), R14 MOVQ R13, AX IMULQ $16,AX SUBQ AX,SP MOVQ SP,DX MOVQ R13, AX MOVQ R14, BX MOVQ SP, CX CALL cvt(SB) PUSHQ R13 PUSHQ DX XORQ BP,BP CALL __init__(SB) CALL main(SB) POPQ DX
The System Calls
The Libraries
Debugging
bash: bash:: command not found [ori@oneeye ~/src/www/eigenstate]$ [ori@oneeye ~/src/www/eigenstate]$ + bash: [ori@oneeye: command not found [ori@oneeye ~/src/www/eigenstate]$ bash: +: command not found bash: bash:: command not found [ori@oneeye ~/src/www/eigenstate]$ [ori@oneeye ~/src/www/eigenstate]$ [ori@oneeye ~/src/www/eigenstate]$ description : Notes on Software Development bash: [ori@oneeye: command not found [ori@oneeye ~/src/www/eigenstate]$ bash: description: command not found bash: bash:: command not found [ori@oneeye ~/src/www/eigenstate]$ [ori@oneeye ~/src/www/eigenstate]$ } bash: [ori@oneeye: command not found [ori@oneeye ~/src/www/eigenstate]$ bash: syntax error near unexpected token `}'
[ori@oneeye ~/src/www/eigenstate]$
The Code