Threading APIs Part 1: Thread Spawning
Threads are a common part of programming today, and while pthreads have won on Unix as the API that most userspace code interacts with, the mechanisms for implementing them is quite different across various OSes.
As part of developing the Myrddin standard library, I've been implementing threads from the ground up, on a number of operating systems. The different choices between the systems has been fascinating.
The code for threading is usable, but is in a state of flux.
Thread Creation
Thread creation varies widely between the systems that I've worked with. Linux has a general clone system call, likely inspired by Plan 9 which does much the same. The BSDs and OSX tend toward thread specific thread spawning, instead of general functions.
Plan 9
Plan 9, unsurprisingly, is the simplest and most elegant of the systems I support.
Threads are created using the rfork()
system call, which is the Plan 9
replacement for fork. The rfork()
call allows the caller to specify what is
shared between the parent and child processes. The stack is mapped at a fixed
location, and may not be shared between the parent and child process. This
makes spawning and thread local variables especially simple, but means that
pointers to variables on the stack may not be shared between threads.
The code for creating a thread is simply:
const spawn = {fn
match sys.rfork(sys.Rfproc | sys.Rfmem | sys.Rfnowait)
| 0:
fn()
std.exit(0)
| -1: -> `std.Err "unable to spawn thread"
| thr: -> `std.Ok (thr : tid)
;;
}
}
There's nothing more to it: Call rfork to create a process, and then call the function that was passed. Unlike all of the other operating systems, there is no assembly needed.
Linux
Threads on Linux are created with the clone(2)
system call. Clone is
very similar to fork()
, however it as a user, it lets you specify which
resources are shared between the process that is created. This is used
for everything from creating containers to creating threads. For a container,
more or less nothing is shared. For a thread, more or less everything is
shared.
The kernel call for clone
looks similar to fork, but with a few parameters
to control the options:
long clone(uint64_t flags, void *child_stack,
int *ptid, uint64_t newtls,
int *ctid);
However, because the stack is not set up when this is called, this function must be invoked from assembly. Until the stack is initialized, any code that uses memory will segfault. This is why the C library provides a function based wrapper, as does the Myrddin system call library:
extern const fnclone : (
flags : cloneopt, stk : byte#,
ptid : pid#, tls : byte#, \
ctid : pid#, ptreg : byte#, \
fn : void# \
-> pid)
It is also be possible to set up the stack manually so that the data needed to call the function is set up ahead of time, but that's ugly code that I decided to avoid. When creating a thread, the stack must be allocated by the creator of the thread, a theme that is repeated on most operating systems.
The code used in Myrddin for spawning a thread is:
const spawnstk = {fn, sz
var stk : byte#, tid, ctid, ret
var szp, fp, tos
stk = getstk(sz)
if stk == sys.Mapbad
-> `std.Err `Estk
;;
initstack(stk, fn, sz)
ret = sys.fnclone(Thrflag, \
(tos : byte#),\
&tid, (0 : byte#), \
&ctid, (0 : byte#), \
(startthread : void#))
if ret < 0
-> `std.Err `Espawn
;;
-> `std.Ok (ret : tid)
}
FreeBSD
Unlike Linux, FreeBSD uses a system call dedicated to creating threads, and
which does nothing This call, unsurprisingly, is called thr_create
. The call
takes a struct describing the thread parameters, and the size of the parameter
struct. From reading the code, the size of the struct is used for versioning
reasons.
struct thr_param {
void (*start_func)(void *);
void *arg;
char *stack_base;
size_t stack_size;
char *tls_base;
size_t tls_size;
long *child_tid;
long *parent_tid;
int flags;
struct rtprio *rtp;
};
int thr_new(struct thr_param *param, int param_size);
Because this function is specific to thread creation, it is fairly simple to use correctly. The relevant entries in the creation parameters are initialized to what you want, and then you call the system call. Note that while the stack is passed as a (ptr, size) pair, the stack should be pointing to the top of the stack.
OpenBSD
Similar to FreeBSD, OpenBSD has a system call specifically to create threads. Similar to Linux and Plan 9, the API is based around fork instead of passing in a bunch of parameters and letting the kernel start the thread.
The API is below:
struct __tfork {
void *tf_tcb; /* TCB address for new thread */
pid_t *tf_tid; /* where to write child's thread ID */
void *tf_stack; /* stack address for new thread */
};
pid_t __tfork(const struct __tfork *params, size_t psize);
However, similar to clone, because the stack is not set up when the thread is
created, the system call is only usable from assembly, and as a result, the
code must be called from an assembly stub. One is provided by OpenBSD's libc,
in the form of __tfork_thread()
.
OSX
OSX is hands down the most pointlessly quirky of the operating systems used.
This is also a running theme. The OSX kernel has a system call named
bsdthread_register
, which must be called before any threads are spawned.
The parameters for this aren't entirely
Windows
Fucked if I know. I've written more code for Plan 9 than for Windows. Anyone want to write something here?