Eigenstate : Pushfilter

Stupid Unix Tricks: Implementing Pushfilter()

Pushfilter is a function that I cooked up which can install a program as a filter on to any file descriptor. For example:

pushfilter(STDIN_FILENO, "tr", "[a-z]", "[A-Z]", NULL);

will install tr [a-z] [A-Z] on to stdin as a filter, making any subsequent reads from stdin get uppercased by the tr program. I'm not sure yet if this is a useless parlor trick, or a generally useful tool. Regardless, it's a good demonstration of process and fd manipulation on Unix.

The Code

The code lives on eigenstate git, and is mirrored on github

The API:

#include <unistd.h>
#include <stdarg.h>

pid_t pushfilter(int fd, char *cmd, ...);
pid_t pushfilterv(int fd, char *cmd, va_list ap);
pid_t pushfilterl(int fd, char **cmd);

Pushfilter() applies the command cmd ... to the file descriptor fd. The command is spawned such that it recieves the original contents of fd on stdin, and is expected to write the desired output to stdout. This is generally seamless, although behind the scenes the file descriptors are replaced with pipes, which may make them unseekable.

The pushfilterv and pushfilterl variants are have identical semantics, but accept the command parameter as a varargs list and a null terminated array, respectively, for convienience.

The Implementation:

There are a couple of trivial wrapper functions to allow for conveniently calling this code, but the interesting part of the implementation is below:

pid_t
pushfilterl(int fd, char **cmd)
{
    int outfd[2];
    pid_t pid;

    if (pipe(outfd) != 0)
        return -1;
    pid = fork();
    /* error */
    if (pid == -1){
        return -1;
    /* child */
    } else if (pid == 0) {
        /* input becomes child stdin */
        if (dup2(fd, 0) == -1)
            exit(1);
        /* don't double close stdin */
        if (fd != 0)
            close(fd);

        /* output becomes child stdout */
        if (dup2(outfd[1], 1) == -1)
            exit(1);

        /* close unused fds */
        close(outfd[0]);
        close(outfd[1]);

        execvp(cmd[0], cmd);
        /* if exec returns, we have an error */
        exit(1);
    /* parent */
    } else {
        if (dup2(outfd[0], fd) == -1)
            return -1;
        close(outfd[0]);
        close(outfd[1]);
    }
    return pid;
}

The way that this code works is fairly straightforward. First, a pipe pair is created in order to replace the file descriptor that was passed in. A child process is forked, and the child process dups the initial file descriptor to its stdin, and closes the initial unused end. A bit of care needs to be taken, as it's perfectly valid to pass stdin. If we simply closed the fd, then we would be closing all references.

A similar process is applied to stdout and the write end of the pipe, which gives us a process hooked up like so:

          +-------------------+
          |                   |
-- fd --> |[0]    Filter   [1]| -- fd' -->
          |                   |
          +-------------------+

The parent process now needs to clean up a little bit. It moves the read end of the pipe over to the fd that was passed in, replacing it in place. Finally, it closes the unused fds that came with the pipe and returns the child pid to the caller.

A similar process could be applied to filtering an output file descriptor, and in fact most of the code could be shared. Doing this is left as an exercise to the reader.