Eigenstate : Libds

Introduction to Libds

libds is a simple, small data structure library with a number of commonly used data structures that I have found useful when writing code in the past. It includes:

While it is not a thread safe library, all APIs are reentrant, which means that if as long as accesses to the data structures are not shared across threads, it is safe for multiple threads to access the APIs concurrently.

Getting The Source

The code lives in Eigenstate's Git repository

git clone git://git.eigenstate.org/git/ori/libds.git

Which is browsable online at http://git.eigenstate.org/ori/libds.git

In order to install, you use the same configure scripts as Automake would want, although this code uses a simple custom makefile.

cd libds/
./configure
make
make install 

It uses the standard pkgconfig machinery to make it easy to integrate with most build systems. The pkg-config name is libds-1.0. In the most basic example, building from the command line should be as simple as this:

cc -o my-binary my-source.c `pkg-config --cflags --libs libds-1.0` 

Slab Allocator

Source: ds-alloc.c

typedef struct _DSAlloc DSAlloc;

DSAlloc *ds_allocatornew         (size_t sz);
void       *ds_alloc             (DSAlloc *a);
void       *ds_zalloc            (DSAlloc *a);
void        ds_free              (DSAlloc *a, void *dat);

DSAlloc *ds_allocatornew(size_t sz) will create a new allocator, that hands out objects of size sz. All objects allocated from within an allocator are of the same size, which allows the allocations to happen with very little waste, very little per object overhead, and a high resistance to fragmentation. This means, however, that you need a separate allocator for each object size.

Benchmarking I did in 2010 showed that the performance for this allocator was roughly double that of glibc's malloc() for small object sizes, with far less tracking overhead. However, this is due to the specificity of the API, and not due to any magical optimization that is done in this library.

void *ds_alloc(DSAlloc *a) will create a new value from the allocator a, much like malloc. The size is determined by the allocator that is passed to the function.

void *ds_zalloc(DSAlloc *a) is similar to dsalloc, except that it returns a zeroed structure. void dsfree(DSAlloc *a, void *dat) frees a value that was allocated from a. The allocator that you free on must be the one that the allocation initially came from.

Example

#include <ds.h>
struct Stuff {
    int value;
    int another_value;
};

int main(int int argc, char **argv)
{
    DSAlloc *listnode_alloc;
    ListNode *head, *n;
    size_t i;

    head = NULL;
    listnode_alloc = ds_allocatornew(sizeof(ListNode));
    for (i = 0; i < 1000; i++) {
        n = ds_alloc(listnode_alloc);
        ds_free(listnode_alloc);
    }
}

Bitsets

Source: ds-bitset.c

typedef struct DSBitset DSBitset;

/* creation */
DSBitset *ds_bsnew      (void);
void      ds_bsfree     (DSBitset *bs);
DSBitset *ds_bsdup      (DSBitset *bs);

/* manipulation */
DSBitset *ds_bsclear    (DSBitset *bs);
size_t    ds_bscount    (DSBitset *bs);
int       ds_bsiter     (DSBitset *bs, size_t *elt);

/* insertion */
void      ds_bsput      (DSBitset *bs, size_t elt)
void      ds_bsdel      (DSBitset *bs, size_t elt)
int       ds_bshas      (DSBitset *bs, size_t elt);k

/* set operations */
void      ds_bsdiff     (DSBitset *a,  DSBitset *b);
void      ds_bsintersect(DSBitset *a, DSBitset *b)
void      ds_bsunion    (DSBitset *a,  DSBitset *b);

/* predicates */
int       ds_bseq       (DSBitset *a, DSBitset *b)
int       ds_bsissubset (DSBitset *set, DSBitset *sub)

ds_bsnew, ds_bsdup, and ds_bsfree will respectively create a new bit set, duplicate an existing bit set, and free the storage associated with a bitset

ds_bsclear will take an existing bitset, and clear it's contents.

ds_bscount will count the number of elements that exist in the bitset. This is an O(n) operation.

ds_bsiter is a slightly tricky function to iterate over the contents of a bitset. It returns true immediately if 'elt' is in the bitset, and does not modify the value. Otherwise it seeks forward to the next value held in the bitset and stores it in elt. If there are no more values, it returns false to stop iteration. This means that you need to increment elt every time you iterate through the loop. Example usage is below:

for (size_t i = 0; bsiter(set, &i); i++)
    use(i); 

ds_bsput and ds_bsdel respectively insert or remove elt from the set bs

`ds_bshas returns 1 if the element elt is in the set bs, or 0 if it is not.

dsbsdiff, dsbsintersect, and ds_bsunion respectively find the set difference, intersection, or union between the two bitsets a and b

dsbseq and dsbsissubset return 1 if the sets are equal or subsets, respectively, and 0 if they are not.

Example

#include <ds.h>
#include <assert.h>
int main(int argc, char **argv)
{
    DSBitset *bs, *other;

    bs = ds_bsnew()
    ds_bsput(bs, 123)
    assert(ds_bshas(bs, 123))
    other = ds_bsdup(bs)
    ds_bsput(bs, 234)
    ds_bsdiff(bs, other);
} 

Hash Tables

source: ds-htab.c

typedef struct _DSHtab DSHtab;

DSHtab      *ds_hnew         (DSHashFn hash, DSCmpFn cmp_key);
DSHtab      *ds_hnewhold     (DSHashFn hash, DSCmpFn cmp_key,
                              DSHoldFn holdk, DSFreeFn freek,
                              DSHoldFn holdv, DSFreeFn freev);
void         ds_hfree        (DSHtab *ht);

void         ds_hput         (DSHtab *ht, void *key, void *data);
void         ds_hdel         (DSHtab *ht, void *key);
void         ds_hset         (DSHtab *ht, void *key, void *data);
void        *ds_hget         (DSHtab *ht, void *key);
int          ds_hhas         (DSHtab *ht, void *key);


DSHtab      *ds_hclone       (DSHtab *ht);
void       **ds_hkeys         (DSHtab *ht);
void       **ds_hvals         (DSHtab *ht);
int          ds_hcount        (DSHtab *ht); 
Documentation for each function

DSHtab      *ds_hnew         (DSHashFn hash, DSCmpFn cmp_key);
DSHtab      *ds_hnewhold     (DSHashFn hash, DSCmpFn cmp_key,
                              DSHoldFn holdk, DSFreeFn freek,
                              DSHoldFn holdv, DSFreeFn freev);
void         ds_hfree        (DSHtab *ht);

These functions create a new hash table. Both variants take both a hash function 'hash' and a comparison function 'cmpkey'. These functions are called to hash and compare the functions. A number of useful hash functions such as dsstrhash, ds_ptrhash, and so on are included in the library.

The ds_hnewhold() hash table constructor takes an additional four function pointers: holdk, freek, holdv, freev. These function pointers are used for memory management, holding and releasing a value. They may be called multiple times per value, so they should be implemented with something along the lines of refcounting.

dshnew() will not call any retain or release functions, and is equivalent to dshnewhold(cmp, eq, NULL, NULL, NULL, NULL)

ds_hfree() releases the storage associated with the hash table, and will call the release functions on each data element still in the hash table, if provided.

ds_hput, ds_hdel, ds_hset, ds_hget, and ds_hhas insert, remove, and query the hash table for elements. If hold and free functions are defined, these will be called on insertion and removal of elements from the hash table. ds_hput assumes that the entry being inserted is not already in the table. use ds_hset if you want to insert duplicate elements.

ds_hclone will duplicate a hash table, calling the hold functions on the data elements if provided.

ds_hkeys and ds_hvals will return a copy of the keys and values, respectively, in the hash table. The array returned is allocated with 'malloc()' and must be freed.

ds_hcount counts the number of entries in the hash table.

Utility Hash Functions

uint32_t ds_strhash     (void *data);
int      ds_streq       (void *a, void *b);
uint32_t ds_ptrhash     (void *ptr);
int      ds_ptreq       (void *a, void *b);
uint32_t ds_directhash  (void *p);
int      ds_directeq    (void *a, void *b); 

These functions implement hashes for common data types, suitable for use in the hash table implementation above.

Linked Lists

typedef struct _DSList DSList;

struct _DSList {
    DSList *prev;
    DSList *next;
    void  *data;
};

typedef void  (*DSListMapFn)(void *cont, void *data);
typedef void *(*DSListEachFn)(void *cont, void *data);

DSList *ds_lmap              (DSList *head, DSListMapFn fn, void *data);
void    ds_lforeach          (DSList *head, DSListIterFn fn, void *data);

DSList  *ds_linsert           (DSList *head, void *data);
DSList  *ds_lpreinsert        (DSList *head, void *data);
DSList  *ds_lprepend          (DSList *head, void *data);
DSList  *ds_lappend           (DSList *head, void *data);
DSList  *ds_ldup              (DSList *head);

DSList *ds_ljoin           (DSList *a, DSList *b);
DSList *ds_ltappend        (DSList *head, DSList **tail, void *data);

DSList *ds_lcmpsearch      (DSList *haystack, void *needle, DSCmpFn cmp);
DSList *ds_lsearch         (DSList *haystack, void *needle);
int     ds_llen            (DSList *l);
DSList *ds_ltail           (DSList *l);

DSList *ds_ldel            (DSList *l, void *dat);
DSList *ds_lndel           (DSList *l, DSList *n);
DSList *ds_lndel_full      (DSList *l, DSList *n, DSFreeFn fnc);

void  ds_lfree             (DSList *l);

void  ds_lfree_full        (DSList       *l, DSFreeFn    fnc);

The data structure 'DSList' is publically exposed, and it's members may be touched directly for convenience.

The empty list is represented by NULL, and any place a list is passed, NULL is a valid value.

ds_lmap and ds_lforeach and the associated function pointer types are used for iterating and mapping over the list elements. For the most part, I find that it's more idiomatic in C to just loop over the lists, though.

ds_linsert and ds_lpreinsert insert an element before and after the node 'head', respectively. Note that the node 'elt' is not necessarily the head of the list, and you may insert into the list at any point with these functions. These functions return the new head of the sublist produced.

ds_lprepend and ds_lappend are similar in concept to their friends dslinsert() and dslprepend() above, however, they will seek to the start or end of the list respectively before doing the insert. They return the new head of the list.

ds_ltappend appends a value into the list 'head'. If 'tail' is passed and is non-null, this function assumes it is the actual tail of the list, and inserts after it. After inserting, it updates the tail pointer to the newly appended node. This allows for O(1) list appends. If tail is null, then this function seeks to the end of the list in O(n) time, and appends as normal.

ds_ldup will duplicate a list.

ds_ljoin will append list 'b' to the end of list 'a'. It does not copy the nodes in 'b', not does it seek to the head of it. It does, however, seek to the end of list 'a'.

ds_lcmpsearch and ds_lsearch iterate through the entire list searching for a value. dslsearch() searches for the node containing a pointer by value. dslcmpsearch() takes a function pointer which it uses to compare the values.

ds_llen counts the number of nodes in the list in O(n) time.

ds_ltail finds the tail of the list 'l' in O(n) time.

ds_ldel, ds_lndel, and ds_lndel_full remove links from lists. dslndel() will simply unlink the node, splicing the list back around it. dsldel() will search the list by pointer value, similar to dslsearch(), and when it finds the node it will unlink it, as in dslndel.

. ds_lndel_full is similar to ds_lndel, however, it takes a function to call over the data element in order to free it.

ds_lfree and ds_lfree_full will free a full list. Similar to dslndel and dslndelfull, dslfree will free only the node data, while dslfreefull will take a free function to call over each node in turn.

Malloc utility functions

void *ds_zmalloc(size_t sz);
void *ds_xmalloc(size_t sz);
void *ds_zrealloc(void *mem, size_t oldsz, size_t sz);
void *ds_xrealloc(void *mem, size_t sz); 

These are utility wrappers for malloc(), calloc(), and realloc() that will abort on out of memory. Since most programs can't actually do anything sane on out of memory, and modern OSes overcommit anyways, making testing the failure paths nearly impossible, this is usually the right thing to do.

Memory allocated with these functions comes from malloc internally, and should be freed with free()

dsxmalloc() is similar to malloc: you give it a size, it gives you a chunk of memory. dszmalloc is similar to ds_malloc, only it clears the memory.

dsxrealloc is similar to realloc: you give it a valid pointer (or NULL), and it will return a resized chunk of memory, or NULL if the size of memory is zero. In other words, dsxrealloc(NULL, size) is equivalent to dsxalloc(size), and dsxrealloc(ptr, NULL) is equivalent to free(ptr).

dszrealloc() is similar to dsxrealloc(), only the values between oldsz and sz are all set to zero.