Patchwork [03,of,22] radixlink: add C implementation

login
register
mail settings
Submitter Jun Wu
Date June 4, 2017, 11:59 p.m.
Message ID <d492628229c58f8417a8.1496620755@x1c>
Download mbox | patch
Permalink /patch/21180/
State Accepted
Headers show

Comments

Jun Wu - June 4, 2017, 11:59 p.m.
# HG changeset patch
# User Jun Wu <quark@fb.com>
# Date 1496550656 25200
#      Sat Jun 03 21:30:56 2017 -0700
# Node ID d492628229c58f8417a8b5925a614e26a16465af
# Parent  e8e8d713e4b774f6894ee65723e7fdc12bf8a101
# Available At https://bitbucket.org/quark-zju/hg-draft
#              hg pull https://bitbucket.org/quark-zju/hg-draft -r d492628229c5
radixlink: add C implementation

This patch implements the radixlink class in C.

The core algorithm and data structure was written in a separated pure C file
so it could be reused outside Python environment.

Partial examples of reading and writing in pure C:

    radixlink_buffer_t index = readfile("prefix");
    radixlink_buffer_t link = readfile("prefix.l");

    /* read */
    uint32_t loff, v;
    radixlink_buffer_t key = { "key", 3 };
    if (radixlink_index_find(&index, &key, &loff) != 0)
        abort();
    while (loff) {
        if (radixlink_link_read(&link, &loff, &v) != 0)
            abort();
        printf("  value: %u\n", (unsigned) v);
    }

    /* insert (key, value) */
    static void resize(radixlink_buffer_t *buf, uint32_t newsize) {
        void *p = realloc(buf->buf, newsize);
        if (p) { buf->buf = p; buf->size = newsize; }
    }
    uint32_t loff, ioff;
    if (radixlink_index_findorcreate(index, key, &loff, &ioff, resize) != 0)
        abort();
    if (radixlink_link_append(link, &loff, value, resize) != 0)
        abort();
    if (radixlink_index_writelink(index, ioff, loff) != 0)
        abort();

"valgrind --tool=memcheck --leak-check=yes python2 test-radixlink.py" does
not report any possible leak with stack containing radixlink.c [1].

[1]: python2 was a debug build of 2.7.13: --with-pydebug --without-pymalloc
     --with-valgrind
Gregory Szorc - June 6, 2017, 5:58 a.m.
On Sun, Jun 4, 2017 at 4:59 PM, Jun Wu <quark@fb.com> wrote:

> # HG changeset patch
> # User Jun Wu <quark@fb.com>
> # Date 1496550656 25200
> #      Sat Jun 03 21:30:56 2017 -0700
> # Node ID d492628229c58f8417a8b5925a614e26a16465af
> # Parent  e8e8d713e4b774f6894ee65723e7fdc12bf8a101
> # Available At https://bitbucket.org/quark-zju/hg-draft
> #              hg pull https://bitbucket.org/quark-zju/hg-draft -r
> d492628229c5
> radixlink: add C implementation
>

Only a very quick review. Nothing glaringly wrong from a CPython
perspective. Although a more thorough examination is needed.

On reading this code, I was once again reminded of Mercurial's style of
omitting braces from one-line conditional blocks. I detest this style. See
Apple's "goto fail" bug for why. I would prefer we not continue that
practice in new code.

It is somewhat weird seeing the C implementation so early in the series.
I'd favor deferring this patch then landing the C implementation after the
feature is implemented. That way we can introduce the C code along with
perf numbers and more robust test coverage. But if it's ready, it's ready:
I don't want to cause too much work for you to refactor.


>
> This patch implements the radixlink class in C.
>
> The core algorithm and data structure was written in a separated pure C
> file
> so it could be reused outside Python environment.
>
> Partial examples of reading and writing in pure C:
>
>     radixlink_buffer_t index = readfile("prefix");
>     radixlink_buffer_t link = readfile("prefix.l");
>
>     /* read */
>     uint32_t loff, v;
>     radixlink_buffer_t key = { "key", 3 };
>     if (radixlink_index_find(&index, &key, &loff) != 0)
>         abort();
>     while (loff) {
>         if (radixlink_link_read(&link, &loff, &v) != 0)
>             abort();
>         printf("  value: %u\n", (unsigned) v);
>     }
>
>     /* insert (key, value) */
>     static void resize(radixlink_buffer_t *buf, uint32_t newsize) {
>         void *p = realloc(buf->buf, newsize);
>         if (p) { buf->buf = p; buf->size = newsize; }
>     }
>     uint32_t loff, ioff;
>     if (radixlink_index_findorcreate(index, key, &loff, &ioff, resize) !=
> 0)
>         abort();
>     if (radixlink_link_append(link, &loff, value, resize) != 0)
>         abort();
>     if (radixlink_index_writelink(index, ioff, loff) != 0)
>         abort();
>
> "valgrind --tool=memcheck --leak-check=yes python2 test-radixlink.py" does
> not report any possible leak with stack containing radixlink.c [1].
>
> [1]: python2 was a debug build of 2.7.13: --with-pydebug --without-pymalloc
>      --with-valgrind
>
> diff --git a/mercurial/cext/radixlink.c b/mercurial/cext/radixlink.c
> new file mode 100644
> --- /dev/null
> +++ b/mercurial/cext/radixlink.c
> @@ -0,0 +1,314 @@
> +/* radixlink.c - Python wrapper for C radixlink implementation
> + *
> + * Copyright 2017 Facebook, Inc.
> + *
> + * This software may be used and distributed according to the terms of the
> + * GNU General Public License version 2 or any later version. */
> +
> +#include <Python.h>
> +#include <stdio.h>
> +
> +#include "radixlink.h"
> +
> +typedef struct {
> +       PyObject_HEAD
> +       radixlink_buffer_t data;
> +} radixlinkObject;
> +
> +static const uint32_t MIN_DATA_SIZE = 76; /* header (4B) + radix-entry
> (72B) */
> +
> +static int radixlink_contains(radixlinkObject *self, PyObject *keyobj)
> +{
> +       radixlink_buffer_t k;
> +       Py_ssize_t klen = 0;
> +       uint32_t loff = 0;
> +
> +       if (PyBytes_AsStringAndSize(keyobj, (char **)&k.buf, &klen) == -1)
> +               return -1;
> +       k.size = (uint32_t)klen;
> +       if (radixlink_index_find(&self->data, &k, &loff) == -1)
> +               return -1;
> +       return loff > 0;
> +}
> +
> +static PyObject *radixlink_getitem(radixlinkObject *self, PyObject
> *keyobj)
> +{
> +       radixlink_buffer_t k;
> +       Py_ssize_t klen = 0;
> +       PyObject *pyvalues = NULL;
> +       uint32_t loff = 0;
> +
> +       if (PyBytes_AsStringAndSize(keyobj, (char **)&k.buf, &klen) == -1)
> +               return NULL;
> +       k.size = (uint32_t)klen;
> +       if (radixlink_index_find(&self->data, &k, &loff) == -1)
> +               return NULL;
> +       if (loff == 0)
> +               return PyErr_Format(PyExc_KeyError, "No such entry");
> +
> +       pyvalues = PyList_New(0);
>

Appending lists can be relatively expensive. Do you think it is worth
scanning the linked list so we can pre-allocate enough slots in the PyList?


> +       if (!pyvalues)
> +               goto bail;
>

This can just return NULL. Then you can change bail to Py_DECREF since
pyvalues is guaranteed to be !NULL.


> +
> +       while (loff) {
> +               PyObject *pyvalue;
> +               uint32_t value;
> +               int r;
> +               if (radixlink_link_read(&self->data, &loff, &value) != 0)
> +                       goto bail;
> +               pyvalue = PyInt_FromLong((long)value);
> +               if (!pyvalue)
> +                       goto bail;
> +               r = PyList_Append(pyvalues, pyvalue);
> +               Py_DECREF(pyvalue);
> +               if (r == -1)
> +                       goto bail;
> +       }
> +       return pyvalues;
> +bail:
> +       Py_XDECREF(pyvalues);
> +       return NULL;
> +}
> +
> +static void resize(radixlink_buffer_t *buf, uint32_t newsize) {
> +       void *p = PyMem_Realloc(buf->buf, (size_t)newsize);
> +       if (p) {
> +               buf->buf = p;
> +               buf->size = newsize;
> +       }
>

This function nor callers appear to handle the case where PyMem_Realloc
fails.


> +}
> +
> +static PyObject *radixlink_insert(radixlinkObject *self, PyObject *args)
> +{
> +       radixlink_buffer_t k;
> +       Py_ssize_t klen = 0;
> +       unsigned int v;
> +       uint32_t loff, ioff;
> +
> +       if (!PyArg_ParseTuple(args, "s#I", &k.buf, &klen, &v))
>

You'll want to conditionalize this to use y# on Python 3 to limit to bytes
types.


> +               return NULL;
> +       k.size = (uint32_t)klen;
> +
> +       if (radixlink_index_findorcreate(&self->data, &k, &loff, &ioff,
> resize)
> +                       != 0)
> +               return NULL;
> +       if (radixlink_link_append(&self->data, &loff, (uint32_t)v,
> resize) != 0)
> +               return NULL;
> +       if (radixlink_index_writelink(&self->data, ioff, loff) != 0)
> +               return NULL;
> +       Py_RETURN_NONE;
> +}
> +
> +static PyObject *radixlink_getsourceoftruthsize(radixlinkObject *self,
> +               void *context)
> +{
> +       uint32_t v;
> +       (void)context;
>

What's this argument for? Future patch?


> +       if (self->data.size < 4)
> +               return NULL;
> +       v = getbe32((const char *)self->data.buf);
> +       return PyInt_FromLong((long)v);
> +}
> +
> +static int radixlink_setsourceoftruthsize(radixlinkObject *self,
> +               PyObject *value, void *context)
> +{
> +       uint32_t origv;
> +       long v;
> +       (void)context;
> +       v = PyInt_AsLong(value);
> +       if (self->data.size < 4 || v < 0)
> +               return -1;
> +       origv = getbe32((const char *)self->data.buf);
> +       if (origv != (uint32_t)v)
> +               putbe32((uint32_t)v, (char *)self->data.buf);
> +       return 0;
> +}
> +
> +static PyObject *radixlink_getdata(radixlinkObject *self, void *context)
> +{
> +       (void)context;
> +       return PyBytes_FromStringAndSize((const char *)self->data.buf,
> +                       (Py_ssize_t)self->data.size);
> +}
> +static int radixlink_setdata(radixlinkObject *self, PyObject *value,
> +               void *context)
> +{
> +       radixlink_buffer_t buf = { NULL, 0 };
> +       Py_buffer view = { NULL };
> +       (void)context;
> +       if (value == NULL || PyObject_Not(value)) {
> +               /* minimal empty buffer */
> +               view.len = MIN_DATA_SIZE;
> +               view.buf = NULL;
> +       } else {
> +               /* copy from a buffer object */
> +               if (!PyObject_CheckBuffer(value)) {
> +                       PyErr_SetString(PyExc_TypeError, "need a buffer");
> +                       goto bail;
> +               }
> +               if (PyObject_GetBuffer(value, &view, PyBUF_SIMPLE) != 0)
> +                       goto bail;
> +       }
> +       buf.buf = PyMem_Malloc((size_t)view.len);
> +       if (!buf.buf) {
> +               PyErr_SetNone(PyExc_MemoryError);
> +               goto bail;
> +       }
> +       if (view.buf) {
> +               memcpy(buf.buf, view.buf, (size_t)view.len);
> +               PyBuffer_Release(&view);
> +       } else {
> +               memset(buf.buf, 0, (size_t)view.len);
> +       }
> +       buf.size = (uint32_t)view.len;
> +       PyMem_Free(self->data.buf);
> +       self->data = buf;
> +       return 0;
> +bail:
> +       if (view.buf)
> +               PyBuffer_Release(&view);
> +       return -1;
> +
> +}
> +
> +static Py_ssize_t radixlink_length(radixlinkObject *self)
> +{
> +       return (Py_ssize_t)self->data.size;
> +}
> +
> +static int radixlink_init(radixlinkObject *self, PyObject *args)
> +{
> +       PyObject *data = NULL;
> +       self->data.buf = NULL;
> +       self->data.size = 0;
> +
> +       if (!PyArg_ParseTuple(args, "|O", &data))
> +               return -1;
> +
> +       return radixlink_setdata(self, data, NULL);
> +}
> +
> +static void radixlink_dealloc(radixlinkObject *self)
> +{
> +       PyMem_Free(self->data.buf);
> +       self->ob_type->tp_free((PyObject *)self);
>

Does Py_TPFLAGS_BASETYPE need to be set? If not, don't bother and call
PyObject_Del() instead.


> +}
> +
> +static PySequenceMethods radixlink_sequence_methods = {
> +       (lenfunc)radixlink_length,      /* sq_length */
> +       0,                              /* sq_concat */
> +       0,                              /* sq_repeat */
> +       0,                              /* sq_item */
> +       0,                              /* sq_slice */
> +       0,                              /* sq_ass_item */
> +       0,                              /* sq_ass_slice */
> +       (objobjproc)radixlink_contains, /* sq_contains */
> +       0,                              /* sq_inplace_concat */
> +       0,                              /* sq_inplace_repeat */
> +};
> +
> +static PyMappingMethods radixlink_mapping_methods = {
> +       (lenfunc)radixlink_length,       /* mp_length */
> +       (binaryfunc)radixlink_getitem,   /* mp_subscript */
> +       NULL,                            /* mp_ass_subscript */
> +};
> +
> +static PyMethodDef radixlink_methods[] = {
> +       {"insert", (PyCFunction)radixlink_insert, METH_VARARGS,
> +        "insert value (uint32) at the beginning of the list for given
> key\n"},
> +       {NULL},
> +};
> +
> +static PyGetSetDef radixlink_getset[] = {
> +       {"sourceoftruthsize", (getter)radixlink_getsourceoftruthsize,
> +        (setter)radixlink_setsourceoftruthsize, "sourceoftruthsize",
> NULL},
> +       {"data", (getter)radixlink_getdata, (setter)radixlink_setdata,
> "data",
> +        NULL},
> +       {NULL},
> +};
> +
> +static PyTypeObject radixlinkType = {
> +       PyObject_HEAD_INIT(NULL)
> +       0,                                        /* ob_size */
> +       "radixlink.radixlink",                    /* tp_name */
>

"mercurial.radixlink.radixlink" (yes, we make this mistake elsewhere).


> +       sizeof(radixlinkObject),                  /* tp_basicsize */
> +       0,                                        /* tp_itemsize */
> +       (destructor)radixlink_dealloc,            /* tp_dealloc */
> +       0,                                        /* tp_print */
> +       0,                                        /* tp_getattr */
> +       0,                                        /* tp_setattr */
> +       0,                                        /* tp_compare */
> +       0,                                        /* tp_repr */
> +       0,                                        /* tp_as_number */
> +       &radixlink_sequence_methods,              /* tp_as_sequence */
> +       &radixlink_mapping_methods,               /* tp_as_mapping */
> +       0,                                        /* tp_hash */
> +       0,                                        /* tp_call */
> +       0,                                        /* tp_str */
> +       0,                                        /* tp_getattro */
> +       0,                                        /* tp_setattro */
> +       0,                                        /* tp_as_buffer */
> +       Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */
> +       "A radixlink object",                     /* tp_doc */
> +       0,                                        /* tp_traverse */
> +       0,                                        /* tp_clear */
> +       0,                                        /* tp_richcompare */
> +       0,                                        /* tp_weaklistoffset */
> +       0,                                        /* tp_iter */
> +       0,                                        /* tp_iternext */
> +       radixlink_methods,                        /* tp_methods */
> +       0,                                        /* tp_members */
> +       radixlink_getset,                         /* tp_getset */
> +       0,                                        /* tp_base */
> +       0,                                        /* tp_dict */
> +       0,                                        /* tp_descr_get */
> +       0,                                        /* tp_descr_set */
> +       0,                                        /* tp_dictoffset */
> +       (initproc)radixlink_init,                 /* tp_init */
> +       0,                                        /* tp_alloc */
> +};
> +
> +static PyMethodDef methods[] = {{NULL}};
> +
> +static char radixlink_doc[] = ("multimap<bytes, uint32> based on radix
> tree "
> +               "and linked list\n");
> +
> +static const int version = 1;
> +
> +static int postinit(PyObject *m)
> +{
> +       Py_INCREF(&radixlinkType);
> +       radixlinkType.tp_new = PyType_GenericNew;
>

Pretty sure you can assign this directly in PyTypeObject. The tp_new slot
is immediately after tp_alloc. I'm not sure why you would defer doing this,
but CPython itself is inconsistent. So maybe I'm missing something.


> +       if (PyType_Ready(&radixlinkType) < 0)
> +               return -1;
> +       if (PyModule_AddObject(m, "radixlink", (PyObject *)&radixlinkType)
> != 0)
> +               return -1;
> +       if (PyModule_AddIntConstant(m, "version", version) != 0)
> +               return -1;
> +       return 0;
> +}
> +
> +#ifdef IS_PY3K
> +static struct PyModuleDef radixlink_module = {
> +       PyModuleDef_HEAD_INIT,
> +       "radixlink",
> +       radixlink_doc,
> +       -1,
> +       methods,
> +};
> +
> +PyMODINIT_FUNC PyInit_radixlink(void)
> +{
> +       PyObject *m = PyModule_Create(&radixlink_module);
> +       if (postinit(m) != 0)
> +               return NULL;
> +       return m;
> +}
> +#else
> +PyMODINIT_FUNC initradixlink(void)
> +{
> +       PyObject *m = Py_InitModule3("radixlink", methods, radixlink_doc);
> +       (void)postinit(m);
> +}
> +#endif
> diff --git a/mercurial/policy.py b/mercurial/policy.py
> --- a/mercurial/policy.py
> +++ b/mercurial/policy.py
> @@ -77,4 +77,5 @@ def _importfrom(pkgname, modname):
>      (r'cext', r'osutil'): 1,
>      (r'cext', r'parsers'): 1,
> +    (r'cext', r'radixlink'): 1,
>  }
>
> diff --git a/mercurial/radixlink.c b/mercurial/radixlink.c
> new file mode 100644
> --- /dev/null
> +++ b/mercurial/radixlink.c
>

I suspect we'll want to put all C code in mercurial/cext. Althought I'm not
100% sure if we want to put a wall between pure C and Python C or what.


> @@ -0,0 +1,270 @@
> +/* radixlink.c - on-disk radixtree index pointing to linked list data
> + *
> + * Copyright 2017 Facebook, Inc.
> + *
> + * This software may be used and distributed according to the terms of the
> + * GNU General Public License version 2 or any later version. */
> +
> +#include <assert.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include "radixlink.h"
> +
> +static const uint32_t INDEX_HEADER_SIZE = 4;
> +static const uint32_t INDEX_KEY_SIZE = 4;
> +static const uint32_t INDEX_OFFSET_SIZE = 4;
> +static const uint32_t LINK_OFFSET_SIZE = 4;
> +static const uint32_t LINK_VALUE_SIZE = 4;
> +
> +/* INDEX_KEY_SIZE + LINK_OFFSET_SIZE + INDEX_OFFSET_SIZE * 16 */
> +static const uint32_t INDEX_RADIX_ENTRY_SIZE = 72;
> +
> +/* Return -1 on buffer overflow, or write *pvalue (if not NULL) and
> return 0 */
> +static inline int safereadu32(radixlink_buffer_t *buf, uint32_t offset,
> +                             uint32_t *pvalue)
> +{
> +       assert(buf);
> +       if (pvalue == NULL)
> +               return 0;
> +       if (buf->size < offset + 4)
> +               return -1;
> +       *pvalue = getbe32((const char *)(buf->buf + offset));
> +       return 0;
> +}
> +
> +/* Return -1 on buffer overflow, or write *buf and return 0 */
> +static inline int safewriteu32(radixlink_buffer_t *buf, uint32_t offset,
> +                              uint32_t value)
> +{
> +       assert(buf);
> +       if (buf->size < offset + 4)
> +               return -1;
> +       putbe32(value, (char *)(buf->buf + offset));
> +       return 0;
> +}
> +
> +/* Get base16 from a base256 key. Caller responsible for memory safety. */
> +static inline uint8_t getb16(uint8_t b256[], uint32_t index)
> +{
> +       uint8_t v = b256[index / 2];
> +       return index & 1 ? (v & 0xf) : (v >> 4);
> +}
> +
> +/* Compare base256 and base16 buffer. Return 0 on equal, -1 otherwise.
> + * Caller responsible for memory boundary check. */
> +static inline int b16cmp(radixlink_buffer_t *b256, uint32_t b256offset,
> +               radixlink_buffer_t *b16)
> +{
> +       uint32_t i;
> +       assert(b256 && b16);
> +       if (b256->size * 2 != b16->size + b256offset)
> +               return -1;
> +       for (i = 0; i < b16->size; ++i) {
> +               uint8_t v = getb16(b256->buf, i + b256offset);
> +               if (v != b16->buf[i])
> +                       return -1;
> +       }
> +       return 0;
> +}
> +
> +static inline int readindexentry(radixlink_buffer_t *index, uint32_t
> offset,
> +               uint32_t *pklen, uint32_t *plinkoffset, uint32_t
> *pindexoffset)
> +{
> +       uint32_t poff = offset + INDEX_KEY_SIZE;
> +       /* index-entry := radix-entry | leaf-entry
> +          radix-entry := '\0' * 4 + link-offset + index-offset (4B) * 16
> +          leaf-entry  := key-length (4B, > 0) + link-offset + key */
> +       if (safereadu32(index, offset, pklen) != 0)
> +               return -1;
> +       if (safereadu32(index, poff, plinkoffset) != 0)
> +               return -1;
> +       if (pindexoffset != NULL)
> +               *pindexoffset = poff;
> +       return 0;
> +}
> +
> +static inline int splitleaf(radixlink_buffer_t *index,
> +               uint32_t ioff, uint32_t klen, uint32_t loff,
> +               radixlink_resize_func resize)
> +{
> +       uint32_t noff, size, koff;
> +       uint8_t b16;
> +
> +       /* require klen, loff to avoid reading from index again */
> +       assert(klen > 0);
> +       koff = ioff + INDEX_KEY_SIZE + LINK_OFFSET_SIZE;
> +       noff = index->size; /* new leaf entry offset */
> +       size = INDEX_KEY_SIZE + LINK_OFFSET_SIZE + klen - 1; /* entry size
> */
> +       if (size < INDEX_RADIX_ENTRY_SIZE)
> +               size = INDEX_RADIX_ENTRY_SIZE;
> +
> +       resize(index, noff + size);
> +       if (index->size < noff + size)
> +               return -1;
> +       memset(index->buf + noff, 0, size);
> +
> +       /* copy remaining key to new location */
> +       putbe32(klen - 1, (char *)(index->buf + noff));
> +       putbe32(loff, (char *)(index->buf + noff + INDEX_KEY_SIZE));
> +       memcpy(index->buf + noff + INDEX_KEY_SIZE + LINK_OFFSET_SIZE,
> +                       index->buf + koff + 1, klen - 1);
> +       b16 = index->buf[koff];
> +
> +       /* convert entry at ioff to a radix entry */
> +       memset(index->buf + ioff, 0, INDEX_RADIX_ENTRY_SIZE);
> +       putbe32(noff, (char *)(index->buf + koff + INDEX_OFFSET_SIZE *
> b16));
> +       return 0;
> +}
> +
> +static inline int appendleaf(radixlink_buffer_t *index,
> +               radixlink_buffer_t *key, uint32_t kidx, uint32_t *poffset,
> +               radixlink_resize_func resize)
> +{
> +       uint32_t noff, koff, size, ksize, klen, i;
> +       assert(index && key && poffset && resize);
> +       noff = index->size;
> +       ksize = key->size * 2; /* size if represented in base16 */
> +       assert(ksize >= kidx);
> +       klen = ksize - kidx;
> +       size = INDEX_KEY_SIZE + LINK_OFFSET_SIZE + klen;
> +       if (size < INDEX_RADIX_ENTRY_SIZE)
> +               size = INDEX_RADIX_ENTRY_SIZE;
> +       resize(index, noff + size);
> +       if (index->size < noff + size)
> +               return -1;
> +       memset(index->buf + noff, 0, size);
> +       putbe32(klen, (char *)(index->buf + noff));
> +       koff = noff + INDEX_KEY_SIZE + LINK_OFFSET_SIZE;
> +       for (i = kidx; i < ksize; ++i) {
> +               uint32_t b = getb16(key->buf, i);
> +               index->buf[koff] = b;
> +               ++koff;
> +       }
> +       *poffset = noff;
> +       return 0;
> +}
> +
> +static inline int followindex(radixlink_buffer_t *index,
> +               radixlink_buffer_t *key, uint32_t *pkidx, uint32_t *pioff,
> +               uint32_t *ppoff, radixlink_resize_func *presize)
> +{
> +       uint32_t ioff, loff, kidx, klen, koff, poff;
> +       uint8_t b16;
> +       assert(index && key && pkidx && pioff);
> +       ioff = *pioff;
> +       kidx = *pkidx;
> +       if (readindexentry(index, ioff, &klen, &loff, NULL) != 0)
> +               return -1;
> +       koff = ioff + INDEX_KEY_SIZE + LINK_OFFSET_SIZE;
> +       /* leaf entry */
> +       if (klen > 0) {
> +               radixlink_buffer_t b16;
> +               if (index->size < koff + klen)
> +                       return -1; /* overflow */
> +               b16.buf = index->buf + koff;
> +               b16.size = klen;
> +               if (b16cmp(key, kidx, &b16) == 0) {
> +                       *pkidx = key->size * 2 + 1;
> +                       return 0;
> +               }
> +               if (presize == NULL) {
> +                       *pioff = 0;
> +                       return 0;
> +               }
> +               /* if presize is provided, split the leaf entry */
> +               splitleaf(index, ioff, klen, loff, *presize);
> +       }
> +       /* radix entry */
> +       *pkidx = kidx + 1;
> +       if (kidx >= key->size * 2)
> +               return 0;
> +       b16 = getb16(key->buf, kidx);
> +       poff = koff + (uint32_t)b16 * INDEX_OFFSET_SIZE;
> +       if (safereadu32(index, poff, pioff) != 0)
> +               return -1;
> +       if (ppoff != NULL)
> +               *ppoff = poff;
> +       return 0;
> +}
> +
> +int radixlink_index_find(radixlink_buffer_t *index, radixlink_buffer_t
> *key,
> +               uint32_t *plinkoffset)
> +{
> +       uint32_t ioff, ksize, kidx;
> +       assert(index && key && plinkoffset);
> +       ioff = INDEX_HEADER_SIZE;
> +       ksize = key->size * 2;
> +       for (kidx = 0; kidx <= ksize;) {
> +               if (followindex(index, key, &kidx, &ioff, NULL, NULL) != 0)
> +                       return -1;
> +               if (ioff == 0) {
> +                       *plinkoffset = 0;
> +                       return 0;
> +               }
> +       }
> +       if (readindexentry(index, ioff, NULL, plinkoffset, NULL) != 0)
> +               return -1;
> +       return 0;
> +}
> +
> +int radixlink_index_findorcreate(radixlink_buffer_t *index,
> +               radixlink_buffer_t *key, uint32_t *plinkoffset,
> +               uint32_t *pindexoffset, radixlink_resize_func resize)
> +{
> +       uint32_t ioff, ksize, kidx;
> +       assert(index && key && plinkoffset && pindexoffset && resize);
> +       ioff = INDEX_HEADER_SIZE;
> +       ksize = key->size * 2;
> +       for (kidx = 0; kidx <= ksize;) {
> +               uint32_t poff = 0;
> +               if (followindex(index, key, &kidx, &ioff, &poff, resize)
> != 0)
> +                       return -1;
> +               if (ioff == 0) {
> +                       /* create new leaf */
> +                       assert(poff != 0);
> +                       if (appendleaf(index, key, kidx, &ioff, resize) !=
> 0)
> +                               return -1;
> +                       if (safewriteu32(index, poff, ioff) != 0)
> +                               return -1;
> +                       break;
> +               }
> +       }
> +       if (readindexentry(index, ioff, NULL, plinkoffset, pindexoffset)
> != 0)
> +               return -1;
> +       return 0;
> +}
> +
> +int radixlink_index_writelink(radixlink_buffer_t *index, uint32_t
> indexoffset,
> +               uint32_t linkoffset)
> +{
> +       return safewriteu32(index, indexoffset, linkoffset);
> +}
> +
> +int radixlink_link_read(radixlink_buffer_t *link, uint32_t *plinkoffset,
> +               uint32_t *pvalue)
> +{
> +       uint32_t nextoffset;
> +       assert(link && plinkoffset && pvalue);
> +       if (safereadu32(link, *plinkoffset + 4, pvalue) != 0)
> +               return -1;
> +       if (safereadu32(link, *plinkoffset, &nextoffset) != 0)
> +               return -1;
> +       *plinkoffset = nextoffset;
> +       return 0;
> +}
> +
> +int radixlink_link_append(radixlink_buffer_t *link, uint32_t
> *plinkoffset,
> +               uint32_t value, radixlink_resize_func resize)
> +{
> +       uint32_t offset, size;
> +       assert(link && plinkoffset);
> +       offset = link->size;
> +       size = LINK_OFFSET_SIZE + LINK_VALUE_SIZE;
> +       resize(link, offset + size);
> +       if (safewriteu32(link, offset + LINK_OFFSET_SIZE, value) != 0)
> +               return -1;
> +       if (safewriteu32(link, offset, *plinkoffset) != 0)
> +               return -1;
> +       *plinkoffset = offset;
> +       return 0;
> +}
> diff --git a/mercurial/radixlink.h b/mercurial/radixlink.h
> new file mode 100644
> --- /dev/null
> +++ b/mercurial/radixlink.h
> @@ -0,0 +1,57 @@
> +#ifndef _HG_RADIXLINK_H_
> +#define _HG_RADIXLINK_H_
> +
> +/* See radixlink.py for the radixlink file format. In short, "index" is a
> radix
> + * tree with pointing to "link" entries. "link" consists of linked lists.
> + * "index" and "link" could be separate buffers or a mixed buffer. */
> +
> +#include "bitmanipulation.h"
> +
> +/* A customized buffer struct allows us to dynamically resize the buffer
> and be
> + * able to do boundary check at all time. */
> +typedef struct {
> +       uint8_t *buf;
> +       /* not size_t because offset (could be size) is u32 in file format
> */
> +       uint32_t size;
> +} radixlink_buffer_t;
> +
> +/* Function signature to resize a buffer. radixlink does not care about
> how to
> + * resize a buffer. The underlying implementation could be realloc or
> mremap or
> + * something else. It must update "buf->size" to "newsize" on success. If
> the
> + * underlying implementation needs to store extra information like
> "capacity"
> + * to make resize more efficiently, it might use "((uint32_t *)(buf) -
> 4)" or
> + * elsewhere to do the trick. */
> +typedef void radixlink_resize_func(radixlink_buffer_t *buf, uint32_t
> newsize);
> +
> +/* Given the index buffer, and a key, lookup the offset in link buffer.
> + * On success, return 0 and store the offset to *plinkoffset. *plinkoffset
> + * could be 0 which means "not found". On error, return -1 and index
> buffer
> + * should be considered as corrupted. */
> +int radixlink_index_find(radixlink_buffer_t *index,
> +               radixlink_buffer_t *key, uint32_t *plinkoffset);
> +
> +/* Like radixlink_index_find, but create the entry on demand */
> +int radixlink_index_findorcreate(radixlink_buffer_t *index,
> +               radixlink_buffer_t *key, uint32_t *plinkoffset,
> +               uint32_t *pindexoffset, radixlink_resize_func resize);
> +
> +/* Update index at given offset to point to given link offset.
> indexoffset and
> + * linkoffset must be values returned by radixlink_index_findorcreate or
> + * radixlink_link_append. */
> +int radixlink_index_writelink(radixlink_buffer_t *index, uint32_t
> indexoffset,
> +               uint32_t linkoffset);
> +
> +/* Given the link buffer and the offset (*ploffset), read an uint32 value
> and
> + * the next offset in the linked list.
> + * On success, return 0 and *ploffset will be the next offset and *pvalue
> will
> + * be the value. If *ploffset is 0, the list should be considered as
> "ended".
> + * On error, return -1 and link buffer should be considered corrupted. */
> +int radixlink_link_read(radixlink_buffer_t *link, uint32_t *plinkoffset,
> +               uint32_t *pvalue);
> +
> +/* Append an entry (nextlinkoffset, value) to link buffer. *plinkoffset
> is used
> + * as both input (nextlinkoffset) and output (offset of the new entry) */
> +int radixlink_link_append(radixlink_buffer_t *link, uint32_t
> *plinkoffset,
> +               uint32_t value, radixlink_resize_func resize);
> +
> +#endif
> diff --git a/setup.py b/setup.py
> --- a/setup.py
> +++ b/setup.py
> @@ -660,4 +660,8 @@ extmodules = [
>                extra_link_args=osutil_ldflags,
>                depends=common_depends),
> +    Extension('mercurial.cext.radixlink', ['mercurial/radixlink.c',
> +                                           'mercurial/cext/radixlink.c'],
> +              include_dirs=common_include_dirs,
> +              depends=common_depends + ['mercurial/radixlink.h']),
>      Extension('hgext.fsmonitor.pywatchman.bser',
>                ['hgext/fsmonitor/pywatchman/bser.c']),
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
Yuya Nishihara - June 6, 2017, 3:58 p.m.
On Sun, 4 Jun 2017 16:59:15 -0700, Jun Wu wrote:
> # HG changeset patch
> # User Jun Wu <quark@fb.com>
> # Date 1496550656 25200
> #      Sat Jun 03 21:30:56 2017 -0700
> # Node ID d492628229c58f8417a8b5925a614e26a16465af
> # Parent  e8e8d713e4b774f6894ee65723e7fdc12bf8a101
> # Available At https://bitbucket.org/quark-zju/hg-draft
> #              hg pull https://bitbucket.org/quark-zju/hg-draft -r d492628229c5
> radixlink: add C implementation

> +++ b/mercurial/cext/radixlink.c
> @@ -0,0 +1,314 @@
> +/* radixlink.c - Python wrapper for C radixlink implementation
> + *
> + * Copyright 2017 Facebook, Inc.
> + *
> + * This software may be used and distributed according to the terms of the
> + * GNU General Public License version 2 or any later version. */
> +
> +#include <Python.h>
> +#include <stdio.h>

(not yet read the code at all)

I think new code should be PY_SSIZE_T_CLEAN.
Jun Wu - June 7, 2017, 1:35 a.m.
Excerpts from Gregory Szorc's message of 2017-06-05 22:58:49 -0700:
> On Sun, Jun 4, 2017 at 4:59 PM, Jun Wu <quark@fb.com> wrote:
> 
> > # HG changeset patch
> > # User Jun Wu <quark@fb.com>
> > # Date 1496550656 25200
> > #      Sat Jun 03 21:30:56 2017 -0700
> > # Node ID d492628229c58f8417a8b5925a614e26a16465af
> > # Parent  e8e8d713e4b774f6894ee65723e7fdc12bf8a101
> > # Available At https://bitbucket.org/quark-zju/hg-draft 
> > #              hg pull https://bitbucket.org/quark-zju/hg-draft  -r
> > d492628229c5
> > radixlink: add C implementation
> >
> 
> Only a very quick review. Nothing glaringly wrong from a CPython
> perspective. Although a more thorough examination is needed.
> 
> On reading this code, I was once again reminded of Mercurial's style of
> omitting braces from one-line conditional blocks. I detest this style. See
> Apple's "goto fail" bug for why. I would prefer we not continue that
> practice in new code.

I could make the code longer :)

> It is somewhat weird seeing the C implementation so early in the series.
> I'd favor deferring this patch then landing the C implementation after the
> feature is implemented. That way we can introduce the C code along with
> perf numbers and more robust test coverage. But if it's ready, it's ready:
> I don't want to cause too much work for you to refactor.

The test includes a case to compare binary compatibility between C and pure.
I think radixlink series are logically separate from obsstore so I put
radixlink patches together.

I'm basically relying on the tests and valgrind to find out potential
issues. There are some hand-made cases to explore strange code paths.

> > +       pyvalues = PyList_New(0);
> >
> 
> Appending lists can be relatively expensive. Do you think it is worth
> scanning the linked list so we can pre-allocate enough slots in the PyList?

Usually the list is expected to be short, like 1-2 items. So a list seems
fine. If it becomes longer we could switch to a generator in the future.

> > +       if (!pyvalues)
> > +               goto bail;
> >
> 
> This can just return NULL. Then you can change bail to Py_DECREF since
> pyvalues is guaranteed to be !NULL.

I'd like the compiler to do that optimization for me so the code is easier
to reason about.

> > +static void resize(radixlink_buffer_t *buf, uint32_t newsize) {
> > +       void *p = PyMem_Realloc(buf->buf, (size_t)newsize);
> > +       if (p) {
> > +               buf->buf = p;
> > +               buf->size = newsize;
> > +       }
> >
> 
> This function nor callers appear to handle the case where PyMem_Realloc
> fails.

It "handles" it by not changing "buf->size". The radixlink C layer will
return -1 when it sees a buffer overflow.

> > +}
> > +
> > +static PyObject *radixlink_insert(radixlinkObject *self, PyObject *args)
> > +{
> > +       radixlink_buffer_t k;
> > +       Py_ssize_t klen = 0;
> > +       unsigned int v;
> > +       uint32_t loff, ioff;
> > +
> > +       if (!PyArg_ParseTuple(args, "s#I", &k.buf, &klen, &v))
> >
> 
> You'll want to conditionalize this to use y# on Python 3 to limit to bytes
> types.

Good to know! It seems undocumented.

> > +static PyObject *radixlink_getsourceoftruthsize(radixlinkObject *self,
> > +               void *context)
> > +{
> > +       uint32_t v;
> > +       (void)context;
> >
> 
> What's this argument for? Future patch?

It's the signature of "(getter)". I think it might be better to just be
explicit that Python defines this.

>[...] 
> Does Py_TPFLAGS_BASETYPE need to be set? If not, don't bother and call
> PyObject_Del() instead.

I think subclassing a C type is important for performance. So __getitem__,
__contains__ won't need hash table lookups.

> [...]
> "mercurial.radixlink.radixlink" (yes, we make this mistake elsewhere).

Nice catch!

> [...]
> > +       radixlinkType.tp_new = PyType_GenericNew;
> 
> Pretty sure you can assign this directly in PyTypeObject. The tp_new slot
> is immediately after tp_alloc. I'm not sure why you would defer doing this,
> but CPython itself is inconsistent. So maybe I'm missing something.

Seems some Python documentation is outdated.

> > diff --git a/mercurial/radixlink.c b/mercurial/radixlink.c
> > new file mode 100644
> > --- /dev/null
> > +++ b/mercurial/radixlink.c
> >
> 
> I suspect we'll want to put all C code in mercurial/cext. Althought I'm
> not 100% sure if we want to put a wall between pure C and Python C or
> what.

It seems to me that "ext" stands for "(Python) extensions". cffi also
depends on pure C code. So "mercurial" seems good as the common ancestor of
"mercurial/cext" and "mercurial/cffi".

Patch

diff --git a/mercurial/cext/radixlink.c b/mercurial/cext/radixlink.c
new file mode 100644
--- /dev/null
+++ b/mercurial/cext/radixlink.c
@@ -0,0 +1,314 @@ 
+/* radixlink.c - Python wrapper for C radixlink implementation
+ *
+ * Copyright 2017 Facebook, Inc.
+ *
+ * This software may be used and distributed according to the terms of the
+ * GNU General Public License version 2 or any later version. */
+
+#include <Python.h>
+#include <stdio.h>
+
+#include "radixlink.h"
+
+typedef struct {
+	PyObject_HEAD
+	radixlink_buffer_t data;
+} radixlinkObject;
+
+static const uint32_t MIN_DATA_SIZE = 76; /* header (4B) + radix-entry (72B) */
+
+static int radixlink_contains(radixlinkObject *self, PyObject *keyobj)
+{
+	radixlink_buffer_t k;
+	Py_ssize_t klen = 0;
+	uint32_t loff = 0;
+
+	if (PyBytes_AsStringAndSize(keyobj, (char **)&k.buf, &klen) == -1)
+		return -1;
+	k.size = (uint32_t)klen;
+	if (radixlink_index_find(&self->data, &k, &loff) == -1)
+		return -1;
+	return loff > 0;
+}
+
+static PyObject *radixlink_getitem(radixlinkObject *self, PyObject *keyobj)
+{
+	radixlink_buffer_t k;
+	Py_ssize_t klen = 0;
+	PyObject *pyvalues = NULL;
+	uint32_t loff = 0;
+
+	if (PyBytes_AsStringAndSize(keyobj, (char **)&k.buf, &klen) == -1)
+		return NULL;
+	k.size = (uint32_t)klen;
+	if (radixlink_index_find(&self->data, &k, &loff) == -1)
+		return NULL;
+	if (loff == 0)
+		return PyErr_Format(PyExc_KeyError, "No such entry");
+
+	pyvalues = PyList_New(0);
+	if (!pyvalues)
+		goto bail;
+
+	while (loff) {
+		PyObject *pyvalue;
+		uint32_t value;
+		int r;
+		if (radixlink_link_read(&self->data, &loff, &value) != 0)
+			goto bail;
+		pyvalue = PyInt_FromLong((long)value);
+		if (!pyvalue)
+			goto bail;
+		r = PyList_Append(pyvalues, pyvalue);
+		Py_DECREF(pyvalue);
+		if (r == -1)
+			goto bail;
+	}
+	return pyvalues;
+bail:
+	Py_XDECREF(pyvalues);
+	return NULL;
+}
+
+static void resize(radixlink_buffer_t *buf, uint32_t newsize) {
+	void *p = PyMem_Realloc(buf->buf, (size_t)newsize);
+	if (p) {
+		buf->buf = p;
+		buf->size = newsize;
+	}
+}
+
+static PyObject *radixlink_insert(radixlinkObject *self, PyObject *args)
+{
+	radixlink_buffer_t k;
+	Py_ssize_t klen = 0;
+	unsigned int v;
+	uint32_t loff, ioff;
+
+	if (!PyArg_ParseTuple(args, "s#I", &k.buf, &klen, &v))
+		return NULL;
+	k.size = (uint32_t)klen;
+
+	if (radixlink_index_findorcreate(&self->data, &k, &loff, &ioff, resize)
+			!= 0)
+		return NULL;
+	if (radixlink_link_append(&self->data, &loff, (uint32_t)v, resize) != 0)
+		return NULL;
+	if (radixlink_index_writelink(&self->data, ioff, loff) != 0)
+		return NULL;
+	Py_RETURN_NONE;
+}
+
+static PyObject *radixlink_getsourceoftruthsize(radixlinkObject *self,
+		void *context)
+{
+	uint32_t v;
+	(void)context;
+	if (self->data.size < 4)
+		return NULL;
+	v = getbe32((const char *)self->data.buf);
+	return PyInt_FromLong((long)v);
+}
+
+static int radixlink_setsourceoftruthsize(radixlinkObject *self,
+		PyObject *value, void *context)
+{
+	uint32_t origv;
+	long v;
+	(void)context;
+	v = PyInt_AsLong(value);
+	if (self->data.size < 4 || v < 0)
+		return -1;
+	origv = getbe32((const char *)self->data.buf);
+	if (origv != (uint32_t)v)
+		putbe32((uint32_t)v, (char *)self->data.buf);
+	return 0;
+}
+
+static PyObject *radixlink_getdata(radixlinkObject *self, void *context)
+{
+	(void)context;
+	return PyBytes_FromStringAndSize((const char *)self->data.buf,
+			(Py_ssize_t)self->data.size);
+}
+static int radixlink_setdata(radixlinkObject *self, PyObject *value,
+		void *context)
+{
+	radixlink_buffer_t buf = { NULL, 0 };
+	Py_buffer view = { NULL };
+	(void)context;
+	if (value == NULL || PyObject_Not(value)) {
+		/* minimal empty buffer */
+		view.len = MIN_DATA_SIZE;
+		view.buf = NULL;
+	} else {
+		/* copy from a buffer object */
+		if (!PyObject_CheckBuffer(value)) {
+			PyErr_SetString(PyExc_TypeError, "need a buffer");
+			goto bail;
+		}
+		if (PyObject_GetBuffer(value, &view, PyBUF_SIMPLE) != 0)
+			goto bail;
+	}
+	buf.buf = PyMem_Malloc((size_t)view.len);
+	if (!buf.buf) {
+		PyErr_SetNone(PyExc_MemoryError);
+		goto bail;
+	}
+	if (view.buf) {
+		memcpy(buf.buf, view.buf, (size_t)view.len);
+		PyBuffer_Release(&view);
+	} else {
+		memset(buf.buf, 0, (size_t)view.len);
+	}
+	buf.size = (uint32_t)view.len;
+	PyMem_Free(self->data.buf);
+	self->data = buf;
+	return 0;
+bail:
+	if (view.buf)
+		PyBuffer_Release(&view);
+	return -1;
+
+}
+
+static Py_ssize_t radixlink_length(radixlinkObject *self)
+{
+	return (Py_ssize_t)self->data.size;
+}
+
+static int radixlink_init(radixlinkObject *self, PyObject *args)
+{
+	PyObject *data = NULL;
+	self->data.buf = NULL;
+	self->data.size = 0;
+
+	if (!PyArg_ParseTuple(args, "|O", &data))
+		return -1;
+
+	return radixlink_setdata(self, data, NULL);
+}
+
+static void radixlink_dealloc(radixlinkObject *self)
+{
+	PyMem_Free(self->data.buf);
+	self->ob_type->tp_free((PyObject *)self);
+}
+
+static PySequenceMethods radixlink_sequence_methods = {
+	(lenfunc)radixlink_length,      /* sq_length */
+	0,                              /* sq_concat */
+	0,                              /* sq_repeat */
+	0,                              /* sq_item */
+	0,                              /* sq_slice */
+	0,                              /* sq_ass_item */
+	0,                              /* sq_ass_slice */
+	(objobjproc)radixlink_contains, /* sq_contains */
+	0,                              /* sq_inplace_concat */
+	0,                              /* sq_inplace_repeat */
+};
+
+static PyMappingMethods radixlink_mapping_methods = {
+	(lenfunc)radixlink_length,       /* mp_length */
+	(binaryfunc)radixlink_getitem,   /* mp_subscript */
+	NULL,                            /* mp_ass_subscript */
+};
+
+static PyMethodDef radixlink_methods[] = {
+	{"insert", (PyCFunction)radixlink_insert, METH_VARARGS,
+	 "insert value (uint32) at the beginning of the list for given key\n"},
+	{NULL},
+};
+
+static PyGetSetDef radixlink_getset[] = {
+	{"sourceoftruthsize", (getter)radixlink_getsourceoftruthsize,
+	 (setter)radixlink_setsourceoftruthsize, "sourceoftruthsize", NULL},
+	{"data", (getter)radixlink_getdata, (setter)radixlink_setdata, "data",
+	 NULL},
+	{NULL},
+};
+
+static PyTypeObject radixlinkType = {
+	PyObject_HEAD_INIT(NULL)
+	0,                                        /* ob_size */
+	"radixlink.radixlink",                    /* tp_name */
+	sizeof(radixlinkObject),                  /* tp_basicsize */
+	0,                                        /* tp_itemsize */
+	(destructor)radixlink_dealloc,            /* tp_dealloc */
+	0,                                        /* tp_print */
+	0,                                        /* tp_getattr */
+	0,                                        /* tp_setattr */
+	0,                                        /* tp_compare */
+	0,                                        /* tp_repr */
+	0,                                        /* tp_as_number */
+	&radixlink_sequence_methods,              /* tp_as_sequence */
+	&radixlink_mapping_methods,               /* tp_as_mapping */
+	0,                                        /* tp_hash */
+	0,                                        /* tp_call */
+	0,                                        /* tp_str */
+	0,                                        /* tp_getattro */
+	0,                                        /* tp_setattro */
+	0,                                        /* tp_as_buffer */
+	Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */
+	"A radixlink object",                     /* tp_doc */
+	0,                                        /* tp_traverse */
+	0,                                        /* tp_clear */
+	0,                                        /* tp_richcompare */
+	0,                                        /* tp_weaklistoffset */
+	0,                                        /* tp_iter */
+	0,                                        /* tp_iternext */
+	radixlink_methods,                        /* tp_methods */
+	0,                                        /* tp_members */
+	radixlink_getset,                         /* tp_getset */
+	0,                                        /* tp_base */
+	0,                                        /* tp_dict */
+	0,                                        /* tp_descr_get */
+	0,                                        /* tp_descr_set */
+	0,                                        /* tp_dictoffset */
+	(initproc)radixlink_init,                 /* tp_init */
+	0,                                        /* tp_alloc */
+};
+
+static PyMethodDef methods[] = {{NULL}};
+
+static char radixlink_doc[] = ("multimap<bytes, uint32> based on radix tree "
+		"and linked list\n");
+
+static const int version = 1;
+
+static int postinit(PyObject *m)
+{
+	Py_INCREF(&radixlinkType);
+	radixlinkType.tp_new = PyType_GenericNew;
+	if (PyType_Ready(&radixlinkType) < 0)
+		return -1;
+	if (PyModule_AddObject(m, "radixlink", (PyObject *)&radixlinkType) != 0)
+		return -1;
+	if (PyModule_AddIntConstant(m, "version", version) != 0)
+		return -1;
+	return 0;
+}
+
+#ifdef IS_PY3K
+static struct PyModuleDef radixlink_module = {
+	PyModuleDef_HEAD_INIT,
+	"radixlink",
+	radixlink_doc,
+	-1,
+	methods,
+};
+
+PyMODINIT_FUNC PyInit_radixlink(void)
+{
+	PyObject *m = PyModule_Create(&radixlink_module);
+	if (postinit(m) != 0)
+		return NULL;
+	return m;
+}
+#else
+PyMODINIT_FUNC initradixlink(void)
+{
+	PyObject *m = Py_InitModule3("radixlink", methods, radixlink_doc);
+	(void)postinit(m);
+}
+#endif
diff --git a/mercurial/policy.py b/mercurial/policy.py
--- a/mercurial/policy.py
+++ b/mercurial/policy.py
@@ -77,4 +77,5 @@  def _importfrom(pkgname, modname):
     (r'cext', r'osutil'): 1,
     (r'cext', r'parsers'): 1,
+    (r'cext', r'radixlink'): 1,
 }
 
diff --git a/mercurial/radixlink.c b/mercurial/radixlink.c
new file mode 100644
--- /dev/null
+++ b/mercurial/radixlink.c
@@ -0,0 +1,270 @@ 
+/* radixlink.c - on-disk radixtree index pointing to linked list data
+ *
+ * Copyright 2017 Facebook, Inc.
+ *
+ * This software may be used and distributed according to the terms of the
+ * GNU General Public License version 2 or any later version. */
+
+#include <assert.h>
+#include <stdlib.h>
+#include <string.h>
+#include "radixlink.h"
+
+static const uint32_t INDEX_HEADER_SIZE = 4;
+static const uint32_t INDEX_KEY_SIZE = 4;
+static const uint32_t INDEX_OFFSET_SIZE = 4;
+static const uint32_t LINK_OFFSET_SIZE = 4;
+static const uint32_t LINK_VALUE_SIZE = 4;
+
+/* INDEX_KEY_SIZE + LINK_OFFSET_SIZE + INDEX_OFFSET_SIZE * 16 */
+static const uint32_t INDEX_RADIX_ENTRY_SIZE = 72;
+
+/* Return -1 on buffer overflow, or write *pvalue (if not NULL) and return 0 */
+static inline int safereadu32(radixlink_buffer_t *buf, uint32_t offset,
+			      uint32_t *pvalue)
+{
+	assert(buf);
+	if (pvalue == NULL)
+		return 0;
+	if (buf->size < offset + 4)
+		return -1;
+	*pvalue = getbe32((const char *)(buf->buf + offset));
+	return 0;
+}
+
+/* Return -1 on buffer overflow, or write *buf and return 0 */
+static inline int safewriteu32(radixlink_buffer_t *buf, uint32_t offset,
+			       uint32_t value)
+{
+	assert(buf);
+	if (buf->size < offset + 4)
+		return -1;
+	putbe32(value, (char *)(buf->buf + offset));
+	return 0;
+}
+
+/* Get base16 from a base256 key. Caller responsible for memory safety. */
+static inline uint8_t getb16(uint8_t b256[], uint32_t index)
+{
+	uint8_t v = b256[index / 2];
+	return index & 1 ? (v & 0xf) : (v >> 4);
+}
+
+/* Compare base256 and base16 buffer. Return 0 on equal, -1 otherwise.
+ * Caller responsible for memory boundary check. */
+static inline int b16cmp(radixlink_buffer_t *b256, uint32_t b256offset,
+		radixlink_buffer_t *b16)
+{
+	uint32_t i;
+	assert(b256 && b16);
+	if (b256->size * 2 != b16->size + b256offset)
+		return -1;
+	for (i = 0; i < b16->size; ++i) {
+		uint8_t v = getb16(b256->buf, i + b256offset);
+		if (v != b16->buf[i])
+			return -1;
+	}
+	return 0;
+}
+
+static inline int readindexentry(radixlink_buffer_t *index, uint32_t offset,
+		uint32_t *pklen, uint32_t *plinkoffset, uint32_t *pindexoffset)
+{
+	uint32_t poff = offset + INDEX_KEY_SIZE;
+	/* index-entry := radix-entry | leaf-entry
+	   radix-entry := '\0' * 4 + link-offset + index-offset (4B) * 16
+	   leaf-entry  := key-length (4B, > 0) + link-offset + key */
+	if (safereadu32(index, offset, pklen) != 0)
+		return -1;
+	if (safereadu32(index, poff, plinkoffset) != 0)
+		return -1;
+	if (pindexoffset != NULL)
+		*pindexoffset = poff;
+	return 0;
+}
+
+static inline int splitleaf(radixlink_buffer_t *index,
+		uint32_t ioff, uint32_t klen, uint32_t loff,
+		radixlink_resize_func resize)
+{
+	uint32_t noff, size, koff;
+	uint8_t b16;
+
+	/* require klen, loff to avoid reading from index again */
+	assert(klen > 0);
+	koff = ioff + INDEX_KEY_SIZE + LINK_OFFSET_SIZE;
+	noff = index->size; /* new leaf entry offset */
+	size = INDEX_KEY_SIZE + LINK_OFFSET_SIZE + klen - 1; /* entry size */
+	if (size < INDEX_RADIX_ENTRY_SIZE)
+		size = INDEX_RADIX_ENTRY_SIZE;
+
+	resize(index, noff + size);
+	if (index->size < noff + size)
+		return -1;
+	memset(index->buf + noff, 0, size);
+
+	/* copy remaining key to new location */
+	putbe32(klen - 1, (char *)(index->buf + noff));
+	putbe32(loff, (char *)(index->buf + noff + INDEX_KEY_SIZE));
+	memcpy(index->buf + noff + INDEX_KEY_SIZE + LINK_OFFSET_SIZE,
+			index->buf + koff + 1, klen - 1);
+	b16 = index->buf[koff];
+
+	/* convert entry at ioff to a radix entry */
+	memset(index->buf + ioff, 0, INDEX_RADIX_ENTRY_SIZE);
+	putbe32(noff, (char *)(index->buf + koff + INDEX_OFFSET_SIZE * b16));
+	return 0;
+}
+
+static inline int appendleaf(radixlink_buffer_t *index,
+		radixlink_buffer_t *key, uint32_t kidx, uint32_t *poffset,
+		radixlink_resize_func resize)
+{
+	uint32_t noff, koff, size, ksize, klen, i;
+	assert(index && key && poffset && resize);
+	noff = index->size;
+	ksize = key->size * 2; /* size if represented in base16 */
+	assert(ksize >= kidx);
+	klen = ksize - kidx;
+	size = INDEX_KEY_SIZE + LINK_OFFSET_SIZE + klen;
+	if (size < INDEX_RADIX_ENTRY_SIZE)
+		size = INDEX_RADIX_ENTRY_SIZE;
+	resize(index, noff + size);
+	if (index->size < noff + size)
+		return -1;
+	memset(index->buf + noff, 0, size);
+	putbe32(klen, (char *)(index->buf + noff));
+	koff = noff + INDEX_KEY_SIZE + LINK_OFFSET_SIZE;
+	for (i = kidx; i < ksize; ++i) {
+		uint32_t b = getb16(key->buf, i);
+		index->buf[koff] = b;
+		++koff;
+	}
+	*poffset = noff;
+	return 0;
+}
+
+static inline int followindex(radixlink_buffer_t *index,
+		radixlink_buffer_t *key, uint32_t *pkidx, uint32_t *pioff,
+		uint32_t *ppoff, radixlink_resize_func *presize)
+{
+	uint32_t ioff, loff, kidx, klen, koff, poff;
+	uint8_t b16;
+	assert(index && key && pkidx && pioff);
+	ioff = *pioff;
+	kidx = *pkidx;
+	if (readindexentry(index, ioff, &klen, &loff, NULL) != 0)
+		return -1;
+	koff = ioff + INDEX_KEY_SIZE + LINK_OFFSET_SIZE;
+	/* leaf entry */
+	if (klen > 0) {
+		radixlink_buffer_t b16;
+		if (index->size < koff + klen)
+			return -1; /* overflow */
+		b16.buf = index->buf + koff;
+		b16.size = klen;
+		if (b16cmp(key, kidx, &b16) == 0) {
+			*pkidx = key->size * 2 + 1;
+			return 0;
+		}
+		if (presize == NULL) {
+			*pioff = 0;
+			return 0;
+		}
+		/* if presize is provided, split the leaf entry */
+		splitleaf(index, ioff, klen, loff, *presize);
+	}
+	/* radix entry */
+	*pkidx = kidx + 1;
+	if (kidx >= key->size * 2)
+		return 0;
+	b16 = getb16(key->buf, kidx);
+	poff = koff + (uint32_t)b16 * INDEX_OFFSET_SIZE;
+	if (safereadu32(index, poff, pioff) != 0)
+		return -1;
+	if (ppoff != NULL)
+		*ppoff = poff;
+	return 0;
+}
+
+int radixlink_index_find(radixlink_buffer_t *index, radixlink_buffer_t *key,
+		uint32_t *plinkoffset)
+{
+	uint32_t ioff, ksize, kidx;
+	assert(index && key && plinkoffset);
+	ioff = INDEX_HEADER_SIZE;
+	ksize = key->size * 2;
+	for (kidx = 0; kidx <= ksize;) {
+		if (followindex(index, key, &kidx, &ioff, NULL, NULL) != 0)
+			return -1;
+		if (ioff == 0) {
+			*plinkoffset = 0;
+			return 0;
+		}
+	}
+	if (readindexentry(index, ioff, NULL, plinkoffset, NULL) != 0)
+		return -1;
+	return 0;
+}
+
+int radixlink_index_findorcreate(radixlink_buffer_t *index,
+		radixlink_buffer_t *key, uint32_t *plinkoffset,
+		uint32_t *pindexoffset, radixlink_resize_func resize)
+{
+	uint32_t ioff, ksize, kidx;
+	assert(index && key && plinkoffset && pindexoffset && resize);
+	ioff = INDEX_HEADER_SIZE;
+	ksize = key->size * 2;
+	for (kidx = 0; kidx <= ksize;) {
+		uint32_t poff = 0;
+		if (followindex(index, key, &kidx, &ioff, &poff, resize) != 0)
+			return -1;
+		if (ioff == 0) {
+			/* create new leaf */
+			assert(poff != 0);
+			if (appendleaf(index, key, kidx, &ioff, resize) != 0)
+				return -1;
+			if (safewriteu32(index, poff, ioff) != 0)
+				return -1;
+			break;
+		}
+	}
+	if (readindexentry(index, ioff, NULL, plinkoffset, pindexoffset) != 0)
+		return -1;
+	return 0;
+}
+
+int radixlink_index_writelink(radixlink_buffer_t *index, uint32_t indexoffset,
+		uint32_t linkoffset)
+{
+	return safewriteu32(index, indexoffset, linkoffset);
+}
+
+int radixlink_link_read(radixlink_buffer_t *link, uint32_t *plinkoffset,
+		uint32_t *pvalue)
+{
+	uint32_t nextoffset;
+	assert(link && plinkoffset && pvalue);
+	if (safereadu32(link, *plinkoffset + 4, pvalue) != 0)
+		return -1;
+	if (safereadu32(link, *plinkoffset, &nextoffset) != 0)
+		return -1;
+	*plinkoffset = nextoffset;
+	return 0;
+}
+
+int radixlink_link_append(radixlink_buffer_t *link, uint32_t *plinkoffset,
+		uint32_t value, radixlink_resize_func resize)
+{
+	uint32_t offset, size;
+	assert(link && plinkoffset);
+	offset = link->size;
+	size = LINK_OFFSET_SIZE + LINK_VALUE_SIZE;
+	resize(link, offset + size);
+	if (safewriteu32(link, offset + LINK_OFFSET_SIZE, value) != 0)
+		return -1;
+	if (safewriteu32(link, offset, *plinkoffset) != 0)
+		return -1;
+	*plinkoffset = offset;
+	return 0;
+}
diff --git a/mercurial/radixlink.h b/mercurial/radixlink.h
new file mode 100644
--- /dev/null
+++ b/mercurial/radixlink.h
@@ -0,0 +1,57 @@ 
+#ifndef _HG_RADIXLINK_H_
+#define _HG_RADIXLINK_H_
+
+/* See radixlink.py for the radixlink file format. In short, "index" is a radix
+ * tree with pointing to "link" entries. "link" consists of linked lists.
+ * "index" and "link" could be separate buffers or a mixed buffer. */
+
+#include "bitmanipulation.h"
+
+/* A customized buffer struct allows us to dynamically resize the buffer and be
+ * able to do boundary check at all time. */
+typedef struct {
+	uint8_t *buf;
+	/* not size_t because offset (could be size) is u32 in file format */
+	uint32_t size;
+} radixlink_buffer_t;
+
+/* Function signature to resize a buffer. radixlink does not care about how to
+ * resize a buffer. The underlying implementation could be realloc or mremap or
+ * something else. It must update "buf->size" to "newsize" on success. If the
+ * underlying implementation needs to store extra information like "capacity"
+ * to make resize more efficiently, it might use "((uint32_t *)(buf) - 4)" or
+ * elsewhere to do the trick. */
+typedef void radixlink_resize_func(radixlink_buffer_t *buf, uint32_t newsize);
+
+/* Given the index buffer, and a key, lookup the offset in link buffer.
+ * On success, return 0 and store the offset to *plinkoffset. *plinkoffset
+ * could be 0 which means "not found". On error, return -1 and index buffer
+ * should be considered as corrupted. */
+int radixlink_index_find(radixlink_buffer_t *index,
+		radixlink_buffer_t *key, uint32_t *plinkoffset);
+
+/* Like radixlink_index_find, but create the entry on demand */
+int radixlink_index_findorcreate(radixlink_buffer_t *index,
+		radixlink_buffer_t *key, uint32_t *plinkoffset,
+		uint32_t *pindexoffset, radixlink_resize_func resize);
+
+/* Update index at given offset to point to given link offset. indexoffset and
+ * linkoffset must be values returned by radixlink_index_findorcreate or
+ * radixlink_link_append. */
+int radixlink_index_writelink(radixlink_buffer_t *index, uint32_t indexoffset,
+		uint32_t linkoffset);
+
+/* Given the link buffer and the offset (*ploffset), read an uint32 value and
+ * the next offset in the linked list.
+ * On success, return 0 and *ploffset will be the next offset and *pvalue will
+ * be the value. If *ploffset is 0, the list should be considered as "ended".
+ * On error, return -1 and link buffer should be considered corrupted. */
+int radixlink_link_read(radixlink_buffer_t *link, uint32_t *plinkoffset,
+		uint32_t *pvalue);
+
+/* Append an entry (nextlinkoffset, value) to link buffer. *plinkoffset is used
+ * as both input (nextlinkoffset) and output (offset of the new entry) */
+int radixlink_link_append(radixlink_buffer_t *link, uint32_t *plinkoffset,
+		uint32_t value, radixlink_resize_func resize);
+
+#endif
diff --git a/setup.py b/setup.py
--- a/setup.py
+++ b/setup.py
@@ -660,4 +660,8 @@  extmodules = [
               extra_link_args=osutil_ldflags,
               depends=common_depends),
+    Extension('mercurial.cext.radixlink', ['mercurial/radixlink.c',
+                                           'mercurial/cext/radixlink.c'],
+              include_dirs=common_include_dirs,
+              depends=common_depends + ['mercurial/radixlink.h']),
     Extension('hgext.fsmonitor.pywatchman.bser',
               ['hgext/fsmonitor/pywatchman/bser.c']),