Patchwork [5,of,7,V3] sparse-revlog: introduce native (C) implementation of slicechunktodensity

login
register
mail settings
Submitter Boris Feld
Date Nov. 19, 2018, 9:42 a.m.
Message ID <b365764b9915e600a5c7.1542620537@localhost.localdomain>
Download mbox | patch
Permalink /patch/36644/
State Superseded
Headers show

Comments

Boris Feld - Nov. 19, 2018, 9:42 a.m.
# HG changeset patch
# User Boris Feld <boris.feld@octobus.net>
# Date 1542276598 -3600
#      Thu Nov 15 11:09:58 2018 +0100
# Node ID b365764b9915e600a5c7c273f244f65e2240e00a
# Parent  f9ce23d5d3aeec9a9e8589ec83d089b2df83da82
# EXP-Topic sparse-perf
# Available At https://bitbucket.org/octobus/mercurial-devel/
#              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r b365764b9915
sparse-revlog: introduce native (C) implementation of slicechunktodensity

This is a C implementation of `_slicechunktodensity` in the
`mercurial/revlogutils/deltas.py` file.

The algorithm involves a lot of integer manipulation and low-level access to
index data. Having a C implementation of it raises a large performance
improvement. See later changeset in this series for details.
Yuya Nishihara - Nov. 19, 2018, 12:54 p.m.
On Mon, 19 Nov 2018 10:42:17 +0100, Boris Feld wrote:
> # HG changeset patch
> # User Boris Feld <boris.feld@octobus.net>
> # Date 1542276598 -3600
> #      Thu Nov 15 11:09:58 2018 +0100
> # Node ID b365764b9915e600a5c7c273f244f65e2240e00a
> # Parent  f9ce23d5d3aeec9a9e8589ec83d089b2df83da82
> # EXP-Topic sparse-perf
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r b365764b9915
> sparse-revlog: introduce native (C) implementation of slicechunktodensity

> +		Py_ssize_t revstart;
> +		Py_ssize_t revsize;
> +		if (!index_get_start(self, revs[i], &revstart)) {
> +			goto bail;

Be aware that Py_ssize_t * != long *. I think Py_ssize_t is the right type
here and other places computing offsets or indices.

There are a few more callers which passes Py_ssize_t * as long *argument.
Boris Feld - Nov. 19, 2018, 5:02 p.m.
On 19/11/2018 13:54, Yuya Nishihara wrote:
> On Mon, 19 Nov 2018 10:42:17 +0100, Boris Feld wrote:
>> # HG changeset patch
>> # User Boris Feld <boris.feld@octobus.net>
>> # Date 1542276598 -3600
>> #      Thu Nov 15 11:09:58 2018 +0100
>> # Node ID b365764b9915e600a5c7c273f244f65e2240e00a
>> # Parent  f9ce23d5d3aeec9a9e8589ec83d089b2df83da82
>> # EXP-Topic sparse-perf
>> # Available At https://bitbucket.org/octobus/mercurial-devel/
>> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r b365764b9915
>> sparse-revlog: introduce native (C) implementation of slicechunktodensity
>> +		Py_ssize_t revstart;
>> +		Py_ssize_t revsize;
>> +		if (!index_get_start(self, revs[i], &revstart)) {
>> +			goto bail;
> Be aware that Py_ssize_t * != long *. I think Py_ssize_t is the right type
> here and other places computing offsets or indices.
>
> There are a few more callers which passes Py_ssize_t * as long *argument.

Looking more closely to this, it seems like offset can be full unsigned
64 bits integer. Py_ssize_t can be smaller than that. So I think we
should keep offset, length and segmentspan as explicit uint64_t. And use
PyLong_AsUnsignedLongLong if conversion is needed.

Does it seem right to you? shall I update the series in that directory?

Because we use unsigned type, it is inconvenient to return -1 value.
That is why I went for the boolean (same as pylong_to_long).

Can you confirm the following:
1) using uint64 for data offset and length
2) using Py_ssize_t for everything else
3) Handle error as booleans (using true/false)

Cheers,

> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Yuya Nishihara - Nov. 20, 2018, 11:59 a.m.
On Mon, 19 Nov 2018 18:02:16 +0100, Boris FELD wrote:
> On 19/11/2018 13:54, Yuya Nishihara wrote:
> > On Mon, 19 Nov 2018 10:42:17 +0100, Boris Feld wrote:
> >> # HG changeset patch
> >> # User Boris Feld <boris.feld@octobus.net>
> >> # Date 1542276598 -3600
> >> #      Thu Nov 15 11:09:58 2018 +0100
> >> # Node ID b365764b9915e600a5c7c273f244f65e2240e00a
> >> # Parent  f9ce23d5d3aeec9a9e8589ec83d089b2df83da82
> >> # EXP-Topic sparse-perf
> >> # Available At https://bitbucket.org/octobus/mercurial-devel/
> >> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r b365764b9915
> >> sparse-revlog: introduce native (C) implementation of slicechunktodensity
> >> +		Py_ssize_t revstart;
> >> +		Py_ssize_t revsize;
> >> +		if (!index_get_start(self, revs[i], &revstart)) {
> >> +			goto bail;
> > Be aware that Py_ssize_t * != long *. I think Py_ssize_t is the right type
> > here and other places computing offsets or indices.
> >
> > There are a few more callers which passes Py_ssize_t * as long *argument.
> 
> Looking more closely to this, it seems like offset can be full unsigned
> 64 bits integer. Py_ssize_t can be smaller than that. So I think we
> should keep offset, length and segmentspan as explicit uint64_t. And use
> PyLong_AsUnsignedLongLong if conversion is needed.
> 
> Does it seem right to you? shall I update the series in that directory?

Yep, that's probably more correct. I was afraid of using PY_LONG_LONG which
isn't always available, but we appear to depend on it. The tuple_format
contains "K" on 32bit/LLP64 platforms, which wouldn't work if PY_LONG_LONG
weren't there.

> Because we use unsigned type, it is inconvenient to return -1 value.
> That is why I went for the boolean (same as pylong_to_long).

Well, the return value has no relation to the output argument. It can be
0/-1 int.

> Can you confirm the following:
> 1) using uint64 for data offset and length

Nit: use the same type as index_get() at the lowest layer. (i.e.
uint64_t for offset, and int for lengths.) And use the widest type while
computing (e.g. maybe signed int64_t in index_segment_span().) That's just
my opinion.

> 2) using Py_ssize_t for everything else

Seems fine.

> 3) Handle error as booleans (using true/false)

See above. But I'm okay with booleans if you prefer.
Boris Feld - Nov. 20, 2018, 8:44 p.m.
On 20/11/2018 12:59, Yuya Nishihara wrote:
> On Mon, 19 Nov 2018 18:02:16 +0100, Boris FELD wrote:
>> On 19/11/2018 13:54, Yuya Nishihara wrote:
>>> On Mon, 19 Nov 2018 10:42:17 +0100, Boris Feld wrote:
>>>> # HG changeset patch
>>>> # User Boris Feld <boris.feld@octobus.net>
>>>> # Date 1542276598 -3600
>>>> #      Thu Nov 15 11:09:58 2018 +0100
>>>> # Node ID b365764b9915e600a5c7c273f244f65e2240e00a
>>>> # Parent  f9ce23d5d3aeec9a9e8589ec83d089b2df83da82
>>>> # EXP-Topic sparse-perf
>>>> # Available At https://bitbucket.org/octobus/mercurial-devel/
>>>> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r b365764b9915
>>>> sparse-revlog: introduce native (C) implementation of slicechunktodensity
>>>> +		Py_ssize_t revstart;
>>>> +		Py_ssize_t revsize;
>>>> +		if (!index_get_start(self, revs[i], &revstart)) {
>>>> +			goto bail;
>>> Be aware that Py_ssize_t * != long *. I think Py_ssize_t is the right type
>>> here and other places computing offsets or indices.
>>>
>>> There are a few more callers which passes Py_ssize_t * as long *argument.
>> Looking more closely to this, it seems like offset can be full unsigned
>> 64 bits integer. Py_ssize_t can be smaller than that. So I think we
>> should keep offset, length and segmentspan as explicit uint64_t. And use
>> PyLong_AsUnsignedLongLong if conversion is needed.
>>
>> Does it seem right to you? shall I update the series in that directory?
> Yep, that's probably more correct. I was afraid of using PY_LONG_LONG which
> isn't always available, but we appear to depend on it. The tuple_format
> contains "K" on 32bit/LLP64 platforms, which wouldn't work if PY_LONG_LONG
> weren't there.
>
>> Because we use unsigned type, it is inconvenient to return -1 value.
>> That is why I went for the boolean (same as pylong_to_long).
> Well, the return value has no relation to the output argument. It can be
> 0/-1 int.
>
>> Can you confirm the following:
>> 1) using uint64 for data offset and length
> Nit: use the same type as index_get() at the lowest layer. (i.e.
> uint64_t for offset, and int for lengths.) And use the widest type while
> computing (e.g. maybe signed int64_t in index_segment_span().) That's just
> my opinion.

I've taken you feedbacks in consideration and made the following changes:

- `int64_t` forĀ  `index_get_start`, uint64_t >> 16, so it fits in a
int64_t. Using a signed type allow for using a normal return using value
< 0 for errors.

- `int` for `index_get_length` as you advised (similar to `index_get`).
Also using negative value for error signaling.

The PyInt/PyLong difference in Python2 cost me a couple of point of
sanity down the line. You'll probably have some feedback about the way
it got handled.
>
>> 2) using Py_ssize_t for everything else
> Seems fine.
>
>> 3) Handle error as booleans (using true/false)
> See above. But I'm okay with booleans if you prefer.
Since we moved to signed type for all functions, I moved back to < 0
error checking.
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Patch

diff --git a/mercurial/cext/revlog.c b/mercurial/cext/revlog.c
--- a/mercurial/cext/revlog.c
+++ b/mercurial/cext/revlog.c
@@ -11,6 +11,7 @@ 
 #include <assert.h>
 #include <ctype.h>
 #include <stddef.h>
+#include <stdlib.h>
 #include <string.h>
 
 #include "bitmanipulation.h"
@@ -1068,6 +1069,230 @@  static bool trim_endidx(indexObject *sel
 	return 1;
 }
 
+struct Gap {
+	Py_ssize_t size;
+	Py_ssize_t idx;
+};
+
+static int gap_compare(const void *left, const void *right)
+{
+	const struct Gap *l_left = ((const struct Gap *)left);
+	const struct Gap *l_right = ((const struct Gap *)right);
+	if (l_left->size < l_right->size) {
+		return -1;
+	} else if (l_left->size > l_right->size) {
+		return 1;
+	}
+	return 0;
+}
+static int Py_ssize_t_compare(const void *left, const void *right)
+{
+	const Py_ssize_t l_left = *(const Py_ssize_t *)left;
+	const Py_ssize_t l_right = *(const Py_ssize_t *)right;
+	if (l_left < l_right) {
+		return -1;
+	} else if (l_left > l_right) {
+		return 1;
+	}
+	return 0;
+}
+
+static PyObject *index_slicechunktodensity(indexObject *self, PyObject *args)
+{
+	/* method arguments */
+	PyObject *list_revs = NULL; /* revisions in the chain */
+	double targetdensity = 0;   /* min density to achieve */
+	Py_ssize_t mingapsize = 0;  /* threshold to ignore gaps */
+
+	/* other core variables */
+	Py_ssize_t i;            /* used for various iteration */
+	PyObject *result = NULL; /* the final return of the function */
+
+	/* generic information about the delta chain being slice */
+	Py_ssize_t num_revs = 0;     /* size of the full delta chain */
+	Py_ssize_t *revs = NULL;     /* native array of revision in the chain */
+	Py_ssize_t chainpayload = 0; /* sum of all delta in the chain */
+	Py_ssize_t deltachainspan =
+	    0; /* distance from first byte to last byte */
+
+	/* variable used for slicing the delta chain */
+	Py_ssize_t readdata =
+	    0;              /* amount of data currently planned to be read */
+	double density = 0; /* ration of payload data compared to read ones */
+	struct Gap *gaps = NULL; /* array of notable gap in the chain */
+	Py_ssize_t num_gaps =
+	    0; /* total number of notable gap recorded so far */
+	Py_ssize_t *selected_indices = NULL; /* indices of gap skipped over */
+	Py_ssize_t num_selected = 0;         /* number of gaps skipped */
+	PyObject *chunk = NULL;              /* individual slice */
+	PyObject *allchunks = NULL;          /* all slices */
+
+	/* parsing argument */
+	if (!PyArg_ParseTuple(args, "O!dl", &PyList_Type, &list_revs,
+	                      &targetdensity, &mingapsize)) {
+		goto bail;
+	}
+
+	/* If the delta chain contains a single element, we do not need slicing
+	 */
+	num_revs = PyList_GET_SIZE(list_revs);
+	if (num_revs <= 1) {
+		result = PyTuple_Pack(1, list_revs);
+		goto done;
+	}
+
+	/* Turn the python list into a native integer array (for efficiency) */
+	revs = (Py_ssize_t *)calloc(num_revs, sizeof(Py_ssize_t));
+	if (revs == NULL) {
+		PyErr_NoMemory();
+		goto bail;
+	}
+	Py_ssize_t idxlen = index_length(self);
+	for (i = 0; i < num_revs; i++) {
+		Py_ssize_t revnum;
+		if (!pylong_to_long(PyList_GET_ITEM(list_revs, i), &revnum)) {
+			goto bail;
+		}
+		if (revnum == -1 && PyErr_Occurred()) {
+			goto bail;
+		}
+		if (revnum < 0 || revnum >= idxlen) {
+			PyErr_SetString(PyExc_IndexError, "index out of range");
+			goto bail;
+		}
+		revs[i] = revnum;
+	}
+
+	/* Compute and check various property of the unsliced delta chain */
+	if (!index_segment_span(self, revs[0], revs[num_revs - 1],
+	                        &deltachainspan)) {
+		goto bail;
+	}
+
+	if (deltachainspan <= mingapsize) {
+		result = PyTuple_Pack(1, list_revs);
+		goto done;
+	}
+	chainpayload = 0;
+	for (i = 0; i < num_revs; i++) {
+		long tmp;
+		if (!index_get_length(self, revs[i], &tmp)) {
+			goto bail;
+		}
+		chainpayload += tmp;
+	}
+
+	readdata = deltachainspan;
+	density = 1.0;
+
+	if (0 < deltachainspan) {
+		density = (double)chainpayload / (double)deltachainspan;
+	};
+
+	if (density >= targetdensity) {
+		result = PyTuple_Pack(1, list_revs);
+		goto done;
+	}
+
+	/* if chain is too sparse, look for relevant gaps */
+	gaps = (struct Gap *)calloc(num_revs, sizeof(struct Gap));
+	if (gaps == NULL) {
+		PyErr_NoMemory();
+		goto bail;
+	}
+
+	Py_ssize_t previous_end = -1;
+	for (i = 0; i < num_revs; i++) {
+		Py_ssize_t revstart;
+		Py_ssize_t revsize;
+		if (!index_get_start(self, revs[i], &revstart)) {
+			goto bail;
+		};
+		if (!index_get_length(self, revs[i], &revsize)) {
+			goto bail;
+		};
+		if (revsize <= 0) {
+			continue;
+		}
+		if (previous_end >= 0) {
+			Py_ssize_t gapsize = revstart - previous_end;
+			if (gapsize > mingapsize) {
+				gaps[num_gaps].size = gapsize;
+				gaps[num_gaps].idx = i;
+				num_gaps += 1;
+			}
+		}
+		previous_end = revstart + revsize;
+	}
+	if (num_gaps == 0) {
+		result = PyTuple_Pack(1, list_revs);
+		goto done;
+	}
+	qsort(gaps, num_gaps, sizeof(struct Gap), &gap_compare);
+
+	/* Slice the largest gap first, they improve the density the most */
+	selected_indices =
+	    (Py_ssize_t *)malloc((num_gaps + 1) * sizeof(Py_ssize_t));
+	if (selected_indices == NULL) {
+		PyErr_NoMemory();
+		goto bail;
+	}
+
+	for (i = num_gaps - 1; i >= 0; i--) {
+		selected_indices[num_selected] = gaps[i].idx;
+		readdata -= gaps[i].size;
+		num_selected += 1;
+		if (readdata <= 0) {
+			density = 1.0;
+		} else {
+			density = (double)chainpayload / (double)readdata;
+		}
+		if (density >= targetdensity) {
+			break;
+		}
+	}
+	qsort(selected_indices, num_selected, sizeof(Py_ssize_t),
+	      &Py_ssize_t_compare);
+
+	/* create the resulting slice */
+	allchunks = PyList_New(0);
+	if (allchunks == NULL) {
+		goto bail;
+	}
+	Py_ssize_t previdx = 0;
+	Py_ssize_t endidx;
+	selected_indices[num_selected] = num_revs;
+	for (i = 0; i <= num_selected; i++) {
+		Py_ssize_t idx = selected_indices[i];
+		if (!trim_endidx(self, revs, previdx, idx, &endidx)) {
+			goto bail;
+		}
+		if (previdx < endidx) {
+			chunk = PyList_GetSlice(list_revs, previdx, endidx);
+			if (chunk == NULL) {
+				goto bail;
+			}
+			if (PyList_Append(allchunks, chunk) == -1) {
+				goto bail;
+			}
+			Py_DECREF(chunk);
+			chunk = NULL;
+			previdx = idx;
+		}
+	}
+	result = allchunks;
+	goto done;
+
+bail:
+	Py_XDECREF(allchunks);
+	Py_XDECREF(chunk);
+done:
+	free(revs);
+	free(gaps);
+	free(selected_indices);
+	return result;
+}
+
 static inline int nt_level(const char *node, Py_ssize_t level)
 {
 	int v = node[level >> 1];
@@ -2300,6 +2525,8 @@  static PyMethodDef index_methods[] = {
      "get filtered head revisions"}, /* Can always do filtering */
     {"deltachain", (PyCFunction)index_deltachain, METH_VARARGS,
      "determine revisions with deltas to reconstruct fulltext"},
+    {"slicechunktodensity", (PyCFunction)index_slicechunktodensity,
+     METH_VARARGS, "determine revisions with deltas to reconstruct fulltext"},
     {"append", (PyCFunction)index_append, METH_O, "append an index entry"},
     {"partialmatch", (PyCFunction)index_partialmatch, METH_VARARGS,
      "match a potentially ambiguous node ID"},