Patchwork rev: C implementation of delta chain resolution

login
register
mail settings
Submitter Gregory Szorc
Date June 25, 2017, 7:42 p.m.
Message ID <9e98095e4daca1f22c70.1498419773@ubuntu-vm-main>
Download mbox | patch
Permalink /patch/21712/
State Accepted
Headers show

Comments

Gregory Szorc - June 25, 2017, 7:42 p.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1498419694 25200
#      Sun Jun 25 12:41:34 2017 -0700
# Node ID 9e98095e4daca1f22c70f5374d1110c81be98cc4
# Parent  8e3021fd1a44e48a4720bb6fa4538fba399ad213
rev: C implementation of delta chain resolution

I've seen revlog._deltachain() appear in a number of performance
profiles. I suspect there are 2 reasons for this:

1. Delta chain resolution performs many index lookups, thus triggering
   population of index tuples. Creating possibly tens of thousands of
   PyObject will have overhead.
2. Delta chain resolution is a tight loop.

By moving delta chain resolution to C, we can defer instantiation
of full index entry tuples and make the loop faster courtesy of
not running in Python.

We can measure the impact to delta chain resolution via
`hg perflogrevision` using the mozilla-central repo with a recent
manifest having delta chain length of 33726:

$ hg perfrevlogrevision -m 364895
! full
! wall 0.367585 comb 0.370000 user 0.340000 sys 0.030000 (best of 27)
! wall 0.357581 comb 0.360000 user 0.350000 sys 0.010000 (best of 28)
! deltachain
! wall 0.010644 comb 0.010000 user 0.010000 sys 0.000000 (best of 270)
! wall 0.000292 comb 0.000000 user 0.000000 sys 0.000000 (best of 8729)

$ hg perfrevlogrevision --cache -m 364895
! deltachain
! wall 0.003904 comb 0.000000 user 0.000000 sys 0.000000 (best of 712)
! wall 0.000284 comb 0.000000 user 0.000000 sys 0.000000 (best of 9926)

The first test measures savings from both not instantiating index
entries and moving to C. The second test (which doesn't clear the
index caches) essentially isolates the benefits of moving from Python
to C. It still shows a 13.7x speedup (versus 36.4x). And there are
multiple milliseconds of savings within the critical path for resolving
revision data. I think that justifies the existence of C code.

A more striking example of the benefits of this change can be
demonstrated by timing `hg debugdeltachain -m` for the mozilla-central
repo:

$ time hg debugdeltachain -m > /dev/null
before:        1057.4s
after:          503.3s
PyPy2.7 5.8.0:  220.0s

It's worth noting that the C code isn't as optimal as it could be.
We're still instantiating a new PyObject for every revision. A future
optimization would be to reuse the PyObject on the cached index tuple.
We could potentially also get wins by using a memory array of raw
integers. There is also room for a delta chain cache on revlog
instances. Of course, the best optimization is to implement revlog
reading outside of Python so Python doesn't need to be concerned
about the relatively expensive index entries and operations on them.
Yuya Nishihara - June 27, 2017, 2:24 p.m.
On Sun, 25 Jun 2017 12:42:53 -0700, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1498419694 25200
> #      Sun Jun 25 12:41:34 2017 -0700
> # Node ID 9e98095e4daca1f22c70f5374d1110c81be98cc4
> # Parent  8e3021fd1a44e48a4720bb6fa4538fba399ad213
> rev: C implementation of delta chain resolution

Generally looks good to me as this is a direct reimplementation of Python
version.

> --- a/mercurial/cext/revlog.c
> +++ b/mercurial/cext/revlog.c
> @@ -816,6 +816,139 @@ bail:
>  	return NULL;
>  }
>  
> +static inline int index_baserev(indexObject *self, int rev)
> +{
> +	const char *data;
> +
> +	if (rev >= self->length - 1) {
> +		PyObject *tuple = PyList_GET_ITEM(self->added,
> +			rev - self->length + 1);
> +		return (int)PyInt_AS_LONG(PyTuple_GET_ITEM(tuple, 3));
> +	}
> +	else {
> +		data = index_deref(self, rev);
> +		if (data == NULL) {
> +			return -2;

Since index_deref() sets MemoryError by itself, callers of index_baserev()
should just return NULL on -2.

> +static PyObject *index_deltachain(indexObject *self, PyObject *args)
> +{
> +	int rev, generaldelta;
> +	PyObject *stoparg;
> +	int stoprev, iterrev, baserev = -1;
> +	int stopped;
> +	PyObject *chain = NULL, *value = NULL, *result = NULL;
> +	const Py_ssize_t length = index_length(self);
> +
> +	if (!PyArg_ParseTuple(args, "iOi", &rev, &stoparg, &generaldelta)) {
> +		return NULL;
> +	}
> +
> +	if (PyInt_Check(stoparg)) {
> +		stoprev = (int)PyInt_AsLong(stoparg);
> +		if (stoprev == -1 && PyErr_Occurred()) {
> +			return NULL;
> +		}
> +	}
> +	else if (stoparg == Py_None) {
> +		stoprev = -2;
> +	}
> +	else {
> +		PyErr_SetString(PyExc_ValueError,
> +			"stoprev must be integer or None");
> +		return NULL;
> +	}
> +
> +	if (rev < 0 || rev >= length - 1) {
> +		PyErr_SetString(PyExc_ValueError, "revlog index out of range");
> +		return NULL;
> +	}
> +
> +	chain = PyList_New(0);
> +	if (chain == NULL) {
> +		return NULL;
> +	}
> +
> +	baserev = index_baserev(self, rev);
> +
> +	/* This should never happen. */
> +	if (baserev == -2) {
> +		PyErr_SetString(PyExc_IndexError, "unable to resolve data");
> +		goto bail;
> +	}
> +
> +	iterrev = rev;
> +
> +	while (iterrev != baserev && iterrev != stoprev) {
> +		value = PyInt_FromLong(iterrev);

Perhaps the scope of "value" could be narrowed as it isn't processed at the
global bail routine.

> +		if (value == NULL) {
> +			goto bail;
> +		}
> +		if (PyList_Append(chain, value)) {
> +			Py_DECREF(value);
> +			goto bail;
> +		}
> +		Py_DECREF(value);
> +
> +		if (generaldelta) {
> +			iterrev = baserev;
> +		}
> +		else {
> +			iterrev--;
> +		}
> +
> +		if (iterrev < 0) {
> +			break;
> +		}
> +
> +		if (iterrev >= length) {

Maybe 'iterrev >= length - 1'? index_baserev() seems not to handle the index
of the null entry, and the real null revision should be caught by the "break"
above.

> +			PyErr_SetString(PyExc_IndexError, "revision outside index");
> +			return NULL;
> +		}
Jun Wu - June 30, 2017, 3:45 a.m.
Looks good to me. I have some minor comments, which don't affect correctness
and only count for 2% performance difference. So nothing serious. I'd
suggest queue this now, and we can improve later.

Excerpts from 's message of 2017-06-25 12:42:53 -0700:
> +static inline int index_baserev(indexObject *self, int rev)

Might worth a comment that "rev" range check must be done by the caller. So
new users of this function will be less likely to cause SIGSEGV.

> +    if (!PyArg_ParseTuple(args, "iOi", &rev, &stoparg, &generaldelta)) {

"generaldelta" could be part of indexObject (passed via initfunc) to avoid
"self._generaldelta" in Python code which is a hash lookup. That also makes
it possible to have a same signature with self._deltachain.

> [...]
> +    /* This should never happen. */
> +    if (baserev == -2) {

Might use "<= -2" to rule out other strange cases.

> +        # Try C implementation.
> +        try:
> +            return self.index.deltachain(rev, stoprev, self._generaldelta)
> +        except AttributeError:
> +            pass
> +

I think it's better to replace the function instead of calling "try" each
time. Like in revlog.__init__:

  try:
      self._deltachain = self.index.deltachain
  except AttributeError:
      pass

With the above changes (avoid passing generaldelta and replace the method
directly), perfrevlogrevision --cache deltachain gets 2% improvement.
Smaller than I excepted, so might be not that worthy to do the change.
Yuya Nishihara - July 1, 2017, 3:39 a.m.
On Tue, 27 Jun 2017 23:24:43 +0900, Yuya Nishihara wrote:
> On Sun, 25 Jun 2017 12:42:53 -0700, Gregory Szorc wrote:
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc@gmail.com>
> > # Date 1498419694 25200
> > #      Sun Jun 25 12:41:34 2017 -0700
> > # Node ID 9e98095e4daca1f22c70f5374d1110c81be98cc4
> > # Parent  8e3021fd1a44e48a4720bb6fa4538fba399ad213
> > rev: C implementation of delta chain resolution

> > +		if (value == NULL) {
> > +			goto bail;
> > +		}
> > +		if (PyList_Append(chain, value)) {
> > +			Py_DECREF(value);
> > +			goto bail;
> > +		}
> > +		Py_DECREF(value);
> > +
> > +		if (generaldelta) {
> > +			iterrev = baserev;
> > +		}
> > +		else {
> > +			iterrev--;
> > +		}
> > +
> > +		if (iterrev < 0) {
> > +			break;
> > +		}
> > +
> > +		if (iterrev >= length) {
> 
> Maybe 'iterrev >= length - 1'? index_baserev() seems not to handle the index
> of the null entry, and the real null revision should be caught by the "break"
> above.

Fixed this in flight (because it could lead to SEGV) and queued, thanks.

Please send follow ups for the other nits.
Gregory Szorc - July 2, 2017, 3:02 a.m.
On Thu, Jun 29, 2017 at 8:45 PM, Jun Wu <quark@fb.com> wrote:

> Looks good to me. I have some minor comments, which don't affect
> correctness
> and only count for 2% performance difference. So nothing serious. I'd
> suggest queue this now, and we can improve later.
>
> Excerpts from 's message of 2017-06-25 12:42:53 -0700:
> > +static inline int index_baserev(indexObject *self, int rev)
>
> Might worth a comment that "rev" range check must be done by the caller. So
> new users of this function will be less likely to cause SIGSEGV.
>
> > +    if (!PyArg_ParseTuple(args, "iOi", &rev, &stoparg, &generaldelta)) {
>
> "generaldelta" could be part of indexObject (passed via initfunc) to avoid
> "self._generaldelta" in Python code which is a hash lookup. That also makes
> it possible to have a same signature with self._deltachain.
>
> > [...]
> > +    /* This should never happen. */
> > +    if (baserev == -2) {
>
> Might use "<= -2" to rule out other strange cases.
>
> > +        # Try C implementation.
> > +        try:
> > +            return self.index.deltachain(rev, stoprev,
> self._generaldelta)
> > +        except AttributeError:
> > +            pass
> > +
>
> I think it's better to replace the function instead of calling "try" each
> time. Like in revlog.__init__:
>
>   try:
>       self._deltachain = self.index.deltachain
>   except AttributeError:
>       pass
>
> With the above changes (avoid passing generaldelta and replace the method
> directly), perfrevlogrevision --cache deltachain gets 2% improvement.
> Smaller than I excepted, so might be not that worthy to do the change.
>

I agree with this feedback to make the signature identical and not having
to check attribute presence on every invocation. However, it is beyond my
effort to benefit ratio, so I'm not going to implement it.

Patch

diff --git a/mercurial/cext/revlog.c b/mercurial/cext/revlog.c
--- a/mercurial/cext/revlog.c
+++ b/mercurial/cext/revlog.c
@@ -816,6 +816,139 @@  bail:
 	return NULL;
 }
 
+static inline int index_baserev(indexObject *self, int rev)
+{
+	const char *data;
+
+	if (rev >= self->length - 1) {
+		PyObject *tuple = PyList_GET_ITEM(self->added,
+			rev - self->length + 1);
+		return (int)PyInt_AS_LONG(PyTuple_GET_ITEM(tuple, 3));
+	}
+	else {
+		data = index_deref(self, rev);
+		if (data == NULL) {
+			return -2;
+		}
+
+		return getbe32(data + 16);
+	}
+}
+
+static PyObject *index_deltachain(indexObject *self, PyObject *args)
+{
+	int rev, generaldelta;
+	PyObject *stoparg;
+	int stoprev, iterrev, baserev = -1;
+	int stopped;
+	PyObject *chain = NULL, *value = NULL, *result = NULL;
+	const Py_ssize_t length = index_length(self);
+
+	if (!PyArg_ParseTuple(args, "iOi", &rev, &stoparg, &generaldelta)) {
+		return NULL;
+	}
+
+	if (PyInt_Check(stoparg)) {
+		stoprev = (int)PyInt_AsLong(stoparg);
+		if (stoprev == -1 && PyErr_Occurred()) {
+			return NULL;
+		}
+	}
+	else if (stoparg == Py_None) {
+		stoprev = -2;
+	}
+	else {
+		PyErr_SetString(PyExc_ValueError,
+			"stoprev must be integer or None");
+		return NULL;
+	}
+
+	if (rev < 0 || rev >= length - 1) {
+		PyErr_SetString(PyExc_ValueError, "revlog index out of range");
+		return NULL;
+	}
+
+	chain = PyList_New(0);
+	if (chain == NULL) {
+		return NULL;
+	}
+
+	baserev = index_baserev(self, rev);
+
+	/* This should never happen. */
+	if (baserev == -2) {
+		PyErr_SetString(PyExc_IndexError, "unable to resolve data");
+		goto bail;
+	}
+
+	iterrev = rev;
+
+	while (iterrev != baserev && iterrev != stoprev) {
+		value = PyInt_FromLong(iterrev);
+		if (value == NULL) {
+			goto bail;
+		}
+		if (PyList_Append(chain, value)) {
+			Py_DECREF(value);
+			goto bail;
+		}
+		Py_DECREF(value);
+
+		if (generaldelta) {
+			iterrev = baserev;
+		}
+		else {
+			iterrev--;
+		}
+
+		if (iterrev < 0) {
+			break;
+		}
+
+		if (iterrev >= length) {
+			PyErr_SetString(PyExc_IndexError, "revision outside index");
+			return NULL;
+		}
+
+		baserev = index_baserev(self, iterrev);
+
+		/* This should never happen. */
+		if (baserev == -2) {
+			PyErr_SetString(PyExc_IndexError, "unable to resolve data");
+			goto bail;
+		}
+	}
+
+	if (iterrev == stoprev) {
+		stopped = 1;
+	}
+	else {
+		value = PyInt_FromLong(iterrev);
+		if (value == NULL) {
+			goto bail;
+		}
+		if (PyList_Append(chain, value)) {
+			Py_DECREF(value);
+			goto bail;
+		}
+		Py_DECREF(value);
+
+		stopped = 0;
+	}
+
+	if (PyList_Reverse(chain)) {
+		goto bail;
+	}
+
+	result = Py_BuildValue("OO", chain, stopped ? Py_True : Py_False);
+	Py_DECREF(chain);
+	return result;
+
+bail:
+	Py_DECREF(chain);
+	return NULL;
+}
+
 static inline int nt_level(const char *node, Py_ssize_t level)
 {
 	int v = node[level>>1];
@@ -1828,6 +1961,8 @@  static PyMethodDef index_methods[] = {
 	 "get head revisions"}, /* Can do filtering since 3.2 */
 	{"headrevsfiltered", (PyCFunction)index_headrevs, METH_VARARGS,
 	 "get filtered head revisions"}, /* Can always do filtering */
+	{"deltachain", (PyCFunction)index_deltachain, METH_VARARGS,
+	 "determine revisions with deltas to reconstruct fulltext"},
 	{"insert", (PyCFunction)index_insert, METH_VARARGS,
 	 "insert an index entry"},
 	{"partialmatch", (PyCFunction)index_partialmatch, METH_VARARGS,
diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -570,6 +570,12 @@  class revlog(object):
         revs in ascending order and ``stopped`` is a bool indicating whether
         ``stoprev`` was hit.
         """
+        # Try C implementation.
+        try:
+            return self.index.deltachain(rev, stoprev, self._generaldelta)
+        except AttributeError:
+            pass
+
         chain = []
 
         # Alias to prevent attribute lookup in tight loop.