Submitter | Gregory Szorc |
---|---|
Date | June 25, 2017, 7:42 p.m. |
Message ID | <9e98095e4daca1f22c70.1498419773@ubuntu-vm-main> |
Download | mbox | patch |
Permalink | /patch/21712/ |
State | Accepted |
Headers | show |
Comments
On Sun, 25 Jun 2017 12:42:53 -0700, Gregory Szorc wrote: > # HG changeset patch > # User Gregory Szorc <gregory.szorc@gmail.com> > # Date 1498419694 25200 > # Sun Jun 25 12:41:34 2017 -0700 > # Node ID 9e98095e4daca1f22c70f5374d1110c81be98cc4 > # Parent 8e3021fd1a44e48a4720bb6fa4538fba399ad213 > rev: C implementation of delta chain resolution Generally looks good to me as this is a direct reimplementation of Python version. > --- a/mercurial/cext/revlog.c > +++ b/mercurial/cext/revlog.c > @@ -816,6 +816,139 @@ bail: > return NULL; > } > > +static inline int index_baserev(indexObject *self, int rev) > +{ > + const char *data; > + > + if (rev >= self->length - 1) { > + PyObject *tuple = PyList_GET_ITEM(self->added, > + rev - self->length + 1); > + return (int)PyInt_AS_LONG(PyTuple_GET_ITEM(tuple, 3)); > + } > + else { > + data = index_deref(self, rev); > + if (data == NULL) { > + return -2; Since index_deref() sets MemoryError by itself, callers of index_baserev() should just return NULL on -2. > +static PyObject *index_deltachain(indexObject *self, PyObject *args) > +{ > + int rev, generaldelta; > + PyObject *stoparg; > + int stoprev, iterrev, baserev = -1; > + int stopped; > + PyObject *chain = NULL, *value = NULL, *result = NULL; > + const Py_ssize_t length = index_length(self); > + > + if (!PyArg_ParseTuple(args, "iOi", &rev, &stoparg, &generaldelta)) { > + return NULL; > + } > + > + if (PyInt_Check(stoparg)) { > + stoprev = (int)PyInt_AsLong(stoparg); > + if (stoprev == -1 && PyErr_Occurred()) { > + return NULL; > + } > + } > + else if (stoparg == Py_None) { > + stoprev = -2; > + } > + else { > + PyErr_SetString(PyExc_ValueError, > + "stoprev must be integer or None"); > + return NULL; > + } > + > + if (rev < 0 || rev >= length - 1) { > + PyErr_SetString(PyExc_ValueError, "revlog index out of range"); > + return NULL; > + } > + > + chain = PyList_New(0); > + if (chain == NULL) { > + return NULL; > + } > + > + baserev = index_baserev(self, rev); > + > + /* This should never happen. */ > + if (baserev == -2) { > + PyErr_SetString(PyExc_IndexError, "unable to resolve data"); > + goto bail; > + } > + > + iterrev = rev; > + > + while (iterrev != baserev && iterrev != stoprev) { > + value = PyInt_FromLong(iterrev); Perhaps the scope of "value" could be narrowed as it isn't processed at the global bail routine. > + if (value == NULL) { > + goto bail; > + } > + if (PyList_Append(chain, value)) { > + Py_DECREF(value); > + goto bail; > + } > + Py_DECREF(value); > + > + if (generaldelta) { > + iterrev = baserev; > + } > + else { > + iterrev--; > + } > + > + if (iterrev < 0) { > + break; > + } > + > + if (iterrev >= length) { Maybe 'iterrev >= length - 1'? index_baserev() seems not to handle the index of the null entry, and the real null revision should be caught by the "break" above. > + PyErr_SetString(PyExc_IndexError, "revision outside index"); > + return NULL; > + }
Looks good to me. I have some minor comments, which don't affect correctness and only count for 2% performance difference. So nothing serious. I'd suggest queue this now, and we can improve later. Excerpts from 's message of 2017-06-25 12:42:53 -0700: > +static inline int index_baserev(indexObject *self, int rev) Might worth a comment that "rev" range check must be done by the caller. So new users of this function will be less likely to cause SIGSEGV. > + if (!PyArg_ParseTuple(args, "iOi", &rev, &stoparg, &generaldelta)) { "generaldelta" could be part of indexObject (passed via initfunc) to avoid "self._generaldelta" in Python code which is a hash lookup. That also makes it possible to have a same signature with self._deltachain. > [...] > + /* This should never happen. */ > + if (baserev == -2) { Might use "<= -2" to rule out other strange cases. > + # Try C implementation. > + try: > + return self.index.deltachain(rev, stoprev, self._generaldelta) > + except AttributeError: > + pass > + I think it's better to replace the function instead of calling "try" each time. Like in revlog.__init__: try: self._deltachain = self.index.deltachain except AttributeError: pass With the above changes (avoid passing generaldelta and replace the method directly), perfrevlogrevision --cache deltachain gets 2% improvement. Smaller than I excepted, so might be not that worthy to do the change.
On Tue, 27 Jun 2017 23:24:43 +0900, Yuya Nishihara wrote: > On Sun, 25 Jun 2017 12:42:53 -0700, Gregory Szorc wrote: > > # HG changeset patch > > # User Gregory Szorc <gregory.szorc@gmail.com> > > # Date 1498419694 25200 > > # Sun Jun 25 12:41:34 2017 -0700 > > # Node ID 9e98095e4daca1f22c70f5374d1110c81be98cc4 > > # Parent 8e3021fd1a44e48a4720bb6fa4538fba399ad213 > > rev: C implementation of delta chain resolution > > + if (value == NULL) { > > + goto bail; > > + } > > + if (PyList_Append(chain, value)) { > > + Py_DECREF(value); > > + goto bail; > > + } > > + Py_DECREF(value); > > + > > + if (generaldelta) { > > + iterrev = baserev; > > + } > > + else { > > + iterrev--; > > + } > > + > > + if (iterrev < 0) { > > + break; > > + } > > + > > + if (iterrev >= length) { > > Maybe 'iterrev >= length - 1'? index_baserev() seems not to handle the index > of the null entry, and the real null revision should be caught by the "break" > above. Fixed this in flight (because it could lead to SEGV) and queued, thanks. Please send follow ups for the other nits.
On Thu, Jun 29, 2017 at 8:45 PM, Jun Wu <quark@fb.com> wrote: > Looks good to me. I have some minor comments, which don't affect > correctness > and only count for 2% performance difference. So nothing serious. I'd > suggest queue this now, and we can improve later. > > Excerpts from 's message of 2017-06-25 12:42:53 -0700: > > +static inline int index_baserev(indexObject *self, int rev) > > Might worth a comment that "rev" range check must be done by the caller. So > new users of this function will be less likely to cause SIGSEGV. > > > + if (!PyArg_ParseTuple(args, "iOi", &rev, &stoparg, &generaldelta)) { > > "generaldelta" could be part of indexObject (passed via initfunc) to avoid > "self._generaldelta" in Python code which is a hash lookup. That also makes > it possible to have a same signature with self._deltachain. > > > [...] > > + /* This should never happen. */ > > + if (baserev == -2) { > > Might use "<= -2" to rule out other strange cases. > > > + # Try C implementation. > > + try: > > + return self.index.deltachain(rev, stoprev, > self._generaldelta) > > + except AttributeError: > > + pass > > + > > I think it's better to replace the function instead of calling "try" each > time. Like in revlog.__init__: > > try: > self._deltachain = self.index.deltachain > except AttributeError: > pass > > With the above changes (avoid passing generaldelta and replace the method > directly), perfrevlogrevision --cache deltachain gets 2% improvement. > Smaller than I excepted, so might be not that worthy to do the change. > I agree with this feedback to make the signature identical and not having to check attribute presence on every invocation. However, it is beyond my effort to benefit ratio, so I'm not going to implement it.
Patch
diff --git a/mercurial/cext/revlog.c b/mercurial/cext/revlog.c --- a/mercurial/cext/revlog.c +++ b/mercurial/cext/revlog.c @@ -816,6 +816,139 @@ bail: return NULL; } +static inline int index_baserev(indexObject *self, int rev) +{ + const char *data; + + if (rev >= self->length - 1) { + PyObject *tuple = PyList_GET_ITEM(self->added, + rev - self->length + 1); + return (int)PyInt_AS_LONG(PyTuple_GET_ITEM(tuple, 3)); + } + else { + data = index_deref(self, rev); + if (data == NULL) { + return -2; + } + + return getbe32(data + 16); + } +} + +static PyObject *index_deltachain(indexObject *self, PyObject *args) +{ + int rev, generaldelta; + PyObject *stoparg; + int stoprev, iterrev, baserev = -1; + int stopped; + PyObject *chain = NULL, *value = NULL, *result = NULL; + const Py_ssize_t length = index_length(self); + + if (!PyArg_ParseTuple(args, "iOi", &rev, &stoparg, &generaldelta)) { + return NULL; + } + + if (PyInt_Check(stoparg)) { + stoprev = (int)PyInt_AsLong(stoparg); + if (stoprev == -1 && PyErr_Occurred()) { + return NULL; + } + } + else if (stoparg == Py_None) { + stoprev = -2; + } + else { + PyErr_SetString(PyExc_ValueError, + "stoprev must be integer or None"); + return NULL; + } + + if (rev < 0 || rev >= length - 1) { + PyErr_SetString(PyExc_ValueError, "revlog index out of range"); + return NULL; + } + + chain = PyList_New(0); + if (chain == NULL) { + return NULL; + } + + baserev = index_baserev(self, rev); + + /* This should never happen. */ + if (baserev == -2) { + PyErr_SetString(PyExc_IndexError, "unable to resolve data"); + goto bail; + } + + iterrev = rev; + + while (iterrev != baserev && iterrev != stoprev) { + value = PyInt_FromLong(iterrev); + if (value == NULL) { + goto bail; + } + if (PyList_Append(chain, value)) { + Py_DECREF(value); + goto bail; + } + Py_DECREF(value); + + if (generaldelta) { + iterrev = baserev; + } + else { + iterrev--; + } + + if (iterrev < 0) { + break; + } + + if (iterrev >= length) { + PyErr_SetString(PyExc_IndexError, "revision outside index"); + return NULL; + } + + baserev = index_baserev(self, iterrev); + + /* This should never happen. */ + if (baserev == -2) { + PyErr_SetString(PyExc_IndexError, "unable to resolve data"); + goto bail; + } + } + + if (iterrev == stoprev) { + stopped = 1; + } + else { + value = PyInt_FromLong(iterrev); + if (value == NULL) { + goto bail; + } + if (PyList_Append(chain, value)) { + Py_DECREF(value); + goto bail; + } + Py_DECREF(value); + + stopped = 0; + } + + if (PyList_Reverse(chain)) { + goto bail; + } + + result = Py_BuildValue("OO", chain, stopped ? Py_True : Py_False); + Py_DECREF(chain); + return result; + +bail: + Py_DECREF(chain); + return NULL; +} + static inline int nt_level(const char *node, Py_ssize_t level) { int v = node[level>>1]; @@ -1828,6 +1961,8 @@ static PyMethodDef index_methods[] = { "get head revisions"}, /* Can do filtering since 3.2 */ {"headrevsfiltered", (PyCFunction)index_headrevs, METH_VARARGS, "get filtered head revisions"}, /* Can always do filtering */ + {"deltachain", (PyCFunction)index_deltachain, METH_VARARGS, + "determine revisions with deltas to reconstruct fulltext"}, {"insert", (PyCFunction)index_insert, METH_VARARGS, "insert an index entry"}, {"partialmatch", (PyCFunction)index_partialmatch, METH_VARARGS, diff --git a/mercurial/revlog.py b/mercurial/revlog.py --- a/mercurial/revlog.py +++ b/mercurial/revlog.py @@ -570,6 +570,12 @@ class revlog(object): revs in ascending order and ``stopped`` is a bool indicating whether ``stoprev`` was hit. """ + # Try C implementation. + try: + return self.index.deltachain(rev, stoprev, self._generaldelta) + except AttributeError: + pass + chain = [] # Alias to prevent attribute lookup in tight loop.