Patchwork [4,of,6] obsolete: add native version of addsuccessors()

login
register
mail settings
Submitter Martin von Zweigbergk
Date Feb. 4, 2015, 4:01 a.m.
Message ID <c6828c4acf92b415ee72.1423022471@martinvonz.mtv.corp.google.com>
Download mbox | patch
Permalink /patch/7652/
State Superseded
Headers show

Comments

Martin von Zweigbergk - Feb. 4, 2015, 4:01 a.m.
# HG changeset patch
# User Martin von Zweigbergk <martinvonz@google.com>
# Date 1421909060 28800
#      Wed Jan 21 22:44:20 2015 -0800
# Node ID c6828c4acf92b415ee722371d781b28870f17e7e
# Parent  cd65758885ab3cc6649001fa9aa68105d818323a
obsolete: add native version of addsuccessors()

This speeds up 'hg status' from 0.438s to 0.425s.
Martin von Zweigbergk - Feb. 4, 2015, 4:05 a.m.
On Tue Feb 03 2015 at 8:01:14 PM Martin von Zweigbergk <
martinvonz@google.com> wrote:

> # HG changeset patch
> # User Martin von Zweigbergk <martinvonz@google.com>
> # Date 1421909060 28800
> #      Wed Jan 21 22:44:20 2015 -0800
> # Node ID c6828c4acf92b415ee722371d781b28870f17e7e
> # Parent  cd65758885ab3cc6649001fa9aa68105d818323a
> obsolete: add native version of addsuccessors()
>
> This speeds up 'hg status' from 0.438s to 0.425s.
>

Is this enough make it worth the extra code? I consider 3/6 worth its cost,
but I'm not sure about 4/6-6/6.

Pierre-Yves, I don't even know what a good test case for 5/6 and 6/6 would
be. When are precursors and children used?


> diff -r cd65758885ab -r c6828c4acf92 mercurial/obsolete.py
> --- a/mercurial/obsolete.py     Tue Jan 20 22:34:57 2015 -0800
> +++ b/mercurial/obsolete.py     Wed Jan 21 22:44:20 2015 -0800
> @@ -607,6 +607,11 @@
>
>      @util.nogc
>      def _addsuccessors(self, markers):
> +        native = getattr(parsers, 'addsuccessors', None)
> +        if native:
> +            native(markers, self._successors)
> +            return
> +
>          for mark in markers:
>              self._successors.setdefault(mark[0], set()).add(mark)
>
> diff -r cd65758885ab -r c6828c4acf92 mercurial/parsers.c
> --- a/mercurial/parsers.c       Tue Jan 20 22:34:57 2015 -0800
> +++ b/mercurial/parsers.c       Wed Jan 21 22:44:20 2015 -0800
> @@ -2294,6 +2294,46 @@
>         return NULL;
>  }
>
> +static PyObject *setdefaultemptyset(PyObject *dict, PyObject *key) {
> +       PyObject *set = PyDict_GetItem(dict, key);
> +       if (!set) {
> +               set = PySet_New(NULL);
> +               if (!set) {
> +                       return NULL;
> +               }
> +               PyDict_SetItem(dict, key, set);
> +               Py_DECREF(set);
> +       }
> +       return set;
> +}
> +
> +static PyObject *addsuccessors(PyObject *self, PyObject *args) {
> +       PyObject *markers;
> +       PyObject *successors;
> +       PyObject *iterator;
> +       PyObject *marker;
> +
> +       if (!PyArg_ParseTuple(args, "OO", &markers, &successors)) {
> +               return NULL;
> +       }
> +
> +       iterator = PyObject_GetIter(markers);
> +       if (!iterator) {
> +               return NULL;
> +       }
> +       while ((marker = PyIter_Next(iterator))) {
> +               PyObject *precursor = PyTuple_GetItem(marker, 0);
> +               PyObject *set = setdefaultemptyset(successors, precursor);
> +               if (!set) {
> +                       return NULL;
> +               }
> +               PySet_Add(set, marker);
> +               Py_DECREF(marker);
> +       }
> +       Py_DECREF(iterator);
> +       return Py_None;
> +}
> +
>  static char parsers_doc[] = "Efficient content parsing.";
>
>  PyObject *encodedir(PyObject *self, PyObject *args);
> @@ -2311,6 +2351,8 @@
>         {"lowerencode", lowerencode, METH_VARARGS, "lower-encode a
> path\n"},
>         {"fm1readmarkers", fm1readmarkers, METH_VARARGS,
>                         "parse v1 obsolete markers\n"},
> +       {"addsuccessors", addsuccessors, METH_VARARGS,
> +                       "updates successors from markers\n"},
>         {NULL, NULL}
>  };
>
>
Matt Mackall - Feb. 4, 2015, 10:09 p.m.
On Wed, 2015-02-04 at 04:05 +0000, Martin von Zweigbergk wrote:
> On Tue Feb 03 2015 at 8:01:14 PM Martin von Zweigbergk <
> martinvonz@google.com> wrote:
> 
> > # HG changeset patch
> > # User Martin von Zweigbergk <martinvonz@google.com>
> > # Date 1421909060 28800
> > #      Wed Jan 21 22:44:20 2015 -0800
> > # Node ID c6828c4acf92b415ee722371d781b28870f17e7e
> > # Parent  cd65758885ab3cc6649001fa9aa68105d818323a
> > obsolete: add native version of addsuccessors()
> >
> > This speeds up 'hg status' from 0.438s to 0.425s.
> >
> 
> Is this enough make it worth the extra code? I consider 3/6 worth its cost,
> but I'm not sure about 4/6-6/6.

One wonders why status cares about markers _at all_. Something is wrong
here.

In the past, I've considered making a bulk function 4x faster to be a
good threshold for going to C. I think the real win here would be
delaying actual parsing of markers into corresponding Python objects.
This isn't smelling like a big enough win.

We might want to look at duplicating the magic the index parser uses. It
reads a whole revlog index into memory and then lazily emulates the
Python containers we had before.
Martin von Zweigbergk - Feb. 4, 2015, 10:18 p.m.
On Wed Feb 04 2015 at 2:09:25 PM Matt Mackall <mpm@selenic.com> wrote:

> On Wed, 2015-02-04 at 04:05 +0000, Martin von Zweigbergk wrote:
> > On Tue Feb 03 2015 at 8:01:14 PM Martin von Zweigbergk <
> > martinvonz@google.com> wrote:
> >
> > > # HG changeset patch
> > > # User Martin von Zweigbergk <martinvonz@google.com>
> > > # Date 1421909060 28800
> > > #      Wed Jan 21 22:44:20 2015 -0800
> > > # Node ID c6828c4acf92b415ee722371d781b28870f17e7e
> > > # Parent  cd65758885ab3cc6649001fa9aa68105d818323a
> > > obsolete: add native version of addsuccessors()
> > >
> > > This speeds up 'hg status' from 0.438s to 0.425s.
> > >
> >
> > Is this enough make it worth the extra code? I consider 3/6 worth its
> cost,
> > but I'm not sure about 4/6-6/6.
>
> One wonders why status cares about markers _at all_. Something is wrong
> here.
>
> In the past, I've considered making a bulk function 4x faster to be a
> good threshold for going to C. I think the real win here would be
> delaying actual parsing of markers into corresponding Python objects.
> This isn't smelling like a big enough win.
>
> We might want to look at duplicating the magic the index parser uses. It
> reads a whole revlog index into memory and then lazily emulates the
> Python containers we had before.
>
> --
> Mathematics is the supreme nostalgia of our time.
>
>
>
Yes, I agree. I'm with with dropping these last 3 patches for now. I have a
few patches that stop 'hg status' and 'hg diff' from reading the markers,
but they're probably not the right approach.

It wasn't quite clear to me what you think about the first 3. They would
still be useful for e.g. "hg log --limit 10" or so, which does need to read
markers, but which doesn't need to populate precursors and children.
Matt Mackall - Feb. 4, 2015, 10:48 p.m.
On Wed, 2015-02-04 at 22:18 +0000, Martin von Zweigbergk wrote:
> On Wed Feb 04 2015 at 2:09:25 PM Matt Mackall <mpm@selenic.com> wrote:
> 
> > On Wed, 2015-02-04 at 04:05 +0000, Martin von Zweigbergk wrote:
> > > On Tue Feb 03 2015 at 8:01:14 PM Martin von Zweigbergk <
> > > martinvonz@google.com> wrote:
> > >
> > > > # HG changeset patch
> > > > # User Martin von Zweigbergk <martinvonz@google.com>
> > > > # Date 1421909060 28800
> > > > #      Wed Jan 21 22:44:20 2015 -0800
> > > > # Node ID c6828c4acf92b415ee722371d781b28870f17e7e
> > > > # Parent  cd65758885ab3cc6649001fa9aa68105d818323a
> > > > obsolete: add native version of addsuccessors()
> > > >
> > > > This speeds up 'hg status' from 0.438s to 0.425s.
> > > >
> > >
> > > Is this enough make it worth the extra code? I consider 3/6 worth its
> > cost,
> > > but I'm not sure about 4/6-6/6.
> >
> > One wonders why status cares about markers _at all_. Something is wrong
> > here.
> >
> > In the past, I've considered making a bulk function 4x faster to be a
> > good threshold for going to C. I think the real win here would be
> > delaying actual parsing of markers into corresponding Python objects.
> > This isn't smelling like a big enough win.
> >
> > We might want to look at duplicating the magic the index parser uses. It
> > reads a whole revlog index into memory and then lazily emulates the
> > Python containers we had before.
> >
> > --
> > Mathematics is the supreme nostalgia of our time.
> >
> >
> >
> Yes, I agree. I'm with with dropping these last 3 patches for now. I have a
> few patches that stop 'hg status' and 'hg diff' from reading the markers,
> but they're probably not the right approach.
> 
> It wasn't quite clear to me what you think about the first 3. They would
> still be useful for e.g. "hg log --limit 10" or so, which does need to read
> markers, but which doesn't need to populate precursors and children.

I'm leaning towards deferring these a bit, since we've just landed some
big C code here and have some other low-hanging fruit to think about. As
your PoC status patch shows, it's a much bigger win for a frequent
command to just avoid loading markers entirely.
Matt Mackall - Feb. 4, 2015, 10:50 p.m.
On Wed, 2015-02-04 at 16:48 -0600, Matt Mackall wrote:
> On Wed, 2015-02-04 at 22:18 +0000, Martin von Zweigbergk wrote:
> > On Wed Feb 04 2015 at 2:09:25 PM Matt Mackall <mpm@selenic.com> wrote:
> > 
> > > On Wed, 2015-02-04 at 04:05 +0000, Martin von Zweigbergk wrote:
> > > > On Tue Feb 03 2015 at 8:01:14 PM Martin von Zweigbergk <
> > > > martinvonz@google.com> wrote:
> > > >
> > > > > # HG changeset patch
> > > > > # User Martin von Zweigbergk <martinvonz@google.com>
> > > > > # Date 1421909060 28800
> > > > > #      Wed Jan 21 22:44:20 2015 -0800
> > > > > # Node ID c6828c4acf92b415ee722371d781b28870f17e7e
> > > > > # Parent  cd65758885ab3cc6649001fa9aa68105d818323a
> > > > > obsolete: add native version of addsuccessors()
> > > > >
> > > > > This speeds up 'hg status' from 0.438s to 0.425s.
> > > > >
> > > >
> > > > Is this enough make it worth the extra code? I consider 3/6 worth its
> > > cost,
> > > > but I'm not sure about 4/6-6/6.
> > >
> > > One wonders why status cares about markers _at all_. Something is wrong
> > > here.
> > >
> > > In the past, I've considered making a bulk function 4x faster to be a
> > > good threshold for going to C. I think the real win here would be
> > > delaying actual parsing of markers into corresponding Python objects.
> > > This isn't smelling like a big enough win.
> > >
> > > We might want to look at duplicating the magic the index parser uses. It
> > > reads a whole revlog index into memory and then lazily emulates the
> > > Python containers we had before.
> > >
> > > --
> > > Mathematics is the supreme nostalgia of our time.
> > >
> > >
> > >
> > Yes, I agree. I'm with with dropping these last 3 patches for now. I have a
> > few patches that stop 'hg status' and 'hg diff' from reading the markers,
> > but they're probably not the right approach.
> > 
> > It wasn't quite clear to me what you think about the first 3. They would
> > still be useful for e.g. "hg log --limit 10" or so, which does need to read
> > markers, but which doesn't need to populate precursors and children.
> 
> I'm leaning towards deferring these a bit, since we've just landed some
> big C code here and have some other low-hanging fruit to think about.

Oops, I take that back, I misremembered seeing another chunk of C in the
first three patches. Still on my plate.

Patch

diff -r cd65758885ab -r c6828c4acf92 mercurial/obsolete.py
--- a/mercurial/obsolete.py	Tue Jan 20 22:34:57 2015 -0800
+++ b/mercurial/obsolete.py	Wed Jan 21 22:44:20 2015 -0800
@@ -607,6 +607,11 @@ 
 
     @util.nogc
     def _addsuccessors(self, markers):
+        native = getattr(parsers, 'addsuccessors', None)
+        if native:
+            native(markers, self._successors)
+            return
+
         for mark in markers:
             self._successors.setdefault(mark[0], set()).add(mark)
 
diff -r cd65758885ab -r c6828c4acf92 mercurial/parsers.c
--- a/mercurial/parsers.c	Tue Jan 20 22:34:57 2015 -0800
+++ b/mercurial/parsers.c	Wed Jan 21 22:44:20 2015 -0800
@@ -2294,6 +2294,46 @@ 
 	return NULL;
 }
 
+static PyObject *setdefaultemptyset(PyObject *dict, PyObject *key) {
+	PyObject *set = PyDict_GetItem(dict, key);
+	if (!set) {
+		set = PySet_New(NULL);
+		if (!set) {
+			return NULL;
+		}
+		PyDict_SetItem(dict, key, set);
+		Py_DECREF(set);
+	}
+	return set;
+}
+
+static PyObject *addsuccessors(PyObject *self, PyObject *args) {
+	PyObject *markers;
+	PyObject *successors;
+	PyObject *iterator;
+	PyObject *marker;
+
+	if (!PyArg_ParseTuple(args, "OO", &markers, &successors)) {
+		return NULL;
+	}
+
+	iterator = PyObject_GetIter(markers);
+	if (!iterator) {
+		return NULL;
+	}
+	while ((marker = PyIter_Next(iterator))) {
+		PyObject *precursor = PyTuple_GetItem(marker, 0);
+		PyObject *set = setdefaultemptyset(successors, precursor);
+		if (!set) {
+			return NULL;
+		}
+		PySet_Add(set, marker);
+		Py_DECREF(marker);
+	}
+	Py_DECREF(iterator);
+	return Py_None;
+}
+
 static char parsers_doc[] = "Efficient content parsing.";
 
 PyObject *encodedir(PyObject *self, PyObject *args);
@@ -2311,6 +2351,8 @@ 
 	{"lowerencode", lowerencode, METH_VARARGS, "lower-encode a path\n"},
 	{"fm1readmarkers", fm1readmarkers, METH_VARARGS,
 			"parse v1 obsolete markers\n"},
+	{"addsuccessors", addsuccessors, METH_VARARGS,
+			"updates successors from markers\n"},
 	{NULL, NULL}
 };