Patchwork [STABLE] parsers: raise an unambiguous type during index lookup (issue4451)

login
register
mail settings
Submitter Gregory Szorc
Date June 10, 2015, 4:50 p.m.
Message ID <1b4b923551c0a37c129b.1433955031@gps-mbp.local>
Download mbox | patch
Permalink /patch/9581/
State Superseded
Headers show

Comments

Gregory Szorc - June 10, 2015, 4:50 p.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1433955009 25200
#      Wed Jun 10 09:50:09 2015 -0700
# Branch stable
# Node ID 1b4b923551c0a37c129b1d1c530448685e5ca2a2
# Parent  7298da81f5a9f64ebbdef2b2195585a65da0f99e
parsers: raise an unambiguous type during index lookup (issue4451)

Previously, index lookups would attempt to raise RevlogError when the
lookup failed. This worked in most circumstances. However, in
environments where there could be multiple Python interpreters, this
could fail.

The way we were raising the error was to cache a reference to
RevlogError in a static variable. This assumed that the
"mercurial.error" module was only loaded once and there was only a
single copy of it floating around in memory. Unfortunately, in
some situations - including certain mod_wsgi configurations - this
was not the case: the "mercurial.error" module could be reloaded.
It was possible for a "RevlogError" reference from the first interpreter
to be used by a second interpreter. While the underlying thing
was a "mercurial.error.RevlogError," the object IDs were different, so
the Python code in revlog.py was failing to catch the exception! This
error has existed since the C index lookup code was implemented in
changeset e8d37b78acfb, which was first released in Mercurial 2.2 in
2012.
http://emptysqua.re/blog/python-c-extensions-and-mod-wsgi/#static-variables-are-shared
contains more details.

This patch changes the index lookup code to raise IndexError - a
built-in exception - instead. This side-steps the problem of finding
a reference to a Python-defined type.

The actual exception used shouldn't really matter, as it is only
relevant in a single call site and that site re-raises a completely
different exception. In other words, the exception merely needs to be a
signal.

Because I've seen C extensions get out of sync with the Python code
that interfaces with them, I retained support for catching RevlogError
in the Python code. This is arguably not necessary.

This is my first Python C extension patch. Furthermore, I'm not too
familiar with the revlog code and am not sure if there is more than 1
consumer of the changed C API. Please give extra scrutiny to this patch.
Augie Fackler - June 10, 2015, 5:29 p.m.
On Wed, Jun 10, 2015 at 09:50:31AM -0700, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1433955009 25200
> #      Wed Jun 10 09:50:09 2015 -0700
> # Branch stable
> # Node ID 1b4b923551c0a37c129b1d1c530448685e5ca2a2
> # Parent  7298da81f5a9f64ebbdef2b2195585a65da0f99e
> parsers: raise an unambiguous type during index lookup (issue4451)

The patch as stated looks fine, however, I do have one proposal below
that might be of interest.

[snip]

>
> diff --git a/mercurial/parsers.c b/mercurial/parsers.c
> --- a/mercurial/parsers.c
> +++ b/mercurial/parsers.c
> @@ -1482,45 +1482,8 @@ static int index_find_node(indexObject *
>               return rev;
>       return -2;
>  }
>
> -static PyObject *raise_revlog_error(void)
> -{
> -	static PyObject *errclass;

We could just dispense with static and load the error every time. This
doesn't look like a path we should hit super-frequently in normal
operations.

> -	PyObject *mod = NULL, *errobj;
> -
> -	if (errclass == NULL) {

[snip]
Gregory Szorc - June 10, 2015, 5:34 p.m.
On Wed, Jun 10, 2015 at 10:29 AM, Augie Fackler <raf@durin42.com> wrote:

> On Wed, Jun 10, 2015 at 09:50:31AM -0700, Gregory Szorc wrote:
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc@gmail.com>
> > # Date 1433955009 25200
> > #      Wed Jun 10 09:50:09 2015 -0700
> > # Branch stable
> > # Node ID 1b4b923551c0a37c129b1d1c530448685e5ca2a2
> > # Parent  7298da81f5a9f64ebbdef2b2195585a65da0f99e
> > parsers: raise an unambiguous type during index lookup (issue4451)
>
> The patch as stated looks fine, however, I do have one proposal below
> that might be of interest.
>
> [snip]
>
> >
> > diff --git a/mercurial/parsers.c b/mercurial/parsers.c
> > --- a/mercurial/parsers.c
> > +++ b/mercurial/parsers.c
> > @@ -1482,45 +1482,8 @@ static int index_find_node(indexObject *
> >               return rev;
> >       return -2;
> >  }
> >
> > -static PyObject *raise_revlog_error(void)
> > -{
> > -     static PyObject *errclass;
>
> We could just dispense with static and load the error every time. This
> doesn't look like a path we should hit super-frequently in normal
> operations.


Perhaps. But as I mentioned in the commit message, the actual type doesn't
really matter. Raising RevlogError from C feels like extra, unjustified
complexity to me. But I'll restore it if that's what is wanted.
Augie Fackler - June 10, 2015, 5:36 p.m.
On Wed, Jun 10, 2015 at 1:34 PM, Gregory Szorc <gregory.szorc@gmail.com> wrote:
> On Wed, Jun 10, 2015 at 10:29 AM, Augie Fackler <raf@durin42.com> wrote:
>>
>> On Wed, Jun 10, 2015 at 09:50:31AM -0700, Gregory Szorc wrote:
>> > # HG changeset patch
>> > # User Gregory Szorc <gregory.szorc@gmail.com>
>> > # Date 1433955009 25200
>> > #      Wed Jun 10 09:50:09 2015 -0700
>> > # Branch stable
>> > # Node ID 1b4b923551c0a37c129b1d1c530448685e5ca2a2
>> > # Parent  7298da81f5a9f64ebbdef2b2195585a65da0f99e
>> > parsers: raise an unambiguous type during index lookup (issue4451)
>>
>> The patch as stated looks fine, however, I do have one proposal below
>> that might be of interest.
>>
>> [snip]
>>
>> >
>> > diff --git a/mercurial/parsers.c b/mercurial/parsers.c
>> > --- a/mercurial/parsers.c
>> > +++ b/mercurial/parsers.c
>> > @@ -1482,45 +1482,8 @@ static int index_find_node(indexObject *
>> >               return rev;
>> >       return -2;
>> >  }
>> >
>> > -static PyObject *raise_revlog_error(void)
>> > -{
>> > -     static PyObject *errclass;
>>
>> We could just dispense with static and load the error every time. This
>> doesn't look like a path we should hit super-frequently in normal
>> operations.
>
>
> Perhaps. But as I mentioned in the commit message, the actual type doesn't
> really matter. Raising RevlogError from C feels like extra, unjustified
> complexity to me. But I'll restore it if that's what is wanted.

I don't feel strongly - I'll leave that up to others. My pedantic side
is twitchy about IndexError for this case, but I also know that I'm
probably being too picky. If marmoute doesn't mind the end result of
this patch, I can certainly live with it. :)
Matt Mackall - June 10, 2015, 5:45 p.m.
On Wed, 2015-06-10 at 09:50 -0700, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1433955009 25200
> #      Wed Jun 10 09:50:09 2015 -0700
> # Branch stable
> # Node ID 1b4b923551c0a37c129b1d1c530448685e5ca2a2
> # Parent  7298da81f5a9f64ebbdef2b2195585a65da0f99e
> parsers: raise an unambiguous type during index lookup (issue4451)
> 
> Previously, index lookups would attempt to raise RevlogError when the
> lookup failed. This worked in most circumstances. However, in
> environments where there could be multiple Python interpreters, this
> could fail.

I know I suggested this approach yesterday, but it breaks our C/Python
coupling rule. In particular, if I compile this version and update
backwards without recompiling to bisect something, I'm now broken.

Also, the simplest and lowest-risk fix is simply to drop static, right?
That's the kind of fix we strongly prefer on the stable branch.
Gregory Szorc - June 10, 2015, 8:44 p.m.
On Wed, Jun 10, 2015 at 10:45 AM, Matt Mackall <mpm@selenic.com> wrote:

> On Wed, 2015-06-10 at 09:50 -0700, Gregory Szorc wrote:
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc@gmail.com>
> > # Date 1433955009 25200
> > #      Wed Jun 10 09:50:09 2015 -0700
> > # Branch stable
> > # Node ID 1b4b923551c0a37c129b1d1c530448685e5ca2a2
> > # Parent  7298da81f5a9f64ebbdef2b2195585a65da0f99e
> > parsers: raise an unambiguous type during index lookup (issue4451)
> >
> > Previously, index lookups would attempt to raise RevlogError when the
> > lookup failed. This worked in most circumstances. However, in
> > environments where there could be multiple Python interpreters, this
> > could fail.
>
> I know I suggested this approach yesterday, but it breaks our C/Python
> coupling rule. In particular, if I compile this version and update
> backwards without recompiling to bisect something, I'm now broken.
>
> Also, the simplest and lowest-risk fix is simply to drop static, right?
> That's the kind of fix we strongly prefer on the stable branch.
>
>
Yes, dropping static is one fix (and really the only one if we want old
Python compatible with new C extension). Although unclear performance
impact - I'm not sure how frequently we look up unknown nodes. Presumably
not a lot?

I'll attempt a patch to remove static once I grok how refcounting works
from C.

Patch

diff --git a/mercurial/parsers.c b/mercurial/parsers.c
--- a/mercurial/parsers.c
+++ b/mercurial/parsers.c
@@ -1482,45 +1482,8 @@  static int index_find_node(indexObject *
 		return rev;
 	return -2;
 }
 
-static PyObject *raise_revlog_error(void)
-{
-	static PyObject *errclass;
-	PyObject *mod = NULL, *errobj;
-
-	if (errclass == NULL) {
-		PyObject *dict;
-
-		mod = PyImport_ImportModule("mercurial.error");
-		if (mod == NULL)
-			goto classfail;
-
-		dict = PyModule_GetDict(mod);
-		if (dict == NULL)
-			goto classfail;
-
-		errclass = PyDict_GetItemString(dict, "RevlogError");
-		if (errclass == NULL) {
-			PyErr_SetString(PyExc_SystemError,
-					"could not find RevlogError");
-			goto classfail;
-		}
-		Py_INCREF(errclass);
-		Py_DECREF(mod);
-	}
-
-	errobj = PyObject_CallFunction(errclass, NULL);
-	if (errobj == NULL)
-		return NULL;
-	PyErr_SetObject(errclass, errobj);
-	return errobj;
-
-classfail:
-	Py_XDECREF(mod);
-	return NULL;
-}
-
 static PyObject *index_getitem(indexObject *self, PyObject *value)
 {
 	char *node;
 	Py_ssize_t nodelen;
@@ -1533,10 +1496,12 @@  static PyObject *index_getitem(indexObje
 		return NULL;
 	rev = index_find_node(self, node, nodelen);
 	if (rev >= -1)
 		return PyInt_FromLong(rev);
-	if (rev == -2)
-		raise_revlog_error();
+	if (rev == -2) {
+		PyErr_SetNone(PyExc_IndexError);
+		return NULL;
+	}
 	return NULL;
 }
 
 static int nt_partialmatch(indexObject *self, const char *node,
@@ -1593,9 +1558,10 @@  static PyObject *index_partialmatch(inde
 	rev = nt_partialmatch(self, node, nodelen);
 
 	switch (rev) {
 	case -4:
-		raise_revlog_error();
+		PyErr_SetNone(PyExc_IndexError);
+		return NULL;
 	case -3:
 		return NULL;
 	case -2:
 		Py_RETURN_NONE;
diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -317,9 +317,11 @@  class revlog(object):
         try:
             return self._nodecache[node]
         except TypeError:
             raise
-        except RevlogError:
+        # Old versions of the C extension raise RevlogError. New versions
+        # raise IndexError.
+        except (IndexError, RevlogError):
             # parsers.c radix tree lookup failed
             raise LookupError(node, self.indexfile, _('no node'))
         except KeyError:
             # pure python cache lookup failed