Patchwork [5,of,7,RFC] parsers: create a dedicated type for an obs marker

login
register
mail settings
Submitter Gregory Szorc
Date March 14, 2017, 5:15 a.m.
Message ID <b4159d4c3683f03b6962.1489468547@ubuntu-vm-main>
Download mbox | patch
Permalink /patch/19321/
State Accepted
Headers show

Comments

Gregory Szorc - March 14, 2017, 5:15 a.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1489467734 25200
#      Mon Mar 13 22:02:14 2017 -0700
# Node ID b4159d4c3683f03b6962991f318b080d849d6ba3
# Parent  d600bd4edd62b3ee74730f1282e53b9d596bbaec
[RFC] parsers: create a dedicated type for an obs marker

Currently, obs markers are represented by the tuple type. This
works. But as subsequent patches show, we can implement magic
to make things faster.

We introduce a dedicated type to represent a single obs marker.
It behaves pretty much like a tuple, or at least similar enough
for most functionality to work.

We implement a constructor function. It currently just dispatches
to the existing fm1readmarker() function then extracts values
from the returned tuple and destroys the tuple. Obviously this
is inefficient and will be improved in subsequent commits. But
you have to start somewhere.

Surprisingly, `hg perfloadmarkers` shows a speedup with this patch:

! result: 130341
! wall 0.120487 comb 0.130000 user 0.120000 sys 0.010000 (best of 79)
! wall 0.085715 comb 0.080000 user 0.060000 sys 0.020000 (best of 100

I'm not really sure why. I'm kinda wondering if something is
obviously wrong.

TODO tests fail. I think this is due to a missing (rich) comparison
function on the type.
Yuya Nishihara - March 16, 2017, 2:34 p.m.
On Mon, 13 Mar 2017 22:15:47 -0700, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1489467734 25200
> #      Mon Mar 13 22:02:14 2017 -0700
> # Node ID b4159d4c3683f03b6962991f318b080d849d6ba3
> # Parent  d600bd4edd62b3ee74730f1282e53b9d596bbaec
> [RFC] parsers: create a dedicated type for an obs marker

I haven't reviewed these carefully, but I like the direction. fm1readmarkers()
creates a bunch of boxed objects, which would have somewhat significant cost.

Can we split obsmarker handling to new module? It's getting bigger.
Gregory Szorc - March 16, 2017, 4:17 p.m.
On Thu, Mar 16, 2017 at 7:34 AM, Yuya Nishihara <yuya@tcha.org> wrote:

> On Mon, 13 Mar 2017 22:15:47 -0700, Gregory Szorc wrote:
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc@gmail.com>
> > # Date 1489467734 25200
> > #      Mon Mar 13 22:02:14 2017 -0700
> > # Node ID b4159d4c3683f03b6962991f318b080d849d6ba3
> > # Parent  d600bd4edd62b3ee74730f1282e53b9d596bbaec
> > [RFC] parsers: create a dedicated type for an obs marker
>
> I haven't reviewed these carefully, but I like the direction.
> fm1readmarkers()
> creates a bunch of boxed objects, which would have somewhat significant
> cost.
>
> Can we split obsmarker handling to new module? It's getting bigger.
>

Yes, I was considering that.

But instead of creating N+1 Python modules, I'd like to move us in the
direction of a single module built from multiple .c files. This is the
approach I've taken in python-zstandard, for example. I believe this
approach is superior because having a single compilation unit facilitates
code reuse, inlining between source files, smaller binaries, and better
performance.
Augie Fackler - March 16, 2017, 4:51 p.m.
On Thu, Mar 16, 2017 at 09:17:05AM -0700, Gregory Szorc wrote:
> On Thu, Mar 16, 2017 at 7:34 AM, Yuya Nishihara <yuya@tcha.org> wrote:
>
> > On Mon, 13 Mar 2017 22:15:47 -0700, Gregory Szorc wrote:
> > > # HG changeset patch
> > > # User Gregory Szorc <gregory.szorc@gmail.com>
> > > # Date 1489467734 25200
> > > #      Mon Mar 13 22:02:14 2017 -0700
> > > # Node ID b4159d4c3683f03b6962991f318b080d849d6ba3
> > > # Parent  d600bd4edd62b3ee74730f1282e53b9d596bbaec
> > > [RFC] parsers: create a dedicated type for an obs marker
> >
> > I haven't reviewed these carefully, but I like the direction.
> > fm1readmarkers()
> > creates a bunch of boxed objects, which would have somewhat significant
> > cost.
> >
> > Can we split obsmarker handling to new module? It's getting bigger.
> >
>
> Yes, I was considering that.
>
> But instead of creating N+1 Python modules, I'd like to move us in the
> direction of a single module built from multiple .c files. This is the
> approach I've taken in python-zstandard, for example. I believe this
> approach is superior because having a single compilation unit facilitates
> code reuse, inlining between source files, smaller binaries, and better
> performance.
>

+1 for multiple .c files that compile into parsers.so for this.

Also, making a type in C is probably unambiguously the right thing for
handling our data formats anyplace it's viable, for exactly this
reason.

> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Patch

diff --git a/mercurial/parsers.c b/mercurial/parsers.c
--- a/mercurial/parsers.c
+++ b/mercurial/parsers.c
@@ -2668,6 +2668,192 @@  static PyObject *readshas(
 	return list;
 }
 
+/*
+ * The obsmarker1Object and obsmarker1Type represents a version 1 obsolescence
+ * marker. The type behaves like a tuple.
+ */
+
+typedef struct {
+	PyObject_HEAD
+
+	/* Tuple elements */
+	PyObject *precursors;
+	PyObject *successors;
+	PyObject *flags;
+	PyObject *metadata;
+	PyObject *time;
+	PyObject *parents;
+} obsmarker1Object;
+
+PyTypeObject obsmarker1Type;
+
+static void obsmarker1_dealloc(obsmarker1Object *self) {
+	Py_XDECREF(self->precursors);
+	Py_XDECREF(self->successors);
+	Py_XDECREF(self->flags);
+	Py_XDECREF(self->metadata);
+	Py_XDECREF(self->time);
+	Py_XDECREF(self->parents);
+
+	PyObject_Del(self);
+}
+
+static PyObject *fm1readmarker(const char *databegin, const char *dataend,
+			       uint32_t *msize);
+
+/**
+ * Construct an obsmarker1Object from raw data.
+ */
+static obsmarker1Object * Obsmarker1_FromData(const char *databegin,
+					      const char *dataend,
+					      uint32_t *msize) {
+
+	PyObject *tuple;
+	obsmarker1Object *result = NULL;
+
+	/* For now, we call the existing function to parse to a tuple
+	   and convert to our new type. */
+	tuple = fm1readmarker(databegin, dataend, msize);
+	if (tuple == NULL) {
+		return NULL;
+	}
+
+	result = (obsmarker1Object *)PyObject_CallObject(
+		(PyObject *)&obsmarker1Type, NULL);
+	if (result == NULL) {
+		goto bail;
+	}
+
+	result->precursors = PyTuple_GET_ITEM(tuple, 0);
+	Py_INCREF(result->precursors);
+	result->successors = PyTuple_GET_ITEM(tuple, 1);
+	Py_INCREF(result->successors);
+	result->flags = PyTuple_GET_ITEM(tuple, 2);
+	Py_INCREF(result->flags);
+	result->metadata = PyTuple_GET_ITEM(tuple, 3);
+	Py_INCREF(result->metadata);
+	result->time = PyTuple_GET_ITEM(tuple, 4);
+	Py_INCREF(result->time);
+	result->parents = PyTuple_GET_ITEM(tuple, 5);
+	Py_INCREF(result->parents);
+
+	Py_CLEAR(tuple);
+
+bail:
+	Py_XDECREF(tuple);
+
+	return result;
+}
+
+static Py_ssize_t obsmarker1_length(obsmarker1Object *self)
+{
+	return 6;
+}
+
+static PyObject *obsmarker1_item(obsmarker1Object *self, Py_ssize_t i)
+{
+	switch (i) {
+	case 0:
+		Py_INCREF(self->precursors);
+		return self->precursors;
+	case 1:
+		Py_INCREF(self->successors);
+		return self->successors;
+	case 2:
+		Py_INCREF(self->flags);
+		return self->flags;
+	case 3:
+		Py_INCREF(self->metadata);
+		return self->metadata;
+	case 4:
+		Py_INCREF(self->time);
+		return self->time;
+	case 5:
+		Py_INCREF(self->parents);
+		return self->parents;
+	default:
+		PyErr_SetString(PyExc_IndexError, "index out of range");
+		return NULL;
+	}
+}
+
+static PySequenceMethods obsmarker1_sequence_methods = {
+	(lenfunc)obsmarker1_length, /* sq_length */
+	0, /* sq_concat */
+	0, /* sq_repeat */
+	(ssizeargfunc)obsmarker1_item, /* sq_item */
+	0, /* sq_ass_item */
+	0, /* sq_contains */
+	0, /* sq_inplace_concat */
+	0, /* sq_inplace_repeat */
+};
+
+static PyObject * obsmarker1_subscript(obsmarker1Object *self, PyObject *item)
+{
+	if (PyIndex_Check(item)) {
+		Py_ssize_t i = PyNumber_AsSsize_t(item, PyExc_IndexError);
+
+		if (i == -1 && PyErr_Occurred()) {
+			return NULL;
+		}
+
+		if (i < 0) {
+			i += 6;
+		}
+
+		return obsmarker1_item(self, i);
+	}
+
+	return NULL;
+}
+
+static PyMappingMethods obsmarker1_mapping_methods = {
+	(lenfunc)obsmarker1_length,
+	(binaryfunc)obsmarker1_subscript,
+	0,
+};
+
+PyTypeObject obsmarker1Type = {
+	PyVarObject_HEAD_INIT(NULL, 0)
+	"parsers.obsmarker1", /* tp_name */
+	sizeof(obsmarker1Object), /* tp_basicsize */
+	0, /* tp_itemsize */
+	(destructor)obsmarker1_dealloc, /* tp_dealloc */
+	0, /* tp_print */
+	0, /* tp_getattr */
+	0, /* tp_setattr */
+	0, /* tp_compare */
+	0, /* tp_repr */
+	0, /* tp_as_number */
+	&obsmarker1_sequence_methods, /* tp_as_sequence */
+	&obsmarker1_mapping_methods, /* tp_as_mapping */
+	0, /* tp_hash */
+	0, /* tp_call */
+	0, /* tp_str */
+	0, /* tp_getattro */
+	0, /* tp_setattro */
+	0, /* tp_as_buffer */
+	Py_TPFLAGS_DEFAULT, /* tp_flags */
+	"a v1 obsmarker", /* tp_doc */
+	0, /* tp_traverse */
+	0, /* tp_clear */
+	0, /* tp_richcompare */
+	0, /* tp_weaklistoffset */
+	0, /* tp_iter */
+	0, /* tp_iternext */
+	0, /* tp_methods */
+	0, /* tp_members */
+	0, /* tp_getset */
+	0, /* tp_base */
+	0, /* tp_dict */
+	0, /* tp_descr_get */
+	0, /* tp_descr_set */
+	0, /* tp_dictoffset */
+	0, /* tp_init */
+	0, /* tp_alloc */
+	PyType_GenericNew, /* tp_new */
+};
+
 static PyObject *fm1readmarker(const char *databegin, const char *dataend,
 			       uint32_t *msize)
 {
@@ -2825,7 +3011,8 @@  static PyObject *fm1readmarkers(PyObject
 	while (offset < stop) {
 		uint32_t msize;
 		void *tmpbuffer;
-		PyObject *record = fm1readmarker(data, dataend, &msize);
+		PyObject *record = (PyObject *)Obsmarker1_FromData(data,
+			dataend, &msize);
 		if (!record) {
 			goto bail;
 		}
@@ -2927,6 +3114,12 @@  static void module_init(PyObject *mod)
 				  -1, -1, -1, -1, nullid, 20);
 	if (nullentry)
 		PyObject_GC_UnTrack(nullentry);
+
+	if (PyType_Ready(&obsmarker1Type) < 0) {
+		return;
+	}
+	Py_INCREF(&obsmarker1Type);
+	PyModule_AddObject(mod, "obsmarker1", (PyObject *)&obsmarker1Type);
 }
 
 static int check_python_version(void)