Patchwork [3,of,6,V2] rust: exposing in parsers module

login
register
mail settings
Submitter Georges Racinet
Date Oct. 9, 2018, 3:22 p.m.
Message ID <cf5c799e65a1225538fa.1539098567@purity.tombe.racinet.fr>
Download mbox | patch
Permalink /patch/35581/
State Superseded
Headers show

Comments

Georges Racinet - Oct. 9, 2018, 3:22 p.m.
# HG changeset patch
# User Georges Racinet <gracinet@anybox.fr>
# Date 1538060175 -7200
#      Thu Sep 27 16:56:15 2018 +0200
# Node ID cf5c799e65a1225538fa1246887e2efd94c09acc
# Parent  81b8781de6fad514634713fa2cb9f10c320d1af3
# EXP-Topic rustancestors-rfc
rust: exposing in parsers module

To build with the Rust code, set the HGWITHRUSTEXT
environment variable.

At this point, it's possible to instantiate and use
a rustlazyancestors object from a Python interpreter,
provided one points LD_LIBRARY_PATH to
  mercurial/rust/target/release

The changes in setup.py are obviously a quick hack,
just good enough to test/bench without much
refactoring. We'd be happy to improve on that with
help from the community.

With respect to the plans at
https://www.mercurial-scm.org/wiki/OxidationPlan
this would probably qualify as "roll our own FFI".
Also, it doesn't quite meet the target of getting
rid of C code, since it brings actually more, yet:

- the new C code does nothing else than parsing
  arguments and calling Rust functions.
  In particular, there's no fancy allocation involved.
- subsequent changes could rewrite more of revlog.c, this
  time resulting in an overall decrease of C code and
  unsafety.
Yuya Nishihara - Oct. 12, 2018, 5:05 a.m.
On Tue, 09 Oct 2018 17:22:47 +0200, Georges Racinet wrote:
> # HG changeset patch
> # User Georges Racinet <gracinet@anybox.fr>
> # Date 1538060175 -7200
> #      Thu Sep 27 16:56:15 2018 +0200
> # Node ID cf5c799e65a1225538fa1246887e2efd94c09acc
> # Parent  81b8781de6fad514634713fa2cb9f10c320d1af3
> # EXP-Topic rustancestors-rfc
> rust: exposing in parsers module

Note that we use tabs in C source.

> +static int rustla_init(rustlazyancestorsObject *self,
> +                       PyObject *args) {
> +  PyObject *initrevsarg = NULL;
> +  PyObject *inclusivearg = NULL;
> +  long stoprev = 0;
> +  long *initrevs = NULL;
> +  int inclusive = 0;
> +  Py_ssize_t i;
> +
> +  indexObject *index;
> +  if (!PyArg_ParseTuple(args, "O!O!lO!",
> +                        &indexType, &index,
> +                        &PyList_Type, &initrevsarg,
> +                        &stoprev,
> +                        &PyBool_Type, &inclusivearg))
> +    return -1;
> +
> +  Py_INCREF(index);
> +  self->index = index;
> +
> +  if (inclusivearg == Py_True)
> +    inclusive = 1;
> +
> +  Py_ssize_t linit = PyList_GET_SIZE(initrevsarg);
> +
> +  initrevs = (long*)malloc(linit * sizeof(long));

linit * sizeof(long) can overflow. Using calloc() would be probably safer,
and should be okay here as initrevs are small.

> +
> +  if (initrevs == NULL) {
> +    PyErr_NoMemory();
> +    goto bail;
> +  }
> +
> +  for (i=0; i<linit; i++) {
> +    initrevs[i] = PyInt_AsLong(PyList_GET_ITEM(initrevsarg, i));

PyInt_AsLong() may fail if a list item isn't an integer.

> +  }
> +  self->iter = rustlazyancestors_init(index,
> +                                      index_get_parents,
> +                                      linit, initrevs,
> +                                      stoprev, inclusive);
> +  if (self->iter == NULL)
> +    /* if this is because of GraphError::ParentOutOfRange
> +       index_get_parents() has already set the proper ValueError */
> +    goto bail;
> +
> +  free(initrevs);
> +  return 0;
> +
> +bail:
> +  Py_XDECREF(index);

Perhaps self->index shouldn't be decrefed here since it's still alive and
decrefed again at rustla_dealloc().

> +  free(initrevs);
> +  return -1;
> +};
> +
> +static void rustla_dealloc(rustlazyancestorsObject *self)
> +{
> +  Py_XDECREF(self->index);
> +  if (self->iter != NULL) { /* can happen if rustla_init failed */
> +    rustlazyancestors_drop(self->iter);
> +  }
> +  PyObject_Del(self);
> +}
Georges Racinet - Oct. 12, 2018, 9:33 a.m.
On 10/12/2018 07:05 AM, Yuya Nishihara wrote:
>> +
>> +  if (initrevs == NULL) {
>> +    PyErr_NoMemory();
>> +    goto bail;
>> +  }
>> +
>> +  for (i=0; i<linit; i++) {
>> +    initrevs[i] = PyInt_AsLong(PyList_GET_ITEM(initrevsarg, i));
> PyInt_AsLong() may fail if a list item isn't an integer.
>
In our particular case, since the convention for failure is that
PyInt_AsLong returns -1, that's equivalent to adding the null revision,
which is filterered right away on the Rust side.

Reference: https://docs.python.org/2/c-api/int.html#c.PyInt_AsLong

It's a bit of coincidental programming, but I think a simple comment is
less dangerous than writing more C to shorten initrevs from there (not
sure that code will live for long anyway).
Yuya Nishihara - Oct. 12, 2018, 10:16 a.m.
On Fri, 12 Oct 2018 11:33:51 +0200, Georges Racinet wrote:
> On 10/12/2018 07:05 AM, Yuya Nishihara wrote:
> >> +
> >> +  if (initrevs == NULL) {
> >> +    PyErr_NoMemory();
> >> +    goto bail;
> >> +  }
> >> +
> >> +  for (i=0; i<linit; i++) {
> >> +    initrevs[i] = PyInt_AsLong(PyList_GET_ITEM(initrevsarg, i));
> > PyInt_AsLong() may fail if a list item isn't an integer.
> >
> In our particular case, since the convention for failure is that
> PyInt_AsLong returns -1, that's equivalent to adding the null revision,
> which is filterered right away on the Rust side.
> 
> Reference: https://docs.python.org/2/c-api/int.html#c.PyInt_AsLong
>
> It's a bit of coincidental programming, but I think a simple comment is
> less dangerous than writing more C to shorten initrevs from there (not
> sure that code will live for long anyway).

Yes, but I think reporting the error is better than silently ignoring it.

  if (PyErr_Occurred())
      goto bail;

Patch

diff -r 81b8781de6fa -r cf5c799e65a1 mercurial/cext/revlog.c
--- a/mercurial/cext/revlog.c	Thu Sep 27 16:51:36 2018 +0200
+++ b/mercurial/cext/revlog.c	Thu Sep 27 16:56:15 2018 +0200
@@ -2290,6 +2290,151 @@ 
 	return NULL;
 }
 
+#ifdef WITH_RUST
+
+/* rustlazyancestors: iteration over ancestors implemented in Rust
+ *
+ * This class holds a reference to an index and to the Rust iterator.
+ */
+typedef struct rustlazyancestorsObjectStruct rustlazyancestorsObject;
+
+struct rustlazyancestorsObjectStruct {
+	PyObject_HEAD
+	/* Type-specific fields go here. */
+        indexObject *index;    /* Ref kept to avoid GC'ing the index */
+        void *iter;        /* Rust iterator */
+};
+
+/* FFI exposed from Rust code */
+rustlazyancestorsObject *rustlazyancestors_init(
+        indexObject *index,
+        /* to pass index_get_parents() */
+        int (*)(indexObject *, Py_ssize_t, int*, int),
+        /* intrevs vector */
+        int initrevslen, long *initrevs,
+        long stoprev,
+        int inclusive);
+void rustlazyancestors_drop(rustlazyancestorsObject *self);
+int rustlazyancestors_next(rustlazyancestorsObject *self);
+
+/* CPython instance methods */
+static int rustla_init(rustlazyancestorsObject *self,
+                       PyObject *args) {
+  PyObject *initrevsarg = NULL;
+  PyObject *inclusivearg = NULL;
+  long stoprev = 0;
+  long *initrevs = NULL;
+  int inclusive = 0;
+  Py_ssize_t i;
+
+  indexObject *index;
+  if (!PyArg_ParseTuple(args, "O!O!lO!",
+                        &indexType, &index,
+                        &PyList_Type, &initrevsarg,
+                        &stoprev,
+                        &PyBool_Type, &inclusivearg))
+    return -1;
+
+  Py_INCREF(index);
+  self->index = index;
+
+  if (inclusivearg == Py_True)
+    inclusive = 1;
+
+  Py_ssize_t linit = PyList_GET_SIZE(initrevsarg);
+
+  initrevs = (long*)malloc(linit * sizeof(long));
+
+  if (initrevs == NULL) {
+    PyErr_NoMemory();
+    goto bail;
+  }
+
+  for (i=0; i<linit; i++) {
+    initrevs[i] = PyInt_AsLong(PyList_GET_ITEM(initrevsarg, i));
+  }
+  self->iter = rustlazyancestors_init(index,
+                                      index_get_parents,
+                                      linit, initrevs,
+                                      stoprev, inclusive);
+  if (self->iter == NULL)
+    /* if this is because of GraphError::ParentOutOfRange
+       index_get_parents() has already set the proper ValueError */
+    goto bail;
+
+  free(initrevs);
+  return 0;
+
+bail:
+  Py_XDECREF(index);
+  free(initrevs);
+  return -1;
+};
+
+static void rustla_dealloc(rustlazyancestorsObject *self)
+{
+  Py_XDECREF(self->index);
+  if (self->iter != NULL) { /* can happen if rustla_init failed */
+    rustlazyancestors_drop(self->iter);
+  }
+  PyObject_Del(self);
+}
+
+static PyObject *rustla_next(rustlazyancestorsObject *self) {
+  int res = rustlazyancestors_next(self->iter);
+  if (res == -1) {
+    /* Setting an explicit exception seems unnecessary
+       as examples from Python source code (Objects/rangeobjets.c and
+       Modules/_io/stringio.c) seem to demonstrate.
+       If that's false, then it would look like:
+          PyErr_SetNone(PyExc_StopIterationError);
+    */
+    return NULL;
+  }
+  return PyInt_FromLong(res);
+}
+
+static PyTypeObject rustlazyancestorsType = {
+	PyVarObject_HEAD_INIT(NULL, 0) /* header */
+	"parsers.rustlazyancestors",           /* tp_name */
+	sizeof(rustlazyancestorsObject),       /* tp_basicsize */
+	0,                         /* tp_itemsize */
+	(destructor)rustla_dealloc, /* tp_dealloc */
+	0,                         /* tp_print */
+	0,                         /* tp_getattr */
+	0,                         /* tp_setattr */
+	0,                         /* tp_compare */
+	0,                         /* tp_repr */
+	0,                         /* tp_as_number */
+	0,                         /* tp_as_sequence */
+	0,                         /* tp_as_mapping */
+	0,                         /* tp_hash */
+	0,                         /* tp_call */
+	0,                         /* tp_str */
+	0,                         /* tp_getattro */
+	0,                         /* tp_setattro */
+	0,                         /* tp_as_buffer */
+	Py_TPFLAGS_DEFAULT,        /* tp_flags */
+	"Iterator over ancestors, implemented in Rust", /* tp_doc */
+	0,                         /* tp_traverse */
+	0,                         /* tp_clear */
+	0,                         /* tp_richcompare */
+	0,                         /* tp_weaklistoffset */
+	0,                         /* tp_iter */
+	(iternextfunc)rustla_next, /* tp_iternext */
+	0,                         /* tp_methods */
+	0,                         /* tp_members */
+	0,                         /* tp_getset */
+	0,                         /* tp_base */
+	0,                         /* tp_dict */
+	0,                         /* tp_descr_get */
+	0,                         /* tp_descr_set */
+	0,                         /* tp_dictoffset */
+	(initproc)rustla_init,     /* tp_init */
+	0,                         /* tp_alloc */
+};
+#endif /* WITH_RUST */
+
 void revlog_module_init(PyObject *mod)
 {
 	indexType.tp_new = PyType_GenericNew;
@@ -2310,4 +2455,14 @@ 
 	}
 	if (nullentry)
 		PyObject_GC_UnTrack(nullentry);
+
+#ifdef WITH_RUST
+        rustlazyancestorsType.tp_new = PyType_GenericNew;
+	if (PyType_Ready(&rustlazyancestorsType) < 0)
+		return;
+        Py_INCREF(&rustlazyancestorsType);
+	PyModule_AddObject(mod, "rustlazyancestors",
+                           (PyObject *)&rustlazyancestorsType);
+#endif
+
 }
diff -r 81b8781de6fa -r cf5c799e65a1 setup.py
--- a/setup.py	Thu Sep 27 16:51:36 2018 +0200
+++ b/setup.py	Thu Sep 27 16:56:15 2018 +0200
@@ -132,6 +132,8 @@ 
 
 ispypy = "PyPy" in sys.version
 
+iswithrustextensions = 'HGWITHRUSTEXT' in os.environ
+
 import ctypes
 import stat, subprocess, time
 import re
@@ -460,6 +462,8 @@ 
         return build_ext.build_extensions(self)
 
     def build_extension(self, ext):
+        if isinstance(ext, RustExtension):
+            ext.rustbuild()
         try:
             build_ext.build_extension(self, ext)
         except CCompilerError:
@@ -884,6 +888,53 @@ 
     'mercurial/thirdparty/xdiff/xutils.h',
 ]
 
+class RustExtension(Extension):
+    """A C Extension, conditionnally enhanced with Rust code.
+
+    if iswithrustextensions is False, does nothing else than plain Extension
+    """
+
+    def __init__(self, mpath, sources, libname, rustdir, **kw):
+        Extension.__init__(self, mpath, sources, **kw)
+        if not iswithrustextensions:
+            return
+        self.rustdir = rustdir
+        self.libraries.append(libname)
+        self.extra_compile_args.append('-DWITH_RUST')
+
+        # adding Rust source and control files to depends so that the extension
+        # gets rebuilt if they've changed
+        self.depends.append(os.path.join(rustdir, 'Cargo.toml'))
+        cargo_lock = os.path.join(rustdir, 'Cargo.lock')
+        if os.path.exists(cargo_lock):
+            self.depends.append(cargo_lock)
+        for dirpath, subdir, fnames in os.walk(os.path.join(rustdir, 'src')):
+            self.depends.extend(os.path.join(dirpath, fname)
+                                for fname in fnames
+                                if os.path.splitext(fname)[1] == '.rs')
+
+    def rustbuild(self):
+        if not iswithrustextensions:
+            return
+        env = os.environ.copy()
+        if 'HGTEST_RESTOREENV' in env:
+            # Mercurial tests change HOME to a temporary directory,
+            # but, if installed with rustup, the Rust toolchain needs
+            # HOME to be correct (otherwise the 'no default toolchain'
+            # error message is issued and the build fails).
+            # This happens currently with test-hghave.t, which does
+            # invoke this build.
+
+            # Unix only fix (os.path.expanduser not really reliable if
+            # HOME is shadowed like this)
+            import pwd
+            env['HOME'] = pwd.getpwuid(os.getuid()).pw_dir
+
+        subprocess.check_call(['cargo', 'build', '-vv', '--release'],
+                              env=env, cwd=self.rustdir)
+        self.library_dirs.append(
+            os.path.join(self.rustdir, 'target', 'release'))
+
 extmodules = [
     Extension('mercurial.cext.base85', ['mercurial/cext/base85.c'],
               include_dirs=common_include_dirs,
@@ -896,14 +947,19 @@ 
                                         'mercurial/cext/mpatch.c'],
               include_dirs=common_include_dirs,
               depends=common_depends),
-    Extension('mercurial.cext.parsers', ['mercurial/cext/charencode.c',
-                                         'mercurial/cext/dirs.c',
-                                         'mercurial/cext/manifest.c',
-                                         'mercurial/cext/parsers.c',
-                                         'mercurial/cext/pathencode.c',
-                                         'mercurial/cext/revlog.c'],
-              include_dirs=common_include_dirs,
-              depends=common_depends + ['mercurial/cext/charencode.h']),
+    RustExtension('mercurial.cext.parsers', ['mercurial/cext/charencode.c',
+                                             'mercurial/cext/dirs.c',
+                                             'mercurial/cext/manifest.c',
+                                             'mercurial/cext/parsers.c',
+                                             'mercurial/cext/pathencode.c',
+                                             'mercurial/cext/revlog.c'],
+                  'hgancestors',
+                  'mercurial/rust',
+                  include_dirs=common_include_dirs,
+                  depends=common_depends + ['mercurial/cext/charencode.h',
+                                            'mercurial/rust/src/lib.rs',
+                                            'mercurial/rust/src/ancestors.rs',
+                                            'mercurial/rust/src/cpython.rs']),
     Extension('mercurial.cext.osutil', ['mercurial/cext/osutil.c'],
               include_dirs=common_include_dirs,
               extra_compile_args=osutil_cflags,