Patchwork D7815: sha1dc: initial implementation of Python extension (NOT READY)

login
register
mail settings
Submitter phabricator
Date Jan. 8, 2020, 9:27 p.m.
Message ID <differential-rev-PHID-DREV-esjn3r4hwvudo5n6lwwm-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/44216/
State Superseded
Headers show

Comments

phabricator - Jan. 8, 2020, 9:27 p.m.
durin42 created this revision.
Herald added subscribers: mercurial-devel, mjpieters.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  This is not yet ready: there's some bug (probably a simple oversight)
  that causes it to segfault inside PyType_Ready() during import of the
  module.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D7815

AFFECTED FILES
  mercurial/thirdparty/sha1dc/cext.c
  setup.py

CHANGE DETAILS




To: durin42, #hg-reviewers
Cc: mjpieters, mercurial-devel
phabricator - Jan. 8, 2020, 9:51 p.m.
spectral added inline comments.

INLINE COMMENTS

> cext.c:24
> +} pysha1ctx;
> +/* clang-format oon */
> +

Nit: "clang-format on" (typo: s/oon/on/)

> cext.c:104
> +	 "Return a copy of the hash object."},
> +};
> +

I think you need a {NULL} entry here, or else it's going to walk off the end of this array when creating the type.

> cext.c:148
> +
> +static PyMethodDef methods[] = {};
> +

You may also need a NULL entry here, haven't tracked down where this is used.

> cext.c:152
> +{
> +	sha1ctxType.tp_new = PyType_GenericNew;
> +	if (PyType_Ready(&sha1ctxType) < 0)

why do this here instead of above?

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7815/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7815

To: durin42, #hg-reviewers
Cc: spectral, mjpieters, mercurial-devel
phabricator - Jan. 8, 2020, 10:41 p.m.
durin42 added inline comments.
durin42 marked 4 inline comments as done.

INLINE COMMENTS

> spectral wrote in cext.c:24
> Nit: "clang-format on" (typo: s/oon/on/)

Good catch!

> spectral wrote in cext.c:104
> I think you need a {NULL} entry here, or else it's going to walk off the end of this array when creating the type.

Sigh, yep.

> spectral wrote in cext.c:152
> why do this here instead of above?

Not sure. This is just how I've always seen it done...

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7815/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7815

To: durin42, #hg-reviewers
Cc: spectral, mjpieters, mercurial-devel
phabricator - Jan. 9, 2020, 4:29 a.m.
indygreg added a comment.


  This seems mostly good to me.

INLINE COMMENTS

> cext.c:68
> +	if (SHA1DCFinal(hash_out, &ctx)) {
> +		PyErr_SetString(PyExc_OverflowError,
> +				"sha1 collision attack detected");

I'm not super keen on overloading `OverflowError` here. Or is this how `hashlib` works in the Python standard library?

> cext.c:92
> +	static const char hexdigit[] = "0123456789abcdef";
> +	for (int i = 0; i < 20; ++i) {
> +		hexhash[i * 2] = hexdigit[hash[i] >> 4];

I didn't verify this is correct because I didn't want to think too hard. Test coverage for parity with `hashlib` would be appreciated.

> cext.c:107
> +	}
> +	clone->ctx = self->ctx;
> +	return (PyObject *)clone;

Should this be a `memcpy` or some such? Or is this opaque type safe to copy by value? Test coverage for this demonstrating that a seeded hasher which is copied can properly diverge would be appreciated.

> cext.c:172
> +	sha1ctxType.tp_new = PyType_GenericNew;
> +	if (PyType_Ready(&sha1ctxType) < 0)
> +		return;

Nit: please always use braces.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7815/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7815

To: durin42, #hg-reviewers
Cc: indygreg, spectral, mjpieters, mercurial-devel
phabricator - Jan. 13, 2020, 9:55 p.m.
durin42 added a comment.
durin42 marked 3 inline comments as done.


  Good call on tests, I had the return type of `hexdigest()` wrong. Please do let me know if you think of any missing cases here.

INLINE COMMENTS

> indygreg wrote in cext.c:68
> I'm not super keen on overloading `OverflowError` here. Or is this how `hashlib` works in the Python standard library?

This is new functionality compared to hashlib (this is the "zomg someone is breaking ur sha1" error case), so there's not anything comparable. I didn't really want to go to the effort of defining our own custom exception type for a case we should never see in the wild.

> indygreg wrote in cext.c:107
> Should this be a `memcpy` or some such? Or is this opaque type safe to copy by value? Test coverage for this demonstrating that a seeded hasher which is copied can properly diverge would be appreciated.

I verified by inspection (when writing this) that copying by value is okay here (no pointers, etc), and we now have test coverage that will catch any regression here.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7815/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7815

To: durin42, #hg-reviewers
Cc: indygreg, spectral, mjpieters, mercurial-devel

Patch

diff --git a/setup.py b/setup.py
--- a/setup.py
+++ b/setup.py
@@ -1481,6 +1481,14 @@ 
         ],
     ),
     Extension(
+        'mercurial.thirdparty.sha1dc',
+        [
+            'mercurial/thirdparty/sha1dc/cext.c',
+            'mercurial/thirdparty/sha1dc/lib/sha1.c',
+            'mercurial/thirdparty/sha1dc/lib/ubc_check.c',
+        ],
+    ),
+    Extension(
         'hgext.fsmonitor.pywatchman.bser', ['hgext/fsmonitor/pywatchman/bser.c']
     ),
     RustStandaloneExtension(
diff --git a/mercurial/thirdparty/sha1dc/cext.c b/mercurial/thirdparty/sha1dc/cext.c
new file mode 100644
--- /dev/null
+++ b/mercurial/thirdparty/sha1dc/cext.c
@@ -0,0 +1,178 @@ 
+#include <Python.h>
+
+#include "lib/sha1.h"
+
+#if PY_MAJOR_VERSION >= 3
+#define IS_PY3K
+#endif
+
+/* helper to switch things like string literal depending on Python version */
+#ifdef IS_PY3K
+#define PY23(py2, py3) py3
+#else
+#define PY23(py2, py3) py2
+#endif
+
+static char sha1collisiondetection_doc[] =
+    "Efficient detection of SHA1 collision constructs.";
+
+/* clang-format off */
+typedef struct {
+	PyObject_HEAD
+	SHA1_CTX ctx;
+} pysha1ctx;
+/* clang-format oon */
+
+static int pysha1ctx_init(pysha1ctx *self, PyObject *args)
+{
+	const char* data = NULL;
+	Py_ssize_t len;
+
+	SHA1DCInit(&(self->ctx));
+	/* We don't want "safe" sha1s, wherein sha1dc can give you a
+	   different hash for something that's trying to give you a
+	   collision. We just want to detect collisions.
+	 */
+	SHA1DCSetSafeHash(&(self->ctx), 0);
+	if (!PyArg_ParseTuple(args, PY23("|s#", "|y#"), &data, &len)) {
+		return -1;
+	}
+	if (data) {
+		SHA1DCUpdate(&(self->ctx), data, len);
+	}
+	return 0;
+}
+
+static void pysha1ctx_dealloc(pysha1ctx *self) {
+	PyObject_Del(self);
+}
+
+static PyObject *pysha1ctx_update(pysha1ctx *self, PyObject *args) {
+	const char* data;
+	Py_ssize_t len;
+	if (!PyArg_ParseTuple(args, PY23("s#", "y#"), &data, &len)) {
+		return NULL;
+	}
+	SHA1DCUpdate(&(self->ctx), data, len);
+	Py_RETURN_NONE;
+}
+
+static PyObject *pysha1ctx_digest(pysha1ctx *self) {
+	unsigned char hash[20];
+	if (SHA1DCFinal(hash, &(self->ctx))) {
+		PyErr_SetString(PyExc_OverflowError, "sha1 collision attack detected");
+		return NULL;
+	}
+	return PyBytes_FromStringAndSize((char *)hash, 20);
+}
+
+static PyObject *pysha1ctx_hexdigest(pysha1ctx *self) {
+	unsigned char hash[20];
+	if (SHA1DCFinal(hash, &(self->ctx))) {
+		PyErr_SetString(PyExc_OverflowError, "sha1 collision attack detected");
+		return NULL;
+	}
+	char hexhash[40];
+	static const char hexdigit[] = "0123456789abcdef";
+	for (int i = 0 ; i < 20 ; ++i) {
+		hexhash[i*2] = hexdigit[hash[i] >> 4];
+		hexhash[i*2 + 1] = hexdigit[hash[i] & 15];
+	}
+	return PyBytes_FromStringAndSize(hexhash, 40);
+}
+
+static PyTypeObject sha1ctxType;
+
+static PyObject *pysha1ctx_copy(pysha1ctx *self) {
+	pysha1ctx *clone = (pysha1ctx *)PyObject_New(pysha1ctx, &sha1ctxType);
+	if (!clone) {
+		return NULL;
+	}
+	clone->ctx = self->ctx;
+	return (PyObject *)clone;
+}
+
+static PyMethodDef pysha1ctx_methods[] = {
+	{"update", (PyCFunction)pysha1ctx_update, METH_O,
+	 "Update this hash object's state with the provided bytes."},
+	{"digest", (PyCFunction)pysha1ctx_digest, METH_NOARGS,
+	 "Return the digest value as a string of binary data."},
+	{"hexdigest",(PyCFunction)pysha1ctx_hexdigest, METH_NOARGS,
+	 "Return the digest value as a string of hexadecimal digits."},
+	{"copy", (PyCFunction)pysha1ctx_copy, METH_NOARGS,
+	 "Return a copy of the hash object."},
+};
+
+/* clang-format off */
+static PyTypeObject sha1ctxType = {
+	PyVarObject_HEAD_INIT(NULL, 0)                    /* header */
+	"sha1dc.sha1",                                    /* tp_name */
+	sizeof(pysha1ctx),                                /* tp_basicsize */
+	0,                                                /* tp_itemsize */
+	(destructor)pysha1ctx_dealloc,                    /* tp_dealloc */
+	0,                                                /* tp_print */
+	0,                                                /* tp_getattr */
+	0,                                                /* tp_setattr */
+	0,                                                /* tp_compare */
+	0,                                                /* tp_repr */
+	0,                                                /* tp_as_number */
+	0,                                                /* tp_as_sequence */
+	0,                                                /* tp_as_mapping */
+	0,                                                /* tp_hash */
+	0,                                                /* tp_call */
+	0,                                                /* tp_str */
+	0,                                                /* tp_getattro */
+	0,                                                /* tp_setattro */
+	0,                                                /* tp_as_buffer */
+	Py_TPFLAGS_DEFAULT,                               /* tp_flags */
+	"sha1 implementation that looks for collisions",  /* tp_doc */
+	0,                                                /* tp_traverse */
+	0,                                                /* tp_clear */
+	0,                                                /* tp_richcompare */
+	0,                                                /* tp_weaklistoffset */
+	0,                                                /* tp_iter */
+	0,                                                /* tp_iternext */
+	pysha1ctx_methods,                                /* tp_methods */
+	0,                                                /* tp_members */
+	0,                                                /* tp_getset */
+	0,                                                /* tp_base */
+	0,                                                /* tp_dict */
+	0,                                                /* tp_descr_get */
+	0,                                                /* tp_descr_set */
+	0,                                                /* tp_dictoffset */
+	(initproc)pysha1ctx_init,                         /* tp_init */
+	0,                                                /* tp_alloc */
+};
+/* clang-format on */
+
+static PyMethodDef methods[] = {};
+
+static void module_init(PyObject *mod)
+{
+	sha1ctxType.tp_new = PyType_GenericNew;
+	if (PyType_Ready(&sha1ctxType) < 0)
+		return;
+	Py_INCREF(&sha1ctxType);
+
+	PyModule_AddObject(mod, "sha1", (PyObject *)&sha1ctxType);
+}
+
+#ifdef IS_PY3K
+static struct PyModuleDef sha1collisiondetection_module = {
+    PyModuleDef_HEAD_INIT, "sha1collisiondetection", sha1collisiondetection_doc,
+    -1, methods};
+
+PyMODINIT_FUNC PyInit_sha1dc(void)
+{
+	PyObject *mod = PyModule_Create(&sha1collisiondetection_module);
+	module_init(mod);
+	return mod;
+}
+#else
+PyMODINIT_FUNC initsha1dc(void)
+{
+	PyObject *mod = Py_InitModule3("sha1collisiondetection", methods,
+				       sha1collisiondetection_doc);
+	module_init(mod);
+}
+#endif