Patchwork [1,of,7,resend] store: implement lowerencode in C

login
register
mail settings
Submitter Bryan O'Sullivan
Date Dec. 12, 2012, 9:10 p.m.
Message ID <e4631ea16083476a48cc.1355346656@australite.local>
Download mbox | patch
Permalink /patch/70/
State Accepted
Commit 0459c6555f69ef47c044ca333fa7ff9204e9fa0a
Delegated to: Matt Mackall
Headers show

Comments

Bryan O'Sullivan - Dec. 12, 2012, 9:10 p.m.
# HG changeset patch
# User Bryan O'Sullivan <bryano at fb.com>
# Date 1355346573 28800
# Node ID e4631ea16083476a48cc350de24c540d7fe0fad5
# Parent  23a40e2a21ddea875ba4e6b603569297bde0b683
store: implement lowerencode in C
Adrian Buehlmann - Dec. 14, 2012, 10:34 a.m.
May I ask a stupid question: Why are you resending this?

Sometimes I find such resends somewhat difficult and messy.

I had your old patch list marked in my inbox, then you pushed
736f1c09f321 to crew without (I think) further mention.

It's like when you discuss something with someone and then that person
suddenly does something on the side. That sort of behavior IMHO feels a
bit like further comments are unwanted. I do know that this is not the
case. I'm just trying to express my thinking.

I do like your probabilistic test idea and I think I had proposed to
_include_ such a test already for the codebase we currently have in the
official tree (i.e. before changing anything further on the code side).

I'm currently still quite a bit puzzled by what you pushed with
736f1c09f321.

It seems to me that the new test-pathencode.py doesn't really do
anything yet?

main() calls runtests().

runtests is:

def runtests(rng, seed, count):
    nerrs = 0
    for p in genpath(rng, count):
        hybridencode(p)
    return nerrs

hybridencode is:

def hybridencode(path):
    return store._hybridencode(path, True)

I don't see any comparison?

Am I overlooking something?
Bryan O'Sullivan - Dec. 14, 2012, 6:21 p.m.
On Fri, Dec 14, 2012 at 2:34 AM, Adrian Buehlmann <adrian at cadifra.com>wrote:

> May I ask a stupid question: Why are you resending this?
>

It needed a small bugfix to do with our friend Py_ssize_t, but also it's
been a month since I submitted the original patchset.


> I'm currently still quite a bit puzzled by what you pushed with
> 736f1c09f321.
>

It's more or less a no-op for now, and since it's not a piece of core
functionality and wasn't going to do any harm upstream, I judged it as no
harm to push it. Think of it as a balancing act between trying to get easy
things done and getting tricky things reviewed, especially given that we
are extremely bandwidth-deprived on the reviewing side.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20121214/a18a91f6/attachment.html>
Adrian Buehlmann - Dec. 14, 2012, 6:28 p.m.
On 2012-12-14 19:21, Bryan O'Sullivan wrote:
> On Fri, Dec 14, 2012 at 2:34 AM, Adrian Buehlmann <adrian at cadifra.com
> <mailto:adrian at cadifra.com>> wrote:
> 
>     May I ask a stupid question: Why are you resending this?
> 
> 
> It needed a small bugfix to do with our friend Py_ssize_t, but also it's
> been a month since I submitted the original patchset.

But then, it's not strictly a "resend" but rather a V2.

If you change your patches and label it as a resend, it's confusing.

>     I'm currently still quite a bit puzzled by what you pushed with
>     736f1c09f321.
> 
> 
> It's more or less a no-op for now, and since it's not a piece of core
> functionality and wasn't going to do any harm upstream, I judged it as
> no harm to push it. Think of it as a balancing act between trying to get
> easy things done and getting tricky things reviewed, especially given
> that we are extremely bandwidth-deprived on the reviewing side.

Can we have it working (=testing) for the current codebase first, please?

Patch

diff --git a/mercurial/parsers.c b/mercurial/parsers.c
--- a/mercurial/parsers.c
+++ b/mercurial/parsers.c
@@ -1508,6 +1508,7 @@  static char parsers_doc[] = "Efficient c
 
 PyObject *encodedir(PyObject *self, PyObject *args);
 PyObject *pathencode(PyObject *self, PyObject *args);
+PyObject *lowerencode(PyObject *self, PyObject *args);
 
 static PyMethodDef methods[] = {
 	{"pack_dirstate", pack_dirstate, METH_VARARGS, "pack a dirstate\n"},
@@ -1516,6 +1517,7 @@  static PyMethodDef methods[] = {
 	{"parse_index2", parse_index2, METH_VARARGS, "parse a revlog index\n"},
 	{"encodedir", encodedir, METH_VARARGS, "encodedir a path\n"},
 	{"pathencode", pathencode, METH_VARARGS, "fncache-encode a path\n"},
+	{"lowerencode", lowerencode, METH_VARARGS, "lower-encode a path\n"},
 	{NULL, NULL}
 };
 
diff --git a/mercurial/pathencode.c b/mercurial/pathencode.c
--- a/mercurial/pathencode.c
+++ b/mercurial/pathencode.c
@@ -15,6 +15,7 @@ 
  * required.
  */
 
+#define PY_SSIZE_T_CLEAN
 #include <Python.h>
 #include <assert.h>
 #include <ctype.h>
@@ -481,6 +482,47 @@  static Py_ssize_t basicencode(char *dest
 
 static const Py_ssize_t maxstorepathlen = 120;
 
+static Py_ssize_t _lowerencode(char *dest, size_t destsize,
+			       const char *src, Py_ssize_t len)
+{
+	static const uint32_t onebyte[8] = {
+		1, 0x2bfffbfb, 0xe8000001, 0x2fffffff
+	};
+
+	static const uint32_t lower[8] = { 0, 0, 0x7fffffe };
+
+	Py_ssize_t i, destlen = 0;
+
+	for (i = 0; i < len; i++) {
+		if (inset(onebyte, src[i]))
+			charcopy(dest, &destlen, destsize, src[i]);
+		else if (inset(lower, src[i]))
+			charcopy(dest, &destlen, destsize, src[i] + 32);
+		else
+			escape3(dest, &destlen, destsize, src[i]);
+	}
+
+	return destlen;
+}
+
+PyObject *lowerencode(PyObject *self, PyObject *args)
+{
+	char *path;
+	Py_ssize_t len, newlen;
+	PyObject *ret;
+
+	if (!PyArg_ParseTuple(args, "s#:lowerencode", &path, &len))
+		return NULL;
+
+	newlen = _lowerencode(NULL, 0, path, len);
+	ret = PyString_FromStringAndSize(NULL, newlen);
+	if (ret)
+		newlen = _lowerencode(PyString_AS_STRING(ret), newlen,
+				      path, len);
+
+	return ret;
+}
+
 /*
  * We currently implement only basic encoding.
  *
diff --git a/mercurial/store.py b/mercurial/store.py
--- a/mercurial/store.py
+++ b/mercurial/store.py
@@ -132,7 +132,7 @@  def _buildlowerencodefun():
         cmap[chr(x)] = chr(x).lower()
     return lambda s: "".join([cmap[c] for c in s])
 
-lowerencode = _buildlowerencodefun()
+lowerencode = getattr(parsers, 'lowerencode', None) or _buildlowerencodefun()
 
 # Windows reserved names: con, prn, aux, nul, com1..com9, lpt1..lpt9
 _winres3 = ('aux', 'con', 'prn', 'nul') # length 3