Patchwork D3499: revlog: use node tree (native code) for shortest() calculation

login
register
mail settings
Submitter phabricator
Date May 8, 2018, 6:07 p.m.
Message ID <differential-rev-PHID-DREV-wmqdaqnsmgmpgaqjsuho-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/31385/
State Superseded
Headers show

Comments

phabricator - May 8, 2018, 6:07 p.m.
martinvonz created this revision.
Herald added a reviewer: indygreg.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  I want to rewrite revlog.shortest() to disambiguate only among hex
  nodeids and then disambiguate the result with revnums at a higher
  level (in scmutil). However, that would slow down `hg log -T
  '{shortest(node,1)}\n'` from 5.0s to 6.8s, which I wasn't sure would
  be acceptable. So this patch makes revlog.shortest() use the node tree
  for finding the length of the shortest prefix that's unambiguous among
  nodeids. Once that has been found, it makes it longer until it is also
  not ambiguous with a revnum.
  
  This speeds up `hg log -T '{shortest(node,1)}\n'` from 5.0s to 4.0s.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3499

AFFECTED FILES
  mercurial/cext/revlog.c
  mercurial/revlog.py

CHANGE DETAILS




To: martinvonz, indygreg, #hg-reviewers
Cc: mercurial-devel
Yuya Nishihara - May 10, 2018, 1:08 p.m.
Looks generally good, but can you fix your editor to not do Google indent?

I'm not sure if we should bump the module version in this case, but I would
do that just for sanity.

> +static int nt_shortest(indexObject *self, const char *node)
> +{
> +  if (nt_init(self) == -1)
> +    return -3;
> +  if (nt_populate(self) == -1)
> +    return -3;
> +
> +  int level, off;

Declaration has to be moved to top.

> +  for (level = off = 0; level < 40; level++) {
> +    int k = nt_level(node, level);
> +    nodetree *n = &self->nt[off];
> +    int v = n->children[k];
> +    if (v < 0) {
> +      v = -(v + 1);
> +      const char *n = index_node(self, v);

Here, too.
phabricator - May 10, 2018, 1:08 p.m.
yuja added a comment.


  Looks generally good, but can you fix your editor to not do Google indent?
  
  I'm not sure if we should bump the module version in this case, but I would
  do that just for sanity.
  
  > +static int nt_shortest(indexObject *self, const char *node)
  >  +{
  >  +  if (nt_init(self) == -1)
  >  +    return -3;
  >  +  if (nt_populate(self) == -1)
  >  +    return -3;
  >  +
  >  +  int level, off;
  
  Declaration has to be moved to top.
  
  > +  for (level = off = 0; level < 40; level++) {
  >  +    int k = nt_level(node, level);
  >  +    nodetree *n = &self->nt[off];
  >  +    int v = n->children[k];
  >  +    if (v < 0) {
  >  +      v = -(v + 1);
  >  +      const char *n = index_node(self, v);
  
  Here, too.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3499

To: martinvonz, indygreg, #hg-reviewers
Cc: yuja, mercurial-devel
phabricator - May 10, 2018, 4:01 p.m.
martinvonz added a comment.


  In https://phab.mercurial-scm.org/D3499#55772, @yuja wrote:
  
  > Looks generally good, but can you fix your editor to not do Google indent?
  
  
  Wow, we still use tabs?! Okay, will change.
  
  > I'm not sure if we should bump the module version in this case, but I would
  >  do that just for sanity.
  
  Done, I think, but please check that I did it right.
  
  > 
  > 
  >> +static int nt_shortest(indexObject *self, const char *node)
  >>  +{
  >>  +  if (nt_init(self) == -1)
  >>  +    return -3;
  >>  +  if (nt_populate(self) == -1)
  >>  +    return -3;
  >>  +
  >>  +  int level, off;
  > 
  > Declaration has to be moved to top.
  
  Done.
  
  > 
  > 
  >> +  for (level = off = 0; level < 40; level++) {
  >>  +    int k = nt_level(node, level);
  >>  +    nodetree *n = &self->nt[off];
  >>  +    int v = n->children[k];
  >>  +    if (v < 0) {
  >>  +      v = -(v + 1);
  >>  +      const char *n = index_node(self, v);
  > 
  > Here, too.
  
  Done.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3499

To: martinvonz, indygreg, #hg-reviewers
Cc: yuja, mercurial-devel
phabricator - May 11, 2018, 7:31 p.m.
martinvonz added a comment.


  In https://phab.mercurial-scm.org/D3499#55925, @yuja wrote:
  
  > > +static int nt_shortest(indexObject *self, const char *node)
  > >  +{
  > >  +	int level, off;
  > >  +
  > >  +	if (nt_init(self) == -1)
  > >  +		return -3;
  > >  +	if (nt_populate(self) == -1)
  > >  +		return -3;
  > >  +
  > >  +	for (level = off = 0; level < 40; level++) {
  > >  +		int k, v;
  > >  +		nodetree *n = &self->nt[off];
  > >  +		k = nt_level(node, level);
  > >  +		v = n->children[k];
  > >  +		if (v < 0) {
  > >  +			v = -(v + 1);
  > >  +			const char *n = index_node(self, v);
  >
  > Perhaps we should check if n == NULL. index_node_existing() might be more
  >  appropriate.
  
  
  Oops, funny how I missed that after just having added index_node_existing. Thanks for noticing.
  
  > Can you send a followup?
  
  Will do. Feel free to fold in.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3499

To: martinvonz, indygreg, #hg-reviewers
Cc: yuja, mercurial-devel

Patch

diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -1526,7 +1526,33 @@ 
                 raise LookupError(node, self.indexfile, _('no node'))
             return not isrev(prefix)
 
+        def maybewdir(prefix):
+            return all(c == 'f' for c in prefix)
+
         hexnode = hex(node)
+
+        def disambiguate(hexnode, minlength):
+            for length in range(minlength, 41):
+                prefix = hexnode[:length]
+                if not isrev(prefix) and not maybewdir(prefix):
+                    return prefix
+
+        if not getattr(self, 'filteredrevs', None):
+            try:
+                length = max(self.index.shortest(node), minlength)
+                return disambiguate(hexnode, length)
+            except RevlogError:
+                if node == wdirid:
+                    for length in range(minlength, 41):
+                        prefix = hexnode[:length]
+                        if isvalid(prefix):
+                            return prefix
+                else:
+                    raise LookupError(node, self.indexfile, _('no node'))
+            except AttributeError:
+                # Fall through to pure code
+                pass
+
         shortest = hexnode
         startlength = max(6, minlength)
         length = startlength
diff --git a/mercurial/cext/revlog.c b/mercurial/cext/revlog.c
--- a/mercurial/cext/revlog.c
+++ b/mercurial/cext/revlog.c
@@ -1259,6 +1259,52 @@ 
 	return nt_find(self, node, nodelen, 1);
 }
 
+/*
+ * Find the length of the shortest unique prefix of node.
+ *
+ * Return values:
+ *
+ *   -3: error (exception set)
+ *   -2: not found (no exception set)
+ * rest: length of shortest prefix
+ */
+static int nt_shortest(indexObject *self, const char *node)
+{
+  if (nt_init(self) == -1)
+    return -3;
+  if (nt_populate(self) == -1)
+    return -3;
+
+  int level, off;
+
+  for (level = off = 0; level < 40; level++) {
+    int k = nt_level(node, level);
+    nodetree *n = &self->nt[off];
+    int v = n->children[k];
+    if (v < 0) {
+      v = -(v + 1);
+      const char *n = index_node(self, v);
+      if (memcmp(node, n, 20) != 0)
+        /*
+         * Found a unique prefix, but it wasn't for the requested node (i.e
+         * the requested node does not exist).
+         */
+        return -2;
+      return level + 1;
+    }
+    if (v == 0)
+      return -2;
+    off = v;
+  }
+  /*
+   * The node was still not unique after 40 hex digits, so this won't happen.
+   * Also, if we get here, then there's a programming error in this file that
+   * made us insert a node longer than 40 hex digits.
+   */
+  PyErr_SetString(PyExc_Exception, "broken node tree");
+  return -3;
+}
+
 static PyObject *index_partialmatch(indexObject *self, PyObject *args)
 {
 	const char *fullnode;
@@ -1307,6 +1353,29 @@ 
 	return PyBytes_FromStringAndSize(fullnode, 20);
 }
 
+static PyObject *index_shortest(indexObject *self, PyObject *args)
+{
+  Py_ssize_t nodelen;
+  PyObject *val;
+  char *node;
+  int length;
+
+  if (!PyArg_ParseTuple(args, "O", &val))
+    return NULL;
+  if (node_check(val, &node, &nodelen) == -1)
+    return NULL;
+
+  self->ntlookups++;
+  length = nt_shortest(self, node);
+  if (length == -3)
+    return NULL;
+  if (length == -2) {
+    raise_revlog_error();
+    return NULL;
+  }
+  return PyInt_FromLong(length);
+}
+
 static PyObject *index_m_get(indexObject *self, PyObject *args)
 {
 	Py_ssize_t nodelen;
@@ -1995,6 +2064,8 @@ 
 	 "insert an index entry"},
 	{"partialmatch", (PyCFunction)index_partialmatch, METH_VARARGS,
 	 "match a potentially ambiguous node ID"},
+	{"shortest", (PyCFunction)index_shortest, METH_VARARGS,
+	 "find length of shortest hex nodeid of a binary ID"},
 	{"stats", (PyCFunction)index_stats, METH_NOARGS,
 	 "stats for the index"},
 	{NULL} /* Sentinel */