Patchwork [1,of,6,bid,merge,v2] ancestors: return all common ancestor heads - not just the most distant

login
register
mail settings
Submitter Mads Kiilerich
Date April 7, 2014, 12:18 a.m.
Message ID <6ab1d2340403e890b071.1396829937@localhost.localdomain>
Download mbox | patch
Permalink /patch/4236/
State Accepted
Headers show

Comments

Mads Kiilerich - April 7, 2014, 12:18 a.m.
# HG changeset patch
# User Mads Kiilerich <madski@unity3d.com>
# Date 1393278134 -3600
#      Mon Feb 24 22:42:14 2014 +0100
# Node ID 6ab1d2340403e890b0719b6b8c6771171cf55204
# Parent  12f161f08d744f0e4b6eef9c905670afb5c24dd4
ancestors: return all common ancestor heads - not just the most distant

Mercurial had a heuristic for picking the "greatest" among multiple "heads of
common ancestors". It was stable and optimized and worked, but:

* There could still be a tie where the caller had to use another heuristic for
  picking which one to use.
* Low level functions should be generic and without hardcoded policies.
* We do (also) want all the common ancestor heads at the higher levels.
* The algorithm for finding the greatest was complex in code and computation.

We thus trim the ancestors function so it no longer try to find the best heads
but return all of them.

This will only make a real difference in rare situations. The new and more
generic functionality for getting all the common ancestor heads can be used for
other purposes.

The lack of test suite changes shows that we don't have any test cases where
the special heuristic for picking the greatest common ancestor head makes any
difference (or that the high level heuristic of picking one arbitrarily is
just as good).
Matt Mackall - April 17, 2014, 3:03 p.m.
On Mon, 2014-04-07 at 02:18 +0200, Mads Kiilerich wrote:
> # HG changeset patch
> # User Mads Kiilerich <madski@unity3d.com>
> # Date 1393278134 -3600
> #      Mon Feb 24 22:42:14 2014 +0100
> # Node ID 6ab1d2340403e890b0719b6b8c6771171cf55204
> # Parent  12f161f08d744f0e4b6eef9c905670afb5c24dd4
> ancestors: return all common ancestor heads - not just the most distant

We're going to need the old single-ancestor merge algorithm around for a
few releases. That algorithm needs to choose a "best" ancestor and the
heuristic it uses for that is the one you're deleting. So this approach
is simply not going to work. We'll probably need to add another function
or flag.

(In fact, we're probably always going to need an algorithm that gives us
a single "best" ancestor.)
Mads Kiilerich - April 17, 2014, 3:11 p.m.
On 04/17/2014 05:03 PM, Matt Mackall wrote:
> On Mon, 2014-04-07 at 02:18 +0200, Mads Kiilerich wrote:
>> # HG changeset patch
>> # User Mads Kiilerich <madski@unity3d.com>
>> # Date 1393278134 -3600
>> #      Mon Feb 24 22:42:14 2014 +0100
>> # Node ID 6ab1d2340403e890b0719b6b8c6771171cf55204
>> # Parent  12f161f08d744f0e4b6eef9c905670afb5c24dd4
>> ancestors: return all common ancestor heads - not just the most distant
> We're going to need the old single-ancestor merge algorithm around for a
> few releases. That algorithm needs to choose a "best" ancestor and the
> heuristic it uses for that is the one you're deleting. So this approach
> is simply not going to work. We'll probably need to add another function
> or flag.
>
> (In fact, we're probably always going to need an algorithm that gives us
> a single "best" ancestor.)

Do you consider it essential to preserve exactly the existing "best" 
heuristics?

I would suggest something like 
http://markmail.org/message/myqbmm6gsfcgw6am (and the previous patch) 
for a more transparent and manageable way of deciding which common 
ancestors head is "best".

/Mads
Matt Mackall - April 17, 2014, 3:17 p.m.
On Thu, 2014-04-17 at 17:11 +0200, Mads Kiilerich wrote:
> On 04/17/2014 05:03 PM, Matt Mackall wrote:
> > On Mon, 2014-04-07 at 02:18 +0200, Mads Kiilerich wrote:
> >> # HG changeset patch
> >> # User Mads Kiilerich <madski@unity3d.com>
> >> # Date 1393278134 -3600
> >> #      Mon Feb 24 22:42:14 2014 +0100
> >> # Node ID 6ab1d2340403e890b0719b6b8c6771171cf55204
> >> # Parent  12f161f08d744f0e4b6eef9c905670afb5c24dd4
> >> ancestors: return all common ancestor heads - not just the most distant
> > We're going to need the old single-ancestor merge algorithm around for a
> > few releases. That algorithm needs to choose a "best" ancestor and the
> > heuristic it uses for that is the one you're deleting. So this approach
> > is simply not going to work. We'll probably need to add another function
> > or flag.
> >
> > (In fact, we're probably always going to need an algorithm that gives us
> > a single "best" ancestor.)
> 
> Do you consider it essential to preserve exactly the existing "best" 
> heuristics?

The distance from root algorithm that we have today must continue to be
available, yes. So wholesale deleting the distance algorithm is a
non-starter.

But I'm totally open to something like this:

revlog.ancestor -> the current algorithm, but possibly overrideable
revlog.commonancestors -> heads(::revs)

I'm afraid this'll have to wait until May though.

Patch

diff --git a/mercurial/ancestor.py b/mercurial/ancestor.py
--- a/mercurial/ancestor.py
+++ b/mercurial/ancestor.py
@@ -11,8 +11,8 @@  from node import nullrev
 
 def ancestors(pfunc, *orignodes):
     """
-    Returns the common ancestors of a and b that are furthest from a
-    root (as measured by longest path).
+    Returns a set with the heads of all common ancestors of all orignodes,
+    heads(::orignodes[0] and ::orignodes[1] and ...) .
 
     pfunc must return a list of parent vertices for a given vertex.
     """
@@ -67,67 +67,9 @@  def ancestors(pfunc, *orignodes):
                     seen[p] = sv
         return gca
 
-    def deepest(nodes):
-        interesting = {}
-        count = max(nodes) + 1
-        depth = [0] * count
-        seen = [0] * count
-        mapping = []
-        for (i, n) in enumerate(sorted(nodes)):
-            depth[n] = 1
-            b = 1 << i
-            seen[n] = b
-            interesting[b] = 1
-            mapping.append((b, n))
-        nv = count - 1
-        while nv >= 0 and len(interesting) > 1:
-            v = nv
-            nv -= 1
-            dv = depth[v]
-            if dv == 0:
-                continue
-            sv = seen[v]
-            for p in pfunc(v):
-                if p == nullrev:
-                    continue
-                dp = depth[p]
-                nsp = sp = seen[p]
-                if dp <= dv:
-                    depth[p] = dv + 1
-                    if sp != sv:
-                        interesting[sv] += 1
-                        nsp = seen[p] = sv
-                        if sp:
-                            interesting[sp] -= 1
-                            if interesting[sp] == 0:
-                                del interesting[sp]
-                elif dv == dp - 1:
-                    nsp = sp | sv
-                    if nsp == sp:
-                        continue
-                    seen[p] = nsp
-                    interesting.setdefault(nsp, 0)
-                    interesting[nsp] += 1
-                    interesting[sp] -= 1
-                    if interesting[sp] == 0:
-                        del interesting[sp]
-            interesting[sv] -= 1
-            if interesting[sv] == 0:
-                del interesting[sv]
-
-        if len(interesting) != 1:
-            return []
-
-        k = 0
-        for i in interesting:
-            k |= i
-        return set(n for (i, n) in mapping if k & i)
-
     gca = candidates(orignodes)
 
-    if len(gca) <= 1:
-        return gca
-    return deepest(gca)
+    return gca
 
 def genericancestor(a, b, pfunc):
     """
diff --git a/mercurial/parsers.c b/mercurial/parsers.c
--- a/mercurial/parsers.c
+++ b/mercurial/parsers.c
@@ -1290,162 +1290,8 @@  bail:
 }
 
 /*
- * Given a disjoint set of revs, return the subset with the longest
- * path to the root.
- */
-static PyObject *find_deepest(indexObject *self, PyObject *revs)
-{
-	const Py_ssize_t revcount = PyList_GET_SIZE(revs);
-	static const Py_ssize_t capacity = 24;
-	int *depth, *interesting = NULL;
-	int i, j, v, ninteresting;
-	PyObject *dict = NULL, *keys;
-	long *seen = NULL;
-	int maxrev = -1;
-	long final;
-
-	if (revcount > capacity) {
-		PyErr_Format(PyExc_OverflowError,
-			     "bitset size (%ld) > capacity (%ld)",
-			     (long)revcount, (long)capacity);
-		return NULL;
-	}
-
-	for (i = 0; i < revcount; i++) {
-		int n = (int)PyInt_AsLong(PyList_GET_ITEM(revs, i));
-		if (n > maxrev)
-			maxrev = n;
-	}
-
-	depth = calloc(sizeof(*depth), maxrev + 1);
-	if (depth == NULL)
-		return PyErr_NoMemory();
-
-	seen = calloc(sizeof(*seen), maxrev + 1);
-	if (seen == NULL) {
-		PyErr_NoMemory();
-		goto bail;
-	}
-
-	interesting = calloc(sizeof(*interesting), 2 << revcount);
-	if (interesting == NULL) {
-		PyErr_NoMemory();
-		goto bail;
-	}
-
-	if (PyList_Sort(revs) == -1)
-		goto bail;
-
-	for (i = 0; i < revcount; i++) {
-		int n = (int)PyInt_AsLong(PyList_GET_ITEM(revs, i));
-		long b = 1l << i;
-		depth[n] = 1;
-		seen[n] = b;
-		interesting[b] = 1;
-	}
-
-	ninteresting = (int)revcount;
-
-	for (v = maxrev; v >= 0 && ninteresting > 1; v--) {
-		int dv = depth[v];
-		int parents[2];
-		long sv;
-
-		if (dv == 0)
-			continue;
-
-		sv = seen[v];
-		index_get_parents(self, v, parents);
-
-		for (i = 0; i < 2; i++) {
-			int p = parents[i];
-			long nsp, sp;
-			int dp;
-
-			if (p == -1)
-				continue;
-
-			dp = depth[p];
-			nsp = sp = seen[p];
-			if (dp <= dv) {
-				depth[p] = dv + 1;
-				if (sp != sv) {
-					interesting[sv] += 1;
-					nsp = seen[p] = sv;
-					if (sp) {
-						interesting[sp] -= 1;
-						if (interesting[sp] == 0)
-							ninteresting -= 1;
-					}
-				}
-			}
-			else if (dv == dp - 1) {
-				nsp = sp | sv;
-				if (nsp == sp)
-					continue;
-				seen[p] = nsp;
-				interesting[sp] -= 1;
-				if (interesting[sp] == 0 && interesting[nsp] > 0)
-					ninteresting -= 1;
-				interesting[nsp] += 1;
-			}
-		}
-		interesting[sv] -= 1;
-		if (interesting[sv] == 0)
-			ninteresting -= 1;
-	}
-
-	final = 0;
-	j = ninteresting;
-	for (i = 0; i < (int)(2 << revcount) && j > 0; i++) {
-		if (interesting[i] == 0)
-			continue;
-		final |= i;
-		j -= 1;
-	}
-	if (final == 0)
-		return PyList_New(0);
-
-	dict = PyDict_New();
-	if (dict == NULL)
-		goto bail;
-
-	for (i = 0; i < revcount; i++) {
-		PyObject *key;
-
-		if ((final & (1 << i)) == 0)
-			continue;
-
-		key = PyList_GET_ITEM(revs, i);
-		Py_INCREF(key);
-		Py_INCREF(Py_None);
-		if (PyDict_SetItem(dict, key, Py_None) == -1) {
-			Py_DECREF(key);
-			Py_DECREF(Py_None);
-			goto bail;
-		}
-	}
-
-	keys = PyDict_Keys(dict);
-
-	free(depth);
-	free(seen);
-	free(interesting);
-	Py_DECREF(dict);
-
-	return keys;
-bail:
-	free(depth);
-	free(seen);
-	free(interesting);
-	Py_XDECREF(dict);
-
-	return NULL;
-}
-
-/*
- * Given a (possibly overlapping) set of revs, return the greatest
- * common ancestors: those with the longest path to the root.
+ * Given a (possibly overlapping) set of revs, return all the greatest
+ * common ancestors: heads(::a and ::b and ...)
  */
 static PyObject *index_ancestors(indexObject *self, PyObject *args)
 {
@@ -1524,11 +1370,8 @@  static PyObject *index_ancestors(indexOb
 	if (gca == NULL)
 		goto bail;
 
-	if (PyList_GET_SIZE(gca) <= 1) {
-		ret = gca;
-		Py_INCREF(gca);
-	}
-	else ret = find_deepest(self, gca);
+	ret = gca;
+	Py_INCREF(gca);
 
 done:
 	free(revs);