Patchwork [1,of,7] ancestors: actually iterate over ancestors in topological order (issue5979)

login
register
mail settings
Submitter Boris Feld
Date Sept. 7, 2018, 3:04 p.m.
Message ID <eb80c721aea9715e23dc.1536332645@localhost.localdomain>
Download mbox | patch
Permalink /patch/34393/
State Accepted
Headers show

Comments

Boris Feld - Sept. 7, 2018, 3:04 p.m.
# HG changeset patch
# User Boris Feld <boris.feld@octobus.net>
# Date 1536267628 14400
#      Thu Sep 06 17:00:28 2018 -0400
# Node ID eb80c721aea9715e23dc35cdd119428aa120ea93
# Parent  ab452995eafffa69c34e863e4d8c03e163d8f3ad
# EXP-Topic issue5979
# Available At https://bitbucket.org/octobus/mercurial-devel/
#              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r eb80c721aea9
ancestors: actually iterate over ancestors in topological order (issue5979)

This code previously used a dequeue logic, the first ancestors seen were the
first ancestors to be emitted. In branching/merging situations, it can result
in ancestors being yielded before their descendants, breaking the object
contract.

We got affected by this issue while working on the copy tracing code. At about
the same time, Axel Hecht <axel@mozilla.com> reported the issue and provided
the test case used in this changeset. Thanks Axel.

Running `hg perfancestors` does not show a significant difference between the
old and the new version.
via Mercurial-devel - Sept. 10, 2018, 6:36 a.m.
On Fri, Sep 7, 2018 at 8:08 AM Boris Feld <boris.feld@octobus.net> wrote:

> # HG changeset patch
> # User Boris Feld <boris.feld@octobus.net>
> # Date 1536267628 14400
> #      Thu Sep 06 17:00:28 2018 -0400
> # Node ID eb80c721aea9715e23dc35cdd119428aa120ea93
> # Parent  ab452995eafffa69c34e863e4d8c03e163d8f3ad
> # EXP-Topic issue5979
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r
> eb80c721aea9
> ancestors: actually iterate over ancestors in topological order (issue5979)
>
> This code previously used a dequeue logic, the first ancestors seen were
> the
> first ancestors to be emitted. In branching/merging situations, it can
> result
> in ancestors being yielded before their descendants, breaking the object
> contract.
>
> We got affected by this issue while working on the copy tracing code. At
> about
> the same time, Axel Hecht <axel@mozilla.com> reported the issue and
> provided
> the test case used in this changeset. Thanks Axel.
>
> Running `hg perfancestors` does not show a significant difference between
> the
> old and the new version.
>

In my hg repo, it went from 0.047230 to 0.093331, which is clearly
significant. Maybe you made a last-minute change that made it slower? Or is
my repo just different from yours?
Yuya Nishihara - Sept. 10, 2018, 1:41 p.m.
On Sun, 9 Sep 2018 23:36:15 -0700, Martin von Zweigbergk via Mercurial-devel wrote:
> On Fri, Sep 7, 2018 at 8:08 AM Boris Feld <boris.feld@octobus.net> wrote:
> > # HG changeset patch
> > # User Boris Feld <boris.feld@octobus.net>
> > # Date 1536267628 14400
> > #      Thu Sep 06 17:00:28 2018 -0400
> > # Node ID eb80c721aea9715e23dc35cdd119428aa120ea93
> > # Parent  ab452995eafffa69c34e863e4d8c03e163d8f3ad
> > # EXP-Topic issue5979
> > # Available At https://bitbucket.org/octobus/mercurial-devel/
> > #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r
> > eb80c721aea9
> > ancestors: actually iterate over ancestors in topological order (issue5979)
> >
> > This code previously used a dequeue logic, the first ancestors seen were
> > the
> > first ancestors to be emitted. In branching/merging situations, it can
> > result
> > in ancestors being yielded before their descendants, breaking the object
> > contract.
> >
> > We got affected by this issue while working on the copy tracing code. At
> > about
> > the same time, Axel Hecht <axel@mozilla.com> reported the issue and
> > provided
> > the test case used in this changeset. Thanks Axel.
> >
> > Running `hg perfancestors` does not show a significant difference between
> > the
> > old and the new version.
> >
> 
> In my hg repo, it went from 0.047230 to 0.093331, which is clearly
> significant. Maybe you made a last-minute change that made it slower? Or is
> my repo just different from yours?

I got a similar result. The heapq version is ~2x slower than the original
buggy implementation. It seems I made it to 1-1.5x slowness by optimizing
out heappush/heappop for contiguous cases (i.e. overwrite visit[0] if p1
is known to be the next largest revision.) I'll send the patches if I can
be sure that's correct.

Patch

diff --git a/mercurial/ancestor.py b/mercurial/ancestor.py
--- a/mercurial/ancestor.py
+++ b/mercurial/ancestor.py
@@ -7,7 +7,6 @@ 
 
 from __future__ import absolute_import
 
-import collections
 import heapq
 
 from .node import nullrev
@@ -305,14 +304,15 @@  class lazyancestors(object):
         """Generate the ancestors of _initrevs in reverse topological order.
 
         If inclusive is False, yield a sequence of revision numbers starting
-        with the parents of each revision in revs, i.e., each revision is *not*
-        considered an ancestor of itself.  Results are in breadth-first order:
-        parents of each rev in revs, then parents of those, etc.
+        with the parents of each revision in revs, i.e., each revision is
+        *not* considered an ancestor of itself. Results are emitted in reverse
+        revision number order. That order is also topological: a child is
+        always emitted before its parent.
 
         If inclusive is True, yield all the revs first (ignoring stoprev),
-        then yield all the ancestors of revs as when inclusive is False.
-        If an element in revs is an ancestor of a different rev it is not
-        yielded again."""
+        then yield all the ancestors of revs as when inclusive is False. If an
+        element in revs is an ancestor of a different rev it is not yielded
+        again."""
         seen = set()
         revs = self._initrevs
         if self._inclusive:
@@ -322,17 +322,27 @@  class lazyancestors(object):
 
         parentrevs = self._parentrevs
         stoprev = self._stoprev
-        visit = collections.deque(revs)
+        visit = []
+        heapq.heapify(visit)
+        schedule = heapq.heappush
+        nextitem = heapq.heappop
+        see = seen.add
+        see(nullrev)
 
-        see = seen.add
-        schedule = visit.append
+        for r in revs:
+            for parent in parentrevs(r):
+                if parent not in seen:
+                    schedule(visit, -parent)
+                    see(parent)
 
         while visit:
-            for parent in parentrevs(visit.popleft()):
-                if parent >= stoprev and parent not in seen:
-                    schedule(parent)
-                    see(parent)
-                    yield parent
+            current = -nextitem(visit)
+            if current >= stoprev:
+                yield current
+                for parent in parentrevs(current):
+                    if parent not in seen:
+                        schedule(visit, -parent)
+                        see(parent)
 
     def __contains__(self, target):
         """Test whether target is an ancestor of self._initrevs."""
diff --git a/tests/test-ancestor.py.out b/tests/test-ancestor.py.out
--- a/tests/test-ancestor.py.out
+++ b/tests/test-ancestor.py.out
@@ -3,16 +3,16 @@  membership: []
 iteration:  []
 % lazy ancestor set for [11, 13], stoprev = 0, inclusive = False
 membership: [7, 8, 3, 4, 1, 0]
-iteration:  [3, 7, 8, 1, 4, 0, 2]
+iteration:  [8, 7, 4, 3, 2, 1, 0]
 % lazy ancestor set for [1, 3], stoprev = 0, inclusive = False
 membership: [1, 0]
-iteration:  [0, 1]
+iteration:  [1, 0]
 % lazy ancestor set for [11, 13], stoprev = 0, inclusive = True
 membership: [11, 13, 7, 8, 3, 4, 1, 0]
-iteration:  [11, 13, 3, 7, 8, 1, 4, 0, 2]
+iteration:  [11, 13, 8, 7, 4, 3, 2, 1, 0]
 % lazy ancestor set for [11, 13], stoprev = 6, inclusive = False
 membership: [7, 8]
-iteration:  [7, 8]
+iteration:  [8, 7]
 % lazy ancestor set for [11, 13], stoprev = 6, inclusive = True
 membership: [11, 13, 7, 8]
-iteration:  [11, 13, 7, 8]
+iteration:  [11, 13, 8, 7]
diff --git a/tests/test-issue5979.t b/tests/test-issue5979.t
new file mode 100644
--- /dev/null
+++ b/tests/test-issue5979.t
@@ -0,0 +1,34 @@ 
+  $ hg init r1
+  $ cd r1
+  $ hg ci --config ui.allowemptycommit=true -m c0
+  $ hg ci --config ui.allowemptycommit=true -m c1
+  $ hg ci --config ui.allowemptycommit=true -m c2
+  $ hg co -q 0
+  $ hg ci --config ui.allowemptycommit=true -m c3
+  created new head
+  $ hg co -q 3
+  $ hg merge --quiet
+  $ hg ci --config ui.allowemptycommit=true -m c4
+
+  $ hg log -G -T'{desc}'
+  @    c4
+  |\
+  | o  c3
+  | |
+  o |  c2
+  | |
+  o |  c1
+  |/
+  o  c0
+  
+
+  >>> from mercurial.ui import ui
+  >>> from mercurial.hg import repository
+  >>> repo = repository(ui())
+  >>> for anc in repo.changelog.ancestors([4], inclusive=True):
+  ...   print(anc)
+  4
+  3
+  2
+  1
+  0
diff --git a/tests/test-revlog-ancestry.py.out b/tests/test-revlog-ancestry.py.out
--- a/tests/test-revlog-ancestry.py.out
+++ b/tests/test-revlog-ancestry.py.out
@@ -1,13 +1,13 @@ 
 Ancestors of 5
 4 2 0 
 Ancestors of 6 and 5
-3 4 2 1 0 
+4 3 2 1 0 
 Ancestors of 5 and 4
 4 2 0 
 Ancestors of 7, stop at 6
 6 
 Ancestors of 7, including revs
-7 6 5 3 4 2 1 0 
+7 6 5 4 3 2 1 0 
 Ancestors of 7, 5 and 3, including revs
 7 5 3 6 4 2 1 0