Patchwork [05,of,16] ancestor.missingancestors: don't discard from basesvisit

login
register
mail settings
Submitter Siddharth Agarwal
Date Nov. 16, 2014, 9:17 a.m.
Message ID <cb87c30f2674da84818c.1416129427@devbig136.prn2.facebook.com>
Download mbox | patch
Permalink /patch/6751/
State Accepted
Commit 194508240dc93eda1d473fd067cf0d663c25ef08
Headers show

Comments

Siddharth Agarwal - Nov. 16, 2014, 9:17 a.m.
# HG changeset patch
# User Siddharth Agarwal <sid0@fb.com>
# Date 1415993632 28800
#      Fri Nov 14 11:33:52 2014 -0800
# Node ID cb87c30f2674da84818c9f3f416b41e493ed9422
# Parent  60263a5502a8dd6a23e77b13e4ca44af688ad0dd
ancestor.missingancestors: don't discard from basesvisit

We only actually care about whether revsvisit is empty, so we can let
basesvisit grow to arbitrary size.

It turns out that this actually helps performance. For a large repo with
hundreds of thousands of commits, hg perfrevset 'only(0, tip)' (basically the
worst case, involving a full DAG traversal) goes from 1.63 seconds to 1.50. hg
perfrevset 'only(tip, 0)' remains unchanged at 1.98 seconds.

Patch

diff --git a/mercurial/ancestor.py b/mercurial/ancestor.py
--- a/mercurial/ancestor.py
+++ b/mercurial/ancestor.py
@@ -155,20 +155,19 @@ 
     start = max(max(revsvisit), max(basesvisit))
     bothvisit = revsvisit.intersection(basesvisit)
     revsvisit.difference_update(bothvisit)
-    basesvisit.difference_update(bothvisit)
     # At this point, we hold the invariants that:
     # - revsvisit is the set of nodes we know are an ancestor of at least one
     #   of the nodes in revs
     # - basesvisit is the same for bases
     # - bothvisit is the set of nodes we know are ancestors of at least one of
-    #   the nodes in revs and one of the nodes in bases
-    # - a node may be in none or one, but not more, of revsvisit, basesvisit
-    #   and bothvisit at any given time
+    #   the nodes in revs and one of the nodes in bases, and that are smaller
+    #   than curr. bothvisit and revsvisit are mutually exclusive, but bothvisit
+    #   is a subset of basesvisit.
     # Now we walk down in reverse topo order, adding parents of nodes already
-    # visited to the sets while maintaining the invariants. When a node is
-    # found in both revsvisit and basesvisit, it is removed from them and
-    # added to bothvisit instead. When revsvisit becomes empty, there are no
-    # more ancestors of revs that aren't also ancestors of bases, so exit.
+    # visited to the sets while maintaining the invariants. When a node is found
+    # in both revsvisit and basesvisit, it is removed from revsvisit and added
+    # to bothvisit. When revsvisit becomes empty, there are no more ancestors of
+    # revs that aren't also ancestors of bases, so exit.
 
     missing = []
     for curr in xrange(start, nullrev, -1):
@@ -177,18 +176,17 @@ 
 
         if curr in bothvisit:
             bothvisit.remove(curr)
-            # curr's parents might have made it into revsvisit or basesvisit
-            # through another path
+            # curr's parents might have made it into revsvisit through another
+            # path
             for p in pfunc(curr):
                 revsvisit.discard(p)
-                basesvisit.discard(p)
+                basesvisit.add(p)
                 bothvisit.add(p)
             continue
 
-        # curr will never be in both revsvisit and basesvisit, since if it
-        # were it'd have been pushed to bothvisit
         if curr in revsvisit:
             missing.append(curr)
+            revsvisit.remove(curr)
             thisvisit = revsvisit
             othervisit = basesvisit
         elif curr in basesvisit:
@@ -198,7 +196,6 @@ 
             # not an ancestor of revs or bases: ignore
             continue
 
-        thisvisit.remove(curr)
         for p in pfunc(curr):
             if p == nullrev:
                 pass
@@ -206,7 +203,7 @@ 
                 # p is implicitly in thisvisit. This means p is or should be
                 # in bothvisit
                 revsvisit.discard(p)
-                basesvisit.discard(p)
+                basesvisit.add(p)
                 bothvisit.add(p)
             else:
                 # visit later