Patchwork [1,of,2] graft: find ancestors of destination lazily

login
register
mail settings
Submitter Siddharth Agarwal
Date April 7, 2013, 3:15 a.m.
Message ID <aff777b0929f7c26ffe9.1365304523@sid0x220>
Download mbox | patch
Permalink /patch/1257/
State Accepted
Commit 177774e4b04a88023278369f2aecd993640ea363
Headers show

Comments

Siddharth Agarwal - April 7, 2013, 3:15 a.m.
# HG changeset patch
# User Siddharth Agarwal <sid0@fb.com>
# Date 1365303917 25200
#      Sat Apr 06 20:05:17 2013 -0700
# Node ID aff777b0929f7c26ffe97f226389271dd3d353d4
# Parent  85707a2ede81f521041ae7c089e1cadc745a072d
graft: find ancestors of destination lazily

When the revisions to graft are numerically close to the destination, this
avoids one walk up the DAG, which for a repository with over 470,000
changesets translates to around 1.1 seconds.
Augie Fackler - April 7, 2013, 8:01 p.m.
On Apr 6, 2013, at 11:15 PM, Siddharth Agarwal <sid0@fb.com> wrote:

> +    ancestors = repo.changelog.ancestors([crev], inclusive=True)

Could this set be very large, and be worth set()ing before doing all those membership checks?
Siddharth Agarwal - April 8, 2013, 2:33 a.m.
On 04/07/2013 01:01 PM, Augie Fackler wrote:
>
> On Apr 6, 2013, at 11:15 PM, Siddharth Agarwal <sid0@fb.com 
> <mailto:sid0@fb.com>> wrote:
>
>> +    ancestors = repo.changelog.ancestors([crev], inclusive=True)
>
> Could this set be very large, and be worth set()ing before doing all 
> those membership checks?

The entire point is to not set() this :) This computes ancestors only as 
far as it needs to, based on the fact that rev nums define a topo order. 
For a roughly 10% worst case overhead this gets us a several orders of 
magnitude improvement in common cases. See the perf measurements in 
http://selenic.com/repo/hg/file/f7f8159caad3/mercurial/ancestor.py.
Augie Fackler - April 8, 2013, 1:32 p.m.
On Sun, Apr 7, 2013 at 10:33 PM, Siddharth Agarwal <sid0@fb.com> wrote:
>>> +    ancestors = repo.changelog.ancestors([crev], inclusive=True)
>>
>>
>> Could this set be very large, and be worth set()ing before doing all those
>> membership checks?
>
>
> The entire point is to not set() this :) This computes ancestors only as far
> as it needs to, based on the fact that rev nums define a topo order. For a
> roughly 10% worst case overhead this gets us a several orders of magnitude
> improvement in common cases. See the perf measurements in
> http://selenic.com/repo/hg/file/f7f8159caad3/mercurial/ancestor.py.


Ah, clever. That might be worth a follow-up with a comment, unless
y'all have some performance dashboard that'd catch an idiot like me
regressing this.
Siddharth Agarwal - April 8, 2013, 10:59 p.m.
On 04/08/2013 06:32 AM, Augie Fackler wrote:
> Ah, clever. That might be worth a follow-up with a comment, unless
> y'all have some performance dashboard that'd catch an idiot like me
> regressing this.

A regression here would be pretty easy to catch on our perf dashboard.

Patch

diff -r 85707a2ede81 -r aff777b0929f mercurial/commands.py
--- a/mercurial/commands.py	Thu Mar 28 00:30:40 2013 -0700
+++ b/mercurial/commands.py	Sat Apr 06 20:05:17 2013 -0700
@@ -2917,9 +2917,13 @@  def graft(ui, repo, *revs, **opts):
         return -1
 
     # check for ancestors of dest branch
-    for rev in repo.revs('::. and %ld', revs):
-        ui.warn(_('skipping ancestor revision %s\n') % rev)
-        revs.remove(rev)
+    crev = repo['.'].rev()
+    ancestors = repo.changelog.ancestors([crev], inclusive=True)
+    # don't mutate while iterating, create a copy
+    for rev in list(revs):
+        if rev in ancestors:
+            ui.warn(_('skipping ancestor revision %s\n') % rev)
+            revs.remove(rev)
     if not revs:
         return -1