Submitter | Martin von Zweigbergk |
---|---|
Date | Feb. 27, 2015, 10:46 p.m. |
Message ID | <b3bcf58446fdebf3672e.1425077211@martinvonz.mtv.corp.google.com> |
Download | mbox | patch |
Permalink | /patch/7855/ |
State | Accepted |
Commit | 61aadba2396e07b3a6f2b19a65d4d4f0b8a0024f |
Headers | show |
Comments
On 2/27/15, 2:46 PM, "Martin von Zweigbergk" <martinvonz@google.com> wrote: ># HG changeset patch ># User Martin von Zweigbergk <martinvonz@google.com> ># Date 1425074581 28800 ># Fri Feb 27 14:03:01 2015 -0800 ># Node ID b3bcf58446fdebf3672edbbc55c24509e549eb22 ># Parent 89f810fb00184d3a1dd49412d0c3256a596ddca8 >copies: calculate 'bothnew' from manifestdict.filesnotin() > >In the same spirit as the previous change, let's now calculate the >'bothnew' variable using manifestdict.filesnotin().5D > >diff -r 89f810fb0018 -r b3bcf58446fd mercurial/copies.py >--- a/mercurial/copies.py Fri Feb 27 14:02:30 2015 -0800 >+++ b/mercurial/copies.py Fri Feb 27 14:03:01 2015 -0800 >@@ -302,7 +302,9 @@ > else: > diverge2.update(fl) # reverse map for below > >- bothnew = sorted([d for d in m1 if d in m2 and d not in ma]) >+ addedinm1 = m1.filesnotin(ma) >+ addedinm2 = m2.filesnotin(ma) >+ bothnew = sorted(addedinm1 & addedinm2) I was concerned about perf here (since we¹re constructing sets when we used to not, and we¹re iterating over both m1 and m2 when we used to not), but I ran the numbers and the new stuff is actually faster. To diff million file manifests with 5k files different each, previously it was 0.36s and now it¹s 0.23. The script for posterity: #!/bin/env python import time fm1 = list(xrange(0, 990000)) fm2 = list(xrange(0, 990000)) fma = list(xrange(0, 990000)) fm1.extend(range(1000000, 1005000)) fm2.extend(range(1005000, 1010000)) fma.extend(range(1010000, 1015000)) m1 = set(fm1) m2 = set(fm2) ma = set(fma) start = time.time() old = sorted([d for d in fm1 if d in m2 and d not in ma]) print ("Old %s" % str(time.time() - start)) def doit(d1, d2): missing = set(d1) missing.difference_update(d2) return missing start = time.time() addedinm1 = doit(fm1, fma) addedinm2 = doit(fm2, fma) bothnew = sorted(addedinm1 & addedinm2) print ("New %s" % str(time.time() - start)) ~/local> ./testset.py Old 0.343271970749 New 0.231050014496
On Fri, Feb 27, 2015 at 5:08 PM Durham Goode <durham@fb.com> wrote: > > > On 2/27/15, 2:46 PM, "Martin von Zweigbergk" <martinvonz@google.com> > wrote: > > ># HG changeset patch > ># User Martin von Zweigbergk <martinvonz@google.com> > ># Date 1425074581 28800 > ># Fri Feb 27 14:03:01 2015 -0800 > ># Node ID b3bcf58446fdebf3672edbbc55c24509e549eb22 > ># Parent 89f810fb00184d3a1dd49412d0c3256a596ddca8 > >copies: calculate 'bothnew' from manifestdict.filesnotin() > > > >In the same spirit as the previous change, let's now calculate the > >'bothnew' variable using manifestdict.filesnotin().5D > > > >diff -r 89f810fb0018 -r b3bcf58446fd mercurial/copies.py > >--- a/mercurial/copies.py Fri Feb 27 14:02:30 2015 -0800 > >+++ b/mercurial/copies.py Fri Feb 27 14:03:01 2015 -0800 > >@@ -302,7 +302,9 @@ > > else: > > diverge2.update(fl) # reverse map for below > > > >- bothnew = sorted([d for d in m1 if d in m2 and d not in ma]) > >+ addedinm1 = m1.filesnotin(ma) > >+ addedinm2 = m2.filesnotin(ma) > >+ bothnew = sorted(addedinm1 & addedinm2) > > I was concerned about perf here (since we¹re constructing sets when we > used to not, and we¹re iterating over both m1 and m2 when we used to not), > but I ran the numbers and the new stuff is actually faster. To diff > million file manifests with 5k files different each, previously it was > 0.36s and now it¹s 0.23. > Great! And thanks for doing my job for me! :-) Even if it were a little slower, it should hopefully have been outweighed after 4/4. But even better that it's already after 3/4. Also, to whomever applies this, feel free to drop (or not) the "5D" at the end of the message.
Patch
diff -r 89f810fb0018 -r b3bcf58446fd mercurial/copies.py --- a/mercurial/copies.py Fri Feb 27 14:02:30 2015 -0800 +++ b/mercurial/copies.py Fri Feb 27 14:03:01 2015 -0800 @@ -302,7 +302,9 @@ else: diverge2.update(fl) # reverse map for below - bothnew = sorted([d for d in m1 if d in m2 and d not in ma]) + addedinm1 = m1.filesnotin(ma) + addedinm2 = m2.filesnotin(ma) + bothnew = sorted(addedinm1 & addedinm2) if bothnew: repo.ui.debug(" unmatched files new in both:\n %s\n" % "\n ".join(bothnew))