Submitter | Durham Goode |
---|---|
Date | Feb. 5, 2016, 9:23 p.m. |
Message ID | <64726d7baee7e6a32780.1454707436@dev8486.prn1.facebook.com> |
Download | mbox | patch |
Permalink | /patch/13021/ |
State | Accepted |
Headers | show |
Comments
On Fri, Feb 05, 2016 at 01:23:56PM -0800, Durham Goode wrote: > # HG changeset patch > # User Durham Goode <durham@fb.com> > # Date 1454707404 28800 > # Fri Feb 05 13:23:24 2016 -0800 > # Node ID 64726d7baee7e6a32780a31e861cb5e9af8aefc6 > # Parent 01a5143cd25f285f8c745a92986cd7186bb32c90 > copies: optimize forward copy detection logic for rebases Dropped v1 and queued this one, many thanks! > > Forward copy detection (i.e. detecting what files have been moved/copied in > commit X since ancestor Y) previously required diff'ing the manifests of both X > and Y. This was expensive since it required reading both entire manifests and > doing a set difference (they weren't already in a set because of the > lazymanifest work). This cost almost 1 second on very large repositories, and > happens N times for a rebase of N commits. > > This patch optimizes it for the case of rebase. In a rebase, we are comparing a > commit against it's immediate parent, and therefore we can know what files > changed by looking at ctx.files(). This let's us drastically decrease the size > of the set comparison, and makes it O(# of changes) instead of O(size of > manifest). This makes it take 1ms instead of 1000ms. > > diff --git a/mercurial/copies.py b/mercurial/copies.py > --- a/mercurial/copies.py > +++ b/mercurial/copies.py > @@ -10,7 +10,9 @@ from __future__ import absolute_import > import heapq > > from . import ( > + node, > pathutil, > + scmutil, > util, > ) > > @@ -175,7 +177,18 @@ def _forwardcopies(a, b, match=None): > # we currently don't try to find where old files went, too expensive > # this means we can miss a case like 'hg rm b; hg cp a b' > cm = {} > - missing = _computeforwardmissing(a, b, match=match) > + > + # Computing the forward missing is quite expensive on large manifests, since > + # it compares the entire manifests. We can optimize it in the common use > + # case of computing what copies are in a commit versus its parent (like > + # during a rebase or histedit). Note, we exclude merge commits from this > + # optimization, since the ctx.files() for a merge commit is not correct for > + # this comparison. > + forwardmissingmatch = match > + if not match and b.p1() == a and b.p2().node() == node.nullid: > + forwardmissingmatch = scmutil.matchfiles(a._repo, b.files()) > + missing = _computeforwardmissing(a, b, match=forwardmissingmatch) > + > ancestrycontext = a._repo.changelog.ancestors([b.rev()], inclusive=True) > for f in missing: > fctx = b[f] > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Patch
diff --git a/mercurial/copies.py b/mercurial/copies.py --- a/mercurial/copies.py +++ b/mercurial/copies.py @@ -10,7 +10,9 @@ from __future__ import absolute_import import heapq from . import ( + node, pathutil, + scmutil, util, ) @@ -175,7 +177,18 @@ def _forwardcopies(a, b, match=None): # we currently don't try to find where old files went, too expensive # this means we can miss a case like 'hg rm b; hg cp a b' cm = {} - missing = _computeforwardmissing(a, b, match=match) + + # Computing the forward missing is quite expensive on large manifests, since + # it compares the entire manifests. We can optimize it in the common use + # case of computing what copies are in a commit versus its parent (like + # during a rebase or histedit). Note, we exclude merge commits from this + # optimization, since the ctx.files() for a merge commit is not correct for + # this comparison. + forwardmissingmatch = match + if not match and b.p1() == a and b.p2().node() == node.nullid: + forwardmissingmatch = scmutil.matchfiles(a._repo, b.files()) + missing = _computeforwardmissing(a, b, match=forwardmissingmatch) + ancestrycontext = a._repo.changelog.ancestors([b.rev()], inclusive=True) for f in missing: fctx = b[f]