Submitter | Durham Goode |
---|---|
Date | Feb. 5, 2016, 7:48 p.m. |
Message ID | <6eca9fcd1a9cce2f07a2.1454701690@dev8486.prn1.facebook.com> |
Download | mbox | patch |
Permalink | /patch/13015/ |
State | Superseded |
Commit | d4247c306d82d6bfe22722c026f2c7675a0c33fa |
Headers | show |
Comments
> This patch optimizes it for the case of rebase. In a rebase, we are comparing a > commit against it's immediate parent, its
On Fri, Feb 05, 2016 at 11:48:10AM -0800, Durham Goode wrote: > # HG changeset patch > # User Durham Goode <durham@fb.com> > # Date 1454701571 28800 > # Fri Feb 05 11:46:11 2016 -0800 > # Node ID 6eca9fcd1a9cce2f07a212a9c0eaf05dacd082df > # Parent 01a5143cd25f285f8c745a92986cd7186bb32c90 > copies: optimize forward copy detection logic for rebases This looks great to me. I've spent a few minutes trying to puzzle out a case where it would be wrong, and I can't. As such: queued! > > Forward copy detection (i.e. detecting what files have been moved/copied in > commit X since ancestor Y) previously required diff'ing the manifests of both X > and Y. This was expensive since it required reading both entire manifests and > doing a set difference (they weren't already in a set because of the > lazymanifest work). This cost almost 1 second on very large repositories, and > happens N times for a rebase of N commits. > > This patch optimizes it for the case of rebase. In a rebase, we are comparing a > commit against it's immediate parent, and therefore we can know what files > changed by looking at ctx.files(). This let's us drastically decrease the size > of the set comparison, and makes it O(# of changes) instead of O(size of > manifest). This makes it take 1ms instead of 1000ms. > > diff --git a/mercurial/copies.py b/mercurial/copies.py > --- a/mercurial/copies.py > +++ b/mercurial/copies.py > @@ -11,6 +11,7 @@ import heapq > > from . import ( > pathutil, > + scmutil, > util, > ) > > @@ -175,7 +176,16 @@ def _forwardcopies(a, b, match=None): > # we currently don't try to find where old files went, too expensive > # this means we can miss a case like 'hg rm b; hg cp a b' > cm = {} > - missing = _computeforwardmissing(a, b, match=match) > + > + # Computing the forward missing is quite expensive on large manifests, since > + # it compares the entire manifests. We can optimize it in the common use > + # case of computing what copies are in a commit versus its parent (like > + # during a rebase or histedit). > + forwardmissingmatch = match > + if not match and b.p1() == a: > + forwardmissingmatch = scmutil.matchfiles(a._repo, b.files()) > + missing = _computeforwardmissing(a, b, match=forwardmissingmatch) > + > ancestrycontext = a._repo.changelog.ancestors([b.rev()], inclusive=True) > for f in missing: > fctx = b[f] > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
On Fri, Feb 5, 2016 at 12:51 PM, Augie Fackler <raf@durin42.com> wrote: > On Fri, Feb 05, 2016 at 11:48:10AM -0800, Durham Goode wrote: >> # HG changeset patch >> # User Durham Goode <durham@fb.com> >> # Date 1454701571 28800 >> # Fri Feb 05 11:46:11 2016 -0800 >> # Node ID 6eca9fcd1a9cce2f07a212a9c0eaf05dacd082df >> # Parent 01a5143cd25f285f8c745a92986cd7186bb32c90 >> copies: optimize forward copy detection logic for rebases > > This looks great to me. I've spent a few minutes trying to puzzle out > a case where it would be wrong, and I can't. As such: queued! How about when b is a merge?
On Fri, Feb 05, 2016 at 12:53:17PM -0800, Martin von Zweigbergk wrote: > On Fri, Feb 5, 2016 at 12:51 PM, Augie Fackler <raf@durin42.com> wrote: > > On Fri, Feb 05, 2016 at 11:48:10AM -0800, Durham Goode wrote: > >> # HG changeset patch > >> # User Durham Goode <durham@fb.com> > >> # Date 1454701571 28800 > >> # Fri Feb 05 11:46:11 2016 -0800 > >> # Node ID 6eca9fcd1a9cce2f07a212a9c0eaf05dacd082df > >> # Parent 01a5143cd25f285f8c745a92986cd7186bb32c90 > >> copies: optimize forward copy detection logic for rebases > > > > This looks great to me. I've spent a few minutes trying to puzzle out > > a case where it would be wrong, and I can't. As such: queued! > > How about when b is a merge? I thought about that, but if a file was newly copied in the merge, wouldn't we have previously walked over a revision in which it was copied, or else it'd be in the changed files list in the merge? (I could be wrong, the reasoning here is somewhat subtle.)
On 2/5/16 1:00 PM, Augie Fackler wrote: > On Fri, Feb 05, 2016 at 12:53:17PM -0800, Martin von Zweigbergk wrote: >> On Fri, Feb 5, 2016 at 12:51 PM, Augie Fackler <raf@durin42.com> wrote: >>> On Fri, Feb 05, 2016 at 11:48:10AM -0800, Durham Goode wrote: >>>> # HG changeset patch >>>> # User Durham Goode <durham@fb.com> >>>> # Date 1454701571 28800 >>>> # Fri Feb 05 11:46:11 2016 -0800 >>>> # Node ID 6eca9fcd1a9cce2f07a212a9c0eaf05dacd082df >>>> # Parent 01a5143cd25f285f8c745a92986cd7186bb32c90 >>>> copies: optimize forward copy detection logic for rebases >>> This looks great to me. I've spent a few minutes trying to puzzle out >>> a case where it would be wrong, and I can't. As such: queued! >> How about when b is a merge? > I thought about that, but if a file was newly copied in the merge, > wouldn't we have previously walked over a revision in which it was > copied, or else it'd be in the changed files list in the merge? > > (I could be wrong, the reasoning here is somewhat subtle.) Hmm, I think it will be a problem with a merge. In a merge, I believe the ctx.files() only returns files that are different from both p1 and p2. But in this case we care about files that are different from p1 (B), so files that are from p2 but are different from p1, won't get returned. Which is a behavior change. I can resend with excluding merge commits.
Patch
diff --git a/mercurial/copies.py b/mercurial/copies.py --- a/mercurial/copies.py +++ b/mercurial/copies.py @@ -11,6 +11,7 @@ import heapq from . import ( pathutil, + scmutil, util, ) @@ -175,7 +176,16 @@ def _forwardcopies(a, b, match=None): # we currently don't try to find where old files went, too expensive # this means we can miss a case like 'hg rm b; hg cp a b' cm = {} - missing = _computeforwardmissing(a, b, match=match) + + # Computing the forward missing is quite expensive on large manifests, since + # it compares the entire manifests. We can optimize it in the common use + # case of computing what copies are in a commit versus its parent (like + # during a rebase or histedit). + forwardmissingmatch = match + if not match and b.p1() == a: + forwardmissingmatch = scmutil.matchfiles(a._repo, b.files()) + missing = _computeforwardmissing(a, b, match=forwardmissingmatch) + ancestrycontext = a._repo.changelog.ancestors([b.rev()], inclusive=True) for f in missing: fctx = b[f]