Submitter | Mads Kiilerich |
---|---|
Date | Dec. 11, 2014, 8:27 p.m. |
Message ID | <0fcb69be20206cfe1582.1418329628@ssl.google-analytics.com> |
Download | mbox | patch |
Permalink | /patch/7054/ |
State | Changes Requested |
Delegated to: | Pierre-Yves David |
Headers | show |
Comments
On Thu, 2014-12-11 at 21:27 +0100, Mads Kiilerich wrote: > # HG changeset patch > # User Mads Kiilerich <madski@unity3d.com> > # Date 1418329283 -3600 > # Thu Dec 11 21:21:23 2014 +0100 > # Branch stable > # Node ID 0fcb69be20206cfe1582244afd93185f2c056e91 > # Parent 837f7c70551db57aef3a80dac4b50129bb647529 > copies: drop _tracefile limit when finding copy sources in actx manifest FYI, I'm sitting on this until I have several hours to think about it, as this will be a pretty big performance problem for some people.
On 12/15/2014 11:26 PM, Matt Mackall wrote: > On Thu, 2014-12-11 at 21:27 +0100, Mads Kiilerich wrote: >> # HG changeset patch >> # User Mads Kiilerich <madski@unity3d.com> >> # Date 1418329283 -3600 >> # Thu Dec 11 21:21:23 2014 +0100 >> # Branch stable >> # Node ID 0fcb69be20206cfe1582244afd93185f2c056e91 >> # Parent 837f7c70551db57aef3a80dac4b50129bb647529 >> copies: drop _tracefile limit when finding copy sources in actx manifest > FYI, I'm sitting on this until I have several hours to think about it, > as this will be a pretty big performance problem for some people. Yes, I figure that would be the potential worst case for a change like this. But as I tried to argue, I think it only will be a problem if a fctx with a very long ancestor chain is merged into a very short diff range ... and the long chain will only be wrong if a file that lived in the past but is removed in actx somehow is restored in the diff range so the fctx iteration will visit actx ancestors. The question is how big the performance problem will be and for how many people ... and whether it is necessary / worth it for correctness. I guess a similar scenario can happen with different order in changelog and filelog and thus not just can be filed under "we know linkrev sometimes cause malfunction" but also must be filed under "consequences of history modification". /Mads
On 12/16/2014 01:57 AM, Mads Kiilerich wrote: > On 12/15/2014 11:26 PM, Matt Mackall wrote: >> On Thu, 2014-12-11 at 21:27 +0100, Mads Kiilerich wrote: >>> # HG changeset patch >>> # User Mads Kiilerich <madski@unity3d.com> >>> # Date 1418329283 -3600 >>> # Thu Dec 11 21:21:23 2014 +0100 >>> # Branch stable >>> # Node ID 0fcb69be20206cfe1582244afd93185f2c056e91 >>> # Parent 837f7c70551db57aef3a80dac4b50129bb647529 >>> copies: drop _tracefile limit when finding copy sources in actx >>> manifest >> FYI, I'm sitting on this until I have several hours to think about it, >> as this will be a pretty big performance problem for some people. > > Yes, I figure that would be the potential worst case for a change like > this. > > But as I tried to argue, I think it only will be a problem if a fctx > with a very long ancestor chain is merged into a very short diff range > ... and the long chain will only be wrong if a file that lived in the > past but is removed in actx somehow is restored in the diff range so > the fctx iteration will visit actx ancestors. The question is how big > the performance problem will be and for how many people ... and > whether it is necessary / worth it for correctness. > > I guess a similar scenario can happen with different order in > changelog and filelog and thus not just can be filed under "we know > linkrev sometimes cause malfunction" but also must be filed under > "consequences of history modification". > > /Mads Second thoughts: The most common worst case must be when a file with a long history has been deleted but is restored again by a merge and you do a diff from a revision where it is missing to a revision where it is present again. That would usually be after a "deleted/changed" conflict. In that case it is "obvious" and well-defined that we shouldn't be searching for actx files in actx ancestors. However, right now, the limit doesn't work reliably because linkrev aliasing and I still think it would be better to not use any limit. I still think the impact will be quite limited. /Mads
On Thu, 2014-12-11 at 21:27 +0100, Mads Kiilerich wrote: > # HG changeset patch > # User Mads Kiilerich <madski@unity3d.com> > # Date 1418329283 -3600 > # Thu Dec 11 21:21:23 2014 +0100 > # Branch stable > # Node ID 0fcb69be20206cfe1582244afd93185f2c056e91 > # Parent 837f7c70551db57aef3a80dac4b50129bb647529 > copies: drop _tracefile limit when finding copy sources in actx manifest Pierre-Yves tells me he's hot on the trail for a fix for this (but is also sick so hasn't finished his patches). Going to drop this for now.
On 12/29/2014 02:26 PM, Matt Mackall wrote: > On Thu, 2014-12-11 at 21:27 +0100, Mads Kiilerich wrote: >> # HG changeset patch >> # User Mads Kiilerich <madski@unity3d.com> >> # Date 1418329283 -3600 >> # Thu Dec 11 21:21:23 2014 +0100 >> # Branch stable >> # Node ID 0fcb69be20206cfe1582244afd93185f2c056e91 >> # Parent 837f7c70551db57aef3a80dac4b50129bb647529 >> copies: drop _tracefile limit when finding copy sources in actx manifest > > Pierre-Yves tells me he's hot on the trail for a fix for this (but is > also sick so hasn't finished his patches). Going to drop this for now. So looking closer to the test case in the first patch of this series, I discovered that the merge itself was not the source of the issue. And that good old linkrev-shadowing bug was making the rename detection fails. Re-using some of the idea Matt brought up in december, I poke at solving this kind of linkrev-induced misbehavior. I think I come with a performance acceptable solution that have been patchbombed on the list. One of the patch contains the reduced version of your original test case. Check the addition to http://www.selenic.com/pipermail/mercurial-devel/2014-December/065080.html at the very end of this email: http://www.selenic.com/pipermail/mercurial-devel/2014-December/065080.html However, showing that linkrev have bug does not prove that rename + merges is not buggy. So I tested various cases that looked pathological to me and could not get anything failing in a way similar to want I understand of this series. The short version is that any merge that involved file rename on one side will result in the creation of a new file revision duplication the rename information. So throwing more merges in the mixes did not made the situation more complicated. I can excavate my tests if someone is interested. In the process I found a merge + rename related bugs, but it was actually already fixed: http://bz.selenic.com/show_bug.cgi?id=3908
On 12/30/2014 11:32 PM, Pierre-Yves David wrote: > > > On 12/29/2014 02:26 PM, Matt Mackall wrote: >> On Thu, 2014-12-11 at 21:27 +0100, Mads Kiilerich wrote: >>> # HG changeset patch >>> # User Mads Kiilerich <madski@unity3d.com> >>> # Date 1418329283 -3600 >>> # Thu Dec 11 21:21:23 2014 +0100 >>> # Branch stable >>> # Node ID 0fcb69be20206cfe1582244afd93185f2c056e91 >>> # Parent 837f7c70551db57aef3a80dac4b50129bb647529 >>> copies: drop _tracefile limit when finding copy sources in actx >>> manifest >> >> Pierre-Yves tells me he's hot on the trail for a fix for this (but is >> also sick so hasn't finished his patches). Going to drop this for now. > > So looking closer to the test case in the first patch of this series, > I discovered that the merge itself was not the source of the issue. > And that good old linkrev-shadowing bug was making the rename > detection fails. Correct. The patch with the test do not try to make a (possibly wrong or at least debatable) analysis of the problem. It just adds a test case for something that we don't have any other test coverage for. The commit message do give a hint that linkrev should be blamed for problem. > Re-using some of the idea Matt brought up in december, I poke at > solving this kind of linkrev-induced misbehavior. I think I come with > a performance acceptable solution that have been patchbombed on the list. Yes, fixing linkrev would also make the test case work. A linkrev fix is however not suitable for stable. I think the fix I propose here is suitable for stable, even if it will be replaced by another fix later. Long term, with linkrev fixed, I still think we want another kind of limit than the one I propose to remove here. Using a revision number for pruning will in some situations be worthless, depending on which topological sorting the repo has. If we want the pruning, we can do better. (I'm still not entirely convinced that the pruning is worth it, considered the complexity and cost of it and how often it will make any significant difference.) Starting a slightly different place: Looking at copies.py from the point of view of how "common ancestor heads" are used and considering how it will work if we have multiple such ancestors, I think it would be better if we kept the "path" to the ancestor instead of repeatedly iterating the DAG while searching for the same ancestor. Similarly, if we want pruning in _tracefile, we should keep the "::fctx-::actx" set and make sure we don't iterate the filelog DAG outside that set. That could give a pruning that was independent of the topological sorting and it would often be more efficient than just using a revision number. So: Also long term, with linkrev fixed, I don't think we want the old _tracefile limit and this change is still relevant. /Mads
On 01/01/2015 01:31 PM, Mads Kiilerich wrote: > On 12/30/2014 11:32 PM, Pierre-Yves David wrote: >> >> >> On 12/29/2014 02:26 PM, Matt Mackall wrote: >>> On Thu, 2014-12-11 at 21:27 +0100, Mads Kiilerich wrote: >>>> # HG changeset patch >>>> # User Mads Kiilerich <madski@unity3d.com> >>>> # Date 1418329283 -3600 >>>> # Thu Dec 11 21:21:23 2014 +0100 >>>> # Branch stable >>>> # Node ID 0fcb69be20206cfe1582244afd93185f2c056e91 >>>> # Parent 837f7c70551db57aef3a80dac4b50129bb647529 >>>> copies: drop _tracefile limit when finding copy sources in actx >>>> manifest >>> >>> Pierre-Yves tells me he's hot on the trail for a fix for this (but is >>> also sick so hasn't finished his patches). Going to drop this for now. >> >> So looking closer to the test case in the first patch of this series, >> I discovered that the merge itself was not the source of the issue. >> And that good old linkrev-shadowing bug was making the rename >> detection fails. > > Correct. The patch with the test do not try to make a (possibly wrong or > at least debatable) analysis of the problem. It just adds a test case > for something that we don't have any other test coverage for. The commit > message do give a hint that linkrev should be blamed for problem. > >> Re-using some of the idea Matt brought up in december, I poke at >> solving this kind of linkrev-induced misbehavior. I think I come with >> a performance acceptable solution that have been patchbombed on the list. > > Yes, fixing linkrev would also make the test case work. A linkrev fix is > however not suitable for stable. I think the fix I propose here is > suitable for stable, even if it will be replaced by another fix later. There is not stable release scheduled until the next feature release. And this feature release contains the linkrev fixes. > Long term, with linkrev fixed, I still think we want another kind of > limit than the one I propose to remove here. Using a revision number for > pruning will in some situations be worthless, depending on which > topological sorting the repo has. If we want the pruning, we can do > better. (I'm still not entirely convinced that the pruning is worth it, > considered the complexity and cost of it and how often it will make any > significant difference.) > > Starting a slightly different place: Looking at copies.py from the point > of view of how "common ancestor heads" are used and considering how it > will work if we have multiple such ancestors, I think it would be better > if we kept the "path" to the ancestor instead of repeatedly iterating > the DAG while searching for the same ancestor. > > Similarly, if we want pruning in _tracefile, we should keep the > "::fctx-::actx" set and make sure we don't iterate the filelog DAG > outside that set. That could give a pruning that was independent of the > topological sorting and it would often be more efficient than just using > a revision number. > > So: Also long term, with linkrev fixed, I don't think we want the old > _tracefile limit and this change is still relevant. I think you are right when saying the limit could be improved. We can probably save some ancestors computation there. I disagree about dropping the limit. negative result for rename would full tree traversals and the resulting complexity would be much worth than the current one. But I may get something wrong here.
Patch
diff --git a/mercurial/copies.py b/mercurial/copies.py --- a/mercurial/copies.py +++ b/mercurial/copies.py @@ -126,15 +126,13 @@ return t -def _tracefile(fctx, am, limit=-1): +def _tracefile(fctx, am): '''return file context that is the ancestor of fctx present in ancestor - manifest am, stopping after the first ancestor lower than limit''' + manifest am''' for f in fctx.ancestors(): if am.get(f.path(), None) == f.filenode(): return f - if f.rev() < limit: - return None def _dirstatecopies(d): ds = d._repo.dirstate @@ -156,11 +154,6 @@ # short-circuit to avoid issues with merge states return _dirstatecopies(w) - # files might have to be traced back to the fctx parent of the last - # one-side-only changeset, but not further back than that - limit = _findlimit(a._repo, a.rev(), b.rev()) - if limit is None: - limit = -1 am = a.manifest() # find where new files came from @@ -171,7 +164,7 @@ missing.difference_update(a.manifest().iterkeys()) for f in missing: - ofctx = _tracefile(b[f], am, limit) + ofctx = _tracefile(b[f], am) if ofctx: cm[f] = ofctx.path() diff --git a/tests/test-mv-cp-st-diff.t b/tests/test-mv-cp-st-diff.t --- a/tests/test-mv-cp-st-diff.t +++ b/tests/test-mv-cp-st-diff.t @@ -1649,11 +1649,10 @@ o 0 empty f $ hg diff --git -r dev-start -r dev - diff --git a/f b/f - deleted file mode 100644 - diff --git a/renamed b/renamed - new file mode 100644 - --- /dev/null + diff --git a/f b/renamed + rename from f + rename to renamed + --- a/f +++ b/renamed @@ -0,0 +1,1 @@ +change