Submitter | Martin von Zweigbergk |
---|---|
Date | Jan. 11, 2016, 5:47 a.m. |
Message ID | <ab1516b36e44bfb32ed6.1452491240@waste.org> |
Download | mbox | patch |
Permalink | /patch/12655/ |
State | Accepted |
Headers | show |
Comments
On Sun, Jan 10, 2016 at 11:47:20PM -0600, Martin von Zweigbergk wrote: > # HG changeset patch > # User Martin von Zweigbergk <martinvonz@google.com> > # Date 1451931209 28800 > # Mon Jan 04 10:13:29 2016 -0800 > # Node ID ab1516b36e44bfb32ed6f354a9dae2f2f6300add > # Parent 4ea59d0f00be7bc44f1968c073a44274d67fa4f3 > status: change + back out == clean (API) Queued these, thanks. > > After backing out a change, so the file contents is equal to a > previous revision of itself, we currently report the status between > the two equal revisions as modified. This is because > context._buildstatus() reports any file whose new nodeid is not equal > to _newnode as modified. That magic nodeid is given only to files > added or modified in the working directory, so any file whose nodeid > has changed between two revisions will be reported as modified. > > Fix by simply comparing the file contents for all cases where the > nodeid changed, whether they are in the working copy or committed. > > Marking with (API) as it subtly changes the semantics of the method. > > diff --git a/mercurial/context.py b/mercurial/context.py > --- a/mercurial/context.py > +++ b/mercurial/context.py > @@ -140,15 +140,9 @@ > added.append(fn) > elif node2 is None: > removed.append(fn) > - elif node2 != _newnode: > - # The file was not a new file in mf2, so an entry > - # from diff is really a difference. > - modified.append(fn) > elif flag1 != flag2: > modified.append(fn) > elif self[fn].cmp(other[fn]): > - # node2 was newnode, but the working file doesn't > - # match the one in mf1. > modified.append(fn) > else: > clean.append(fn) > diff --git a/tests/test-status.t b/tests/test-status.t > --- a/tests/test-status.t > +++ b/tests/test-status.t > @@ -388,6 +388,14 @@ > > #endif > > +reverted and commit change should appear clean > + > + $ hg revert -r 0 . > + reverting file > + $ hg ci -m a > + $ hg status -A --rev 0 --rev 2 > + C file > + > $ cd .. > > hg status of binary file starting with '\1\n', a separator for metadata: > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@selenic.com > https://selenic.com/mailman/listinfo/mercurial-devel
On Sun, Jan 10, 2016 at 9:51 PM Martin von Zweigbergk <martinvonz@google.com> wrote: > # HG changeset patch > # User Martin von Zweigbergk <martinvonz@google.com> > # Date 1451931209 28800 > # Mon Jan 04 10:13:29 2016 -0800 > # Node ID ab1516b36e44bfb32ed6f354a9dae2f2f6300add > # Parent 4ea59d0f00be7bc44f1968c073a44274d67fa4f3 > status: change + back out == clean (API) > > After backing out a change, so the file contents is equal to a > previous revision of itself, we currently report the status between > the two equal revisions as modified. This is because > context._buildstatus() reports any file whose new nodeid is not equal > to _newnode as modified. That magic nodeid is given only to files > added or modified in the working directory, so any file whose nodeid > has changed between two revisions will be reported as modified. > > Fix by simply comparing the file contents for all cases where the > nodeid changed, whether they are in the working copy or committed. > I didn't check the performance impact of this until now, sorry :-( Since we used to assume that different nodeid implied different contents (and thus reported some clean files as modified), we avoided a lot of content comparisons. Bad cases like "hg st --rev .~1000 --rev ." on the mozilla repo (>7k files) goes from <1s to >30s. Thoughts? Back out this change and accept that "hg status" may report some false modifications?
On Tue, Jan 12, 2016 at 10:44 AM Martin von Zweigbergk < martinvonz@google.com> wrote: > On Sun, Jan 10, 2016 at 9:51 PM Martin von Zweigbergk < > martinvonz@google.com> wrote: > >> # HG changeset patch >> # User Martin von Zweigbergk <martinvonz@google.com> >> # Date 1451931209 28800 >> # Mon Jan 04 10:13:29 2016 -0800 >> # Node ID ab1516b36e44bfb32ed6f354a9dae2f2f6300add >> # Parent 4ea59d0f00be7bc44f1968c073a44274d67fa4f3 >> status: change + back out == clean (API) >> >> After backing out a change, so the file contents is equal to a >> previous revision of itself, we currently report the status between >> the two equal revisions as modified. This is because >> context._buildstatus() reports any file whose new nodeid is not equal >> to _newnode as modified. That magic nodeid is given only to files >> added or modified in the working directory, so any file whose nodeid >> has changed between two revisions will be reported as modified. >> >> Fix by simply comparing the file contents for all cases where the >> nodeid changed, whether they are in the working copy or committed. >> > > I didn't check the performance impact of this until now, sorry :-( Since > we used to assume that different nodeid implied different contents (and > thus reported some clean files as modified), we avoided a lot of content > comparisons. Bad cases like "hg st --rev .~1000 --rev ." on the mozilla > repo (>7k files) goes from <1s to >30s. > I should have said that that is with "somewhat cold disk cache" (I'm not allowed to clear caches on my machine even as sudo). With warm disk, it goes from 0.4s to 1.4s. > > Thoughts? Back out this change and accept that "hg status" may report some > false modifications? >
> On Jan 12, 2016, at 13:44, Martin von Zweigbergk <martinvonz@google.com> wrote: > > > On Sun, Jan 10, 2016 at 9:51 PM Martin von Zweigbergk <martinvonz@google.com <mailto:martinvonz@google.com>> wrote: > # HG changeset patch > # User Martin von Zweigbergk <martinvonz@google.com <mailto:martinvonz@google.com>> > # Date 1451931209 28800 > # Mon Jan 04 10:13:29 2016 -0800 > # Node ID ab1516b36e44bfb32ed6f354a9dae2f2f6300add > # Parent 4ea59d0f00be7bc44f1968c073a44274d67fa4f3 > status: change + back out == clean (API) > > After backing out a change, so the file contents is equal to a > previous revision of itself, we currently report the status between > the two equal revisions as modified. This is because > context._buildstatus() reports any file whose new nodeid is not equal > to _newnode as modified. That magic nodeid is given only to files > added or modified in the working directory, so any file whose nodeid > has changed between two revisions will be reported as modified. > > Fix by simply comparing the file contents for all cases where the > nodeid changed, whether they are in the working copy or committed. > > I didn't check the performance impact of this until now, sorry :-( Since we used to assume that different nodeid implied different contents (and thus reported some clean files as modified), we avoided a lot of content comparisons. Bad cases like "hg st --rev .~1000 --rev ." on the mozilla repo (>7k files) goes from <1s to >30s. > > Thoughts? Back out this change and accept that "hg status" may report some false modifications? I think that's probably the least horrible result sadly. > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@selenic.com <mailto:Mercurial-devel@selenic.com> > https://selenic.com/mailman/listinfo/mercurial-devel <https://selenic.com/mailman/listinfo/mercurial-devel>
On Tue, Jan 12, 2016 at 11:39 AM Augie Fackler <raf@durin42.com> wrote: > On Jan 12, 2016, at 13:44, Martin von Zweigbergk <martinvonz@google.com> > wrote: > > > On Sun, Jan 10, 2016 at 9:51 PM Martin von Zweigbergk < > martinvonz@google.com> wrote: > >> # HG changeset patch >> # User Martin von Zweigbergk <martinvonz@google.com> >> # Date 1451931209 28800 >> # Mon Jan 04 10:13:29 2016 -0800 >> # Node ID ab1516b36e44bfb32ed6f354a9dae2f2f6300add >> # Parent 4ea59d0f00be7bc44f1968c073a44274d67fa4f3 >> status: change + back out == clean (API) >> >> After backing out a change, so the file contents is equal to a >> previous revision of itself, we currently report the status between >> the two equal revisions as modified. This is because >> context._buildstatus() reports any file whose new nodeid is not equal >> to _newnode as modified. That magic nodeid is given only to files >> added or modified in the working directory, so any file whose nodeid >> has changed between two revisions will be reported as modified. >> >> Fix by simply comparing the file contents for all cases where the >> nodeid changed, whether they are in the working copy or committed. >> > > I didn't check the performance impact of this until now, sorry :-( Since > we used to assume that different nodeid implied different contents (and > thus reported some clean files as modified), we avoided a lot of content > comparisons. Bad cases like "hg st --rev .~1000 --rev ." on the mozilla > repo (>7k files) goes from <1s to >30s. > > Thoughts? Back out this change and accept that "hg status" may report some > false modifications? > > > I think that's probably the least horrible result sadly. > I'll send a patch or two soon. >
On 1/12/16 12:55 PM, Martin von Zweigbergk wrote: > > On Tue, Jan 12, 2016 at 11:39 AM Augie Fackler <raf@durin42.com > <mailto:raf@durin42.com>> wrote: > >> On Jan 12, 2016, at 13:44, Martin von Zweigbergk >> <martinvonz@google.com <mailto:martinvonz@google.com>> wrote: >> >> >> On Sun, Jan 10, 2016 at 9:51 PM Martin von Zweigbergk >> <martinvonz@google.com <mailto:martinvonz@google.com>> wrote: >> >> # HG changeset patch >> # User Martin von Zweigbergk <martinvonz@google.com >> <mailto:martinvonz@google.com>> >> # Date 1451931209 28800 >> # Mon Jan 04 10:13:29 2016 -0800 >> # Node ID ab1516b36e44bfb32ed6f354a9dae2f2f6300add >> # Parent 4ea59d0f00be7bc44f1968c073a44274d67fa4f3 >> status: change + back out == clean (API) >> >> After backing out a change, so the file contents is equal to a >> previous revision of itself, we currently report the status >> between >> the two equal revisions as modified. This is because >> context._buildstatus() reports any file whose new nodeid is >> not equal >> to _newnode as modified. That magic nodeid is given only to files >> added or modified in the working directory, so any file whose >> nodeid >> has changed between two revisions will be reported as modified. >> >> Fix by simply comparing the file contents for all cases where the >> nodeid changed, whether they are in the working copy or >> committed. >> >> >> I didn't check the performance impact of this until now, sorry >> :-( Since we used to assume that different nodeid implied >> different contents (and thus reported some clean files as >> modified), we avoided a lot of content comparisons. Bad cases >> like "hg st --rev .~1000 --rev ." on the mozilla repo (>7k files) >> goes from <1s to >30s. >> >> Thoughts? Back out this change and accept that "hg status" may >> report some false modifications? > > I think that's probably the least horrible result sadly. > > > I'll send a patch or two soon. The perf consequence of this change is a bit disappointing. It also affects remotefilelog repos, since previously we would be able to compute status based almost entirely on the manifest contents, but now we need the before and after file contents of every file. Was this actively breaking something before? I can change remotefilelog's implementation of filelog.cmp to return False if the nodes are different, but I'd rather not diverge from core like that if possible.
On Mon, Jan 25, 2016 at 1:54 PM Durham Goode <durham@fb.com> wrote: > > > On 1/12/16 12:55 PM, Martin von Zweigbergk wrote: > > > On Tue, Jan 12, 2016 at 11:39 AM Augie Fackler <raf@durin42.com> wrote: > >> On Jan 12, 2016, at 13:44, Martin von Zweigbergk <martinvonz@google.com> >> wrote: >> >> >> On Sun, Jan 10, 2016 at 9:51 PM Martin von Zweigbergk < >> martinvonz@google.com> wrote: >> >>> # HG changeset patch >>> # User Martin von Zweigbergk < <martinvonz@google.com> >>> martinvonz@google.com> >>> # Date 1451931209 28800 >>> # Mon Jan 04 10:13:29 2016 -0800 >>> # Node ID ab1516b36e44bfb32ed6f354a9dae2f2f6300add >>> # Parent 4ea59d0f00be7bc44f1968c073a44274d67fa4f3 >>> status: change + back out == clean (API) >>> >>> After backing out a change, so the file contents is equal to a >>> previous revision of itself, we currently report the status between >>> the two equal revisions as modified. This is because >>> context._buildstatus() reports any file whose new nodeid is not equal >>> to _newnode as modified. That magic nodeid is given only to files >>> added or modified in the working directory, so any file whose nodeid >>> has changed between two revisions will be reported as modified. >>> >>> Fix by simply comparing the file contents for all cases where the >>> nodeid changed, whether they are in the working copy or committed. >>> >> >> I didn't check the performance impact of this until now, sorry :-( Since >> we used to assume that different nodeid implied different contents (and >> thus reported some clean files as modified), we avoided a lot of content >> comparisons. Bad cases like "hg st --rev .~1000 --rev ." on the mozilla >> repo (>7k files) goes from <1s to >30s. >> >> Thoughts? Back out this change and accept that "hg status" may report >> some false modifications? >> >> >> I think that's probably the least horrible result sadly. >> > > I'll send a patch or two soon. > > > The perf consequence of this change is a bit disappointing. It also > affects remotefilelog repos, since previously we would be able to compute > status based almost entirely on the manifest contents, but now we need the > before and after file contents of every file. > > Was this actively breaking something before? I can change remotefilelog's > implementation of filelog.cmp to return False if the nodes are different, > but I'd rather not diverge from core like that if possible. > From reading your reply, it seems like you did not see that I backed out this change (54522bbe1597). Did you cut a release between this patch and the backout?
On 1/25/16 1:59 PM, Martin von Zweigbergk wrote: > > > On Mon, Jan 25, 2016 at 1:54 PM Durham Goode <durham@fb.com > <mailto:durham@fb.com>> wrote: > > > > On 1/12/16 12:55 PM, Martin von Zweigbergk wrote: >> >> On Tue, Jan 12, 2016 at 11:39 AM Augie Fackler <raf@durin42.com >> <mailto:raf@durin42.com>> wrote: >> >>> On Jan 12, 2016, at 13:44, Martin von Zweigbergk >>> <martinvonz@google.com <mailto:martinvonz@google.com>> wrote: >>> >>> >>> On Sun, Jan 10, 2016 at 9:51 PM Martin von Zweigbergk >>> <martinvonz@google.com <mailto:martinvonz@google.com>> wrote: >>> >>> # HG changeset patch >>> # User Martin von Zweigbergk <martinvonz@google.com >>> <mailto:martinvonz@google.com>> >>> # Date 1451931209 28800 >>> # Mon Jan 04 10:13:29 2016 -0800 >>> # Node ID ab1516b36e44bfb32ed6f354a9dae2f2f6300add >>> # Parent 4ea59d0f00be7bc44f1968c073a44274d67fa4f3 >>> status: change + back out == clean (API) >>> >>> After backing out a change, so the file contents is >>> equal to a >>> previous revision of itself, we currently report the >>> status between >>> the two equal revisions as modified. This is because >>> context._buildstatus() reports any file whose new nodeid >>> is not equal >>> to _newnode as modified. That magic nodeid is given only >>> to files >>> added or modified in the working directory, so any file >>> whose nodeid >>> has changed between two revisions will be reported as >>> modified. >>> >>> Fix by simply comparing the file contents for all cases >>> where the >>> nodeid changed, whether they are in the working copy or >>> committed. >>> >>> >>> I didn't check the performance impact of this until now, >>> sorry :-( Since we used to assume that different nodeid >>> implied different contents (and thus reported some clean >>> files as modified), we avoided a lot of content comparisons. >>> Bad cases like "hg st --rev .~1000 --rev ." on the mozilla >>> repo (>7k files) goes from <1s to >30s. >>> >>> Thoughts? Back out this change and accept that "hg status" >>> may report some false modifications? >> >> I think that's probably the least horrible result sadly. >> >> >> I'll send a patch or two soon. > > The perf consequence of this change is a bit disappointing. It > also affects remotefilelog repos, since previously we would be > able to compute status based almost entirely on the manifest > contents, but now we need the before and after file contents of > every file. > > Was this actively breaking something before? I can change > remotefilelog's implementation of filelog.cmp to return False if > the nodes are different, but I'd rather not diverge from core like > that if possible. > > > From reading your reply, it seems like you did not see that I backed > out this change (54522bbe1597). Did you cut a release between this > patch and the backout? Ah, ok. I misread your final message about sending a patch or two soon. Yes, we cut a branch and that's what I'm testing. I'll cherry-pick in the fix. Thanks for pointing that out.
Patch
diff --git a/mercurial/context.py b/mercurial/context.py --- a/mercurial/context.py +++ b/mercurial/context.py @@ -140,15 +140,9 @@ added.append(fn) elif node2 is None: removed.append(fn) - elif node2 != _newnode: - # The file was not a new file in mf2, so an entry - # from diff is really a difference. - modified.append(fn) elif flag1 != flag2: modified.append(fn) elif self[fn].cmp(other[fn]): - # node2 was newnode, but the working file doesn't - # match the one in mf1. modified.append(fn) else: clean.append(fn) diff --git a/tests/test-status.t b/tests/test-status.t --- a/tests/test-status.t +++ b/tests/test-status.t @@ -388,6 +388,14 @@ #endif +reverted and commit change should appear clean + + $ hg revert -r 0 . + reverting file + $ hg ci -m a + $ hg status -A --rev 0 --rev 2 + C file + $ cd .. hg status of binary file starting with '\1\n', a separator for metadata: