Patchwork [4,of,4] status: change + back out == clean (API)

login
register
mail settings
Submitter Martin von Zweigbergk
Date Jan. 11, 2016, 5:47 a.m.
Message ID <ab1516b36e44bfb32ed6.1452491240@waste.org>
Download mbox | patch
Permalink /patch/12655/
State Accepted
Headers show

Comments

Martin von Zweigbergk - Jan. 11, 2016, 5:47 a.m.
# HG changeset patch
# User Martin von Zweigbergk <martinvonz@google.com>
# Date 1451931209 28800
#      Mon Jan 04 10:13:29 2016 -0800
# Node ID ab1516b36e44bfb32ed6f354a9dae2f2f6300add
# Parent  4ea59d0f00be7bc44f1968c073a44274d67fa4f3
status: change + back out == clean (API)

After backing out a change, so the file contents is equal to a
previous revision of itself, we currently report the status between
the two equal revisions as modified. This is because
context._buildstatus() reports any file whose new nodeid is not equal
to _newnode as modified. That magic nodeid is given only to files
added or modified in the working directory, so any file whose nodeid
has changed between two revisions will be reported as modified.

Fix by simply comparing the file contents for all cases where the
nodeid changed, whether they are in the working copy or committed.

Marking with (API) as it subtly changes the semantics of the method.
Augie Fackler - Jan. 12, 2016, 3:18 a.m.
On Sun, Jan 10, 2016 at 11:47:20PM -0600, Martin von Zweigbergk wrote:
> # HG changeset patch
> # User Martin von Zweigbergk <martinvonz@google.com>
> # Date 1451931209 28800
> #      Mon Jan 04 10:13:29 2016 -0800
> # Node ID ab1516b36e44bfb32ed6f354a9dae2f2f6300add
> # Parent  4ea59d0f00be7bc44f1968c073a44274d67fa4f3
> status: change + back out == clean (API)

Queued these, thanks.

>
> After backing out a change, so the file contents is equal to a
> previous revision of itself, we currently report the status between
> the two equal revisions as modified. This is because
> context._buildstatus() reports any file whose new nodeid is not equal
> to _newnode as modified. That magic nodeid is given only to files
> added or modified in the working directory, so any file whose nodeid
> has changed between two revisions will be reported as modified.
>
> Fix by simply comparing the file contents for all cases where the
> nodeid changed, whether they are in the working copy or committed.
>
> Marking with (API) as it subtly changes the semantics of the method.
>
> diff --git a/mercurial/context.py b/mercurial/context.py
> --- a/mercurial/context.py
> +++ b/mercurial/context.py
> @@ -140,15 +140,9 @@
>                  added.append(fn)
>              elif node2 is None:
>                  removed.append(fn)
> -            elif node2 != _newnode:
> -                # The file was not a new file in mf2, so an entry
> -                # from diff is really a difference.
> -                modified.append(fn)
>              elif flag1 != flag2:
>                  modified.append(fn)
>              elif self[fn].cmp(other[fn]):
> -                # node2 was newnode, but the working file doesn't
> -                # match the one in mf1.
>                  modified.append(fn)
>              else:
>                  clean.append(fn)
> diff --git a/tests/test-status.t b/tests/test-status.t
> --- a/tests/test-status.t
> +++ b/tests/test-status.t
> @@ -388,6 +388,14 @@
>
>  #endif
>
> +reverted and commit change should appear clean
> +
> +  $ hg revert -r 0 .
> +  reverting file
> +  $ hg ci -m a
> +  $ hg status -A --rev 0 --rev 2
> +  C file
> +
>    $ cd ..
>
>  hg status of binary file starting with '\1\n', a separator for metadata:
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> https://selenic.com/mailman/listinfo/mercurial-devel
Martin von Zweigbergk - Jan. 12, 2016, 6:44 p.m.
On Sun, Jan 10, 2016 at 9:51 PM Martin von Zweigbergk <martinvonz@google.com>
wrote:

> # HG changeset patch
> # User Martin von Zweigbergk <martinvonz@google.com>
> # Date 1451931209 28800
> #      Mon Jan 04 10:13:29 2016 -0800
> # Node ID ab1516b36e44bfb32ed6f354a9dae2f2f6300add
> # Parent  4ea59d0f00be7bc44f1968c073a44274d67fa4f3
> status: change + back out == clean (API)
>
> After backing out a change, so the file contents is equal to a
> previous revision of itself, we currently report the status between
> the two equal revisions as modified. This is because
> context._buildstatus() reports any file whose new nodeid is not equal
> to _newnode as modified. That magic nodeid is given only to files
> added or modified in the working directory, so any file whose nodeid
> has changed between two revisions will be reported as modified.
>
> Fix by simply comparing the file contents for all cases where the
> nodeid changed, whether they are in the working copy or committed.
>

I didn't check the performance impact of this until now, sorry :-( Since we
used to assume that different nodeid implied different contents (and thus
reported some clean files as modified), we avoided a lot of content
comparisons. Bad cases like "hg st --rev .~1000 --rev ." on the mozilla
repo (>7k files) goes from <1s to >30s.

Thoughts? Back out this change and accept that "hg status" may report some
false modifications?
Martin von Zweigbergk - Jan. 12, 2016, 6:46 p.m.
On Tue, Jan 12, 2016 at 10:44 AM Martin von Zweigbergk <
martinvonz@google.com> wrote:

> On Sun, Jan 10, 2016 at 9:51 PM Martin von Zweigbergk <
> martinvonz@google.com> wrote:
>
>> # HG changeset patch
>> # User Martin von Zweigbergk <martinvonz@google.com>
>> # Date 1451931209 28800
>> #      Mon Jan 04 10:13:29 2016 -0800
>> # Node ID ab1516b36e44bfb32ed6f354a9dae2f2f6300add
>> # Parent  4ea59d0f00be7bc44f1968c073a44274d67fa4f3
>> status: change + back out == clean (API)
>>
>> After backing out a change, so the file contents is equal to a
>> previous revision of itself, we currently report the status between
>> the two equal revisions as modified. This is because
>> context._buildstatus() reports any file whose new nodeid is not equal
>> to _newnode as modified. That magic nodeid is given only to files
>> added or modified in the working directory, so any file whose nodeid
>> has changed between two revisions will be reported as modified.
>>
>> Fix by simply comparing the file contents for all cases where the
>> nodeid changed, whether they are in the working copy or committed.
>>
>
> I didn't check the performance impact of this until now, sorry :-( Since
> we used to assume that different nodeid implied different contents (and
> thus reported some clean files as modified), we avoided a lot of content
> comparisons. Bad cases like "hg st --rev .~1000 --rev ." on the mozilla
> repo (>7k files) goes from <1s to >30s.
>

I should have said that that is with "somewhat cold disk cache" (I'm not
allowed to clear caches on my machine even as sudo). With warm disk, it
goes from 0.4s to 1.4s.


>
> Thoughts? Back out this change and accept that "hg status" may report some
> false modifications?
>
Augie Fackler - Jan. 12, 2016, 7:39 p.m.
> On Jan 12, 2016, at 13:44, Martin von Zweigbergk <martinvonz@google.com> wrote:
> 
> 
> On Sun, Jan 10, 2016 at 9:51 PM Martin von Zweigbergk <martinvonz@google.com <mailto:martinvonz@google.com>> wrote:
> # HG changeset patch
> # User Martin von Zweigbergk <martinvonz@google.com <mailto:martinvonz@google.com>>
> # Date 1451931209 28800
> #      Mon Jan 04 10:13:29 2016 -0800
> # Node ID ab1516b36e44bfb32ed6f354a9dae2f2f6300add
> # Parent  4ea59d0f00be7bc44f1968c073a44274d67fa4f3
> status: change + back out == clean (API)
> 
> After backing out a change, so the file contents is equal to a
> previous revision of itself, we currently report the status between
> the two equal revisions as modified. This is because
> context._buildstatus() reports any file whose new nodeid is not equal
> to _newnode as modified. That magic nodeid is given only to files
> added or modified in the working directory, so any file whose nodeid
> has changed between two revisions will be reported as modified.
> 
> Fix by simply comparing the file contents for all cases where the
> nodeid changed, whether they are in the working copy or committed.
> 
> I didn't check the performance impact of this until now, sorry :-( Since we used to assume that different nodeid implied different contents (and thus reported some clean files as modified), we avoided a lot of content comparisons. Bad cases like "hg st --rev .~1000 --rev ." on the mozilla repo (>7k files) goes from <1s to >30s.
> 
> Thoughts? Back out this change and accept that "hg status" may report some false modifications?

I think that's probably the least horrible result sadly.

> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com <mailto:Mercurial-devel@selenic.com>
> https://selenic.com/mailman/listinfo/mercurial-devel <https://selenic.com/mailman/listinfo/mercurial-devel>
Martin von Zweigbergk - Jan. 12, 2016, 8:55 p.m.
On Tue, Jan 12, 2016 at 11:39 AM Augie Fackler <raf@durin42.com> wrote:

> On Jan 12, 2016, at 13:44, Martin von Zweigbergk <martinvonz@google.com>
> wrote:
>
>
> On Sun, Jan 10, 2016 at 9:51 PM Martin von Zweigbergk <
> martinvonz@google.com> wrote:
>
>> # HG changeset patch
>> # User Martin von Zweigbergk <martinvonz@google.com>
>> # Date 1451931209 28800
>> #      Mon Jan 04 10:13:29 2016 -0800
>> # Node ID ab1516b36e44bfb32ed6f354a9dae2f2f6300add
>> # Parent  4ea59d0f00be7bc44f1968c073a44274d67fa4f3
>> status: change + back out == clean (API)
>>
>> After backing out a change, so the file contents is equal to a
>> previous revision of itself, we currently report the status between
>> the two equal revisions as modified. This is because
>> context._buildstatus() reports any file whose new nodeid is not equal
>> to _newnode as modified. That magic nodeid is given only to files
>> added or modified in the working directory, so any file whose nodeid
>> has changed between two revisions will be reported as modified.
>>
>> Fix by simply comparing the file contents for all cases where the
>> nodeid changed, whether they are in the working copy or committed.
>>
>
> I didn't check the performance impact of this until now, sorry :-( Since
> we used to assume that different nodeid implied different contents (and
> thus reported some clean files as modified), we avoided a lot of content
> comparisons. Bad cases like "hg st --rev .~1000 --rev ." on the mozilla
> repo (>7k files) goes from <1s to >30s.
>
> Thoughts? Back out this change and accept that "hg status" may report some
> false modifications?
>
>
> I think that's probably the least horrible result sadly.
>

I'll send a patch or two soon.

>
Durham Goode - Jan. 25, 2016, 9:54 p.m.
On 1/12/16 12:55 PM, Martin von Zweigbergk wrote:
>
> On Tue, Jan 12, 2016 at 11:39 AM Augie Fackler <raf@durin42.com 
> <mailto:raf@durin42.com>> wrote:
>
>>     On Jan 12, 2016, at 13:44, Martin von Zweigbergk
>>     <martinvonz@google.com <mailto:martinvonz@google.com>> wrote:
>>
>>
>>     On Sun, Jan 10, 2016 at 9:51 PM Martin von Zweigbergk
>>     <martinvonz@google.com <mailto:martinvonz@google.com>> wrote:
>>
>>         # HG changeset patch
>>         # User Martin von Zweigbergk <martinvonz@google.com
>>         <mailto:martinvonz@google.com>>
>>         # Date 1451931209 28800
>>         #      Mon Jan 04 10:13:29 2016 -0800
>>         # Node ID ab1516b36e44bfb32ed6f354a9dae2f2f6300add
>>         # Parent 4ea59d0f00be7bc44f1968c073a44274d67fa4f3
>>         status: change + back out == clean (API)
>>
>>         After backing out a change, so the file contents is equal to a
>>         previous revision of itself, we currently report the status
>>         between
>>         the two equal revisions as modified. This is because
>>         context._buildstatus() reports any file whose new nodeid is
>>         not equal
>>         to _newnode as modified. That magic nodeid is given only to files
>>         added or modified in the working directory, so any file whose
>>         nodeid
>>         has changed between two revisions will be reported as modified.
>>
>>         Fix by simply comparing the file contents for all cases where the
>>         nodeid changed, whether they are in the working copy or
>>         committed.
>>
>>
>>     I didn't check the performance impact of this until now, sorry
>>     :-( Since we used to assume that different nodeid implied
>>     different contents (and thus reported some clean files as
>>     modified), we avoided a lot of content comparisons. Bad cases
>>     like "hg st --rev .~1000 --rev ." on the mozilla repo (>7k files)
>>     goes from <1s to >30s.
>>
>>     Thoughts? Back out this change and accept that "hg status" may
>>     report some false modifications?
>
>     I think that's probably the least horrible result sadly.
>
>
> I'll send a patch or two soon.

The perf consequence of this change is a bit disappointing.  It also 
affects remotefilelog repos, since previously we would be able to 
compute status based almost entirely on the manifest contents, but now 
we need the before and after file contents of every file.

Was this actively breaking something before?  I can change 
remotefilelog's implementation of filelog.cmp to return False if the 
nodes are different, but I'd rather not diverge from core like that if 
possible.
Martin von Zweigbergk - Jan. 25, 2016, 9:59 p.m.
On Mon, Jan 25, 2016 at 1:54 PM Durham Goode <durham@fb.com> wrote:

>
>
> On 1/12/16 12:55 PM, Martin von Zweigbergk wrote:
>
>
> On Tue, Jan 12, 2016 at 11:39 AM Augie Fackler <raf@durin42.com> wrote:
>
>> On Jan 12, 2016, at 13:44, Martin von Zweigbergk <martinvonz@google.com>
>> wrote:
>>
>>
>> On Sun, Jan 10, 2016 at 9:51 PM Martin von Zweigbergk <
>> martinvonz@google.com> wrote:
>>
>>> # HG changeset patch
>>> # User Martin von Zweigbergk < <martinvonz@google.com>
>>> martinvonz@google.com>
>>> # Date 1451931209 28800
>>> #      Mon Jan 04 10:13:29 2016 -0800
>>> # Node ID ab1516b36e44bfb32ed6f354a9dae2f2f6300add
>>> # Parent  4ea59d0f00be7bc44f1968c073a44274d67fa4f3
>>> status: change + back out == clean (API)
>>>
>>> After backing out a change, so the file contents is equal to a
>>> previous revision of itself, we currently report the status between
>>> the two equal revisions as modified. This is because
>>> context._buildstatus() reports any file whose new nodeid is not equal
>>> to _newnode as modified. That magic nodeid is given only to files
>>> added or modified in the working directory, so any file whose nodeid
>>> has changed between two revisions will be reported as modified.
>>>
>>> Fix by simply comparing the file contents for all cases where the
>>> nodeid changed, whether they are in the working copy or committed.
>>>
>>
>> I didn't check the performance impact of this until now, sorry :-( Since
>> we used to assume that different nodeid implied different contents (and
>> thus reported some clean files as modified), we avoided a lot of content
>> comparisons. Bad cases like "hg st --rev .~1000 --rev ." on the mozilla
>> repo (>7k files) goes from <1s to >30s.
>>
>> Thoughts? Back out this change and accept that "hg status" may report
>> some false modifications?
>>
>>
>> I think that's probably the least horrible result sadly.
>>
>
> I'll send a patch or two soon.
>
>
> The perf consequence of this change is a bit disappointing.  It also
> affects remotefilelog repos, since previously we would be able to compute
> status based almost entirely on the manifest contents, but now we need the
> before and after file contents of every file.
>
> Was this actively breaking something before?  I can change remotefilelog's
> implementation of filelog.cmp to return False if the nodes are different,
> but I'd rather not diverge from core like that if possible.
>

From reading your reply, it seems like you did not see that I backed out
this change (54522bbe1597). Did you cut a release between this patch and
the backout?
Durham Goode - Jan. 25, 2016, 10:19 p.m.
On 1/25/16 1:59 PM, Martin von Zweigbergk wrote:
>
>
> On Mon, Jan 25, 2016 at 1:54 PM Durham Goode <durham@fb.com 
> <mailto:durham@fb.com>> wrote:
>
>
>
>     On 1/12/16 12:55 PM, Martin von Zweigbergk wrote:
>>
>>     On Tue, Jan 12, 2016 at 11:39 AM Augie Fackler <raf@durin42.com
>>     <mailto:raf@durin42.com>> wrote:
>>
>>>         On Jan 12, 2016, at 13:44, Martin von Zweigbergk
>>>         <martinvonz@google.com <mailto:martinvonz@google.com>> wrote:
>>>
>>>
>>>         On Sun, Jan 10, 2016 at 9:51 PM Martin von Zweigbergk
>>>         <martinvonz@google.com <mailto:martinvonz@google.com>> wrote:
>>>
>>>             # HG changeset patch
>>>             # User Martin von Zweigbergk <martinvonz@google.com
>>>             <mailto:martinvonz@google.com>>
>>>             # Date 1451931209 28800
>>>             #      Mon Jan 04 10:13:29 2016 -0800
>>>             # Node ID ab1516b36e44bfb32ed6f354a9dae2f2f6300add
>>>             # Parent 4ea59d0f00be7bc44f1968c073a44274d67fa4f3
>>>             status: change + back out == clean (API)
>>>
>>>             After backing out a change, so the file contents is
>>>             equal to a
>>>             previous revision of itself, we currently report the
>>>             status between
>>>             the two equal revisions as modified. This is because
>>>             context._buildstatus() reports any file whose new nodeid
>>>             is not equal
>>>             to _newnode as modified. That magic nodeid is given only
>>>             to files
>>>             added or modified in the working directory, so any file
>>>             whose nodeid
>>>             has changed between two revisions will be reported as
>>>             modified.
>>>
>>>             Fix by simply comparing the file contents for all cases
>>>             where the
>>>             nodeid changed, whether they are in the working copy or
>>>             committed.
>>>
>>>
>>>         I didn't check the performance impact of this until now,
>>>         sorry :-( Since we used to assume that different nodeid
>>>         implied different contents (and thus reported some clean
>>>         files as modified), we avoided a lot of content comparisons.
>>>         Bad cases like "hg st --rev .~1000 --rev ." on the mozilla
>>>         repo (>7k files) goes from <1s to >30s.
>>>
>>>         Thoughts? Back out this change and accept that "hg status"
>>>         may report some false modifications?
>>
>>         I think that's probably the least horrible result sadly.
>>
>>
>>     I'll send a patch or two soon.
>
>     The perf consequence of this change is a bit disappointing.  It
>     also affects remotefilelog repos, since previously we would be
>     able to compute status based almost entirely on the manifest
>     contents, but now we need the before and after file contents of
>     every file.
>
>     Was this actively breaking something before?  I can change
>     remotefilelog's implementation of filelog.cmp to return False if
>     the nodes are different, but I'd rather not diverge from core like
>     that if possible.
>
>
> From reading your reply, it seems like you did not see that I backed 
> out this change (54522bbe1597). Did you cut a release between this 
> patch and the backout?
Ah, ok.  I misread your final message about sending a patch or two 
soon.  Yes, we cut a branch and that's what I'm testing.  I'll 
cherry-pick in the fix.  Thanks for pointing that out.

Patch

diff --git a/mercurial/context.py b/mercurial/context.py
--- a/mercurial/context.py
+++ b/mercurial/context.py
@@ -140,15 +140,9 @@ 
                 added.append(fn)
             elif node2 is None:
                 removed.append(fn)
-            elif node2 != _newnode:
-                # The file was not a new file in mf2, so an entry
-                # from diff is really a difference.
-                modified.append(fn)
             elif flag1 != flag2:
                 modified.append(fn)
             elif self[fn].cmp(other[fn]):
-                # node2 was newnode, but the working file doesn't
-                # match the one in mf1.
                 modified.append(fn)
             else:
                 clean.append(fn)
diff --git a/tests/test-status.t b/tests/test-status.t
--- a/tests/test-status.t
+++ b/tests/test-status.t
@@ -388,6 +388,14 @@ 
 
 #endif
 
+reverted and commit change should appear clean
+
+  $ hg revert -r 0 .
+  reverting file
+  $ hg ci -m a
+  $ hg status -A --rev 0 --rev 2
+  C file
+
   $ cd ..
 
 hg status of binary file starting with '\1\n', a separator for metadata: