Patchwork [stable] dirstate.walk: don't report same file stat multiple times

login
register
mail settings
Submitter Martin von Zweigbergk
Date April 5, 2015, 9:38 p.m.
Message ID <df4b2a1b51a30bd4b021.1428269896@martinvonz.mtv.corp.google.com>
Download mbox | patch
Permalink /patch/8504/
State Accepted
Headers show

Comments

Martin von Zweigbergk - April 5, 2015, 9:38 p.m.
# HG changeset patch
# User Martin von Zweigbergk <martinvonz@google.com>
# Date 1428209652 25200
#      Sat Apr 04 21:54:12 2015 -0700
# Branch stable
# Node ID df4b2a1b51a30bd4b021b479d871864cce4886e9
# Parent  2e2e9a0750f91a6fe0ad88e4de34f8efefdcab08
dirstate.walk: don't report same file stat multiple times

dirstate.walk() generates pairs of filename and a stat-like
object. After "hg mv foo Foo", it generates one pair for "foo" and one
for "Foo", as it should. However, on case-insensitive file systems,
when it tries to stat to get the disk state as well, it gets the same
stat result for both names. This confuses at least
scmutil._interestingfiles(), making it think that "foo" was forgotten
rather than removed. That, in turn, makes "hg addremove" add "foo"
back, resulting in both cases in the dirstate, as reported in
issue4590.

This change only takes care of the "if unknown" branch. A similar fix
should perhaps be applied to the other branch.
Matt Harbison - April 6, 2015, 1:06 a.m.
On Sun, 05 Apr 2015 17:38:16 -0400, Martin von Zweigbergk  
<martinvonz@google.com> wrote:

> # HG changeset patch
> # User Martin von Zweigbergk <martinvonz@google.com>
> # Date 1428209652 25200
> #      Sat Apr 04 21:54:12 2015 -0700
> # Branch stable
> # Node ID df4b2a1b51a30bd4b021b479d871864cce4886e9
> # Parent  2e2e9a0750f91a6fe0ad88e4de34f8efefdcab08
> dirstate.walk: don't report same file stat multiple times
>
> dirstate.walk() generates pairs of filename and a stat-like
> object. After "hg mv foo Foo", it generates one pair for "foo" and one
> for "Foo", as it should. However, on case-insensitive file systems,
> when it tries to stat to get the disk state as well, it gets the same
> stat result for both names. This confuses at least
> scmutil._interestingfiles(), making it think that "foo" was forgotten
> rather than removed. That, in turn, makes "hg addremove" add "foo"
> back, resulting in both cases in the dirstate, as reported in
> issue4590.
>
> This change only takes care of the "if unknown" branch. A similar fix
> should perhaps be applied to the other branch.
>
> diff -r 2e2e9a0750f9 -r df4b2a1b51a3 mercurial/dirstate.py
> --- a/mercurial/dirstate.py	Tue Mar 31 20:20:17 2015 -0300
> +++ b/mercurial/dirstate.py	Sat Apr 04 21:54:12 2015 -0700
> @@ -793,9 +793,15 @@
>                  audit_path = pathutil.pathauditor(self._root)
>                 for nf in iter(visit):
> +                    # If a stat for the same file was already added  
> with a
> +                    # different case, don't add one for this, since  
> that would
> +                    # make it appear as if the file exists under both  
> names
> +                    # on disk.
> +                    if normalize and normalize(nf, True, True) in  
> results:
> +                        results[nf] = None
>                      # Report ignored items in the dmap as long as they  
> are not
>                      # under a symlink directory.
> -                    if audit_path.check(nf):
> +                    elif audit_path.check(nf):
>                          try:
>                              results[nf] = lstat(join(nf))
>                              # file was just ignored, no links, and  
> exists
> diff -r 2e2e9a0750f9 -r df4b2a1b51a3 tests/test-casefolding.t
> --- a/tests/test-casefolding.t	Tue Mar 31 20:20:17 2015 -0300
> +++ b/tests/test-casefolding.t	Sat Apr 04 21:54:12 2015 -0700
> @@ -37,6 +37,15 @@
>    $ hg mv A a
>    $ hg st
> +addremove after case-changing rename has no effect (issue4590)
> +
> +  $ hg mv a A
> +  $ hg addremove
> +  recording removal of a as rename to A (100% similar)

Any idea why this is printing out?  I would understand if the test did a  
plain mv, but shouldn't 'hg mv' be what definitively records the  
removal/rename of 'a'?

> +  $ hg revert --all
> +  forgetting A
> +  undeleting a
> +
>  test changing case of path components
>   $ mkdir D
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
Martin von Zweigbergk - April 6, 2015, 4:21 a.m.
On Apr 5, 2015 6:07 PM, "Matt Harbison" <mharbison72@gmail.com> wrote:

> On Sun, 05 Apr 2015 17:38:16 -0400, Martin von Zweigbergk <
> martinvonz@google.com> wrote:
>
>  # HG changeset patch
>> # User Martin von Zweigbergk <martinvonz@google.com>
>> # Date 1428209652 25200
>> #      Sat Apr 04 21:54:12 2015 -0700
>> # Branch stable
>> # Node ID df4b2a1b51a30bd4b021b479d871864cce4886e9
>> # Parent  2e2e9a0750f91a6fe0ad88e4de34f8efefdcab08
>> dirstate.walk: don't report same file stat multiple times
>>
>> dirstate.walk() generates pairs of filename and a stat-like
>> object. After "hg mv foo Foo", it generates one pair for "foo" and one
>> for "Foo", as it should. However, on case-insensitive file systems,
>> when it tries to stat to get the disk state as well, it gets the same
>> stat result for both names. This confuses at least
>> scmutil._interestingfiles(), making it think that "foo" was forgotten
>> rather than removed. That, in turn, makes "hg addremove" add "foo"
>> back, resulting in both cases in the dirstate, as reported in
>> issue4590.
>>
>> This change only takes care of the "if unknown" branch. A similar fix
>> should perhaps be applied to the other branch.
>>
>> diff -r 2e2e9a0750f9 -r df4b2a1b51a3 mercurial/dirstate.py
>> --- a/mercurial/dirstate.py     Tue Mar 31 20:20:17 2015 -0300
>> +++ b/mercurial/dirstate.py     Sat Apr 04 21:54:12 2015 -0700
>> @@ -793,9 +793,15 @@
>>                  audit_path = pathutil.pathauditor(self._root)
>>                 for nf in iter(visit):
>> +                    # If a stat for the same file was already added with
>> a
>> +                    # different case, don't add one for this, since that
>> would
>> +                    # make it appear as if the file exists under both
>> names
>> +                    # on disk.
>> +                    if normalize and normalize(nf, True, True) in
>> results:
>> +                        results[nf] = None
>>                      # Report ignored items in the dmap as long as they
>> are not
>>                      # under a symlink directory.
>> -                    if audit_path.check(nf):
>> +                    elif audit_path.check(nf):
>>                          try:
>>                              results[nf] = lstat(join(nf))
>>                              # file was just ignored, no links, and exists
>> diff -r 2e2e9a0750f9 -r df4b2a1b51a3 tests/test-casefolding.t
>> --- a/tests/test-casefolding.t  Tue Mar 31 20:20:17 2015 -0300
>> +++ b/tests/test-casefolding.t  Sat Apr 04 21:54:12 2015 -0700
>> @@ -37,6 +37,15 @@
>>    $ hg mv A a
>>    $ hg st
>> +addremove after case-changing rename has no effect (issue4590)
>> +
>> +  $ hg mv a A
>> +  $ hg addremove
>> +  recording removal of a as rename to A (100% similar)
>>
>
> Any idea why this is printing out?  I would understand if the test did a
> plain mv, but shouldn't 'hg mv' be what definitively records the
> removal/rename of 'a'?
>

Sorry, I forgot to mention that part, so thanks for asking. I agree that
it's a little surprising, but it's actually unrelated to case. We should
probably add a test (outside of this patch) with "hg mv foo bar; hg
addremove" that shows this behavior.


>  +  $ hg revert --all
>> +  forgetting A
>> +  undeleting a
>> +
>>  test changing case of path components
>>   $ mkdir D
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel@selenic.com
>> http://selenic.com/mailman/listinfo/mercurial-devel
>>
>
Matt Mackall - April 6, 2015, 8:45 p.m.
On Sun, 2015-04-05 at 14:38 -0700, Martin von Zweigbergk wrote:
> # HG changeset patch
> # User Martin von Zweigbergk <martinvonz@google.com>
> # Date 1428209652 25200
> #      Sat Apr 04 21:54:12 2015 -0700
> # Branch stable
> # Node ID df4b2a1b51a30bd4b021b479d871864cce4886e9
> # Parent  2e2e9a0750f91a6fe0ad88e4de34f8efefdcab08
> dirstate.walk: don't report same file stat multiple times

Queued for stable, thanks.

Patch

diff -r 2e2e9a0750f9 -r df4b2a1b51a3 mercurial/dirstate.py
--- a/mercurial/dirstate.py	Tue Mar 31 20:20:17 2015 -0300
+++ b/mercurial/dirstate.py	Sat Apr 04 21:54:12 2015 -0700
@@ -793,9 +793,15 @@ 
                 audit_path = pathutil.pathauditor(self._root)
 
                 for nf in iter(visit):
+                    # If a stat for the same file was already added with a
+                    # different case, don't add one for this, since that would
+                    # make it appear as if the file exists under both names
+                    # on disk.
+                    if normalize and normalize(nf, True, True) in results:
+                        results[nf] = None
                     # Report ignored items in the dmap as long as they are not
                     # under a symlink directory.
-                    if audit_path.check(nf):
+                    elif audit_path.check(nf):
                         try:
                             results[nf] = lstat(join(nf))
                             # file was just ignored, no links, and exists
diff -r 2e2e9a0750f9 -r df4b2a1b51a3 tests/test-casefolding.t
--- a/tests/test-casefolding.t	Tue Mar 31 20:20:17 2015 -0300
+++ b/tests/test-casefolding.t	Sat Apr 04 21:54:12 2015 -0700
@@ -37,6 +37,15 @@ 
   $ hg mv A a
   $ hg st
 
+addremove after case-changing rename has no effect (issue4590)
+
+  $ hg mv a A
+  $ hg addremove
+  recording removal of a as rename to A (100% similar)
+  $ hg revert --all
+  forgetting A
+  undeleting a
+
 test changing case of path components
 
   $ mkdir D