Patchwork [STABLE] icasefs: improve rename awareness of case-folding collision detection (issue3452)

login
register
mail settings
Submitter Katsunori FUJIWARA
Date Feb. 1, 2013, 1:44 a.m.
Message ID <3eca4bf6f77f64a83835.1359683069@juju>
Download mbox | patch
Permalink /patch/782/
State Superseded
Headers show

Comments

Katsunori FUJIWARA - Feb. 1, 2013, 1:44 a.m.
# HG changeset patch
# User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
# Date 1359682698 -32400
# Branch stable
# Node ID 3eca4bf6f77f64a83835466f9b07022f5152a219
# Parent  2a1fac3650a5b4d650198604c82ab59969500374
icasefs: improve rename awareness of case-folding collision detection (issue3452)

Before this patch, merging at (6) in the example below is aborted for
case-folding collision unexpectedly, because 'a' at (4) and 'A' at (5)
aren't recognized as source/destination of renaming.

  (0) -- (2) -- (4) -------
      \      \             \
       \      \             \
         (1) -- (3) -- (5) -- (6)

  0: add file 'a'
  1: rename from 'a' to 'A'
  2: add file 'x'
  3: merge
  4: modify 'a'
  5: modify 'A'
  6: merge

"copies.pathcopies()" used for case-folding collision detection scans
filelog entries only for the revisions which are grater than common
ancestor of merged revisions: in the example above, renaming from 'a'
to 'A' at (1) is not scanned in the merging at (6), because common
ancestor of merged revisions is (2) in this case.

"copies.mergecopies()" can detect renaming in such case.

But in the other hand, in the case merging branches without
modification of renamed file (= issue3370 case), like (3) in the
example above, "copies.mergecopies()" can't detect renaming.

So, this patch uses both "copies.pathcopies()" and
"copies.mergecopies()".

To prevent "copies.mergecopies()" from being called in
"manifestmerge()" again, "_checkcollision()" returns result of it, and
"manifestmerge()" reuses it, if it is invoked in "_checkcollision()".
Mads Kiilerich - Feb. 1, 2013, 10:25 a.m.
On 02/01/2013 02:44 AM, FUJIWARA Katsunori wrote:
> # HG changeset patch
> # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> # Date 1359682698 -32400
> # Branch stable
> # Node ID 3eca4bf6f77f64a83835466f9b07022f5152a219
> # Parent  2a1fac3650a5b4d650198604c82ab59969500374
> icasefs: improve rename awareness of case-folding collision detection (issue3452)
>
> Before this patch, merging at (6) in the example below is aborted for
> case-folding collision unexpectedly, because 'a' at (4) and 'A' at (5)
> aren't recognized as source/destination of renaming.
>
>    (0) -- (2) -- (4) -------
>        \      \             \
>         \      \             \
>           (1) -- (3) -- (5) -- (6)
>
>    0: add file 'a'
>    1: rename from 'a' to 'A'
>    2: add file 'x'
>    3: merge
>    4: modify 'a'
>    5: modify 'A'
>    6: merge
>
> "copies.pathcopies()" used for case-folding collision detection scans
> filelog entries only for the revisions which are grater than common
> ancestor of merged revisions: in the example above, renaming from 'a'
> to 'A' at (1) is not scanned in the merging at (6), because common
> ancestor of merged revisions is (2) in this case.
>
> "copies.mergecopies()" can detect renaming in such case.
>
> But in the other hand, in the case merging branches without
> modification of renamed file (= issue3370 case), like (3) in the
> example above, "copies.mergecopies()" can't detect renaming.
>
> So, this patch uses both "copies.pathcopies()" and
> "copies.mergecopies()".
>
> To prevent "copies.mergecopies()" from being called in
> "manifestmerge()" again, "_checkcollision()" returns result of it, and
> "manifestmerge()" reuses it, if it is invoked in "_checkcollision()".

A tricky one ...

It is also not exactly a one-liner patch. Considering the size of the 
patch, the frequency this case is hit, the impact when it is hit, and 
the timing: I don't think this is for stable now. Doing it right seems 
to require some refactorings - and it would be better to do it right.

Some overall comments:

It is a bit messy that it calls copies.mergecopies from different places 
far from each other. A first improvement to the patch would be to 
calculate it and pass it as a parameter to _checkcollision.

It is also not obvious why we have the separation between 
calculateupdates and manifestmerge. You demonstrate that we sometimes 
need copies.mergecopies in calculateupdates, sometimes in manifestmerge. 
I think it would be better to start with a refactoring that merges 
calculateupdates into manifestmerge. It seems like that would simplify 
things.

And ...

> diff --git a/hgext/largefiles/overrides.py b/hgext/largefiles/overrides.py
> --- a/hgext/largefiles/overrides.py
> +++ b/hgext/largefiles/overrides.py
> @@ -360,8 +360,9 @@
>   # Finally, the merge.applyupdates function will then take care of
>   # writing the files into the working copy and lfcommands.updatelfiles
>   # will update the largefiles.
> -def overridemanifestmerge(origfn, repo, p1, p2, pa, overwrite, partial):
> -    actions = origfn(repo, p1, p2, pa, overwrite, partial)
> +def overridemanifestmerge(origfn, repo, p1, p2, pa, overwrite, partial,
> +                          mergecopies=None):
> +    actions = origfn(repo, p1, p2, pa, overwrite, partial, mergecopies)
>       processed = []
>   
>       for action in actions:
> diff --git a/mercurial/merge.py b/mercurial/merge.py
> --- a/mercurial/merge.py
> +++ b/mercurial/merge.py
> @@ -141,21 +141,33 @@
>       if extractxs:
>           wctx, actx = extractxs
>           # class to delay looking up copy mapping
> -        class pathcopies(object):
> +        class copymap(object):
> +            def __init__(self):
> +                self._mergecopies = None
>               @util.propertycache
> -            def map(self):
> +            def pathcopies(self):
>                   # {dst@mctx: src@wctx} copy mapping
>                   return copies.pathcopies(wctx, mctx)
> -        pc = pathcopies()
> +            @util.propertycache
> +            def mergecopies(self):
> +                self._mergecopies = copies.mergecopies(wctx._repo,
> +                                                       wctx, mctx, actx)
> +                return self._mergecopies[0]

Calling the property mergecopies but only returning mergecopies[0] seems 
a bit confusing.

> +            def find(self, f1, f2):
> +                return (self.pathcopies.get(f1) == f2 or
> +                        self.mergecopies.get(f1) == f2 or
> +                        self.mergecopies.get(f2) == f1)
> +        cm = copymap()
>   
>           for fn in wctx:
>               fold = util.normcase(fn)
>               mfn = folded.get(fold, None)
> -            if (mfn and mfn != fn and pc.map.get(mfn) != fn and
> +            if (mfn and mfn != fn and not cm.find(mfn, fn) and
>                   _remains(fn, wctx.manifest(), actx.manifest(), True) and
>                   _remains(mfn, mctx.manifest(), actx.manifest())):
>                   raise util.Abort(_("case-folding collision between %s and %s")
>                                    % (mfn, fn))
> +        return cm._mergecopies

Why make it private when you access it from the outside anyway?

>   
>   def _forgetremoved(wctx, mctx, branchmerge):
>       """
> @@ -185,7 +197,7 @@
>   
>       return actions
>   
> -def manifestmerge(repo, p1, p2, pa, overwrite, partial):
> +def manifestmerge(repo, p1, p2, pa, overwrite, partial, mergecopies=None):
>       """
>       Merge p1 and p2 with ancestor pa and generate merge action list
>   
> @@ -204,7 +216,7 @@
>       elif pa == p2: # backwards
>           pa = p1.p1()
>       elif pa and repo.ui.configbool("merge", "followcopies", True):
> -        ret = copies.mergecopies(repo, p1, p2, pa)
> +        ret = mergecopies or copies.mergecopies(repo, p1, p2, pa)
>           copy, movewithdir, diverge, renamedelete = ret
>           for of, fl in diverge.iteritems():
>               act("divergent renames", "dr", of, fl)
> @@ -440,13 +452,14 @@
>       "Calculate the actions needed to merge mctx into tctx"
>       actions = []
>       folding = not util.checkcase(repo.path)
> +    mergecopies = None
>       if folding:
>           # collision check is not needed for clean update
>           if (not branchmerge and
>               (force or not tctx.dirty(missing=True, branch=False))):
>               _checkcollision(mctx, None)
>           else:
> -            _checkcollision(mctx, (tctx, ancestor))
> +            mergecopies = _checkcollision(mctx, (tctx, ancestor))
>       if not force:
>           _checkunknown(repo, tctx, mctx)
>       if tctx.rev() is None:
> @@ -454,7 +467,8 @@
>       actions += manifestmerge(repo, tctx, mctx,
>                                ancestor,
>                                force and not branchmerge,
> -                             partial)
> +                             partial,
> +                             mergecopies)
>       return actions
>   
>   def recordupdates(repo, actions, branchmerge):
> diff --git a/tests/test-casecollision-merge.t b/tests/test-casecollision-merge.t
> --- a/tests/test-casecollision-merge.t
> +++ b/tests/test-casecollision-merge.t
> @@ -22,33 +22,53 @@
>     $ hg commit -m '#1'
>     $ hg update 0
>     1 files updated, 0 files merged, 1 files removed, 0 files unresolved
> -  $ echo 'modified at #2' > a
> +  $ echo x > x
> +  $ hg add x
>     $ hg commit -m '#2'
>     created new head
>   
> -  $ hg merge
> -  merging a and A to A
> -  0 files updated, 1 files merged, 0 files removed, 0 files unresolved
> -  (branch merge, don't forget to commit)
> +  $ hg merge -q
> +  $ hg status -A
> +  M A
> +  R a
> +  C x
> +
> +  $ hg update -q --clean 1
> +  $ hg merge -q
> +  $ hg status -A
> +  M x
> +  C A
> +
> +additional test for issue3452:
> +
> +  $ hg commit -m '#3'
> +
> +  $ hg update -q --clean 2
> +  $ echo 'a@4' > a
> +  $ hg commit -m '#4'
> +  created new head
> +
> +  $ hg update -q --clean 3
> +  $ echo 'A@5' > A
> +  $ hg commit -m '#5'
> +
> +  $ hg merge -q --tool internal:other 4
>     $ hg status -A
>     M A
>       a
> -  R a
> +  C x
>     $ cat A
> -  modified at #2
> +  a@4
>   
> -  $ hg update --clean 1
> -  1 files updated, 0 files merged, 0 files removed, 0 files unresolved
> -  $ hg merge
> -  merging A and a to A
> -  0 files updated, 1 files merged, 0 files removed, 0 files unresolved
> -  (branch merge, don't forget to commit)
> -  $ hg status -A
> +  $ hg update -q --clean 4
> +  $ hg merge -q --tool internal:other 5
> +  $ hg statu -A

Reusing tests is fine, but refactoring tests while doing a fix makes it 
very hard to see whether the fix actually introduces a regression. 
Please separate them somehow - for example by an initial 'refactor test' 
or 'improve test coverage' patch.

>     M A
>       a
> +  R a
> +  C x
>     $ cat A
> -  modified at #2
> -
> +  A@5
>     $ cd ..
>   
>   (2) colliding file is not related to collided file

/Mads
Matt Mackall - Feb. 1, 2013, 7:06 p.m.
On Fri, 2013-02-01 at 10:44 +0900, FUJIWARA Katsunori wrote:
> # HG changeset patch
> # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> # Date 1359682698 -32400
> # Branch stable
> # Node ID 3eca4bf6f77f64a83835466f9b07022f5152a219
> # Parent  2a1fac3650a5b4d650198604c82ab59969500374
> icasefs: improve rename awareness of case-folding collision detection (issue3452)

Going to hold off on this one. Mergecopies() needs to die, it is broken
in a number of ways. I think we need a deeper rethink of our
collision-detection strategy.

Patch

diff --git a/hgext/largefiles/overrides.py b/hgext/largefiles/overrides.py
--- a/hgext/largefiles/overrides.py
+++ b/hgext/largefiles/overrides.py
@@ -360,8 +360,9 @@ 
 # Finally, the merge.applyupdates function will then take care of
 # writing the files into the working copy and lfcommands.updatelfiles
 # will update the largefiles.
-def overridemanifestmerge(origfn, repo, p1, p2, pa, overwrite, partial):
-    actions = origfn(repo, p1, p2, pa, overwrite, partial)
+def overridemanifestmerge(origfn, repo, p1, p2, pa, overwrite, partial,
+                          mergecopies=None):
+    actions = origfn(repo, p1, p2, pa, overwrite, partial, mergecopies)
     processed = []
 
     for action in actions:
diff --git a/mercurial/merge.py b/mercurial/merge.py
--- a/mercurial/merge.py
+++ b/mercurial/merge.py
@@ -141,21 +141,33 @@ 
     if extractxs:
         wctx, actx = extractxs
         # class to delay looking up copy mapping
-        class pathcopies(object):
+        class copymap(object):
+            def __init__(self):
+                self._mergecopies = None
             @util.propertycache
-            def map(self):
+            def pathcopies(self):
                 # {dst@mctx: src@wctx} copy mapping
                 return copies.pathcopies(wctx, mctx)
-        pc = pathcopies()
+            @util.propertycache
+            def mergecopies(self):
+                self._mergecopies = copies.mergecopies(wctx._repo,
+                                                       wctx, mctx, actx)
+                return self._mergecopies[0]
+            def find(self, f1, f2):
+                return (self.pathcopies.get(f1) == f2 or
+                        self.mergecopies.get(f1) == f2 or
+                        self.mergecopies.get(f2) == f1)
+        cm = copymap()
 
         for fn in wctx:
             fold = util.normcase(fn)
             mfn = folded.get(fold, None)
-            if (mfn and mfn != fn and pc.map.get(mfn) != fn and
+            if (mfn and mfn != fn and not cm.find(mfn, fn) and
                 _remains(fn, wctx.manifest(), actx.manifest(), True) and
                 _remains(mfn, mctx.manifest(), actx.manifest())):
                 raise util.Abort(_("case-folding collision between %s and %s")
                                  % (mfn, fn))
+        return cm._mergecopies
 
 def _forgetremoved(wctx, mctx, branchmerge):
     """
@@ -185,7 +197,7 @@ 
 
     return actions
 
-def manifestmerge(repo, p1, p2, pa, overwrite, partial):
+def manifestmerge(repo, p1, p2, pa, overwrite, partial, mergecopies=None):
     """
     Merge p1 and p2 with ancestor pa and generate merge action list
 
@@ -204,7 +216,7 @@ 
     elif pa == p2: # backwards
         pa = p1.p1()
     elif pa and repo.ui.configbool("merge", "followcopies", True):
-        ret = copies.mergecopies(repo, p1, p2, pa)
+        ret = mergecopies or copies.mergecopies(repo, p1, p2, pa)
         copy, movewithdir, diverge, renamedelete = ret
         for of, fl in diverge.iteritems():
             act("divergent renames", "dr", of, fl)
@@ -440,13 +452,14 @@ 
     "Calculate the actions needed to merge mctx into tctx"
     actions = []
     folding = not util.checkcase(repo.path)
+    mergecopies = None
     if folding:
         # collision check is not needed for clean update
         if (not branchmerge and
             (force or not tctx.dirty(missing=True, branch=False))):
             _checkcollision(mctx, None)
         else:
-            _checkcollision(mctx, (tctx, ancestor))
+            mergecopies = _checkcollision(mctx, (tctx, ancestor))
     if not force:
         _checkunknown(repo, tctx, mctx)
     if tctx.rev() is None:
@@ -454,7 +467,8 @@ 
     actions += manifestmerge(repo, tctx, mctx,
                              ancestor,
                              force and not branchmerge,
-                             partial)
+                             partial,
+                             mergecopies)
     return actions
 
 def recordupdates(repo, actions, branchmerge):
diff --git a/tests/test-casecollision-merge.t b/tests/test-casecollision-merge.t
--- a/tests/test-casecollision-merge.t
+++ b/tests/test-casecollision-merge.t
@@ -22,33 +22,53 @@ 
   $ hg commit -m '#1'
   $ hg update 0
   1 files updated, 0 files merged, 1 files removed, 0 files unresolved
-  $ echo 'modified at #2' > a
+  $ echo x > x
+  $ hg add x
   $ hg commit -m '#2'
   created new head
 
-  $ hg merge
-  merging a and A to A
-  0 files updated, 1 files merged, 0 files removed, 0 files unresolved
-  (branch merge, don't forget to commit)
+  $ hg merge -q
+  $ hg status -A
+  M A
+  R a
+  C x
+
+  $ hg update -q --clean 1
+  $ hg merge -q
+  $ hg status -A
+  M x
+  C A
+
+additional test for issue3452:
+
+  $ hg commit -m '#3'
+
+  $ hg update -q --clean 2
+  $ echo 'a@4' > a
+  $ hg commit -m '#4'
+  created new head
+
+  $ hg update -q --clean 3
+  $ echo 'A@5' > A
+  $ hg commit -m '#5'
+
+  $ hg merge -q --tool internal:other 4
   $ hg status -A
   M A
     a
-  R a
+  C x
   $ cat A
-  modified at #2
+  a@4
 
-  $ hg update --clean 1
-  1 files updated, 0 files merged, 0 files removed, 0 files unresolved
-  $ hg merge
-  merging A and a to A
-  0 files updated, 1 files merged, 0 files removed, 0 files unresolved
-  (branch merge, don't forget to commit)
-  $ hg status -A
+  $ hg update -q --clean 4
+  $ hg merge -q --tool internal:other 5
+  $ hg statu -A
   M A
     a
+  R a
+  C x
   $ cat A
-  modified at #2
-
+  A@5
   $ cd ..
 
 (2) colliding file is not related to collided file