Patchwork D1056: context: add a fast-comparision path between arbitraryfilectx and workingfilectx

login
register
mail settings
Submitter phabricator
Date Oct. 13, 2017, 7:45 p.m.
Message ID <differential-rev-PHID-DREV-c5ef77rk62jkiwshpu42-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/24850/
State Superseded
Headers show

Comments

phabricator - Oct. 13, 2017, 7:45 p.m.
phillco created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1056

AFFECTED FILES
  mercurial/context.py

CHANGE DETAILS




To: phillco, #hg-reviewers
Cc: mercurial-devel
phabricator - Oct. 16, 2017, 10:57 a.m.
ryanmce added a comment.


  ruh-roh

INLINE COMMENTS

> context.py:2566
> +            # Add a fast-path for merge if both sides are disk-backed.
> +            # Note that filecmp uses the opposite return values as cmp.
> +            return not filecmp.cmp(self.path(), self._repo.wjoin(fctx.path()))

Ugh, haha

> context.py:2567
> +            # Note that filecmp uses the opposite return values as cmp.
> +            return not filecmp.cmp(self.path(), self._repo.wjoin(fctx.path()))
> +        return self.data() != fctx.data()

Why is this sufficient? Can't the contents be the same even if the paths are different?

I think you can only fastpath if the paths are the same, otherwise you have to fall back to data comparison.

This is already queued, but I think we need to drop it if I'm right here.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1056

To: phillco, #hg-reviewers, durin42
Cc: ryanmce, mercurial-devel
phabricator - Oct. 16, 2017, 5:34 p.m.
phillco added inline comments.

INLINE COMMENTS

> ryanmce wrote in context.py:2567
> Why is this sufficient? Can't the contents be the same even if the paths are different?
> 
> I think you can only fastpath if the paths are the same, otherwise you have to fall back to data comparison.
> 
> This is already queued, but I think we need to drop it if I'm right here.

Ryan and I talked offline -- but surprisingly, the default `filectx.cmp` function only compares contents:

  1:~/1$ repo['.']['A'].cmp(repo['.']['B'])
  Out[1]: False
  
  2:~/1$ repo['.']['A'].cmp(repo['.']['A'])
  Out[2]: False
  
  3:~/1$ repo['.']['A'].cmp(repo['.']['C'])
  Out[3]: True

(here, `A` and `B` have the same content).

And `filecmp` seems to behave the same way:

  7:~/1$ filecmp.cmp('A', 'B')
  Out[7]: True
  
  8:~/1$ filecmp.cmp('A', 'A')
  Out[8]: True
  
  9:~/1$ filecmp.cmp('A', 'C')
  Out[9]: False

That doesn't mean that it's wrong, but I think it's consistent.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1056

To: phillco, #hg-reviewers, durin42
Cc: ryanmce, mercurial-devel
phabricator - Oct. 16, 2017, 6:03 p.m.
phillco added inline comments.

INLINE COMMENTS

> phillco wrote in context.py:2567
> Ryan and I talked offline -- but surprisingly, the default `filectx.cmp` function only compares contents:
> 
>   1:~/1$ repo['.']['A'].cmp(repo['.']['B'])
>   Out[1]: False
>   
>   2:~/1$ repo['.']['A'].cmp(repo['.']['A'])
>   Out[2]: False
>   
>   3:~/1$ repo['.']['A'].cmp(repo['.']['C'])
>   Out[3]: True
> 
> (here, `A` and `B` have the same content).
> 
> And `filecmp` seems to behave the same way:
> 
>   7:~/1$ filecmp.cmp('A', 'B')
>   Out[7]: True
>   
>   8:~/1$ filecmp.cmp('A', 'A')
>   Out[8]: True
>   
>   9:~/1$ filecmp.cmp('A', 'C')
>   Out[9]: False
> 
> That doesn't mean that it's wrong, but I think it's consistent.

Ah, but some experimenting revealed that `filecmp` does follow symlinks, but our `filectx` comparators do not. Otherwise, both `filecmp` and `filectx.cmp` ignore any flag differences.

Thus, you can demonstrate a discrepancy:

  A contains "foo"
  real_A contains "A"
  sym_A link to A
  
  repo['.']['real_A'].cmp(repo['.']['sym_A']) # claims the same
  filecmp.cmp('real_A', 'sym_A') # claims a difference, because "foo" != "A"

Note this simpler case doesn't trigger the discrepancy, because the linked file is otherwise identical:

  A contains "A"
  sym_A link to A
  
  repo['.']['A'].cmp(repo['.']['sym_A']) # claims the same
  filecmp.cmp('A', 'sym_A') # claims the same

The easiest fix is just to skip the fast-comparison path if either side is a symlink, and that's what I'll do unless others have other ideas. (@ryanmce thought we should think about what this API _should_ do).

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1056

To: phillco, #hg-reviewers, durin42
Cc: ryanmce, mercurial-devel
phabricator - Oct. 16, 2017, 7:01 p.m.
durin42 added a comment.


  For now, send a follow-up to not do that fast-path if it's a symlink, and we can reason more carefully about this API during the freeze with an eye towards landing something better in 4.5...

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1056

To: phillco, #hg-reviewers, durin42
Cc: ryanmce, mercurial-devel
phabricator - Oct. 16, 2017, 7:36 p.m.
phillco added a comment.


  Done: https://phab.mercurial-scm.org/D1122.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1056

To: phillco, #hg-reviewers, durin42
Cc: ryanmce, mercurial-devel

Patch

diff --git a/mercurial/context.py b/mercurial/context.py
--- a/mercurial/context.py
+++ b/mercurial/context.py
@@ -8,6 +8,7 @@ 
 from __future__ import absolute_import
 
 import errno
+import filecmp
 import os
 import re
 import stat
@@ -2545,11 +2546,17 @@ 
     """Allows you to use filectx-like functions on a file in an arbitrary
     location on disk, possibly not in the working directory.
     """
-    def __init__(self, path):
+    def __init__(self, path, repo=None):
+        # Repo is optional because contrib/simplemerge uses this class.
+        self._repo = repo
         self._path = path
 
-    def cmp(self, otherfilectx):
-        return self.data() != otherfilectx.data()
+    def cmp(self, fctx):
+        if isinstance(fctx, workingfilectx) and self._repo:
+            # Add a fast-path for merge if both sides are disk-backed.
+            # Note that filecmp uses the opposite return values as cmp.
+            return not filecmp.cmp(self.path(), self._repo.wjoin(fctx.path()))
+        return self.data() != fctx.data()
 
     def path(self):
         return self._path