Patchwork [6,of,8,sparse,V2] dirstate: move validation of files in sparse checkout from extension

login
register
mail settings
Submitter Gregory Szorc
Date July 11, 2017, 4:57 a.m.
Message ID <7b09c7c8f8457dd96f45.1499749025@ubuntu-vm-main>
Download mbox | patch
Permalink /patch/22218/
State Accepted
Headers show

Comments

Gregory Szorc - July 11, 2017, 4:57 a.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1499554173 25200
#      Sat Jul 08 15:49:33 2017 -0700
# Node ID 7b09c7c8f8457dd96f45334dad13a0ceaae92a18
# Parent  03eed39f7af2db1678c2d39a2a7342e5c66b0804
dirstate: move validation of files in sparse checkout from extension

The sparse extension wraps various dirstate methods to validate that
the path being modified in dirstate exists in the sparse checkout.
This commit moves that code to dirstate itself.

As part of the move, the error messages were tweaked slightly.
And the code for the validation was reworked slightly. The most
notable change is the removal of the "f is not None" check.
This check has instead been inlined into copy(), which is the only
place where it should be relevant.

I'm not crazy about the references to sparse's UI adjustments
leaking into core. But what can you do.

I'm not a dirstate expert and am not sure if this is the ideal
way to hook in path validation into dirstate now that the code is in
core. But this does preserve the behavior of the extension. We can
always refactor later.

Unless the sparse feature is enabled via the sparse extension,
the matcher will be an alwaysmatcher and the new code will
essentially no-op. This change does add the overhead of calling
a few functions, however. If the modified methods are called in
tight loops (this can occur for working directory updates) this could
have a performance impact. I haven't explicitly measured the perf
impact because I view this code as temporary until it can be
refactored. The end state should essentially be an attribute
lookup, so I'm not too worried about long-term perf impact.
Katsunori FUJIWARA - July 11, 2017, 6:01 p.m.
At Mon, 10 Jul 2017 21:57:05 -0700,
Gregory Szorc wrote:
> 
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1499554173 25200
> #      Sat Jul 08 15:49:33 2017 -0700
> # Node ID 7b09c7c8f8457dd96f45334dad13a0ceaae92a18
> # Parent  03eed39f7af2db1678c2d39a2a7342e5c66b0804
> dirstate: move validation of files in sparse checkout from extension
> 
> The sparse extension wraps various dirstate methods to validate that
> the path being modified in dirstate exists in the sparse checkout.
> This commit moves that code to dirstate itself.
> 
> As part of the move, the error messages were tweaked slightly.
> And the code for the validation was reworked slightly. The most
> notable change is the removal of the "f is not None" check.
> This check has instead been inlined into copy(), which is the only
> place where it should be relevant.
> 
> I'm not crazy about the references to sparse's UI adjustments
> leaking into core. But what can you do.
> 
> I'm not a dirstate expert and am not sure if this is the ideal
> way to hook in path validation into dirstate now that the code is in
> core. But this does preserve the behavior of the extension. We can
> always refactor later.
> 
> Unless the sparse feature is enabled via the sparse extension,
> the matcher will be an alwaysmatcher and the new code will
> essentially no-op. This change does add the overhead of calling
> a few functions, however. If the modified methods are called in
> tight loops (this can occur for working directory updates) this could
> have a performance impact. I haven't explicitly measured the perf
> impact because I view this code as temporary until it can be
> refactored. The end state should essentially be an attribute
> lookup, so I'm not too worried about long-term perf impact.
> 
> diff --git a/hgext/sparse.py b/hgext/sparse.py
> --- a/hgext/sparse.py
> +++ b/hgext/sparse.py
> @@ -240,23 +240,6 @@ def _setupdirstate(ui):
>          return orig(self, parent, allfiles, changedfiles)
>      extensions.wrapfunction(dirstate.dirstate, 'rebuild', _rebuild)
>  
> -    # Prevent adding files that are outside the sparse checkout
> -    editfuncs = ['normal', 'add', 'normallookup', 'copy', 'remove', 'merge']
> -    hint = _('include file with `hg debugsparse --include <pattern>` or use ' +
> -             '`hg add -s <file>` to include file directory while adding')
> -    for func in editfuncs:
> -        def _wrapper(orig, self, *args):
> -            sparsematch = self._sparsematcher
> -            if not sparsematch.always():
> -                for f in args:
> -                    if (f is not None and not sparsematch(f) and
> -                        f not in self):
> -                        raise error.Abort(_("cannot add '%s' - it is outside "
> -                                            "the sparse checkout") % f,
> -                                          hint=hint)
> -            return orig(self, *args)
> -        extensions.wrapfunction(dirstate.dirstate, func, _wrapper)
> -
>  @command('^debugsparse', [
>      ('I', 'include', False, _('include files in the sparse checkout')),
>      ('X', 'exclude', False, _('exclude files in the sparse checkout')),
> diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
> --- a/mercurial/dirstate.py
> +++ b/mercurial/dirstate.py
> @@ -68,6 +68,10 @@ def nonnormalentries(dmap):
>                  otherparent.add(fname)
>          return nonnorm, otherparent
>  
> +ADD_SPARSE_HINT = _("include file with 'hg debugsparse --include <pattern>' "
> +                    "or use 'hg add --sparse <file>' to include file "
> +                    "directory while adding")
> +
>  class dirstate(object):
>  
>      def __init__(self, opener, ui, root, validate, sparsematchfn):
> @@ -273,6 +277,18 @@ class dirstate(object):
>          # it's safe because f is always a relative path
>          return self._rootdir + f
>  
> +    def _ensuresparsematch(self, f):
> +        """Verifies a path is part of the sparse config and aborts if not."""
> +        matcher = self._sparsematcher
> +
> +        if matcher.always():
> +            return
> +
> +        if not matcher(f) and f not in self._map:
> +            raise error.Abort(_("cannot add '%s' because it is outside the "
> +                                "sparse checkout") % f,
> +                              hint=ADD_SPARSE_HINT)
> +

Nit picking for later refactoring.

"cannot add 'XXXX'" seems uncomfortable, when this code path is
executed for removed file. This should:

  1. use more generic phrase ("manage", "operate', 'involve' or so on), or
  2. switch message according to operation type

"use 'hg add --sparse ...." in ADD_SPARSE_HINT, too.

BTW:

  - normal() always uses _addpath()
  - add() always uses _addpath()
  - merge() uses
    - normallookup()
    - otherparent(), which always uses _addpath()
  - normallookup()
    If "self._pl[1] != nullid and f in self._map" route is
    executed, "f in self._map" means that specified file is already
    managed in dirstate, and it should be detected before.

    Therefore, normallookup() always uses _addpath(), in practice

  - remove() always uses _droppath()

  - copy()
    Both "source" and "dest" should exist in dirstate before copy()

Therefore, how about these?

  - check "inside sparse" in _addpath() and _droppath(),
    and raise Abort, SparseError (as Martin mentioned), or so

    We can switch message and hint here.

    Strictly speaking, this allows lstat() on a file outside sparse in
    normal() case. But caller (merge/update, revert, and so on) should
    avoid it.

  - check "inside sparse" in code paths below,
    and raise ProgrammingError or so (use "assert" ?)

    - "self._pl[1] != nullid and f in self._map" route in normallookup()
    - at the beginning of copy()


>      def flagfunc(self, buildfallback):
>          if self._checklink and self._checkexec:
>              def f(x):
> @@ -514,6 +530,11 @@ class dirstate(object):
>          """Mark dest as a copy of source. Unmark dest if source is None."""
>          if source == dest:
>              return
> +
> +        if source is not None:
> +            self._ensuresparsematch(source)
> +        self._ensuresparsematch(dest)
> +
>          self._dirty = True
>          if source is not None:
>              self._copymap[dest] = source
> @@ -565,6 +586,7 @@ class dirstate(object):
>  
>      def normal(self, f):
>          '''Mark a file normal and clean.'''
> +        self._ensuresparsematch(f)
>          s = os.lstat(self._join(f))
>          mtime = s.st_mtime
>          self._addpath(f, 'n', s.st_mode,
> @@ -581,6 +603,8 @@ class dirstate(object):
>  
>      def normallookup(self, f):
>          '''Mark a file normal, but possibly dirty.'''
> +        self._ensuresparsematch(f)
> +
>          if self._pl[1] != nullid and f in self._map:
>              # if there is a merge going on and the file was either
>              # in state 'm' (-1) or coming from other parent (-2) before
> @@ -620,12 +644,14 @@ class dirstate(object):
>  
>      def add(self, f):
>          '''Mark a file added.'''
> +        self._ensuresparsematch(f)
>          self._addpath(f, 'a', 0, -1, -1)
>          if f in self._copymap:
>              del self._copymap[f]
>  
>      def remove(self, f):
>          '''Mark a file removed.'''
> +        self._ensuresparsematch(f)
>          self._dirty = True
>          self._droppath(f)
>          size = 0
> @@ -644,6 +670,8 @@ class dirstate(object):
>  
>      def merge(self, f):
>          '''Mark a file merged.'''
> +        self._ensuresparsematch(f)
> +
>          if self._pl[1] == nullid:
>              return self.normallookup(f)
>          return self.otherparent(f)
> diff --git a/tests/test-sparse.t b/tests/test-sparse.t
> --- a/tests/test-sparse.t
> +++ b/tests/test-sparse.t
> @@ -98,8 +98,8 @@ Verify status only shows included files
>  Adding an excluded file should fail
>  
>    $ hg add hide3
> -  abort: cannot add 'hide3' - it is outside the sparse checkout
> -  (include file with `hg debugsparse --include <pattern>` or use `hg add -s <file>` to include file directory while adding)
> +  abort: cannot add 'hide3' because it is outside the sparse checkout
> +  (include file with 'hg debugsparse --include <pattern>' or use 'hg add --sparse <file>' to include file directory while adding)
>    [255]
>  
>  Verify deleting sparseness while a file has changes fails
> @@ -265,8 +265,8 @@ Test that add -s adds dirs to sparse pro
>    $ touch add/foo
>    $ touch add/bar
>    $ hg add add/foo
> -  abort: cannot add 'add/foo' - it is outside the sparse checkout
> -  (include file with `hg debugsparse --include <pattern>` or use `hg add -s <file>` to include file directory while adding)
> +  abort: cannot add 'add/foo' because it is outside the sparse checkout
> +  (include file with 'hg debugsparse --include <pattern>' or use 'hg add --sparse <file>' to include file directory while adding)
>    [255]
>    $ hg add -s add/foo
>    $ hg st
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Patch

diff --git a/hgext/sparse.py b/hgext/sparse.py
--- a/hgext/sparse.py
+++ b/hgext/sparse.py
@@ -240,23 +240,6 @@  def _setupdirstate(ui):
         return orig(self, parent, allfiles, changedfiles)
     extensions.wrapfunction(dirstate.dirstate, 'rebuild', _rebuild)
 
-    # Prevent adding files that are outside the sparse checkout
-    editfuncs = ['normal', 'add', 'normallookup', 'copy', 'remove', 'merge']
-    hint = _('include file with `hg debugsparse --include <pattern>` or use ' +
-             '`hg add -s <file>` to include file directory while adding')
-    for func in editfuncs:
-        def _wrapper(orig, self, *args):
-            sparsematch = self._sparsematcher
-            if not sparsematch.always():
-                for f in args:
-                    if (f is not None and not sparsematch(f) and
-                        f not in self):
-                        raise error.Abort(_("cannot add '%s' - it is outside "
-                                            "the sparse checkout") % f,
-                                          hint=hint)
-            return orig(self, *args)
-        extensions.wrapfunction(dirstate.dirstate, func, _wrapper)
-
 @command('^debugsparse', [
     ('I', 'include', False, _('include files in the sparse checkout')),
     ('X', 'exclude', False, _('exclude files in the sparse checkout')),
diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
--- a/mercurial/dirstate.py
+++ b/mercurial/dirstate.py
@@ -68,6 +68,10 @@  def nonnormalentries(dmap):
                 otherparent.add(fname)
         return nonnorm, otherparent
 
+ADD_SPARSE_HINT = _("include file with 'hg debugsparse --include <pattern>' "
+                    "or use 'hg add --sparse <file>' to include file "
+                    "directory while adding")
+
 class dirstate(object):
 
     def __init__(self, opener, ui, root, validate, sparsematchfn):
@@ -273,6 +277,18 @@  class dirstate(object):
         # it's safe because f is always a relative path
         return self._rootdir + f
 
+    def _ensuresparsematch(self, f):
+        """Verifies a path is part of the sparse config and aborts if not."""
+        matcher = self._sparsematcher
+
+        if matcher.always():
+            return
+
+        if not matcher(f) and f not in self._map:
+            raise error.Abort(_("cannot add '%s' because it is outside the "
+                                "sparse checkout") % f,
+                              hint=ADD_SPARSE_HINT)
+
     def flagfunc(self, buildfallback):
         if self._checklink and self._checkexec:
             def f(x):
@@ -514,6 +530,11 @@  class dirstate(object):
         """Mark dest as a copy of source. Unmark dest if source is None."""
         if source == dest:
             return
+
+        if source is not None:
+            self._ensuresparsematch(source)
+        self._ensuresparsematch(dest)
+
         self._dirty = True
         if source is not None:
             self._copymap[dest] = source
@@ -565,6 +586,7 @@  class dirstate(object):
 
     def normal(self, f):
         '''Mark a file normal and clean.'''
+        self._ensuresparsematch(f)
         s = os.lstat(self._join(f))
         mtime = s.st_mtime
         self._addpath(f, 'n', s.st_mode,
@@ -581,6 +603,8 @@  class dirstate(object):
 
     def normallookup(self, f):
         '''Mark a file normal, but possibly dirty.'''
+        self._ensuresparsematch(f)
+
         if self._pl[1] != nullid and f in self._map:
             # if there is a merge going on and the file was either
             # in state 'm' (-1) or coming from other parent (-2) before
@@ -620,12 +644,14 @@  class dirstate(object):
 
     def add(self, f):
         '''Mark a file added.'''
+        self._ensuresparsematch(f)
         self._addpath(f, 'a', 0, -1, -1)
         if f in self._copymap:
             del self._copymap[f]
 
     def remove(self, f):
         '''Mark a file removed.'''
+        self._ensuresparsematch(f)
         self._dirty = True
         self._droppath(f)
         size = 0
@@ -644,6 +670,8 @@  class dirstate(object):
 
     def merge(self, f):
         '''Mark a file merged.'''
+        self._ensuresparsematch(f)
+
         if self._pl[1] == nullid:
             return self.normallookup(f)
         return self.otherparent(f)
diff --git a/tests/test-sparse.t b/tests/test-sparse.t
--- a/tests/test-sparse.t
+++ b/tests/test-sparse.t
@@ -98,8 +98,8 @@  Verify status only shows included files
 Adding an excluded file should fail
 
   $ hg add hide3
-  abort: cannot add 'hide3' - it is outside the sparse checkout
-  (include file with `hg debugsparse --include <pattern>` or use `hg add -s <file>` to include file directory while adding)
+  abort: cannot add 'hide3' because it is outside the sparse checkout
+  (include file with 'hg debugsparse --include <pattern>' or use 'hg add --sparse <file>' to include file directory while adding)
   [255]
 
 Verify deleting sparseness while a file has changes fails
@@ -265,8 +265,8 @@  Test that add -s adds dirs to sparse pro
   $ touch add/foo
   $ touch add/bar
   $ hg add add/foo
-  abort: cannot add 'add/foo' - it is outside the sparse checkout
-  (include file with `hg debugsparse --include <pattern>` or use `hg add -s <file>` to include file directory while adding)
+  abort: cannot add 'add/foo' because it is outside the sparse checkout
+  (include file with 'hg debugsparse --include <pattern>' or use 'hg add --sparse <file>' to include file directory while adding)
   [255]
   $ hg add -s add/foo
   $ hg st