Patchwork [3,of,7] vfs: create copy at renaming to avoid file stat ambiguity if needed

login
register
mail settings
Submitter Katsunori FUJIWARA
Date June 9, 2017, 4:08 a.m.
Message ID <650b77396c6ea684d7ff.1496981303@speaknoevil>
Download mbox | patch
Permalink /patch/21264/
State Accepted
Headers show

Comments

Katsunori FUJIWARA - June 9, 2017, 4:08 a.m.
# HG changeset patch
# User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
# Date 1496980698 -32400
#      Fri Jun 09 12:58:18 2017 +0900
# Node ID 650b77396c6ea684d7ffca6c5e0921482eaffd49
# Parent  5ae5c4d392374e41d489d87a9d7c1a85920e0702
vfs: create copy at renaming to avoid file stat ambiguity if needed

In order to fix issue5418, bff5ccbe5ead made vfs.rename(checkambig=True)
omit advancing mtime of renamed file, if renamed file is owned by
another (EPERM is raised in this case).

But this omission causes rewinding mtime at restoration in such
situation, and makes avoiding file stat ambiguity difficult, because
ExactCacheValidationPlan assumes that mtime should be advanced, if a
file is changed in same ctime.

    https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan

Ambiguity of file stat also requires issue5584 to be fixed with other
than file stat, but "hash of file", "generation ID" and so on were
already rejected ideas (please see original RFC linked from "Outline
of issue" in ExactCacheValidationPlan page).

This omission occurs:

  - only for non append-only files (dirstate, bookmarks, and phaseroots), and
  - only if previous transaction is rollbacked by another user

The latter means "sharing a repository clone via group permission".
This is reasonable usecase, but not ordinary for many users, IMHO.
"hg rollback" itself has been deprecated since Mercurial 2.7, too.

Therefore, increasing the cost at rollbacking previous transaction
executed by another a little seems reasonable, for avoidance of file
stat ambiguity.

This patch does:

  - create copy of (already renamed) source file, if advancing mtime
    fails for EPERM
  - rename from copied file to destination file, and
  - advance mtime of renamed file, which is now owned by current user

This patch also factors "self.join(src)" out to reduce redundancy.
Yuya Nishihara - June 11, 2017, 3:23 a.m.
On Fri, 09 Jun 2017 13:08:23 +0900, FUJIWARA Katsunori wrote:
> # HG changeset patch
> # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> # Date 1496980698 -32400
> #      Fri Jun 09 12:58:18 2017 +0900
> # Node ID 650b77396c6ea684d7ffca6c5e0921482eaffd49
> # Parent  5ae5c4d392374e41d489d87a9d7c1a85920e0702
> vfs: create copy at renaming to avoid file stat ambiguity if needed

> diff --git a/mercurial/vfs.py b/mercurial/vfs.py
> --- a/mercurial/vfs.py
> +++ b/mercurial/vfs.py
> @@ -174,6 +174,7 @@ class abstractvfs(object):
>          only if destination file is guarded by any lock
>          (e.g. repo.lock or repo.wlock).
>          """
> +        srcpath = self.join(src)
>          dstpath = self.join(dst)
>          oldstat = checkambig and util.filestat(dstpath)
>          if oldstat and oldstat.stat:
> @@ -184,9 +185,13 @@ class abstractvfs(object):
>                      # stat of renamed file is ambiguous to original one
>                      return ret, newstat.avoidambig(dpath, oldstat)
>                  return ret, True
> -            ret, avoided = dorename(self.join(src), dstpath)
> +            ret, avoided = dorename(srcpath, dstpath)
> +            if not avoided:
> +                # simply copy to change owner of srcpath (see issue5418)
> +                util.copyfile(dstpath, srcpath)

Is there any chance that the srcpath, which was freed by the previous rename,
could be reused by another process? I'm just asking because I don't know it's
possible in practice.
Katsunori FUJIWARA - June 11, 2017, 9:51 p.m.
At Sun, 11 Jun 2017 12:23:34 +0900,
Yuya Nishihara wrote:
> 
> On Fri, 09 Jun 2017 13:08:23 +0900, FUJIWARA Katsunori wrote:
> > # HG changeset patch
> > # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> > # Date 1496980698 -32400
> > #      Fri Jun 09 12:58:18 2017 +0900
> > # Node ID 650b77396c6ea684d7ffca6c5e0921482eaffd49
> > # Parent  5ae5c4d392374e41d489d87a9d7c1a85920e0702
> > vfs: create copy at renaming to avoid file stat ambiguity if needed
> 
> > diff --git a/mercurial/vfs.py b/mercurial/vfs.py
> > --- a/mercurial/vfs.py
> > +++ b/mercurial/vfs.py
> > @@ -174,6 +174,7 @@ class abstractvfs(object):
> >          only if destination file is guarded by any lock
> >          (e.g. repo.lock or repo.wlock).
> >          """
> > +        srcpath = self.join(src)
> >          dstpath = self.join(dst)
> >          oldstat = checkambig and util.filestat(dstpath)
> >          if oldstat and oldstat.stat:
> > @@ -184,9 +185,13 @@ class abstractvfs(object):
> >                      # stat of renamed file is ambiguous to original one
> >                      return ret, newstat.avoidambig(dpath, oldstat)
> >                  return ret, True
> > -            ret, avoided = dorename(self.join(src), dstpath)
> > +            ret, avoided = dorename(srcpath, dstpath)
> > +            if not avoided:
> > +                # simply copy to change owner of srcpath (see issue5418)
> > +                util.copyfile(dstpath, srcpath)
> 
> Is there any chance that the srcpath, which was freed by the previous rename,
> could be reused by another process? I'm just asking because I don't know it's
> possible in practice.

vfs.rename(checkambig=True) is invoked only by:

  - changelog._finalize(), inside (s)lock
  - localrepo._rollback() for bookmarks and phaseroots, inside (s)lock
  - dirstate.restorebackup(), inside wlock

In these cases, "src" of renaming can't be created without acquisition
of corresponded lock.

Therefore, there is no chance of reusing original "src" by another
process, in practice.

But should I revise current docstring below with "only if both source
and destination files are guarded ...." or so, for future readers ?

        checkambig argument is used with util.filestat, and is useful
        only if destination file is guarded by any lock
        (e.g. repo.lock or repo.wlock).
Yuya Nishihara - June 12, 2017, 12:21 p.m.
On Mon, 12 Jun 2017 06:51:30 +0900, FUJIWARA Katsunori wrote:
> At Sun, 11 Jun 2017 12:23:34 +0900,
> Yuya Nishihara wrote:
> > 
> > On Fri, 09 Jun 2017 13:08:23 +0900, FUJIWARA Katsunori wrote:
> > > # HG changeset patch
> > > # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> > > # Date 1496980698 -32400
> > > #      Fri Jun 09 12:58:18 2017 +0900
> > > # Node ID 650b77396c6ea684d7ffca6c5e0921482eaffd49
> > > # Parent  5ae5c4d392374e41d489d87a9d7c1a85920e0702
> > > vfs: create copy at renaming to avoid file stat ambiguity if needed
> > 
> > > diff --git a/mercurial/vfs.py b/mercurial/vfs.py
> > > --- a/mercurial/vfs.py
> > > +++ b/mercurial/vfs.py
> > > @@ -174,6 +174,7 @@ class abstractvfs(object):
> > >          only if destination file is guarded by any lock
> > >          (e.g. repo.lock or repo.wlock).
> > >          """
> > > +        srcpath = self.join(src)
> > >          dstpath = self.join(dst)
> > >          oldstat = checkambig and util.filestat(dstpath)
> > >          if oldstat and oldstat.stat:
> > > @@ -184,9 +185,13 @@ class abstractvfs(object):
> > >                      # stat of renamed file is ambiguous to original one
> > >                      return ret, newstat.avoidambig(dpath, oldstat)
> > >                  return ret, True
> > > -            ret, avoided = dorename(self.join(src), dstpath)
> > > +            ret, avoided = dorename(srcpath, dstpath)
> > > +            if not avoided:
> > > +                # simply copy to change owner of srcpath (see issue5418)
> > > +                util.copyfile(dstpath, srcpath)
> > 
> > Is there any chance that the srcpath, which was freed by the previous rename,
> > could be reused by another process? I'm just asking because I don't know it's
> > possible in practice.
> 
> vfs.rename(checkambig=True) is invoked only by:
> 
>   - changelog._finalize(), inside (s)lock
>   - localrepo._rollback() for bookmarks and phaseroots, inside (s)lock
>   - dirstate.restorebackup(), inside wlock
> 
> In these cases, "src" of renaming can't be created without acquisition
> of corresponded lock.
> 
> Therefore, there is no chance of reusing original "src" by another
> process, in practice.

Thanks, that made me feel safer.

> But should I revise current docstring below with "only if both source
> and destination files are guarded ...." or so, for future readers ?
> 
>         checkambig argument is used with util.filestat, and is useful
>         only if destination file is guarded by any lock
>         (e.g. repo.lock or repo.wlock).

Or maybe we could turn it to safer side by using tempfile.mkstemp() ?

Patch

diff --git a/mercurial/vfs.py b/mercurial/vfs.py
--- a/mercurial/vfs.py
+++ b/mercurial/vfs.py
@@ -174,6 +174,7 @@  class abstractvfs(object):
         only if destination file is guarded by any lock
         (e.g. repo.lock or repo.wlock).
         """
+        srcpath = self.join(src)
         dstpath = self.join(dst)
         oldstat = checkambig and util.filestat(dstpath)
         if oldstat and oldstat.stat:
@@ -184,9 +185,13 @@  class abstractvfs(object):
                     # stat of renamed file is ambiguous to original one
                     return ret, newstat.avoidambig(dpath, oldstat)
                 return ret, True
-            ret, avoided = dorename(self.join(src), dstpath)
+            ret, avoided = dorename(srcpath, dstpath)
+            if not avoided:
+                # simply copy to change owner of srcpath (see issue5418)
+                util.copyfile(dstpath, srcpath)
+                ret, avoided = dorename(srcpath, dstpath)
             return ret
-        return util.rename(self.join(src), dstpath)
+        return util.rename(srcpath, dstpath)
 
     def readlink(self, path):
         return os.readlink(self.join(path))