Patchwork [2,of,3,V2] hardlink: duplicate hardlink detection for copying files and directories

login
register
mail settings
Submitter Jun Wu
Date March 29, 2017, 7:42 p.m.
Message ID <b4e6f395c7940676c56b.1490816537@x1c>
Download mbox | patch
Permalink /patch/19824/
State Accepted
Headers show

Comments

Jun Wu - March 29, 2017, 7:42 p.m.
# HG changeset patch
# User Jun Wu <quark@fb.com>
# Date 1490815606 25200
#      Wed Mar 29 12:26:46 2017 -0700
# Node ID b4e6f395c7940676c56b6f5308e203aa861d8bbe
# Parent  b1ef68e4196e01f723b78746d752f60e46e33cc0
# Available At https://bitbucket.org/quark-zju/hg-draft
#              hg pull https://bitbucket.org/quark-zju/hg-draft -r b4e6f395c794
hardlink: duplicate hardlink detection for copying files and directories

A later patch will change one of them so they diverge.
via Mercurial-devel - April 3, 2017, 6:20 p.m.
On Wed, Mar 29, 2017 at 12:42 PM, Jun Wu <quark@fb.com> wrote:
> # HG changeset patch
> # User Jun Wu <quark@fb.com>
> # Date 1490815606 25200
> #      Wed Mar 29 12:26:46 2017 -0700
> # Node ID b4e6f395c7940676c56b6f5308e203aa861d8bbe
> # Parent  b1ef68e4196e01f723b78746d752f60e46e33cc0
> # Available At https://bitbucket.org/quark-zju/hg-draft
> #              hg pull https://bitbucket.org/quark-zju/hg-draft -r b4e6f395c794
> hardlink: duplicate hardlink detection for copying files and directories
>
> A later patch will change one of them so they diverge.
>
> diff --git a/mercurial/util.py b/mercurial/util.py
> --- a/mercurial/util.py
> +++ b/mercurial/util.py
> @@ -1127,12 +1127,11 @@ def copyfiles(src, dst, hardlink=None, p
>      num = 0
>
> -    if hardlink is None:
> -        hardlink = (os.stat(src).st_dev ==
> -                    os.stat(os.path.dirname(dst)).st_dev)
> -
>      gettopic = lambda: hardlink and _('linking') or _('copying')
> -    topic = gettopic()
>
>      if os.path.isdir(src):
> +        if hardlink is None:
> +            hardlink = (os.stat(src).st_dev ==
> +                        os.stat(os.path.dirname(dst)).st_dev)

The next patch changes the condition below. It seems to make sense to
also drop the os.path.dirname() call here, no? I would have just sent
a patch for it if I trusted I could test it quickly.

> +        topic = gettopic()
>          os.mkdir(dst)
>          for name, kind in osutil.listdir(src):
> @@ -1145,4 +1144,9 @@ def copyfiles(src, dst, hardlink=None, p
>              num += n
>      else:
> +        if hardlink is None:
> +            hardlink = (os.stat(src).st_dev ==
> +                        os.stat(os.path.dirname(dst)).st_dev)
> +        topic = gettopic()
> +
>          if hardlink:
>              try:
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Jun Wu - April 3, 2017, 6:40 p.m.
Excerpts from Martin von Zweigbergk's message of 2017-04-03 11:20:40 -0700:
> On Wed, Mar 29, 2017 at 12:42 PM, Jun Wu <quark@fb.com> wrote:
> > # HG changeset patch
> > # User Jun Wu <quark@fb.com>
> > # Date 1490815606 25200
> > #      Wed Mar 29 12:26:46 2017 -0700
> > # Node ID b4e6f395c7940676c56b6f5308e203aa861d8bbe
> > # Parent  b1ef68e4196e01f723b78746d752f60e46e33cc0
> > # Available At https://bitbucket.org/quark-zju/hg-draft 
> > #              hg pull https://bitbucket.org/quark-zju/hg-draft  -r b4e6f395c794
> > hardlink: duplicate hardlink detection for copying files and directories
> >
> > A later patch will change one of them so they diverge.
> >
> > diff --git a/mercurial/util.py b/mercurial/util.py
> > --- a/mercurial/util.py
> > +++ b/mercurial/util.py
> > @@ -1127,12 +1127,11 @@ def copyfiles(src, dst, hardlink=None, p
> >      num = 0
> >
> > -    if hardlink is None:
> > -        hardlink = (os.stat(src).st_dev ==
> > -                    os.stat(os.path.dirname(dst)).st_dev)
> > -
> >      gettopic = lambda: hardlink and _('linking') or _('copying')
> > -    topic = gettopic()
> >
> >      if os.path.isdir(src):
> > +        if hardlink is None:
> > +            hardlink = (os.stat(src).st_dev ==
> > +                        os.stat(os.path.dirname(dst)).st_dev)
> 
> The next patch changes the condition below. It seems to make sense to
> also drop the os.path.dirname() call here, no? I would have just sent
> a patch for it if I trusted I could test it quickly.

I think the most sane behavior is to not test "hardlink" (i.e. leave
"hardlink = None") when copying a directory. Because filesystems usually do
not support hardlink a directory, and even if it supports, it may cause
unwanted effects later.

> 
> > +        topic = gettopic()
> >          os.mkdir(dst)
> >          for name, kind in osutil.listdir(src):
> > @@ -1145,4 +1144,9 @@ def copyfiles(src, dst, hardlink=None, p
> >              num += n
> >      else:
> > +        if hardlink is None:
> > +            hardlink = (os.stat(src).st_dev ==
> > +                        os.stat(os.path.dirname(dst)).st_dev)
> > +        topic = gettopic()
> > +
> >          if hardlink:
> >              try:
> > _______________________________________________
> > Mercurial-devel mailing list
> > Mercurial-devel@mercurial-scm.org
> > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
via Mercurial-devel - April 3, 2017, 6:41 p.m.
Sounds good. Will you send a patch?

On Mon, Apr 3, 2017, 11:40 Jun Wu <quark@fb.com> wrote:

> Excerpts from Martin von Zweigbergk's message of 2017-04-03 11:20:40 -0700:
> > On Wed, Mar 29, 2017 at 12:42 PM, Jun Wu <quark@fb.com> wrote:
> > > # HG changeset patch
> > > # User Jun Wu <quark@fb.com>
> > > # Date 1490815606 25200
> > > #      Wed Mar 29 12:26:46 2017 -0700
> > > # Node ID b4e6f395c7940676c56b6f5308e203aa861d8bbe
> > > # Parent  b1ef68e4196e01f723b78746d752f60e46e33cc0
> > > # Available At https://bitbucket.org/quark-zju/hg-draft
> > > #              hg pull https://bitbucket.org/quark-zju/hg-draft  -r
> b4e6f395c794
> > > hardlink: duplicate hardlink detection for copying files and
> directories
> > >
> > > A later patch will change one of them so they diverge.
> > >
> > > diff --git a/mercurial/util.py b/mercurial/util.py
> > > --- a/mercurial/util.py
> > > +++ b/mercurial/util.py
> > > @@ -1127,12 +1127,11 @@ def copyfiles(src, dst, hardlink=None, p
> > >      num = 0
> > >
> > > -    if hardlink is None:
> > > -        hardlink = (os.stat(src).st_dev ==
> > > -                    os.stat(os.path.dirname(dst)).st_dev)
> > > -
> > >      gettopic = lambda: hardlink and _('linking') or _('copying')
> > > -    topic = gettopic()
> > >
> > >      if os.path.isdir(src):
> > > +        if hardlink is None:
> > > +            hardlink = (os.stat(src).st_dev ==
> > > +                        os.stat(os.path.dirname(dst)).st_dev)
> >
> > The next patch changes the condition below. It seems to make sense to
> > also drop the os.path.dirname() call here, no? I would have just sent
> > a patch for it if I trusted I could test it quickly.
>
> I think the most sane behavior is to not test "hardlink" (i.e. leave
> "hardlink = None") when copying a directory. Because filesystems usually do
> not support hardlink a directory, and even if it supports, it may cause
> unwanted effects later.
>
> >
> > > +        topic = gettopic()
> > >          os.mkdir(dst)
> > >          for name, kind in osutil.listdir(src):
> > > @@ -1145,4 +1144,9 @@ def copyfiles(src, dst, hardlink=None, p
> > >              num += n
> > >      else:
> > > +        if hardlink is None:
> > > +            hardlink = (os.stat(src).st_dev ==
> > > +                        os.stat(os.path.dirname(dst)).st_dev)
> > > +        topic = gettopic()
> > > +
> > >          if hardlink:
> > >              try:
> > > _______________________________________________
> > > Mercurial-devel mailing list
> > > Mercurial-devel@mercurial-scm.org
> > > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
Jun Wu - April 3, 2017, 8:20 p.m.
Excerpts from Martin von Zweigbergk's message of 2017-04-03 18:41:48 +0000:
> Sounds good. Will you send a patch?

I actually tried that before the current fix. But it's not that trivial
because of interaction with "progess", and "hardlink".

I think some API change is needed to fix it cleanly. I will think about it
again and send the fix if API change is simple.
Yuya Nishihara - April 4, 2017, 2:51 p.m.
On Mon, 3 Apr 2017 13:20:45 -0700, Jun Wu wrote:
> Excerpts from Martin von Zweigbergk's message of 2017-04-03 18:41:48 +0000:
> > Sounds good. Will you send a patch?
> 
> I actually tried that before the current fix. But it's not that trivial
> because of interaction with "progess", and "hardlink".
> 
> I think some API change is needed to fix it cleanly. I will think about it
> again and send the fix if API change is simple.

Last time I read that code, I felt it could be simpler if we'd stop using
recursion. I haven't tried though.

Patch

diff --git a/mercurial/util.py b/mercurial/util.py
--- a/mercurial/util.py
+++ b/mercurial/util.py
@@ -1127,12 +1127,11 @@  def copyfiles(src, dst, hardlink=None, p
     num = 0
 
-    if hardlink is None:
-        hardlink = (os.stat(src).st_dev ==
-                    os.stat(os.path.dirname(dst)).st_dev)
-
     gettopic = lambda: hardlink and _('linking') or _('copying')
-    topic = gettopic()
 
     if os.path.isdir(src):
+        if hardlink is None:
+            hardlink = (os.stat(src).st_dev ==
+                        os.stat(os.path.dirname(dst)).st_dev)
+        topic = gettopic()
         os.mkdir(dst)
         for name, kind in osutil.listdir(src):
@@ -1145,4 +1144,9 @@  def copyfiles(src, dst, hardlink=None, p
             num += n
     else:
+        if hardlink is None:
+            hardlink = (os.stat(src).st_dev ==
+                        os.stat(os.path.dirname(dst)).st_dev)
+        topic = gettopic()
+
         if hardlink:
             try: