Patchwork convert: config option for git rename limit

login
register
mail settings
Submitter Gregory Szorc
Date Dec. 18, 2016, 8:54 p.m.
Message ID <34f99a6983df082fc56f.1482094447@ubuntu-vm-main>
Download mbox | patch
Permalink /patch/17955/
State Accepted
Headers show

Comments

Gregory Szorc - Dec. 18, 2016, 8:54 p.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1482094400 28800
#      Sun Dec 18 12:53:20 2016 -0800
# Node ID 34f99a6983df082fc56f50e8aab9dd5f1bbe6676
# Parent  935092e525b0ee5656d0830162a1c2adf8248de3
convert: config option for git rename limit

By default, Git applies rename and copy detection to 400 files. The
diff.renamelimit config option and -l argument to diff commands can
override this.

As part of converting some repositories in the wild, I was hitting
the default limit. Unfortunately, the warnings that Git prints in this
scenario are swallowed because the process running functionality in
common.py redirects stderr to /dev/null by default. This seems like
a bug, but a bug for another day.

This commit establishes a config option to send the rename limit
through to `git diff-tree`. The added tests demonstrate a too-low
rename limit doesn't result in copy metadata being recorded.
Pierre-Yves David - Dec. 21, 2016, 2:31 p.m.
On 12/18/2016 09:54 PM, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1482094400 28800
> #      Sun Dec 18 12:53:20 2016 -0800
> # Node ID 34f99a6983df082fc56f50e8aab9dd5f1bbe6676
> # Parent  935092e525b0ee5656d0830162a1c2adf8248de3
> convert: config option for git rename limit
>
> By default, Git applies rename and copy detection to 400 files. The
> diff.renamelimit config option and -l argument to diff commands can
> override this.

Looks reasonable, pushed.

Cheers,
Danek Duvall - Dec. 23, 2016, 8:08 p.m.
I got this in my test run last night:

    --- .../tests/test-convert-git.t
    +++ .../tests/test-convert-git.t.err
    @@ -397,7 +397,7 @@
       A bar-copy0
	 bar
       A bar-copy1
    -    bar
    +    bar-copied2

Haven't tracked it down yet, but I thought I'd make a note in case it was
obvious to you.  My git version is 2.7.4, if that's relevant here.

Thanks,
Danek
Gregory Szorc - Dec. 23, 2016, 8:42 p.m.
Yeah, I'm seeing something similar with Git 2.11 on OS X. Strangely, I
don't think I saw this on my Linux machine.

On Fri, Dec 23, 2016 at 1:08 PM, Danek Duvall <danek.duvall@oracle.com>
wrote:

> I got this in my test run last night:
>
>     --- .../tests/test-convert-git.t
>     +++ .../tests/test-convert-git.t.err
>     @@ -397,7 +397,7 @@
>        A bar-copy0
>          bar
>        A bar-copy1
>     -    bar
>     +    bar-copied2
>
> Haven't tracked it down yet, but I thought I'd make a note in case it was
> obvious to you.  My git version is 2.7.4, if that's relevant here.
>
> Thanks,
> Danek
>
Pierre-Yves David - Dec. 23, 2016, 10:34 p.m.
On 12/23/2016 09:42 PM, Gregory Szorc wrote:
> Yeah, I'm seeing something similar with Git 2.11 on OS X. Strangely, I
> don't think I saw this on my Linux machine.
>
> On Fri, Dec 23, 2016 at 1:08 PM, Danek Duvall <danek.duvall@oracle.com
> <mailto:danek.duvall@oracle.com>> wrote:
>
>     I got this in my test run last night:
>
>         --- .../tests/test-convert-git.t
>         +++ .../tests/test-convert-git.t.err
>         @@ -397,7 +397,7 @@
>            A bar-copy0
>              bar
>            A bar-copy1
>         -    bar
>         +    bar-copied2
>
>     Haven't tracked it down yet, but I thought I'd make a note in case
>     it was
>     obvious to you.  My git version is 2.7.4, if that's relevant here.
>
>     Thanks,
>     Danek


Should we drop that patch in the mean time ?

Cheers,
Gregory Szorc - Dec. 24, 2016, 5:47 p.m.
On Fri, Dec 23, 2016 at 3:34 PM, Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 12/23/2016 09:42 PM, Gregory Szorc wrote:
>
>> Yeah, I'm seeing something similar with Git 2.11 on OS X. Strangely, I
>> don't think I saw this on my Linux machine.
>>
>> On Fri, Dec 23, 2016 at 1:08 PM, Danek Duvall <danek.duvall@oracle.com
>> <mailto:danek.duvall@oracle.com>> wrote:
>>
>>     I got this in my test run last night:
>>
>>         --- .../tests/test-convert-git.t
>>         +++ .../tests/test-convert-git.t.err
>>         @@ -397,7 +397,7 @@
>>            A bar-copy0
>>              bar
>>            A bar-copy1
>>         -    bar
>>         +    bar-copied2
>>
>>     Haven't tracked it down yet, but I thought I'd make a note in case
>>     it was
>>     obvious to you.  My git version is 2.7.4, if that's relevant here.
>>
>>     Thanks,
>>     Danek
>>
>
>
> Should we drop that patch in the mean time ?
>

I just patchbombed a fix.

Patch

diff --git a/hgext/convert/__init__.py b/hgext/convert/__init__.py
--- a/hgext/convert/__init__.py
+++ b/hgext/convert/__init__.py
@@ -320,6 +320,13 @@  def convert(ui, src, dest=None, revmapfi
         is very expensive for large projects, and is only effective when
         ``convert.git.similarity`` is greater than 0. The default is False.
 
+    :convert.git.renamelimit: perform rename and copy detection up to this
+        many changed files in a commit. Increasing this will make rename
+        and copy detection more accurate but will significantly slow down
+        computation on large projects. The option is only relevant if
+        ``convert.git.similarity`` is greater than 0. The default is
+        ``400``.
+
     :convert.git.remoteprefix: remote refs are converted as bookmarks with
         ``convert.git.remoteprefix`` as a prefix followed by a /. The default
         is 'remote'.
diff --git a/hgext/convert/git.py b/hgext/convert/git.py
--- a/hgext/convert/git.py
+++ b/hgext/convert/git.py
@@ -78,6 +78,10 @@  class convert_git(common.converter_sourc
                                              False)
             if findcopiesharder:
                 self.simopt.append('--find-copies-harder')
+
+            renamelimit = ui.configint('convert', 'git.renamelimit',
+                                       default=400)
+            self.simopt.append('-l%d' % renamelimit)
         else:
             self.simopt = []
 
diff --git a/tests/test-convert-git.t b/tests/test-convert-git.t
--- a/tests/test-convert-git.t
+++ b/tests/test-convert-git.t
@@ -374,6 +374,31 @@  source, the copy source took the content
   A bar-copied2
     bar
 
+renamelimit config option works
+
+  $ cd git-repo2
+  $ cp bar bar-copy0
+  $ echo 0 >> bar-copy0
+  $ cp bar bar-copy1
+  $ echo 1 >> bar-copy1
+  $ git add bar-copy0 bar-copy1
+  $ commit -a -m 'copy bar 2 times'
+  $ cd ..
+
+  $ hg -q convert --config convert.git.renamelimit=1 \
+  > --config convert.git.findcopiesharder=true --datesort git-repo2 fullrepo2
+  $ hg -R fullrepo2 status -C --change master
+  A bar-copy0
+  A bar-copy1
+
+  $ hg -q convert --config convert.git.renamelimit=100 \
+  > --config convert.git.findcopiesharder=true --datesort git-repo2 fullrepo3
+  $ hg -R fullrepo3 status -C --change master
+  A bar-copy0
+    bar
+  A bar-copy1
+    bar
+
 test binary conversion (issue1359)
 
   $ count=19
diff --git a/tests/test-convert.t b/tests/test-convert.t
--- a/tests/test-convert.t
+++ b/tests/test-convert.t
@@ -261,6 +261,13 @@ 
                     for large projects, and is only effective when
                     "convert.git.similarity" is greater than 0. The default is
                     False.
+      convert.git.renamelimit
+                    perform rename and copy detection up to this many changed
+                    files in a commit. Increasing this will make rename and copy
+                    detection more accurate but will significantly slow down
+                    computation on large projects. The option is only relevant
+                    if "convert.git.similarity" is greater than 0. The default
+                    is "400".
       convert.git.remoteprefix
                     remote refs are converted as bookmarks with
                     "convert.git.remoteprefix" as a prefix followed by a /. The