Patchwork automv: use 95 as the default similarity threshold

login
register
mail settings
Submitter Martijn Pieters
Date Feb. 16, 2016, 4:20 p.m.
Message ID <8a113a93013288f1aacb.1455639634@mjpieters-mbp>
Download mbox | patch
Permalink /patch/13229/
State Accepted
Headers show

Comments

Martijn Pieters - Feb. 16, 2016, 4:20 p.m.
# HG changeset patch
# User Martijn Pieters <mjpieters@fb.com>
# Date 1455638312 0
#      Tue Feb 16 15:58:32 2016 +0000
# Node ID 8a113a93013288f1aacb0016ec93e61b8ff3ed08
# Parent  c2e526103b29a5091cf139da302b99ee674957f2
automv: use 95 as the default similarity threshold.

The motivation for the change from 100 to 95 is included in a comment.

* Updated the tests to include a change to a moved file that still should be
  caught as a move.

* Use ui.configint() to non-integer configuration entries more gracefully. Also
  complain if a similarity outside of the acceptable range is set.
Augie Fackler - Feb. 22, 2016, 9:20 p.m.
On Tue, Feb 16, 2016 at 04:20:34PM +0000, Martijn Pieters wrote:
> # HG changeset patch
> # User Martijn Pieters <mjpieters@fb.com>
> # Date 1455638312 0
> #      Tue Feb 16 15:58:32 2016 +0000
> # Node ID 8a113a93013288f1aacb0016ec93e61b8ff3ed08
> # Parent  c2e526103b29a5091cf139da302b99ee674957f2
> automv: use 95 as the default similarity threshold.

queued this, thanks

>
> The motivation for the change from 100 to 95 is included in a comment.
>
> * Updated the tests to include a change to a moved file that still should be
>   caught as a move.
>
> * Use ui.configint() to non-integer configuration entries more gracefully. Also
>   complain if a similarity outside of the acceptable range is set.
>
> diff --git a/hgext/automv.py b/hgext/automv.py
> --- a/hgext/automv.py
> +++ b/hgext/automv.py
> @@ -11,14 +11,25 @@
>
>  The threshold at which a file is considered a move can be set with the
>  ``automv.similarity`` config option. This option takes a percentage between 0
> -(disabled) and 100 (files must be identical), the default is 100.
> +(disabled) and 100 (files must be identical), the default is 95.
>
>  """
> +
> +# Using 95 as a default similarity is based on an analysis of the mercurial
> +# repositories of the cpython, mozilla-central & mercurial repositories, as
> +# well as 2 very large facebook repositories. At 95 50% of all potential
> +# missed moves would be caught, as well as correspond with 87% of all
> +# explicitly marked moves.  Together, 80% of moved files are 95% similar or
> +# more.
> +#
> +# See http://markmail.org/thread/5pxnljesvufvom57 for context.
> +
>  from __future__ import absolute_import
>
>  from mercurial import (
>      commands,
>      copies,
> +    error,
>      extensions,
>      scmutil,
>      similar
> @@ -37,7 +48,9 @@
>      renames = None
>      disabled = opts.pop('no_automv', False)
>      if not disabled:
> -        threshold = float(ui.config('automv', 'similarity', '100'))
> +        threshold = ui.configint('automv', 'similarity', 95)
> +        if not 0 <= threshold <= 100:
> +            raise error.Abort(_('automv.similarity must be between 0 and 100'))
>          if threshold > 0:
>              match = scmutil.match(repo[None], pats, opts)
>              added, removed = _interestingfiles(repo, match)
> diff --git a/tests/test-automv.t b/tests/test-automv.t
> --- a/tests/test-automv.t
> +++ b/tests/test-automv.t
> @@ -13,7 +13,7 @@
>
>  Test automv command for commit
>
> -  $ echo 'foo' > a.txt
> +  $ printf 'foo\nbar\nbaz\n' > a.txt
>    $ hg add a.txt
>    $ hg commit -m 'init repo with a'
>
> @@ -37,6 +37,24 @@
>    $ mv a.txt b.txt
>    $ hg rm a.txt
>    $ hg add b.txt
> +  $ printf '\n' >> b.txt
> +  $ hg status -C
> +  A b.txt
> +  R a.txt
> +  $ hg commit -m 'msg'
> +  detected move of 1 files
> +  created new head
> +  $ hg status --change . -C
> +  A b.txt
> +    a.txt
> +  R a.txt
> +  $ hg up -r 0
> +  1 files updated, 0 files merged, 1 files removed, 0 files unresolved
> +
> +mv/rm/add/modif
> +  $ mv a.txt b.txt
> +  $ hg rm a.txt
> +  $ hg add b.txt
>    $ printf '\nfoo\n' >> b.txt
>    $ hg status -C
>    A b.txt
> @@ -161,6 +179,29 @@
>    $ mv a.txt b.txt
>    $ hg rm a.txt
>    $ hg add b.txt
> +  $ printf '\n' >> b.txt
> +  $ hg status -C
> +  A b.txt
> +  R a.txt
> +  $ hg commit --amend -m 'amended'
> +  detected move of 1 files
> +  saved backup bundle to $TESTTMP/repo/.hg/strip-backup/*-amend-backup.hg (glob)
> +  $ hg status --change . -C
> +  A b.txt
> +    a.txt
> +  A c.txt
> +  R a.txt
> +  $ hg up -r 0
> +  1 files updated, 0 files merged, 2 files removed, 0 files unresolved
> +
> +mv/rm/add/modif
> +  $ echo 'c' > c.txt
> +  $ hg add c.txt
> +  $ hg commit -m 'revision to amend to'
> +  created new head
> +  $ mv a.txt b.txt
> +  $ hg rm a.txt
> +  $ hg add b.txt
>    $ printf '\nfoo\n' >> b.txt
>    $ hg status -C
>    A b.txt
> @@ -285,3 +326,13 @@
>    $ hg status --change . -C
>    A b.txt
>    R a.txt
> +
> +error conditions
> +
> +  $ cat >> $HGRCPATH << EOF
> +  > [automv]
> +  > similarity=110
> +  > EOF
> +  $ hg commit -m 'revision to amend to'
> +  abort: automv.similarity must be between 0 and 100
> +  [255]
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Patch

diff --git a/hgext/automv.py b/hgext/automv.py
--- a/hgext/automv.py
+++ b/hgext/automv.py
@@ -11,14 +11,25 @@ 
 
 The threshold at which a file is considered a move can be set with the
 ``automv.similarity`` config option. This option takes a percentage between 0
-(disabled) and 100 (files must be identical), the default is 100.
+(disabled) and 100 (files must be identical), the default is 95.
 
 """
+
+# Using 95 as a default similarity is based on an analysis of the mercurial
+# repositories of the cpython, mozilla-central & mercurial repositories, as
+# well as 2 very large facebook repositories. At 95 50% of all potential
+# missed moves would be caught, as well as correspond with 87% of all
+# explicitly marked moves.  Together, 80% of moved files are 95% similar or
+# more.
+#
+# See http://markmail.org/thread/5pxnljesvufvom57 for context.
+
 from __future__ import absolute_import
 
 from mercurial import (
     commands,
     copies,
+    error,
     extensions,
     scmutil,
     similar
@@ -37,7 +48,9 @@ 
     renames = None
     disabled = opts.pop('no_automv', False)
     if not disabled:
-        threshold = float(ui.config('automv', 'similarity', '100'))
+        threshold = ui.configint('automv', 'similarity', 95)
+        if not 0 <= threshold <= 100:
+            raise error.Abort(_('automv.similarity must be between 0 and 100'))
         if threshold > 0:
             match = scmutil.match(repo[None], pats, opts)
             added, removed = _interestingfiles(repo, match)
diff --git a/tests/test-automv.t b/tests/test-automv.t
--- a/tests/test-automv.t
+++ b/tests/test-automv.t
@@ -13,7 +13,7 @@ 
 
 Test automv command for commit
 
-  $ echo 'foo' > a.txt
+  $ printf 'foo\nbar\nbaz\n' > a.txt
   $ hg add a.txt
   $ hg commit -m 'init repo with a'
 
@@ -37,6 +37,24 @@ 
   $ mv a.txt b.txt
   $ hg rm a.txt
   $ hg add b.txt
+  $ printf '\n' >> b.txt
+  $ hg status -C
+  A b.txt
+  R a.txt
+  $ hg commit -m 'msg'
+  detected move of 1 files
+  created new head
+  $ hg status --change . -C
+  A b.txt
+    a.txt
+  R a.txt
+  $ hg up -r 0
+  1 files updated, 0 files merged, 1 files removed, 0 files unresolved
+
+mv/rm/add/modif
+  $ mv a.txt b.txt
+  $ hg rm a.txt
+  $ hg add b.txt
   $ printf '\nfoo\n' >> b.txt
   $ hg status -C
   A b.txt
@@ -161,6 +179,29 @@ 
   $ mv a.txt b.txt
   $ hg rm a.txt
   $ hg add b.txt
+  $ printf '\n' >> b.txt
+  $ hg status -C
+  A b.txt
+  R a.txt
+  $ hg commit --amend -m 'amended'
+  detected move of 1 files
+  saved backup bundle to $TESTTMP/repo/.hg/strip-backup/*-amend-backup.hg (glob)
+  $ hg status --change . -C
+  A b.txt
+    a.txt
+  A c.txt
+  R a.txt
+  $ hg up -r 0
+  1 files updated, 0 files merged, 2 files removed, 0 files unresolved
+
+mv/rm/add/modif
+  $ echo 'c' > c.txt
+  $ hg add c.txt
+  $ hg commit -m 'revision to amend to'
+  created new head
+  $ mv a.txt b.txt
+  $ hg rm a.txt
+  $ hg add b.txt
   $ printf '\nfoo\n' >> b.txt
   $ hg status -C
   A b.txt
@@ -285,3 +326,13 @@ 
   $ hg status --change . -C
   A b.txt
   R a.txt
+
+error conditions
+
+  $ cat >> $HGRCPATH << EOF
+  > [automv]
+  > similarity=110
+  > EOF
+  $ hg commit -m 'revision to amend to'
+  abort: automv.similarity must be between 0 and 100
+  [255]