Patchwork automv: new experimental extension

login
register
mail settings
Submitter Martijn Pieters
Date Feb. 4, 2016, 7:47 p.m.
Message ID <b1878a8f4543825e3f40.1454615261@mjpieters-mbp>
Download mbox | patch
Permalink /patch/12985/
State Superseded
Delegated to: Augie Fackler
Headers show

Comments

Martijn Pieters - Feb. 4, 2016, 7:47 p.m.
# HG changeset patch
# User Martijn Pieters <mjpieters@fb.com>
# Date 1454615245 0
#      Thu Feb 04 19:47:25 2016 +0000
# Node ID b1878a8f4543825e3f4052dc54323c2a0dbc46f1
# Parent  c6d86560175c18e1e130e1ccebb89b7045096452
automv: new experimental extension

Automatically detect moves and record them at commit time.

This extension was originally developed at
https://bitbucket.org/facebook/hg-experimental
Augie Fackler - Feb. 5, 2016, 9:17 p.m.
On Thu, Feb 04, 2016 at 07:47:41PM +0000, Martijn Pieters wrote:
> # HG changeset patch
> # User Martijn Pieters <mjpieters@fb.com>
> # Date 1454615245 0
> #      Thu Feb 04 19:47:25 2016 +0000
> # Node ID b1878a8f4543825e3f4052dc54323c2a0dbc46f1
> # Parent  c6d86560175c18e1e130e1ccebb89b7045096452
> automv: new experimental extension
>
> Automatically detect moves and record them at commit time.
>
> This extension was originally developed at
> https://bitbucket.org/facebook/hg-experimental

Probably worth mentioning that 0.75 was chosen somewhat arbitrarily,
and that it might be worthwhile as a future endeavor to try and look
at real-world repo histories to figure out a better number?

>
> diff --git a/hgext/automv.py b/hgext/automv.py
> new file mode 100644
> --- /dev/null
> +++ b/hgext/automv.py
> @@ -0,0 +1,79 @@
> +# automv.py
> +#
> +# Copyright 2013-2016 Facebook, Inc.
> +#
> +# This software may be used and distributed according to the terms of the
> +# GNU General Public License version 2 or any later version.
> +"""Check for unrecorded moves at commit time (EXPERIMENTAL)
> +
> +This extension checks at commit/amend time if any of the committed files
> +comes from an unrecorded mv.
> +
> +The threshold at which a file is considered a move can be set with the
> +``automv.similarity`` config option; the default value is 0.75.
> +
> +"""
> +import mercurial.commands
> +import mercurial.copies
> +import mercurial.extensions
> +import mercurial.scmutil
> +import mercurial.similar

any reason not to do this as

from mercurial import (
  commands,
  copies,
  extensions,
  scmutil,
  similar,
  )

? That's the pattern we've been using most places, and I'm not at all
used to leaving the mercurial. on the module names.


[snip rest of implementation and test which looks fine]

> diff --git a/tests/test-check-py3-compat.t b/tests/test-check-py3-compat.t
> --- a/tests/test-check-py3-compat.t
> +++ b/tests/test-check-py3-compat.t
> @@ -33,6 +33,7 @@
>    doc/hgmanpage.py not using absolute_import
>    hgext/__init__.py not using absolute_import
>    hgext/acl.py not using absolute_import
> +  hgext/automv.py not using absolute_import

Can I convince you to conform to the absolute_import format so we are
one step closer to Python 3?

>    hgext/blackbox.py not using absolute_import
>    hgext/bugzilla.py not using absolute_import
>    hgext/censor.py not using absolute_import
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Martijn Pieters - Feb. 5, 2016, 10:11 p.m.
On 5 Feb 2016, at 21:17, Augie Fackler <raf@durin42.com> wrote:
> Probably worth mentioning that 0.75 was chosen somewhat arbitrarily,
> and that it might be worthwhile as a future endeavor to try and look
> at real-world repo histories to figure out a better number?

I'm actually thinking switching to 1.0 just like automv might be an idea here, just for consistency.

>> 
>> +import mercurial.commands
>> +import mercurial.copies
>> +import mercurial.extensions
>> +import mercurial.scmutil
>> +import mercurial.similar
> 
> any reason not to do this as
> 
> from mercurial import (
>  commands,
>  copies,
>  extensions,
>  scmutil,
>  similar,
>  )
> 
> ? That's the pattern we've been using most places, and I'm not at all
> used to leaving the mercurial. on the module names.

Because:

On Tue, Feb 2, 2016 at 10:15 PM, Matt Mackall <mpm@selenic.com> wrote:
> "_" is one of the few symbol-level imports we like, because of its ubiquity.
> Others should be avoided.

Matt asked me to, and me being new and not having too much of an opinion on it acquiesced. Then again, I may have misunderstood and his comment was purely about from mercurial.extensions import wrap command .

I also appreciate the irony that a Googler prefers the old style, which actually goes against the Google Python style guide ;-).

> [snip rest of implementation and test which looks fine]
> 
>> diff --git a/tests/test-check-py3-compat.t b/tests/test-check-py3-compat.t
>> --- a/tests/test-check-py3-compat.t
>> +++ b/tests/test-check-py3-compat.t
>> @@ -33,6 +33,7 @@
>>   doc/hgmanpage.py not using absolute_import
>>   hgext/__init__.py not using absolute_import
>>   hgext/acl.py not using absolute_import
>> +  hgext/automv.py not using absolute_import
> 
> Can I convince you to conform to the absolute_import format so we are
> one step closer to Python 3?

The test applies only to core mercurial, not extensions, which are 'external'. The imports *are* absolute.

--
Martijn Pieters
Augie Fackler - Feb. 5, 2016, 10:20 p.m.
On Fri, Feb 05, 2016 at 10:11:49PM +0000, Martijn Pieters wrote:
> On 5 Feb 2016, at 21:17, Augie Fackler <raf@durin42.com> wrote:
> > Probably worth mentioning that 0.75 was chosen somewhat arbitrarily,
> > and that it might be worthwhile as a future endeavor to try and look
> > at real-world repo histories to figure out a better number?
>
> I'm actually thinking switching to 1.0 just like automv might be an idea here, just for consistency.

Perhaps (I assume you meant addremove). We could always start with
1.0, and then try and do something smart using data from various repo
histories later to try and pick a more sensible default.

>
> >>
> >> +import mercurial.commands
> >> +import mercurial.copies
> >> +import mercurial.extensions
> >> +import mercurial.scmutil
> >> +import mercurial.similar
> >
> > any reason not to do this as
> >
> > from mercurial import (
> >  commands,
> >  copies,
> >  extensions,
> >  scmutil,
> >  similar,
> >  )
> >
> > ? That's the pattern we've been using most places, and I'm not at all
> > used to leaving the mercurial. on the module names.
>
> Because:
>
> On Tue, Feb 2, 2016 at 10:15 PM, Matt Mackall <mpm@selenic.com> wrote:
> > "_" is one of the few symbol-level imports we like, because of its ubiquity.
> > Others should be avoided.
>
> Matt asked me to, and me being new and not having too much of an opinion on it acquiesced. Then again, I may have misunderstood and his comment was purely about from mercurial.extensions import wrap command .

Yeah, import modules, but not functions. :)

>
> I also appreciate the irony that a Googler prefers the old style, which actually goes against the Google Python style guide ;-).

Actually, our current style guide allows "from thing import module",
where you always do the from ... import ... so that you end up with no
dots in your module names in the
code. (https://google.github.io/styleguide/pyguide.html#Imports)

I can't remember the last time I saw something like import x.y and
then referencing something as x.y.z.


>
> > [snip rest of implementation and test which looks fine]
> >
> >> diff --git a/tests/test-check-py3-compat.t b/tests/test-check-py3-compat.t
> >> --- a/tests/test-check-py3-compat.t
> >> +++ b/tests/test-check-py3-compat.t
> >> @@ -33,6 +33,7 @@
> >>   doc/hgmanpage.py not using absolute_import
> >>   hgext/__init__.py not using absolute_import
> >>   hgext/acl.py not using absolute_import
> >> +  hgext/automv.py not using absolute_import
> >
> > Can I convince you to conform to the absolute_import format so we are
> > one step closer to Python 3?
>
> The test applies only to core mercurial, not extensions, which are 'external'. The imports *are* absolute.

Right, but you could add from __future__ import absolute_import and
suppress the warning ;)

>
> --
> Martijn Pieters
>
Martijn Pieters - Feb. 8, 2016, 1:56 p.m.
On Fri, Feb 5, 2016 at 10:20 PM, Augie Fackler <raf@durin42.com> wrote:
>> I'm actually thinking switching to 1.0 just like automv might be an idea here, just for consistency.
>
> Perhaps (I assume you meant addremove). We could always start with
> 1.0, and then try and do something smart using data from various repo
> histories later to try and pick a more sensible default.

Sounds like a sensible approach. I'll look into seeing what kind of
data we can glean from ours.

> Yeah, import modules, but not functions. :)

So obvious now.

> Actually, our current style guide allows "from thing import module",
> where you always do the from ... import ... so that you end up with no
> dots in your module names in the
> code. (https://google.github.io/styleguide/pyguide.html#Imports)
>
> I can't remember the last time I saw something like import x.y and
> then referencing something as x.y.z.

It was a while ago; code that was for a project acquired by Google was
reviewed by a G engineer and they asked to use the full names (which I
pushed back on). Looking at the guide now I see no such requirement.
But we digress.

>> The test applies only to core mercurial, not extensions, which are 'external'. The imports *are* absolute.
>
> Right, but you could add from __future__ import absolute_import and
> suppress the warning ;)

Duh. Facepalm.

Patch

diff --git a/hgext/automv.py b/hgext/automv.py
new file mode 100644
--- /dev/null
+++ b/hgext/automv.py
@@ -0,0 +1,79 @@ 
+# automv.py
+#
+# Copyright 2013-2016 Facebook, Inc.
+#
+# This software may be used and distributed according to the terms of the
+# GNU General Public License version 2 or any later version.
+"""Check for unrecorded moves at commit time (EXPERIMENTAL)
+
+This extension checks at commit/amend time if any of the committed files
+comes from an unrecorded mv.
+
+The threshold at which a file is considered a move can be set with the
+``automv.similarity`` config option; the default value is 0.75.
+
+"""
+import mercurial.commands
+import mercurial.copies
+import mercurial.extensions
+import mercurial.scmutil
+import mercurial.similar
+from mercurial.i18n import _
+
+def extsetup(ui):
+    entry = mercurial.extensions.wrapcommand(
+        mercurial.commands.table, 'commit', mvcheck)
+    entry[1].append(
+        ('', 'no-automv', None,
+         _('disable automatic file move detection')))
+
+def mvcheck(orig, ui, repo, *pats, **opts):
+    disabled = opts.pop('no_automv', False)
+    if not disabled:
+        threshold = float(ui.config('automv', 'similarity', '0.75'))
+        if threshold > 0:
+            match = mercurial.scmutil.match(repo[None], pats, opts)
+            added, removed = _interestingfiles(repo, match)
+            renames = _findrenames(repo, match, added, removed, threshold)
+            _markchanges(repo, renames)
+
+    # developer config: automv.testmode
+    if not ui.configbool('automv', 'testmode'):
+        return orig(ui, repo, *pats, **opts)
+
+def _interestingfiles(repo, matcher):
+    stat = repo.status(repo['.'], repo[None], matcher)
+    added = stat[1]
+    removed = stat[2]
+
+    copy = mercurial.copies._forwardcopies(repo['.'], repo[None], matcher)
+    # remove the copy files for which we already have copy info
+    added = [f for f in added if f not in copy]
+
+    return added, removed
+
+def _findrenames(repo, matcher, added, removed, similarity):
+    """Find renames from removed files of the current commit/amend files
+    to the added ones"""
+    renames = {}
+    if similarity > 0:
+        for src, dst, score in mercurial.similar.findrenames(
+                repo, added, removed, similarity):
+            if repo.ui.verbose:
+                repo.ui.status(
+                    _('detected move of %s as %s (%d%% similar)\n') % (
+                        matcher.rel(src), matcher.rel(dst), score * 100))
+            renames[dst] = src
+    if renames:
+        repo.ui.status(_('detected move of %d files\n') % len(renames))
+    return renames
+
+def _markchanges(repo, renames):
+    """Marks the files in renames as copied."""
+    wctx = repo[None]
+    wlock = repo.wlock()
+    try:
+        for dst, src in renames.iteritems():
+            wctx.copy(src, dst)
+    finally:
+        wlock.release()
diff --git a/tests/test-automv.t b/tests/test-automv.t
new file mode 100644
--- /dev/null
+++ b/tests/test-automv.t
@@ -0,0 +1,328 @@ 
+  $ cat >> $HGRCPATH << EOF
+  > [extensions]
+  > automv=
+  > rebase=
+  > EOF
+
+Setup repo
+
+  $ hg init repo
+  $ cd repo
+
+Test automv command for commit
+
+  $ echo 'foo' > a.txt
+  $ hg add a.txt
+  $ hg commit -m 'init repo with a'
+
+mv/rm/add
+  $ mv a.txt b.txt
+  $ hg rm a.txt
+  $ hg add b.txt
+  $ hg status -C
+  A b.txt
+  R a.txt
+  $ hg commit -m 'msg'
+  detected move of 1 files
+  $ hg status --change . -C
+  A b.txt
+    a.txt
+  R a.txt
+  $ hg up -r 0
+  1 files updated, 0 files merged, 1 files removed, 0 files unresolved
+
+mv/rm/add/modif
+  $ mv a.txt b.txt
+  $ hg rm a.txt
+  $ hg add b.txt
+  $ printf '\n\n' >> b.txt
+  $ hg status -C
+  A b.txt
+  R a.txt
+  $ hg commit -m 'msg'
+  detected move of 1 files
+  created new head
+  $ hg status --change . -C
+  A b.txt
+    a.txt
+  R a.txt
+  $ hg up -r 0
+  1 files updated, 0 files merged, 1 files removed, 0 files unresolved
+
+mv/rm/add/modif
+  $ mv a.txt b.txt
+  $ hg rm a.txt
+  $ hg add b.txt
+  $ printf '\nfoo\n' >> b.txt
+  $ hg status -C
+  A b.txt
+  R a.txt
+  $ hg commit -m 'msg'
+  created new head
+  $ hg status --change . -C
+  A b.txt
+  R a.txt
+  $ hg up -r 0
+  1 files updated, 0 files merged, 1 files removed, 0 files unresolved
+
+mv/rm/add/modif/changethreshold
+  $ mv a.txt b.txt
+  $ hg rm a.txt
+  $ hg add b.txt
+  $ printf '\nfoo\n' >> b.txt
+  $ hg status -C
+  A b.txt
+  R a.txt
+  $ hg commit --config automv.similarity='0.6' -m 'msg'
+  detected move of 1 files
+  created new head
+  $ hg status --change . -C
+  A b.txt
+    a.txt
+  R a.txt
+  $ hg up -r 0
+  1 files updated, 0 files merged, 1 files removed, 0 files unresolved
+
+mv
+  $ mv a.txt b.txt
+  $ hg status -C
+  ! a.txt
+  ? b.txt
+  $ hg commit -m 'msg'
+  nothing changed (1 missing files, see 'hg status')
+  [1]
+  $ hg status -C
+  ! a.txt
+  ? b.txt
+  $ hg revert -aqC
+  $ rm b.txt
+
+mv/rm/add/notincommitfiles
+  $ mv a.txt b.txt
+  $ hg rm a.txt
+  $ hg add b.txt
+  $ echo 'bar' > c.txt
+  $ hg add c.txt
+  $ hg status -C
+  A b.txt
+  A c.txt
+  R a.txt
+  $ hg commit c.txt -m 'msg'
+  created new head
+  $ hg status --change . -C
+  A c.txt
+  $ hg status -C
+  A b.txt
+  R a.txt
+  $ hg up -r 0
+  0 files updated, 0 files merged, 1 files removed, 0 files unresolved
+  $ hg rm a.txt
+  $ echo 'bar' > c.txt
+  $ hg add c.txt
+  $ hg commit -m 'msg'
+  detected move of 1 files
+  created new head
+  $ hg status --change . -C
+  A b.txt
+    a.txt
+  A c.txt
+  R a.txt
+  $ hg up -r 0
+  1 files updated, 0 files merged, 2 files removed, 0 files unresolved
+
+mv/rm/add/--no-automv
+  $ mv a.txt b.txt
+  $ hg rm a.txt
+  $ hg add b.txt
+  $ hg status -C
+  A b.txt
+  R a.txt
+  $ hg commit --no-automv -m 'msg'
+  created new head
+  $ hg status --change . -C
+  A b.txt
+  R a.txt
+  $ hg up -r 0
+  1 files updated, 0 files merged, 1 files removed, 0 files unresolved
+
+
+Test automv command for commit --amend
+
+mv/rm/add
+  $ echo 'c' > c.txt
+  $ hg add c.txt
+  $ hg commit -m 'revision to amend to'
+  created new head
+  $ mv a.txt b.txt
+  $ hg rm a.txt
+  $ hg add b.txt
+  $ hg status -C
+  A b.txt
+  R a.txt
+  $ hg commit --amend -m 'amended'
+  detected move of 1 files
+  saved backup bundle to $TESTTMP/repo/.hg/strip-backup/*-amend-backup.hg (glob)
+  $ hg status --change . -C
+  A b.txt
+    a.txt
+  A c.txt
+  R a.txt
+  $ hg up -r 0
+  1 files updated, 0 files merged, 2 files removed, 0 files unresolved
+
+mv/rm/add/modif
+  $ echo 'c' > c.txt
+  $ hg add c.txt
+  $ hg commit -m 'revision to amend to'
+  created new head
+  $ mv a.txt b.txt
+  $ hg rm a.txt
+  $ hg add b.txt
+  $ printf '\n\n' >> b.txt
+  $ hg status -C
+  A b.txt
+  R a.txt
+  $ hg commit --amend -m 'amended'
+  detected move of 1 files
+  saved backup bundle to $TESTTMP/repo/.hg/strip-backup/*-amend-backup.hg (glob)
+  $ hg status --change . -C
+  A b.txt
+    a.txt
+  A c.txt
+  R a.txt
+  $ hg up -r 0
+  1 files updated, 0 files merged, 2 files removed, 0 files unresolved
+
+mv/rm/add/modif
+  $ echo 'c' > c.txt
+  $ hg add c.txt
+  $ hg commit -m 'revision to amend to'
+  created new head
+  $ mv a.txt b.txt
+  $ hg rm a.txt
+  $ hg add b.txt
+  $ printf '\nfoo\n' >> b.txt
+  $ hg status -C
+  A b.txt
+  R a.txt
+  $ hg commit --amend -m 'amended'
+  saved backup bundle to $TESTTMP/repo/.hg/strip-backup/*-amend-backup.hg (glob)
+  $ hg status --change . -C
+  A b.txt
+  A c.txt
+  R a.txt
+  $ hg up -r 0
+  1 files updated, 0 files merged, 2 files removed, 0 files unresolved
+
+mv/rm/add/modif/changethreshold
+  $ echo 'c' > c.txt
+  $ hg add c.txt
+  $ hg commit -m 'revision to amend to'
+  created new head
+  $ mv a.txt b.txt
+  $ hg rm a.txt
+  $ hg add b.txt
+  $ printf '\nfoo\n' >> b.txt
+  $ hg status -C
+  A b.txt
+  R a.txt
+  $ hg commit --amend --config automv.similarity='0.6' -m 'amended'
+  detected move of 1 files
+  saved backup bundle to $TESTTMP/repo/.hg/strip-backup/*-amend-backup.hg (glob)
+  $ hg status --change . -C
+  A b.txt
+    a.txt
+  A c.txt
+  R a.txt
+  $ hg up -r 0
+  1 files updated, 0 files merged, 2 files removed, 0 files unresolved
+
+mv
+  $ echo 'c' > c.txt
+  $ hg add c.txt
+  $ hg commit -m 'revision to amend to'
+  created new head
+  $ mv a.txt b.txt
+  $ hg status -C
+  ! a.txt
+  ? b.txt
+  $ hg commit --amend -m 'amended'
+  saved backup bundle to $TESTTMP/repo/.hg/strip-backup/*-amend-backup.hg (glob)
+  $ hg status -C
+  ! a.txt
+  ? b.txt
+  $ hg up -Cr 0
+  1 files updated, 0 files merged, 1 files removed, 0 files unresolved
+
+mv/rm/add/notincommitfiles
+  $ echo 'c' > c.txt
+  $ hg add c.txt
+  $ hg commit -m 'revision to amend to'
+  created new head
+  $ mv a.txt b.txt
+  $ hg rm a.txt
+  $ hg add b.txt
+  $ echo 'bar' > d.txt
+  $ hg add d.txt
+  $ hg status -C
+  A b.txt
+  A d.txt
+  R a.txt
+  $ hg commit --amend -m 'amended' d.txt
+  saved backup bundle to $TESTTMP/repo/.hg/strip-backup/*-amend-backup.hg (glob)
+  $ hg status --change . -C
+  A c.txt
+  A d.txt
+  $ hg status -C
+  A b.txt
+  R a.txt
+  $ hg commit --amend -m 'amended'
+  detected move of 1 files
+  saved backup bundle to $TESTTMP/repo/.hg/strip-backup/*-amend-backup.hg (glob)
+  $ hg status --change . -C
+  A b.txt
+    a.txt
+  A c.txt
+  A d.txt
+  R a.txt
+  $ hg up -r 0
+  1 files updated, 0 files merged, 3 files removed, 0 files unresolved
+
+mv/rm/add/--no-automv
+  $ echo 'c' > c.txt
+  $ hg add c.txt
+  $ hg commit -m 'revision to amend to'
+  created new head
+  $ mv a.txt b.txt
+  $ hg rm a.txt
+  $ hg add b.txt
+  $ hg status -C
+  A b.txt
+  R a.txt
+  $ hg commit --amend -m 'amended' --no-automv
+  saved backup bundle to $TESTTMP/repo/.hg/strip-backup/*-amend-backup.hg (glob)
+  $ hg status --change . -C
+  A b.txt
+  A c.txt
+  R a.txt
+  $ hg up -r 0
+  1 files updated, 0 files merged, 2 files removed, 0 files unresolved
+
+
+mv/rm/commit/add/amend
+  $ echo 'c' > c.txt
+  $ hg add c.txt
+  $ hg commit -m 'revision to amend to'
+  created new head
+  $ mv a.txt b.txt
+  $ hg rm a.txt
+  $ hg status -C
+  R a.txt
+  ? b.txt
+  $ hg commit -m "removed a"
+  $ hg add b.txt
+  $ hg commit --amend -m 'amended'
+  saved backup bundle to $TESTTMP/repo/.hg/strip-backup/*-amend-backup.hg (glob)
+  $ hg status --change . -C
+  A b.txt
+  R a.txt
diff --git a/tests/test-check-py3-compat.t b/tests/test-check-py3-compat.t
--- a/tests/test-check-py3-compat.t
+++ b/tests/test-check-py3-compat.t
@@ -33,6 +33,7 @@ 
   doc/hgmanpage.py not using absolute_import
   hgext/__init__.py not using absolute_import
   hgext/acl.py not using absolute_import
+  hgext/automv.py not using absolute_import
   hgext/blackbox.py not using absolute_import
   hgext/bugzilla.py not using absolute_import
   hgext/censor.py not using absolute_import