Patchwork [RFC] extdiff: add support for subrepo diff

login
register
mail settings
Submitter Mathias De Maré
Date Feb. 7, 2015, 8:56 a.m.
Message ID <dd4c16684d072b2eb35d.1423299369@mathias-Latitude-E6540>
Download mbox | patch
Permalink /patch/7761/
State RFC, archived
Headers show

Comments

Mathias De Maré - Feb. 7, 2015, 8:56 a.m.
# HG changeset patch
# User Mathias De Maré <mathias.demare@gmail.com>
# Date 1423299130 -3600
#      Sam Feb 07 09:52:10 2015 +0100
# Node ID dd4c16684d072b2eb35d0ef5f6e567eb9ea5b430
# Parent  ff5caa8dfd993680d9602ca6ebb14da9de10d5f4
extdiff: add support for subrepo diff

Sending as RFC, I'd like to get some general feedback on
how to progress with this patch.

Up to this point, extdiff did not have a '--subrepos' flag.
This patch makes it possible to view changes in subrepos
as well, though not in sub-subrepos.

No special treatment of executable files
or links is done at this point.
Yuya Nishihara - Feb. 7, 2015, 10:03 a.m.
On Sat, 07 Feb 2015 09:56:09 +0100, Mathias De Maré wrote:
> # HG changeset patch
> # User Mathias De Maré <mathias.demare@gmail.com>
> # Date 1423299130 -3600
> #      Sam Feb 07 09:52:10 2015 +0100
> # Node ID dd4c16684d072b2eb35d0ef5f6e567eb9ea5b430
> # Parent  ff5caa8dfd993680d9602ca6ebb14da9de10d5f4
> extdiff: add support for subrepo diff
> 
> Sending as RFC, I'd like to get some general feedback on
> how to progress with this patch.
> 
> Up to this point, extdiff did not have a '--subrepos' flag.
> This patch makes it possible to view changes in subrepos
> as well, though not in sub-subrepos.
> 
> No special treatment of executable files
> or links is done at this point.
> 
> diff --git a/hgext/extdiff.py b/hgext/extdiff.py
> --- a/hgext/extdiff.py
> +++ b/hgext/extdiff.py
> @@ -69,7 +69,7 @@ cmdtable = {}
>  command = cmdutil.command(cmdtable)
>  testedwith = 'internal'
>  
> -def snapshot(ui, repo, files, node, tmproot):
> +def snapshot(ui, repo, files, subs, node, tmproot):
>      '''snapshot files as of some revision
>      if not using snapshot, -I/-X does not work and recursive diff
>      in tools like kdiff3 and meld displays too many files.'''
> @@ -107,8 +107,40 @@ def snapshot(ui, repo, files, node, tmpr
>          if node is None:
>              fns_and_mtime.append((dest, repo.wjoin(fn),
>                                    os.lstat(dest).st_mtime))
> +
> +    for s in subs:
> +        for fn in sorted(subs[s]):
> +            wfn = util.pconvert(fn)
> +            ws = util.pconvert(s)
> +            wfnroot = os.path.join(ws, wfn)
> +            matcher = scmutil.match(repo[node], ['path:' + wfnroot],
> +                                    {"exact": True})
> +            dest = os.path.join(base, wfnroot)
> +            destdir = os.path.dirname(dest)
> +            if not os.path.exists(destdir):
> +                os.makedirs(destdir)
> +            if node is None:
> +                util.copyfile(wfnroot, dest)
> +                fns_and_mtime.append((dest, repo.wjoin(ws),
> +                                      os.lstat(dest).st_mtime))
> +            else:
> +                cmdutil.cat(ui, repo, repo[node], matcher, '', output=dest)
>      return dirname, fns_and_mtime

Maybe extdiff.snapshot() can share code with "hg archive" that supports
--subrepos.
Matt Harbison - Feb. 7, 2015, 4:58 p.m.
On Sat, 07 Feb 2015 05:03:09 -0500, Yuya Nishihara <yuya@tcha.org> wrote:

> On Sat, 07 Feb 2015 09:56:09 +0100, Mathias De Maré wrote:
>> # HG changeset patch
>> # User Mathias De Maré <mathias.demare@gmail.com>
>> # Date 1423299130 -3600
>> #      Sam Feb 07 09:52:10 2015 +0100
>> # Node ID dd4c16684d072b2eb35d0ef5f6e567eb9ea5b430
>> # Parent  ff5caa8dfd993680d9602ca6ebb14da9de10d5f4
>> extdiff: add support for subrepo diff
>>
>
> Maybe extdiff.snapshot() can share code with "hg archive" that supports
> --subrepos.

I toyed with a similar idea a year or two ago where the extdiff command  
basically did 'repo.status -S <pats>' and then 'archival.archive -S -I  
<what_status_returned>' to build the tree.  If extdiff is taught to use  
archive -S with the proper matcher, it should be able to handle all  
subrepos, since they all support archive already, and archive already  
knows how to recurse.  (It doesn't look like svn supports status though,  
so that may be an archive everything from svn proposition).

The only potential snags I saw were:

1) Archive prints out a message that says "archiving...".  Probably not a  
big deal, but a user might wonder why it is archiving.  The progress  
message archive spits out might be nice for large repos.

2) Archive supports largefiles.  While this may be useful in some cases,  
it could slow down building the trees to compare.  I'm not sure if we  
should have the largefiles extension add a --large to extdiff, and  
refactor the largefile overrides for archive accordingly.  I'm looking at  
refactoring them anyway, since they are mostly a copy/paste of core, yet  
don't handle everything core does.

--Matt
Mathias De Maré - Feb. 8, 2015, 11:24 a.m.
On Sat, Feb 7, 2015 at 5:58 PM, Matt Harbison <mharbison72@gmail.com> wrote:

> On Sat, 07 Feb 2015 05:03:09 -0500, Yuya Nishihara <yuya@tcha.org> wrote:
>
>  On Sat, 07 Feb 2015 09:56:09 +0100, Mathias De Maré wrote:
>>
>>> # HG changeset patch
>>> # User Mathias De Maré <mathias.demare@gmail.com>
>>> # Date 1423299130 -3600
>>> #      Sam Feb 07 09:52:10 2015 +0100
>>> # Node ID dd4c16684d072b2eb35d0ef5f6e567eb9ea5b430
>>> # Parent  ff5caa8dfd993680d9602ca6ebb14da9de10d5f4
>>> extdiff: add support for subrepo diff
>>>
>>>
>> Maybe extdiff.snapshot() can share code with "hg archive" that supports
>> --subrepos.
>>
>
> I toyed with a similar idea a year or two ago where the extdiff command
> basically did 'repo.status -S <pats>' and then 'archival.archive -S -I
> <what_status_returned>' to build the tree.  If extdiff is taught to use
> archive -S with the proper matcher, it should be able to handle all
> subrepos, since they all support archive already, and archive already knows
> how to recurse.  (It doesn't look like svn supports status though, so that
> may be an archive everything from svn proposition).
>
This sounds like a very good idea. Do you still have the code for those
changes? If not, I'll see if I can take this approach myself (will take me
a while though).

>
> The only potential snags I saw were:
>
> 1) Archive prints out a message that says "archiving...".  Probably not a
> big deal, but a user might wonder why it is archiving.  The progress
> message archive spits out might be nice for large repos.
>
Like you say, it doesn't seem like a major issue. If it's not ideal,
perhaps I could change the message for extdiff?

>
> 2) Archive supports largefiles.  While this may be useful in some cases,
> it could slow down building the trees to compare.  I'm not sure if we
> should have the largefiles extension add a --large to extdiff, and refactor
> the largefile overrides for archive accordingly.  I'm looking at
> refactoring them anyway, since they are mostly a copy/paste of core, yet
> don't handle everything core does.
>
I'll leave this part alone for now, if that's okay, since you mention
you're looking at refactoring this.

Greetings,
Mathias

>
> --Matt
>
Matt Harbison - Feb. 8, 2015, 6:50 p.m.
On Sun, 08 Feb 2015 06:24:41 -0500, Mathias De Maré  
<mathias.demare@gmail.com> wrote:

> On Sat, Feb 7, 2015 at 5:58 PM, Matt Harbison <mharbison72@gmail.com>  
> wrote:
>
>> On Sat, 07 Feb 2015 05:03:09 -0500, Yuya Nishihara <yuya@tcha.org>  
>> wrote:
>>
>>  On Sat, 07 Feb 2015 09:56:09 +0100, Mathias De Maré wrote:
>>>
>>>> # HG changeset patch
>>>> # User Mathias De Maré <mathias.demare@gmail.com>
>>>> # Date 1423299130 -3600
>>>> #      Sam Feb 07 09:52:10 2015 +0100
>>>> # Node ID dd4c16684d072b2eb35d0ef5f6e567eb9ea5b430
>>>> # Parent  ff5caa8dfd993680d9602ca6ebb14da9de10d5f4
>>>> extdiff: add support for subrepo diff
>>>>
>>>>
>>> Maybe extdiff.snapshot() can share code with "hg archive" that supports
>>> --subrepos.
>>>
>>
>> I toyed with a similar idea a year or two ago where the extdiff command
>> basically did 'repo.status -S <pats>' and then 'archival.archive -S -I
>> <what_status_returned>' to build the tree.  If extdiff is taught to use
>> archive -S with the proper matcher, it should be able to handle all
>> subrepos, since they all support archive already, and archive already  
>> knows
>> how to recurse.  (It doesn't look like svn supports status though, so  
>> that
>> may be an archive everything from svn proposition).
>>
> This sounds like a very good idea. Do you still have the code for those
> changes? If not, I'll see if I can take this approach myself (will take  
> me
> a while though).

I see something related at the top of my patch queue.  Let me look it over  
and see if the comments still apply (it's from July 2012, so a lot has  
changed since).

>>
>> The only potential snags I saw were:
>>
>> 1) Archive prints out a message that says "archiving...".  Probably not  
>> a
>> big deal, but a user might wonder why it is archiving.  The progress
>> message archive spits out might be nice for large repos.
>>
> Like you say, it doesn't seem like a major issue. If it's not ideal,
> perhaps I could change the message for extdiff?

Maybe, but I never got to the point of really using it to see if it was an  
issue.  It kinda just felt like I was abusing archive, so I'm glad to see  
others think it is a good idea.


>>
>> 2) Archive supports largefiles.  While this may be useful in some cases,
>> it could slow down building the trees to compare.  I'm not sure if we
>> should have the largefiles extension add a --large to extdiff, and  
>> refactor
>> the largefile overrides for archive accordingly.  I'm looking at
>> refactoring them anyway, since they are mostly a copy/paste of core, yet
>> don't handle everything core does.
>>
> I'll leave this part alone for now, if that's okay, since you mention
> you're looking at refactoring this.

That's probably OK.  The archive command basically works today (though the  
exit code for -I/-X failures doesn't match core, so be aware if you are  
checking exit codes.)  I'm fine with a wait and see approach to the idea  
that you have to opt into largefiles, but make sure that you note the new  
functionality in the commit message when you switch to archive- it's a  
subtle change and other reviewers may have opinions.

>
> Greetings,
> Mathias
>
>>
>> --Matt

Patch

diff --git a/hgext/extdiff.py b/hgext/extdiff.py
--- a/hgext/extdiff.py
+++ b/hgext/extdiff.py
@@ -69,7 +69,7 @@  cmdtable = {}
 command = cmdutil.command(cmdtable)
 testedwith = 'internal'
 
-def snapshot(ui, repo, files, node, tmproot):
+def snapshot(ui, repo, files, subs, node, tmproot):
     '''snapshot files as of some revision
     if not using snapshot, -I/-X does not work and recursive diff
     in tools like kdiff3 and meld displays too many files.'''
@@ -107,8 +107,40 @@  def snapshot(ui, repo, files, node, tmpr
         if node is None:
             fns_and_mtime.append((dest, repo.wjoin(fn),
                                   os.lstat(dest).st_mtime))
+
+    for s in subs:
+        for fn in sorted(subs[s]):
+            wfn = util.pconvert(fn)
+            ws = util.pconvert(s)
+            wfnroot = os.path.join(ws, wfn)
+            matcher = scmutil.match(repo[node], ['path:' + wfnroot],
+                                    {"exact": True})
+            dest = os.path.join(base, wfnroot)
+            destdir = os.path.dirname(dest)
+            if not os.path.exists(destdir):
+                os.makedirs(destdir)
+            if node is None:
+                util.copyfile(wfnroot, dest)
+                fns_and_mtime.append((dest, repo.wjoin(ws),
+                                      os.lstat(dest).st_mtime))
+            else:
+                cmdutil.cat(ui, repo, repo[node], matcher, '', output=dest)
     return dirname, fns_and_mtime
 
+def calcsubfiles(own, other):
+    '''Calculate the relevant changed files in the subrepos
+    '''
+    subfiles = dict()
+    for s in own:
+        (smodown, saddown, sremown) = own[s]
+        if s in other:
+            (smodother, saddother, sremother) = other[s]
+            files = smodown | sremown | ((smodother | saddother) - saddown)
+        else:
+            files = smodown | sremown
+        subfiles[s] = files
+    return subfiles
+
 def dodiff(ui, repo, cmdline, pats, opts):
     '''Do the actual diff:
 
@@ -120,6 +152,7 @@  def dodiff(ui, repo, cmdline, pats, opts
 
     revs = opts.get('rev')
     change = opts.get('change')
+    subrepos = opts.get('subrepos')
     do3way = '$parent2' in cmdline
 
     if revs and change:
@@ -148,18 +181,51 @@  def dodiff(ui, repo, cmdline, pats, opts
         mod_b, add_b, rem_b = set(), set(), set()
     modadd = mod_a | add_a | mod_b | add_b
     common = modadd | rem_a | rem_b
-    if not common:
+
+    subsa = {}
+    subsb = {}
+    subs = {}
+
+    subchanges = False
+    subsmodadd = {}
+    if subrepos:
+        targetsubs = sorted(s for s in repo[node2].substate if matcher(s))
+        for s in targetsubs:
+            if s in repo[node1a].substate:
+                substate = map(set, repo[node1a].sub(s).status(node2))
+                subsa[s] = substate[:3]
+                subs[s] = substate[:3]
+            if do3way and s in repo[node1b].substate:
+                substate = map(set, repo[node1b].sub(s).status(node2))
+                subsb[s] = substate[:3]
+                if s not in subs:
+                    subs[s] = (set(), set(), set())
+                (smod, sadd, srem) = subs[s]
+                (bmod, badd, brem) = substate[:3]
+                subs[s] = (smod | bmod, sadd | badd, srem | brem)
+            if s in subs:
+                (smod, sadd, srem) = subs[s]
+                subsmodadd[s] = smod | sadd
+                if smod | sadd | srem:
+                    subchanges = True
+
+    subsafiles = calcsubfiles(subsa, subsb)
+    subsbfiles = calcsubfiles(subsb, subsa)
+
+    if not common and not subchanges:
         return 0
 
     tmproot = tempfile.mkdtemp(prefix='extdiff.')
     try:
         # Always make a copy of node1a (and node1b, if applicable)
         dir1a_files = mod_a | rem_a | ((mod_b | add_b) - add_a)
-        dir1a = snapshot(ui, repo, dir1a_files, node1a, tmproot)[0]
+        dir1a = snapshot(ui, repo, dir1a_files,
+                         subsafiles, node1a, tmproot)[0]
         rev1a = '@%d' % repo[node1a].rev()
         if do3way:
             dir1b_files = mod_b | rem_b | ((mod_a | add_a) - add_b)
-            dir1b = snapshot(ui, repo, dir1b_files, node1b, tmproot)[0]
+            dir1b = snapshot(ui, repo, dir1b_files,
+                             subsbfiles, node1b, tmproot)[0]
             rev1b = '@%d' % repo[node1b].rev()
         else:
             dir1b = None
@@ -171,14 +237,15 @@  def dodiff(ui, repo, cmdline, pats, opts
         dir2root = ''
         rev2 = ''
         if node2:
-            dir2 = snapshot(ui, repo, modadd, node2, tmproot)[0]
+            dir2 = snapshot(ui, repo, modadd, subsmodadd, node2, tmproot)[0]
             rev2 = '@%d' % repo[node2].rev()
-        elif len(common) > 1:
+        elif len(common) > 1 or subchanges:
             #we only actually need to get the files to copy back to
             #the working dir in this case (because the other cases
             #are: diffing 2 revisions or single file -- in which case
             #the file is already directly passed to the diff tool).
-            dir2, fns_and_mtime = snapshot(ui, repo, modadd, None, tmproot)
+            dir2, fns_and_mtime = snapshot(ui, repo, modadd,
+                                           subsmodadd, None, tmproot)
         else:
             # This lets the diff tool open the changed file directly
             dir2 = ''
@@ -190,7 +257,7 @@  def dodiff(ui, repo, cmdline, pats, opts
 
         # If only one change, diff the files instead of the directories
         # Handle bogus modifies correctly by checking if the files exist
-        if len(common) == 1:
+        if len(common) == 1 and not subchanges:
             common_file = util.localpath(common.pop())
             dir1a = os.path.join(tmproot, dir1a, common_file)
             label1a = common_file + rev1a
@@ -246,7 +313,7 @@  def dodiff(ui, repo, cmdline, pats, opts
      _('pass option to comparison program'), _('OPT')),
     ('r', 'rev', [], _('revision'), _('REV')),
     ('c', 'change', '', _('change made by revision'), _('REV')),
-    ] + commands.walkopts,
+    ] + commands.walkopts + commands.subrepoopts,
     _('hg extdiff [OPT]... [FILE]...'),
     inferrepo=True)
 def extdiff(ui, repo, *pats, **opts):
diff --git a/tests/test-extdiff.t b/tests/test-extdiff.t
--- a/tests/test-extdiff.t
+++ b/tests/test-extdiff.t
@@ -48,6 +48,7 @@  Should diff cloned directories:
    -c --change REV          change made by revision
    -I --include PATTERN [+] include names matching the given patterns
    -X --exclude PATTERN [+] exclude names matching the given patterns
+   -S --subrepos            recurse into subrepositories
   
   (some details hidden, use --verbose to show complete help)
 
@@ -322,3 +323,52 @@  Test symlinks handling (issue1909)
   $ cd ..
 
 #endif
+
+Test subrepos
+  $ hg init toprepo
+  $ cd toprepo
+  $ echo foo > foo
+  $ hg add
+  adding foo
+  $ hg init subrepo
+  $ echo bar > subrepo/bar
+  $ echo foobar > subrepo/foobar
+  $ hg -R subrepo add
+  adding subrepo/bar
+  adding subrepo/foobar
+  $ hg -R subrepo commit -m "subrepo bar"
+  $ echo "subrepo = subrepo" > .hgsub
+  $ hg add .hgsub
+  $ hg commit -m "added subrepo and foo"
+
+ignore subrepository during the initial add
+(to avoid diffing an entire subrepository, similar behaviour as for diff)
+  $ hg extdiff -S --change . -p diff
+  Only in toprepo.935947dcc4e9: .hgsub
+  Only in toprepo.935947dcc4e9: .hgsubstate
+  Only in toprepo.935947dcc4e9: foo
+  [1]
+
+correctly diff changes in later subrepository changes
+  $ cat > subrepo/bar << EOF
+  > bar
+  > foo
+  > foobar
+  > EOF
+  $ echo foo > subrepo/foo
+  $ rm subrepo/foobar
+  $ hg -R subrepo addremove
+  adding foo
+  removing foobar
+  $ hg st -S
+  M subrepo/bar
+  A subrepo/foo
+  R subrepo/foobar
+  $ hg extdiff -S -p diff -o "-rn"
+  diff -rn toprepo.935947dcc4e9/subrepo/bar toprepo/subrepo/bar
+  a1 2
+  foo
+  foobar
+  Only in toprepo/subrepo: foo
+  Only in toprepo.935947dcc4e9/subrepo: foobar
+  [1]
diff --git a/tests/test-extension.t b/tests/test-extension.t
--- a/tests/test-extension.t
+++ b/tests/test-extension.t
@@ -394,6 +394,7 @@  Extension module help vs command help:
    -c --change REV          change made by revision
    -I --include PATTERN [+] include names matching the given patterns
    -X --exclude PATTERN [+] exclude names matching the given patterns
+   -S --subrepos            recurse into subrepositories
   
   (some details hidden, use --verbose to show complete help)