Patchwork [5,of,5,v3] graft: apply get-only changes in-memory

login
register
mail settings
Submitter David Schleimer
Date Dec. 4, 2012, 8:58 p.m.
Message ID <df518d2b8f24b4b5aa0c.1354654720@dev010.prn1.facebook.com>
Download mbox | patch
Permalink /patch/18/
State Superseded
Delegated to: David Schleimer
Headers show

Comments

David Schleimer - Dec. 4, 2012, 8:58 p.m.
# HG changeset patch
# User David Schleimer <dschleimer at fb.com>
# Date 1354654458 28800
# Node ID df518d2b8f24b4b5aa0c0271c368e045b1b28885
# Parent  7cc8b808bfcb3a282cdcfeacd73a0583efc4b58a
graft: apply get-only changes in-memory

This changes the graft code to apply the simplest-possible changes
in-memory.  Specifically, changes where all we need to do is set the
contents of files to the version in the commit being grafted.  This
means that no files were deleted and no files were modified on by both
the target branch and the change we're grafting.

Based on one of our real-world branches, with 497 grafts,
approximately 80% of cherry-picked commits will fall into the fast
case here.  This patch takes a graft with all 497 revisions on the
branch in question from 5299 seconds to 3251 seconds.

The revisions that fall in the fast case were more common near the
beginning of the branch than the end.  Longer lived branches are
likely to see less of an improvement from this patch, but they should
be unlikely to see a regression.

For single revision grafts and grafts with file-level merges, this
patch does not provide any benefits.  It does not appear to be a
regression for these cases however.

Times taken for small grafts, with and without file merges:
revs | file-merges | before | after
---------------------------------------
1    | no          | 16     | 16
2    | no          | 25     | 17
1    | yes         | 17     | 17
2    | yes         | 29     | 29

This change is at worst neutral, and at best a significant speedup.
The longer the chain of grafts without file-level merges, the larger
the speedup will be.
Matt Mackall - Dec. 4, 2012, 10:59 p.m.
On Tue, 2012-12-04 at 12:58 -0800, David Schleimer wrote:
> # HG changeset patch
> # User David Schleimer <dschleimer at fb.com>
> # Date 1354654458 28800
> # Node ID df518d2b8f24b4b5aa0c0271c368e045b1b28885
> # Parent  7cc8b808bfcb3a282cdcfeacd73a0583efc4b58a
> graft: apply get-only changes in-memory

Still too big, I'm afraid.

I recommend a first step here is to change update() to calculate+apply.

We're going to want to do this apply-in-memory trick elsewhere (eg
rebase, evolve), so it makes sense to have ALL of the apply-in-memory
magic in merge.py to minimize the complexity at the use sites. Something
like:

actionlist = mergemod.calculate()
if maybeinmemory:
   result = mergemod.applyinmemory(actionlist)
if not maybeinmemory or result is bad:
   if working directory lags:
      update(current)
   result = mergemod.applytodisk(actionlist)

..and then leave it up to applyinmemory() to know what it can and cannot
do. Ideally, all the other details (like decoupling 'current' from '.'
and fiddling with extra) are staged so that the above is done in a short
patch of its own.

Patch

diff --git a/mercurial/commands.py b/mercurial/commands.py
--- a/mercurial/commands.py
+++ b/mercurial/commands.py
@@ -2840,7 +2840,10 @@ 
             source = ctx.extra().get('source')
             if not source:
                 source = ctx.hex()
-            extra = {'source': source}
+            extra = {
+                'source': source,
+                'branch': current.extra().get('branch', 'default'),
+                }
             user = ctx.user()
             if opts.get('user'):
                 user = opts['user']
@@ -2857,34 +2860,84 @@ 
                 try:
                     # ui.forcemerge is an internal variable, do not document
                     repo.ui.setconfig('ui', 'forcemerge', opts.get('tool', ''))
-                    stats = mergemod.update(repo, ctx.node(), True, True, False,
-                                            ctx.p1().node())
+                    action = mergemod.calculateupdates(repo, current, ctx,
+                                                       ctx.p1(),
+                                                       branchmerge=True,
+                                                       force=True,
+                                                       partial=False)
                 finally:
                     repo.ui.setconfig('ui', 'forcemerge', '')
-                # report any conflicts
-                if stats and stats[3] > 0:
-                    # write out state for --continue
-                    nodelines = [repo[rev].hex() + "\n" for rev in revs[pos:]]
-                    repo.opener.write('graftstate', ''.join(nodelines))
-                    raise util.Abort(
-                        _("unresolved conflicts, can't continue"),
-                        hint=_('use hg resolve and hg graft --continue'))
+
+                filelist, filectxfn = mergemod.memapplyinfo(ctx, action)
+                memoryapply = bool(filelist)
+
+                if memoryapply:
+                    repo.ui.debug("performing (graft) merge in-memory\n")
+                    memctx = context.memctx(repo, (current.node(), None),
+                                            text=message,
+                                            files=filelist,
+                                            filectxfn=filectxfn,
+                                            user=user,
+                                            date=date,
+                                            extra=extra)
+                else:
+                    repo.ui.debug("performing (graft) merge on-disk\n")
+                    try:
+                        # first update to the current rev if we've
+                        # commited anything since the last update
+                        if current.node() != repo['.'].node():
+                            mergemod.update(repo, current.node(),
+                                            branchmerge=False,
+                                            force=True,
+                                            partial=None)
+                        # ui.forcemerge is an internal variable, do not document
+                        repo.ui.setconfig('ui', 'forcemerge',
+                                          opts.get('tool', ''))
+                        stats = mergemod.applyupdates(repo, action,
+                                                      wctx=repo[None],
+                                                      mctx=ctx,
+                                                      actx=ctx.p1(),
+                                                      overwrite=False)
+                        repo.setparents(current.node(), ctx.node())
+                        mergemod.recordupdates(repo, action, branchmerge=True)
+                    finally:
+                        repo.ui.setconfig('ui', 'forcemerge', '')
+                    # report any conflicts
+                    if stats and stats[3] > 0:
+                        # write out state for --continue
+                        nodelines = [repo[rev].hex() + "\n"
+                                     for rev in revs[pos:]]
+                        repo.opener.write('graftstate', ''.join(nodelines))
+                        raise util.Abort(
+                            _("unresolved conflicts, can't continue"),
+                            hint=_('use hg resolve and hg graft --continue'))
+
+            if cont or not memoryapply:
+                cont = False
+                repo.setparents(current.node(), nullid)
+                repo.dirstate.write()
+                # fix up dirstate for copies and renames
+                cmdutil.duplicatecopies(repo, ctx.rev(), ctx.p1().rev())
+
+                # commit
+                node = repo.commit(text=message, user=user,
+                                   date=date, extra=extra, editor=editor)
+
             else:
                 cont = False
-
-            # drop the second merge parent
-            repo.setparents(current.node(), nullid)
-            repo.dirstate.write()
-            # fix up dirstate for copies and renames
-            cmdutil.duplicatecopies(repo, ctx.rev(), ctx.p1().rev())
-
-            # commit
-            node = repo.commit(text=message, user=user,
-                        date=date, extra=extra, editor=editor)
+                node = repo.commitctx(memctx)
+
             if node is None:
                 ui.status(_('graft for revision %s is empty\n') % ctx.rev())
             else:
                 current = repo[node]
+        # if we've committed at least one revision, update to the most recent
+        if current.node() != repo['.'].node():
+            mergemod.update(repo, current.node(),
+                            branchmerge=False,
+                            force=True,
+                            partial=False)
+
     finally:
         wlock.release()
 
diff --git a/mercurial/merge.py b/mercurial/merge.py
--- a/mercurial/merge.py
+++ b/mercurial/merge.py
@@ -651,3 +651,43 @@ 
     if not partial:
         repo.hook('update', parent1=xp1, parent2=xp2, error=stats[3])
     return stats
+
+
+def memapplyinfo(mctx, action):
+    """Extract a list of files and a filectxfunction from an action list
+
+    This function is for translating from the merge-internal action
+    list format to a list of files, and a file context function,
+    suitable for constructing a memctx object.
+
+    Please note that not all actions are supported for in-memory
+    commits.  This function will return a None, None tuple if you pass
+    it a list with an unsupprted action.
+
+    When successful, this function will return a (filelist, filectxfn) tuple.
+    """
+
+    memactions = set(['g'])
+    copymap = copies.pathcopies(mctx.p1(), mctx)
+
+    actionmap = {}
+    filelist = []
+    for a in action:
+        actionmap[a[0]] = a
+        filelist.append(a[0])
+        if a[1] not in memactions:
+            return None, None
+
+
+    def filectxfn(repo, memctx, path):
+        a = actionmap[path]
+        flags = a[2]
+        islink = 'l' in flags
+        isexec = 'x' in flags
+        mfilectx = mctx.filectx(path)
+        return context.memfilectx(path, mfilectx.data(),
+                                  islink=islink,
+                                  isexec=isexec,
+                                  copied=copymap.get(path))
+
+    return filelist, filectxfn
diff --git a/tests/test-graft.t b/tests/test-graft.t
--- a/tests/test-graft.t
+++ b/tests/test-graft.t
@@ -135,8 +135,9 @@ 
     checking for directory renames
   resolving manifests
    overwrite: False, partial: False
-   ancestor: 68795b066622, local: ef0ef43d49e7+, remote: 5d205f8b35b6
+   ancestor: 68795b066622, local: ef0ef43d49e7, remote: 5d205f8b35b6
    b: local copied/moved to a -> m
+  performing (graft) merge on-disk
   preserving b for resolve of b
   updating: b 1/1 files (100.00%)
   picked tool 'internal:merge' for b (binary False symlink False)
@@ -148,18 +149,24 @@ 
     searching for copies back to rev 1
   resolving manifests
    overwrite: False, partial: False
-   ancestor: 4c60f11aa304, local: 6b9e5368ca4e+, remote: 97f8bfe72746
+   ancestor: 4c60f11aa304, local: 6b9e5368ca4e, remote: 97f8bfe72746
    e: remote is newer -> g
-  updating: e 1/1 files (100.00%)
-  getting e
+  performing (graft) merge in-memory
   e
   grafting revision 4
     searching for copies back to rev 1
   resolving manifests
    overwrite: False, partial: False
-   ancestor: 4c60f11aa304, local: 1905859650ec+, remote: 9c233e8e184d
+   ancestor: 4c60f11aa304, local: 1905859650ec, remote: 9c233e8e184d
    e: versions differ -> m
    d: remote is newer -> g
+  performing (graft) merge on-disk
+  resolving manifests
+   overwrite: True, partial: False
+   ancestor: 6b9e5368ca4e+, local: 6b9e5368ca4e+, remote: 1905859650ec
+   e: remote is newer -> g
+  updating: e 1/1 files (100.00%)
+  getting e
   preserving e for resolve of e
   updating: d 1/2 files (50.00%)
   getting d