Patchwork strip: make --keep option not set all dirstate times to 0

login
register
mail settings
Submitter Durham Goode
Date March 7, 2013, 4:23 a.m.
Message ID <83dd9435821427a1ce46.1362630212@dev350.prn1.facebook.com>
Download mbox | patch
Permalink /patch/1081/
State Superseded
Commit e74704c33e24b841301b91508a4275e9571998fa
Headers show

Comments

Durham Goode - March 7, 2013, 4:23 a.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1362629589 28800
#      Wed Mar 06 20:13:09 2013 -0800
# Node ID 83dd9435821427a1ce4646d02fd23fdd44fd34d1
# Parent  6aca4d1c744ed8f6c1305525ded7590abaa72d06
strip: make --keep option not set all dirstate times to 0

hg strip -k was using dirstate.rebuild() which reset all the dirstate
entries timestamps to 0.  This meant that the next time hg status was
run every file was considered to be 'unsure', which caused it to do
expensive read operations on every filelog. On a repo with >150,000
files it took 70 seconds when everything was in memory.  From a cold
cache it took several minutes.

The fix is to only reset files that have changed between the working
context and the destination context.

For reference, --keep means the working directory is left alone during
the strip. We have users wanting to use this operation to store their
work-in-progress as a commit on a branch while they go work on another
branch, then come back later and be able to uncommit that work and
continue working.  They currently use 'git reset HARD^' to accomplish
this in git.
Bryan O'Sullivan - March 8, 2013, 6:44 p.m.
On Wed, Mar 6, 2013 at 8:23 PM, Durham Goode <durham@fb.com> wrote:

> +            changedfiles = []
> +            for f, m, args, msg in actions:
> +                # blindy reset the files, regardless of actual action
> necessary
> +                changedfiles.append(f)
>

changedfiles = [a[0] for a in actions]
Bryan O'Sullivan - March 8, 2013, 7 p.m.
On Fri, Mar 8, 2013 at 10:51 AM, Kevin Bullock <
kbullock+mercurial@ringworld.org> wrote:

> Indeed. The list comprehension is usually faster, right?


I'd expect it to be maybe 20% faster in this case, but the reduced number
of lines is nice, too.

Patch

diff --git a/hgext/mq.py b/hgext/mq.py
--- a/hgext/mq.py
+++ b/hgext/mq.py
@@ -63,7 +63,7 @@ 
 from mercurial.node import bin, hex, short, nullid, nullrev
 from mercurial.lock import release
 from mercurial import commands, cmdutil, hg, scmutil, util, revset
-from mercurial import repair, extensions, error, phases
+from mercurial import repair, extensions, error, phases, merge
 from mercurial import patch as patchmod
 import os, re, errno, shutil
 
@@ -3037,7 +3037,18 @@ 
         wlock = repo.wlock()
         try:
             urev = repo.mq.qparents(repo, revs[0])
-            repo.dirstate.rebuild(urev, repo[urev].manifest())
+            uctx = repo[urev]
+
+            # only reset the dirstate for files that would actually change
+            # between the working context and uctx
+            actions = merge.calculateupdates(repo, repo[None], uctx, uctx,
+                                             False, True, None)
+            changedfiles = []
+            for f, m, args, msg in actions:
+                # blindy reset the files, regardless of actual action necessary
+                changedfiles.append(f)
+
+            repo.dirstate.rebuild(urev, uctx.manifest(), changedfiles)
             repo.dirstate.write()
             update = False
         finally:
diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
--- a/mercurial/dirstate.py
+++ b/mercurial/dirstate.py
@@ -498,13 +498,18 @@ 
         self._lastnormaltime = 0
         self._dirty = True
 
-    def rebuild(self, parent, files):
+    def rebuild(self, parent, allfiles, changedfiles=None):
+        changedfiles = changedfiles or allfiles
+        oldmap = self._map
         self.clear()
-        for f in files:
-            if 'x' in files.flags(f):
-                self._map[f] = ('n', 0777, -1, 0)
+        for f in allfiles:
+            if f not in changedfiles:
+                self._map[f] = oldmap[f]
             else:
-                self._map[f] = ('n', 0666, -1, 0)
+                if 'x' in allfiles.flags(f):
+                    self._map[f] = ('n', 0777, -1, 0)
+                else:
+                    self._map[f] = ('n', 0666, -1, 0)
         self._pl = (parent, nullid)
         self._dirty = True