Patchwork dirstate: add --minimal flag to debugrebuilddirstate

login
register
mail settings
Submitter Durham Goode
Date Aug. 13, 2015, 2:50 a.m.
Message ID <fc181ceca7961caadaad.1439434201@dev2000.prn2.facebook.com>
Download mbox | patch
Permalink /patch/10199/
State Accepted
Headers show

Comments

Durham Goode - Aug. 13, 2015, 2:50 a.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1439433861 25200
#      Wed Aug 12 19:44:21 2015 -0700
# Node ID fc181ceca7961caadaad137efbe0472f092c998d
# Parent  a5f62af2951729db578347fec0c5a52bb3229db9
dirstate: add --minimal flag to debugrebuilddirstate

On repositories with hundreds of thousands of files, hg debugrebuilddirstate
causes every dirstate entry to be marked lookup, and the next hg status can take
many minutes.

This adds a --minimal flag that allows us to only rebuild the parts of the
dirstate that are inconsistent. This follows two rules:

1) If a file is in the dirstate but not in the parent manifest, and it is not
marked 'add', it is busted and we should drop it.

2) If a file is not in the dirstate at all, but it is in the parent manifest, it
should be added to the dirstate and we need to mark it as lookup.

This allows us to fix repositories where the dirstate doesn't match the manifest
much more quickly.

Tested by artificially adding bad dirstate entries (via code) for both cases
above.
Matt Mackall - Aug. 13, 2015, 10:39 p.m.
On Wed, 2015-08-12 at 19:50 -0700, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1439433861 25200
> #      Wed Aug 12 19:44:21 2015 -0700
> # Node ID fc181ceca7961caadaad137efbe0472f092c998d
> # Parent  a5f62af2951729db578347fec0c5a52bb3229db9
> dirstate: add --minimal flag to debugrebuilddirstate

Queued for default, thanks. Would really like some sort of test for this
though.

Patch

diff --git a/mercurial/commands.py b/mercurial/commands.py
--- a/mercurial/commands.py
+++ b/mercurial/commands.py
@@ -2700,9 +2700,12 @@  def debugpvec(ui, repo, a, b=None):
               pa.distance(pb), rel))
 
 @command('debugrebuilddirstate|debugrebuildstate',
-    [('r', 'rev', '', _('revision to rebuild to'), _('REV'))],
+    [('r', 'rev', '', _('revision to rebuild to'), _('REV')),
+     ('', 'minimal', None, _('only rebuild files that are inconsistent with '
+                             'the working copy parent')),
+    ],
     _('[-r REV]'))
-def debugrebuilddirstate(ui, repo, rev):
+def debugrebuilddirstate(ui, repo, rev, **opts):
     """rebuild the dirstate as it would look like for the given revision
 
     If no revision is specified the first current parent will be used.
@@ -2711,13 +2714,33 @@  def debugrebuilddirstate(ui, repo, rev):
     The actual working directory content or existing dirstate
     information such as adds or removes is not considered.
 
+    ``minimal`` will only rebuild the dirstate status for files that claim to be
+    tracked but are not in the parent manifest, or that exist in the parent
+    manifest but are not in the dirstate. It will not change adds, removes, or
+    modified files that are in the working copy parent.
+
     One use of this command is to make the next :hg:`status` invocation
     check the actual file content.
     """
     ctx = scmutil.revsingle(repo, rev)
     wlock = repo.wlock()
     try:
-        repo.dirstate.rebuild(ctx.node(), ctx.manifest())
+        dirstate = repo.dirstate
+
+        # See command doc for what minimal does.
+        if opts.get('minimal'):
+            dirstatefiles = set(dirstate)
+            ctxfiles = set(ctx.manifest().keys())
+            for file in (dirstatefiles | ctxfiles):
+                indirstate = file in dirstatefiles
+                inctx = file in ctxfiles
+
+                if indirstate and not inctx and dirstate[file] != 'a':
+                    dirstate.drop(file)
+                elif inctx and not indirstate:
+                    dirstate.normallookup(file)
+        else:
+            dirstate.rebuild(ctx.node(), ctx.manifest())
     finally:
         wlock.release()