Patchwork dirstate: speed up repeated missing directory checks

login
register
mail settings
Submitter Martin von Zweigbergk
Date Nov. 20, 2014, 5:53 a.m.
Message ID <0eff6940a2efac28fbbf.1416462807@martinvonz.mtv.corp.google.com>
Download mbox | patch
Permalink /patch/6807/
State Accepted
Commit a179db3db9b96b38c10c491e6e7e7ad5f40a7787
Headers show

Comments

Martin von Zweigbergk - Nov. 20, 2014, 5:53 a.m.
# HG changeset patch
# User Martin von Zweigbergk <martinvonz@google.com>
# Date 1416460026 28800
#      Wed Nov 19 21:07:06 2014 -0800
# Node ID 0eff6940a2efac28fbbf4180592a0f4e1404744e
# Parent  cc83cac41619b32375b28f821237fb0747010b80
dirstate: speed up repeated missing directory checks

In a mozilla repo with tip at bb3ff09f52fe,

  hg update tip~1000 && time hg revert -nq -r tip .

displays ~4:20 minutes. With tip~100, it runs in ~11 s. With revision
100000, it did not finish in 12 minutes.

Revert calls dirstate.status() with a matcher that matches each file
in the target revision. The main problem [1] lies in
dirstate._walkexplicit(), which looks for matching deleted directories
by checking whether each path is prefix of any path in the
dirstate. With m files in the dirstate and n files in the target
revision that are not in the dirstate, this is clearly O(m*n). Let's
improve by keeping a lazily initialized set of all the directories in
the dirstate, so the time becomes O(m+n).

After this patch, the 4:20 minutes become 5.5 s, while for a single
missing path, it slows down from 1.092 s to 1.150 s (best of 4). The
>12 min case becomes 5.8 s.

 [1] A narrower optimization would be to make revert take the fast
     path for '.' and '--all'.
Pierre-Yves David - Nov. 20, 2014, 6:48 a.m.
On 11/19/2014 09:53 PM, Martin von Zweigbergk wrote:
> # HG changeset patch
> # User Martin von Zweigbergk <martinvonz@google.com>
> # Date 1416460026 28800
> #      Wed Nov 19 21:07:06 2014 -0800
> # Node ID 0eff6940a2efac28fbbf4180592a0f4e1404744e
> # Parent  cc83cac41619b32375b28f821237fb0747010b80
> dirstate: speed up repeated missing directory checks

Nice and nice, pushed to the clowncopter.
Martin von Zweigbergk - Nov. 20, 2014, 6:59 a.m.
Hmm... that 'break' should have been dropped :-( It won't impact timings,
so no need to update description. Will you fix or do you want a resend?
Either way, might as well also change the 'self._map' to 'dmap' for
consistency.

On Wed Nov 19 2014 at 10:49:01 PM Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 11/19/2014 09:53 PM, Martin von Zweigbergk wrote:
> > # HG changeset patch
> > # User Martin von Zweigbergk <martinvonz@google.com>
> > # Date 1416460026 28800
> > #      Wed Nov 19 21:07:06 2014 -0800
> > # Node ID 0eff6940a2efac28fbbf4180592a0f4e1404744e
> > # Parent  cc83cac41619b32375b28f821237fb0747010b80
> > dirstate: speed up repeated missing directory checks
>
> Nice and nice, pushed to the clowncopter.
>
> --
> Pierre-Yves David
>
Pierre-Yves David - Nov. 20, 2014, 7:15 a.m.
On 11/19/2014 10:59 PM, Martin von Zweigbergk wrote:
> Hmm... that 'break' should have been dropped :-( It won't impact
> timings, so no need to update description. Will you fix or do you want a
> resend? Either way, might as well also change the 'self._map' to 'dmap'
> for consistency.

patch update on the clowncopter

Patch

diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
--- a/mercurial/dirstate.py
+++ b/mercurial/dirstate.py
@@ -629,6 +629,7 @@ 
         results = dict.fromkeys(subrepos)
         results['.hg'] = None
 
+        alldirs = None
         for ff in files:
             if normalize:
                 nf = normalize(normpath(ff), False, True)
@@ -657,13 +658,13 @@ 
                 if nf in dmap: # does it exactly match a missing file?
                     results[nf] = None
                 else: # does it match a missing directory?
-                    prefix = nf + "/"
-                    for fn in dmap:
-                        if fn.startswith(prefix):
-                            if matchedir:
-                                matchedir(nf)
-                            notfoundadd(nf)
-                            break
+                    if alldirs is None:
+                        alldirs = scmutil.dirs(self._map)
+                    if nf in alldirs:
+                        if matchedir:
+                            matchedir(nf)
+                        notfoundadd(nf)
+                        break
                     else:
                         badfn(ff, inst.strerror)