# Patchwork [2,of,5,V2] strip: add faster revlog strip computation

Submitter Durham Goode Nov. 13, 2013, 9:31 p.m. <05ca3c1dac66f205bf65.1384378263@dev350.prn1.facebook.com> mbox | patch /patch/2937/ Superseded 5fc2ae1c631b94ea83b2fe078c8acfea5aa55fbf show

Durham Goode - Nov. 13, 2013, 9:31 p.m.
```# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1384216969 28800
#      Mon Nov 11 16:42:49 2013 -0800
# Node ID 05ca3c1dac66f205bf65eaf8c6695dae4f4531f7
# Parent  e243df7f91498e4423c5854c3f38322854304760
strip: add faster revlog strip computation

The previous revlog strip computation would walk every rev in the revlog, from
the bottom to the top.  Since we're usually stripping only the top few revs of
the revlog, this was needlessly expensive on large repos.

The new algorithm walks the exact number of revs that will be stripped, thus
making the operation not dependent on the number of revs in the repo.

This makes amend on a large repo go from 8.7 seconds to 6 seconds.
```

## Patch

```diff --git a/mercurial/repair.py b/mercurial/repair.py
--- a/mercurial/repair.py
+++ b/mercurial/repair.py
@@ -38,16 +38,8 @@
"""return the changesets which will be broken by the truncation"""
s = set()
def collectone(revlog):
-        # find the truncation point of the revlog
-            if lrev >= striprev:
-                break
-        # see if any revision after this point has a linkrev
-        # less than striprev (those will be broken by strip)
-            if lrev < striprev:
+        _, brokenset = revlog.getstrippoint(striprev)
+        s.update([revlog.linkrev(r) for r in brokenset])

collectone(repo.manifest)
for fname in files:
diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -1285,6 +1285,49 @@

return content

+        """find the minimum rev that must be stripped to strip the linkrev
+
+        Returns a tuple containing the minimum rev and a set of all revs that
+        have linkrevs that will be broken by this strip.
+        """
+        brokenrevs = set()
+
+        # This algorithm involves walking down the rev graph, starting at the
+        # it this way, we walk the minimum numbers of revs for the strip,
+        # which improves perf in large repos.
+                break
+
+        strippoint = len(self)
+
+
+            for p in self.parentrevs(strippoint):
+                if p != nullrev:
+
+            # Since the revs are topologically sorted according to linkrev,
+            # no more revs that could have a linkrev greater than minlink.
+            # So we can stop walking.
+                    break
+
+        return strippoint, brokenrevs
+
"""truncate the revlog on the first revision with a linkrev >= minlink

@@ -1302,10 +1345,8 @@
if len(self) == 0:
return

-        for rev in self: