Patchwork [1,of,8] _addrevision: refactor out the selection of candidate revisions

login
register
mail settings
Submitter Paul Morelle
Date Jan. 14, 2018, 10:28 a.m.
Message ID <7526dfca3d32e7c51864.1515925697@taranis.localdomain>
Download mbox | patch
Permalink /patch/26733/
State Accepted
Headers show

Comments

Paul Morelle - Jan. 14, 2018, 10:28 a.m.
# HG changeset patch
# User Paul Morelle <paul.morelle@octobus.net>
# Date 1515668342 -3600
#      Thu Jan 11 11:59:02 2018 +0100
# Node ID 7526dfca3d32e7c51864c21de2c2f4735c4cade6
# Parent  4b68ca118d8d316cff1fbfe260e8fdb0dae3e26a
# EXP-Topic refactor-revlog
# Available At https://bitbucket.org/octobus/mercurial-devel/
#              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 7526dfca3d32
_addrevision: refactor out the selection of candidate revisions

The new function will be useful to retrieve all the revisions which will be
needed to determine the best delta, and parallelize the computation of the
necessary diffs.
Gregory Szorc - Jan. 14, 2018, 8:46 p.m.
On Sun, Jan 14, 2018 at 2:28 AM, Paul Morelle <paul.morelle@octobus.net>
wrote:

> # HG changeset patch
> # User Paul Morelle <paul.morelle@octobus.net>
> # Date 1515668342 -3600
> #      Thu Jan 11 11:59:02 2018 +0100
> # Node ID 7526dfca3d32e7c51864c21de2c2f4735c4cade6
> # Parent  4b68ca118d8d316cff1fbfe260e8fdb0dae3e26a
> # EXP-Topic refactor-revlog
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r
> 7526dfca3d32
> _addrevision: refactor out the selection of candidate revisions
>

This refactor seems to preserve the existing behavior AFAICT. The code is
not easy to read. But that was a preexisting problem. Hopefully the end
state of this series improves matters more. If not, that can be done in
follow-ups.

Queued part 1.

Will look at the subsequent parts soon...

Also, please use "revlog:" in the commit message.


>
> The new function will be useful to retrieve all the revisions which will be
> needed to determine the best delta, and parallelize the computation of the
> necessary diffs.
>
> diff -r 4b68ca118d8d -r 7526dfca3d32 mercurial/revlog.py
> --- a/mercurial/revlog.py       Thu Jan 11 11:57:59 2018 +0000
> +++ b/mercurial/revlog.py       Thu Jan 11 11:59:02 2018 +0100
> @@ -1844,6 +1844,44 @@
>
>          return True
>
> +    def _getcandidaterevs(self, p1, p2, cachedelta):
> +        """
> +        Provides revisions that present an interest to be diffed against,
> +        grouped by level of easiness.
> +        """
>

I'd appreciate a bit more documentation around the intended use of the
generator stream. Essentially, it is emitting iterables of revs and the
first group that yields a rev with a suitable delta is used and the client
stops processing. That's kinda a weird pattern. But it is how the existing
logic works. Using a generator is a clever way to codify that logic.


> +        curr = len(self)
> +        prev = curr - 1
> +        p1r, p2r = self.rev(p1), self.rev(p2)
>

curr, p1r, and p2r are already calculated by the caller. I'd pass them into
this function. Please send a follow-up.


> +
> +        # should we try to build a delta?
> +        if prev != nullrev and self.storedeltachains:
>

We could change this to `if pre == nullrev or not self.storedeltachains:
return` and dedent the following block. That would make things a bit easier
to read.


> +            tested = set()
> +            # This condition is true most of the time when processing
> +            # changegroup data into a generaldelta repo. The only time it
> +            # isn't true is if this is the first revision in a delta chain
> +            # or if ``format.generaldelta=true`` disabled
> ``lazydeltabase``.
> +            if cachedelta and self._generaldelta and self._lazydeltabase:
> +                # Assume what we received from the server is a good choice
> +                # build delta will reuse the cache
> +                yield (cachedelta[0],)
> +                tested.add(cachedelta[0])
> +
> +            if self._generaldelta:
> +                # exclude already lazy tested base if any
> +                parents = [p for p in (p1r, p2r)
> +                           if p != nullrev and p not in tested]
> +                if parents and not self._aggressivemergedeltas:
> +                    # Pick whichever parent is closer to us (to minimize
> the
> +                    # chance of having to build a fulltext).
> +                    parents = [max(parents)]
> +                tested.update(parents)
> +                yield parents
> +
> +            if prev not in tested:
> +                # other approach failed try against prev to hopefully
> save us a
> +                # fulltext.
> +                yield (prev,)
>

Having looked at this code in detail as part of the review, it seems
nonsensical to me to emit prev as a candidate revision when generaldelta is
being used. I'd refactor the code to use separate branches for generaldelta
and non-generaldelta scenarios. But this can be done as follow-up (if it
isn't addressed by later patches in this series).


>      def _addrevision(self, node, rawtext, transaction, link, p1, p2,
> flags,
>                       cachedelta, ifh, dfh, alwayscache=False):
>          """internal function to add revisions to the log
> @@ -1943,42 +1981,16 @@
>          else:
>              textlen = len(rawtext)
>
> -        # should we try to build a delta?
> -        if prev != nullrev and self.storedeltachains:
> -            tested = set()
> -            # This condition is true most of the time when processing
> -            # changegroup data into a generaldelta repo. The only time it
> -            # isn't true is if this is the first revision in a delta chain
> -            # or if ``format.generaldelta=true`` disabled
> ``lazydeltabase``.
> -            if cachedelta and self._generaldelta and self._lazydeltabase:
> -                # Assume what we received from the server is a good choice
> -                # build delta will reuse the cache
> -                candidatedelta = builddelta(cachedelta[0])
> -                tested.add(cachedelta[0])
> +        for candidaterevs in self._getcandidaterevs(p1, p2, cachedelta):
> +            nominateddeltas = []
> +            for candidaterev in candidaterevs:
> +                candidatedelta = builddelta(candidaterev)
>                  if self._isgooddelta(candidatedelta, textlen):
> -                    delta = candidatedelta
> -            if delta is None and self._generaldelta:
> -                # exclude already lazy tested base if any
> -                parents = [p for p in (p1r, p2r)
> -                           if p != nullrev and p not in tested]
> -                if parents and not self._aggressivemergedeltas:
> -                    # Pick whichever parent is closer to us (to minimize
> the
> -                    # chance of having to build a fulltext).
> -                    parents = [max(parents)]
> -                tested.update(parents)
> -                pdeltas = []
> -                for p in parents:
> -                    pd = builddelta(p)
> -                    if self._isgooddelta(pd, textlen):
> -                        pdeltas.append(pd)
> -                if pdeltas:
> -                    delta = min(pdeltas, key=lambda x: x[1])
> -            if delta is None and prev not in tested:
> -                # other approach failed try against prev to hopefully
> save us a
> -                # fulltext.
> -                candidatedelta = builddelta(prev)
> -                if self._isgooddelta(candidatedelta, textlen):
> -                    delta = candidatedelta
> +                    nominateddeltas.append(candidatedelta)
> +            if nominateddeltas:
> +                delta = min(nominateddeltas, key=lambda x: x[1])
> +                break
> +
>          if delta is not None:
>              dist, l, data, base, chainbase, chainlen, compresseddeltalen
> = delta
>          else:
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>

Patch

diff -r 4b68ca118d8d -r 7526dfca3d32 mercurial/revlog.py
--- a/mercurial/revlog.py	Thu Jan 11 11:57:59 2018 +0000
+++ b/mercurial/revlog.py	Thu Jan 11 11:59:02 2018 +0100
@@ -1844,6 +1844,44 @@ 
 
         return True
 
+    def _getcandidaterevs(self, p1, p2, cachedelta):
+        """
+        Provides revisions that present an interest to be diffed against,
+        grouped by level of easiness.
+        """
+        curr = len(self)
+        prev = curr - 1
+        p1r, p2r = self.rev(p1), self.rev(p2)
+
+        # should we try to build a delta?
+        if prev != nullrev and self.storedeltachains:
+            tested = set()
+            # This condition is true most of the time when processing
+            # changegroup data into a generaldelta repo. The only time it
+            # isn't true is if this is the first revision in a delta chain
+            # or if ``format.generaldelta=true`` disabled ``lazydeltabase``.
+            if cachedelta and self._generaldelta and self._lazydeltabase:
+                # Assume what we received from the server is a good choice
+                # build delta will reuse the cache
+                yield (cachedelta[0],)
+                tested.add(cachedelta[0])
+
+            if self._generaldelta:
+                # exclude already lazy tested base if any
+                parents = [p for p in (p1r, p2r)
+                           if p != nullrev and p not in tested]
+                if parents and not self._aggressivemergedeltas:
+                    # Pick whichever parent is closer to us (to minimize the
+                    # chance of having to build a fulltext).
+                    parents = [max(parents)]
+                tested.update(parents)
+                yield parents
+
+            if prev not in tested:
+                # other approach failed try against prev to hopefully save us a
+                # fulltext.
+                yield (prev,)
+
     def _addrevision(self, node, rawtext, transaction, link, p1, p2, flags,
                      cachedelta, ifh, dfh, alwayscache=False):
         """internal function to add revisions to the log
@@ -1943,42 +1981,16 @@ 
         else:
             textlen = len(rawtext)
 
-        # should we try to build a delta?
-        if prev != nullrev and self.storedeltachains:
-            tested = set()
-            # This condition is true most of the time when processing
-            # changegroup data into a generaldelta repo. The only time it
-            # isn't true is if this is the first revision in a delta chain
-            # or if ``format.generaldelta=true`` disabled ``lazydeltabase``.
-            if cachedelta and self._generaldelta and self._lazydeltabase:
-                # Assume what we received from the server is a good choice
-                # build delta will reuse the cache
-                candidatedelta = builddelta(cachedelta[0])
-                tested.add(cachedelta[0])
+        for candidaterevs in self._getcandidaterevs(p1, p2, cachedelta):
+            nominateddeltas = []
+            for candidaterev in candidaterevs:
+                candidatedelta = builddelta(candidaterev)
                 if self._isgooddelta(candidatedelta, textlen):
-                    delta = candidatedelta
-            if delta is None and self._generaldelta:
-                # exclude already lazy tested base if any
-                parents = [p for p in (p1r, p2r)
-                           if p != nullrev and p not in tested]
-                if parents and not self._aggressivemergedeltas:
-                    # Pick whichever parent is closer to us (to minimize the
-                    # chance of having to build a fulltext).
-                    parents = [max(parents)]
-                tested.update(parents)
-                pdeltas = []
-                for p in parents:
-                    pd = builddelta(p)
-                    if self._isgooddelta(pd, textlen):
-                        pdeltas.append(pd)
-                if pdeltas:
-                    delta = min(pdeltas, key=lambda x: x[1])
-            if delta is None and prev not in tested:
-                # other approach failed try against prev to hopefully save us a
-                # fulltext.
-                candidatedelta = builddelta(prev)
-                if self._isgooddelta(candidatedelta, textlen):
-                    delta = candidatedelta
+                    nominateddeltas.append(candidatedelta)
+            if nominateddeltas:
+                delta = min(nominateddeltas, key=lambda x: x[1])
+                break
+
         if delta is not None:
             dist, l, data, base, chainbase, chainlen, compresseddeltalen = delta
         else: