Patchwork [2,of,8] _addrevision: choose between ifh and dfh once for all

login
register
mail settings
Submitter Paul Morelle
Date Jan. 14, 2018, 10:28 a.m.
Message ID <84eb864137a7b27e2357.1515925698@taranis.localdomain>
Download mbox | patch
Permalink /patch/26736/
State Accepted
Headers show

Comments

Paul Morelle - Jan. 14, 2018, 10:28 a.m.
# HG changeset patch
# User Paul Morelle <paul.morelle@octobus.net>
# Date 1515771775 -3600
#      Fri Jan 12 16:42:55 2018 +0100
# Node ID 84eb864137a7b27e2357eb4f6d465f726670dc98
# Parent  7526dfca3d32e7c51864c21de2c2f4735c4cade6
# EXP-Topic refactor-revlog
# Available At https://bitbucket.org/octobus/mercurial-devel/
#              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 84eb864137a7
_addrevision: choose between ifh and dfh once for all
Gregory Szorc - Jan. 14, 2018, 9:08 p.m.
On Sun, Jan 14, 2018 at 2:28 AM, Paul Morelle <paul.morelle@octobus.net>
wrote:

> # HG changeset patch
> # User Paul Morelle <paul.morelle@octobus.net>
> # Date 1515771775 -3600
> #      Fri Jan 12 16:42:55 2018 +0100
> # Node ID 84eb864137a7b27e2357eb4f6d465f726670dc98
> # Parent  7526dfca3d32e7c51864c21de2c2f4735c4cade6
> # EXP-Topic refactor-revlog
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r
> 84eb864137a7
> _addrevision: choose between ifh and dfh once for all
>

Queued parts 2-8. The entire series is now queued.

FWIW, I was thinking about enabling aggressivemergedeltas by default. Perf
work around bdiff optimization in the past ~1 year has made it fast enough
that only very large fulltexts have noticeable performance loss from
enabling the feature. If you make delta generation faster in the remainder
of this series, I think there should be little reason to not enable
aggressivemergedeltas by default.


>
> diff -r 7526dfca3d32 -r 84eb864137a7 mercurial/revlog.py
> --- a/mercurial/revlog.py       Thu Jan 11 11:59:02 2018 +0100
> +++ b/mercurial/revlog.py       Fri Jan 12 16:42:55 2018 +0100
> @@ -1901,6 +1901,11 @@
>              raise RevlogError(_("%s: attempt to add wdir revision") %
>                                (self.indexfile))
>
> +        if self._inline:
> +            fh = ifh
> +        else:
> +            fh = dfh
> +
>          btext = [rawtext]
>          def buildtext():
>              if btext[0] is not None:
> @@ -1915,10 +1920,6 @@
>                                                         len(delta) - hlen):
>                  btext[0] = delta[hlen:]
>              else:
> -                if self._inline:
> -                    fh = ifh
> -                else:
> -                    fh = dfh
>                  basetext = self.revision(baserev, _df=fh, raw=True)
>                  btext[0] = mdiff.patch(basetext, delta)
>
> @@ -1947,10 +1948,6 @@
>                      header = mdiff.replacediffheader(self.rawsize(rev),
> len(t))
>                      delta = header + t
>                  else:
> -                    if self._inline:
> -                        fh = ifh
> -                    else:
> -                        fh = dfh
>                      ptext = self.revision(rev, _df=fh, raw=True)
>                      delta = mdiff.textdiff(ptext, t)
>              header, data = self.compress(delta)
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
Paul Morelle - Jan. 15, 2018, 8:39 a.m.
On 01/14/2018 10:08 PM, Gregory Szorc wrote:
> On Sun, Jan 14, 2018 at 2:28 AM, Paul Morelle
> <paul.morelle@octobus.net <mailto:paul.morelle@octobus.net>> wrote:
>
>     # HG changeset patch
>     # User Paul Morelle <paul.morelle@octobus.net
>     <mailto:paul.morelle@octobus.net>>
>     # Date 1515771775 -3600
>     #      Fri Jan 12 16:42:55 2018 +0100
>     # Node ID 84eb864137a7b27e2357eb4f6d465f726670dc98
>     # Parent  7526dfca3d32e7c51864c21de2c2f4735c4cade6
>     # EXP-Topic refactor-revlog
>     # Available At https://bitbucket.org/octobus/mercurial-devel/
>     <https://bitbucket.org/octobus/mercurial-devel/>
>     #              hg pull
>     https://bitbucket.org/octobus/mercurial-devel/
>     <https://bitbucket.org/octobus/mercurial-devel/> -r 84eb864137a7
>     _addrevision: choose between ifh and dfh once for all
>
>
> Queued parts 2-8. The entire series is now queued.
>
> FWIW, I was thinking about enabling aggressivemergedeltas by default.
> Perf work around bdiff optimization in the past ~1 year has made it
> fast enough that only very large fulltexts have noticeable performance
> loss from enabling the feature. If you make delta generation faster in
> the remainder of this series, I think there should be little reason to
> not enable aggressivemergedeltas by default.
>  
Hello Gregory, and thank you for your review,

The idea would be indeed to increase delta generation speed.
In order to do so, it would be parallelized into threads (see
af25237be091 for the implementation in perfbdiff), which would store the
deltas in a buffer ready to be used by _addrevision and consor.
Once a revision is done, the buffer would be notified, and the
corresponding slot would be freed for another revision's deltas.

This series is a first step to reorganize the code a bit in order to be
able to separate things more easily between the threads and the main
process; however I am not sure to reach the goal in time for the freeze.

And yes, the real aim would be to generalize aggressivemergedeltas, and
even later to evolve the algorithm so that other revisions may be used
too (intermediate semi-fulltexts, ...).
However, the parallelization should be disabled if only one logical CPU
is available, as it would just slow things.
And in this case, the current behavior should be maintained.
As nowadays most computers have multiple threads, it shouldn't affect
most users, but codewise it may be a burden.
My idea would be to use an object for delta computation, which could be
replaced by a threaded version if multiple CPUs are found.

Do you have any comments about these ideas?

Have a nice day!

Paul
Gregory Szorc - Jan. 15, 2018, 6:59 p.m.
On Mon, Jan 15, 2018 at 12:39 AM, Paul Morelle <paul.morelle@octobus.net>
wrote:

> On 01/14/2018 10:08 PM, Gregory Szorc wrote:
>
> On Sun, Jan 14, 2018 at 2:28 AM, Paul Morelle <paul.morelle@octobus.net>
> wrote:
>
>> # HG changeset patch
>> # User Paul Morelle <paul.morelle@octobus.net>
>> # Date 1515771775 -3600
>> #      Fri Jan 12 16:42:55 2018 +0100
>> # Node ID 84eb864137a7b27e2357eb4f6d465f726670dc98
>> # Parent  7526dfca3d32e7c51864c21de2c2f4735c4cade6
>> # EXP-Topic refactor-revlog
>> # Available At https://bitbucket.org/octobus/mercurial-devel/
>> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r
>> 84eb864137a7
>> _addrevision: choose between ifh and dfh once for all
>>
>
> Queued parts 2-8. The entire series is now queued.
>
> FWIW, I was thinking about enabling aggressivemergedeltas by default. Perf
> work around bdiff optimization in the past ~1 year has made it fast enough
> that only very large fulltexts have noticeable performance loss from
> enabling the feature. If you make delta generation faster in the remainder
> of this series, I think there should be little reason to not enable
> aggressivemergedeltas by default.
>
>
> Hello Gregory, and thank you for your review,
>
> The idea would be indeed to increase delta generation speed.
> In order to do so, it would be parallelized into threads (see af25237be091
> for the implementation in perfbdiff), which would store the deltas in a
> buffer ready to be used by _addrevision and consor.
> Once a revision is done, the buffer would be notified, and the
> corresponding slot would be freed for another revision's deltas.
>
> This series is a first step to reorganize the code a bit in order to be
> able to separate things more easily between the threads and the main
> process; however I am not sure to reach the goal in time for the freeze.
>
> And yes, the real aim would be to generalize aggressivemergedeltas, and
> even later to evolve the algorithm so that other revisions may be used too
> (intermediate semi-fulltexts, ...).
> However, the parallelization should be disabled if only one logical CPU is
> available, as it would just slow things.
> And in this case, the current behavior should be maintained.
> As nowadays most computers have multiple threads, it shouldn't affect most
> users, but codewise it may be a burden.
> My idea would be to use an object for delta computation, which could be
> replaced by a threaded version if multiple CPUs are found.
>
> Do you have any comments about these ideas?
>

That all sounds like good work.

I'd like to add that I'd *really* like to get a Rust implementation of
revlogs. I'm starting to chip away at the edges of that problem. But you
should consider it vaporware at this time. In other words, don't let me
discourage you from making improvements to the Python code.

Patch

diff -r 7526dfca3d32 -r 84eb864137a7 mercurial/revlog.py
--- a/mercurial/revlog.py	Thu Jan 11 11:59:02 2018 +0100
+++ b/mercurial/revlog.py	Fri Jan 12 16:42:55 2018 +0100
@@ -1901,6 +1901,11 @@ 
             raise RevlogError(_("%s: attempt to add wdir revision") %
                               (self.indexfile))
 
+        if self._inline:
+            fh = ifh
+        else:
+            fh = dfh
+
         btext = [rawtext]
         def buildtext():
             if btext[0] is not None:
@@ -1915,10 +1920,6 @@ 
                                                        len(delta) - hlen):
                 btext[0] = delta[hlen:]
             else:
-                if self._inline:
-                    fh = ifh
-                else:
-                    fh = dfh
                 basetext = self.revision(baserev, _df=fh, raw=True)
                 btext[0] = mdiff.patch(basetext, delta)
 
@@ -1947,10 +1948,6 @@ 
                     header = mdiff.replacediffheader(self.rawsize(rev), len(t))
                     delta = header + t
                 else:
-                    if self._inline:
-                        fh = ifh
-                    else:
-                        fh = dfh
                     ptext = self.revision(rev, _df=fh, raw=True)
                     delta = mdiff.textdiff(ptext, t)
             header, data = self.compress(delta)