Patchwork [01,of,13] sparse-read: target density of 50% instead of 25%

login
register
mail settings
Submitter Boris Feld
Date July 10, 2018, 1:27 p.m.
Message ID <c24db8e9c40d26bd1069.1531229230@FB-lair>
Download mbox | patch
Permalink /patch/32742/
State Accepted
Headers show

Comments

Boris Feld - July 10, 2018, 1:27 p.m.
# HG changeset patch
# User Paul Morelle <paul.morelle@octobus.net>
# Date 1529680344 -7200
#      Fri Jun 22 17:12:24 2018 +0200
# Node ID c24db8e9c40d26bd106954c7186d1ff30dad6c19
# Parent  9b077e5fa8ba8d89568d07a4815685d5db4147fe
# EXP-Topic write-for-sparse-read
# Available At https://bitbucket.org/octobus/mercurial-devel/
#              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r c24db8e9c40d
sparse-read: target density of 50% instead of 25%

The target density value is wrong. The default target chain span is
4*text-length. However, the target max chain payload is 2*text-length. So
default target density should be 50% (2/4) not 25% (1/4).
Gregory Szorc - July 10, 2018, 10:37 p.m.
On Tue, Jul 10, 2018 at 6:27 AM, Boris Feld <boris.feld@octobus.net> wrote:

> # HG changeset patch
> # User Paul Morelle <paul.morelle@octobus.net>
> # Date 1529680344 -7200
> #      Fri Jun 22 17:12:24 2018 +0200
> # Node ID c24db8e9c40d26bd106954c7186d1ff30dad6c19
> # Parent  9b077e5fa8ba8d89568d07a4815685d5db4147fe
> # EXP-Topic write-for-sparse-read
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r
> c24db8e9c40d
> sparse-read: target density of 50% instead of 25%
>

I queued this series.

One of the patches introduced a double space after a "<". I cleaned this up
in flight.

I'm not super keen on introducing the test revlog class and the inline
doctests. test-revlog.py might be a better home for those tests. But I
don't want to block landing on that.

Since you are experimenting with revlog reading strategies, if you have
time, it would be interesting to see if memory mapping the revlog has any
performance benefits. Obviously that will increase memory requirements. The
OS will reserve memory for the size of the mmap segment. But it shouldn't
actually be allocated until the underlying bytes are accessed. I'd like to
think that on machines with gigabytes of memory, this would "just work" and
would be no worse than our current strategy where we read a bunch of chunks
in memory. If Python properly supports 0-copy on reads from a mmap()
segment and we can throw away the mmap after resolving a revision, I'd
expect to see a performance improvement because I/O is handled by the
memory system and not Python. But we won't know until we measure it!


>
> The target density value is wrong. The default target chain span is
> 4*text-length. However, the target max chain payload is 2*text-length. So
> default target density should be 50% (2/4) not 25% (1/4).
>
> diff --git a/mercurial/configitems.py b/mercurial/configitems.py
> --- a/mercurial/configitems.py
> +++ b/mercurial/configitems.py
> @@ -594,7 +594,7 @@ coreconfigitem('experimental', 'sparse-r
>      default=False,
>  )
>  coreconfigitem('experimental', 'sparse-read.density-threshold',
> -    default=0.25,
> +    default=0.50,
>  )
>  coreconfigitem('experimental', 'sparse-read.min-gap-size',
>      default='256K',
> diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> --- a/mercurial/revlog.py
> +++ b/mercurial/revlog.py
> @@ -625,7 +625,7 @@ class revlog(object):
>          self._compengine = 'zlib'
>          self._maxdeltachainspan = -1
>          self._withsparseread = False
> -        self._srdensitythreshold = 0.25
> +        self._srdensitythreshold = 0.50
>          self._srmingapsize = 262144
>
>          mmapindexthreshold = None
>
Boris Feld - July 13, 2018, 9:16 a.m.
On 11/07/2018 00:37, Gregory Szorc wrote:
> On Tue, Jul 10, 2018 at 6:27 AM, Boris Feld <boris.feld@octobus.net 
> <mailto:boris.feld@octobus.net>> wrote:
>
>     # HG changeset patch
>     # User Paul Morelle <paul.morelle@octobus.net
>     <mailto:paul.morelle@octobus.net>>
>     # Date 1529680344 -7200
>     #      Fri Jun 22 17:12:24 2018 +0200
>     # Node ID c24db8e9c40d26bd106954c7186d1ff30dad6c19
>     # Parent  9b077e5fa8ba8d89568d07a4815685d5db4147fe
>     # EXP-Topic write-for-sparse-read
>     # Available At https://bitbucket.org/octobus/mercurial-devel/
>     <https://bitbucket.org/octobus/mercurial-devel/>
>     #              hg pull
>     https://bitbucket.org/octobus/mercurial-devel/
>     <https://bitbucket.org/octobus/mercurial-devel/> -r c24db8e9c40d
>     sparse-read: target density of 50% instead of 25%
>
>
> I queued this series.
>
> One of the patches introduced a double space after a "<". I cleaned 
> this up in flight.
>
> I'm not super keen on introducing the test revlog class and the inline 
> doctests. test-revlog.py might be a better home for those tests. But I 
> don't want to block landing on that.
I'm not a fan of mixing test class and actual code either. However, I 
think the tests are more readable as doctest. Maybe we should have some 
`mercurial.doctestutil` module or something similar for the test class?
>
> Since you are experimenting with revlog reading strategies, if you 
> have time, it would be interesting to see if memory mapping the revlog 
> has any performance benefits. Obviously that will increase memory 
> requirements. The OS will reserve memory for the size of the mmap 
> segment. But it shouldn't actually be allocated until the underlying 
> bytes are accessed. I'd like to think that on machines with gigabytes 
> of memory, this would "just work" and would be no worse than our 
> current strategy where we read a bunch of chunks in memory. If Python 
> properly supports 0-copy on reads from a mmap() segment and we can 
> throw away the mmap after resolving a revision, I'd expect to see a 
> performance improvement because I/O is handled by the memory system 
> and not Python. But we won't know until we measure it!

That's an interesting idea, we are adding it to our list of potential 
improvements. mmap can have tricky side-effects so it will need careful 
investigation. Using mmap seems to give good results for indexes, so it 
could help for data too.

We have already some done and reflections started on several potential 
improvements, so we will focus on these experiments first. Here is a 
roughly sorted list of what we want to look at and experiment around revlog:

* writing "sparse-revlog" to work around pathological case around 
chain-span limitation (currently in testing/benchmark)
   (this replace experimental.maxdeltachainspan)

* Chain length improvement
   (improving snapshot reuse, intermediate, snapshots, etc)

* mmapindex enabled by default,

* zstd as compression for storage,

* mmap for reading delta (your proposal).

Unfortunately, validating each of these experiment takes time, for 
converting + writing code + running benchmarks + ... We are about 1 or 2 
weeks minimum per experiment, so progress will be slow paced.

Cheers,

Patch

diff --git a/mercurial/configitems.py b/mercurial/configitems.py
--- a/mercurial/configitems.py
+++ b/mercurial/configitems.py
@@ -594,7 +594,7 @@  coreconfigitem('experimental', 'sparse-r
     default=False,
 )
 coreconfigitem('experimental', 'sparse-read.density-threshold',
-    default=0.25,
+    default=0.50,
 )
 coreconfigitem('experimental', 'sparse-read.min-gap-size',
     default='256K',
diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -625,7 +625,7 @@  class revlog(object):
         self._compengine = 'zlib'
         self._maxdeltachainspan = -1
         self._withsparseread = False
-        self._srdensitythreshold = 0.25
+        self._srdensitythreshold = 0.50
         self._srmingapsize = 262144
 
         mmapindexthreshold = None