Patchwork [01,of,14,V3] util: move 'readexactly' in the util module

login
register
mail settings
Submitter Boris Feld
Date Jan. 19, 2018, 11:47 p.m.
Message ID <15f7795f96a5f9acb3ed.1516405626@FB>
Download mbox | patch
Permalink /patch/26987/
State Accepted
Headers show

Comments

Boris Feld - Jan. 19, 2018, 11:47 p.m.
# HG changeset patch
# User Boris Feld <boris.feld@octobus.net>
# Date 1516391495 -3600
#      Fri Jan 19 20:51:35 2018 +0100
# Node ID 15f7795f96a5f9acb3ed2e640fcec82f3ccd6f53
# Parent  de32acb24949c0e3633de373d1c6c8c814faa804
# EXP-Topic b2-stream
# Available At https://bitbucket.org/octobus/mercurial-devel/
#              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 15f7795f96a5
util: move 'readexactly' in the util module

This function is used in multiple place, having it in util would be better.
(existing caller will be migrated in another series)
Augie Fackler - Jan. 20, 2018, 1:55 a.m.
On Sat, Jan 20, 2018 at 12:47:06AM +0100, Boris Feld wrote:
> # HG changeset patch
> # User Boris Feld <boris.feld@octobus.net>
> # Date 1516391495 -3600
> #      Fri Jan 19 20:51:35 2018 +0100
> # Node ID 15f7795f96a5f9acb3ed2e640fcec82f3ccd6f53
> # Parent  de32acb24949c0e3633de373d1c6c8c814faa804
> # EXP-Topic b2-stream
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 15f7795f96a5
> util: move 'readexactly' in the util module

Comment for 4.6 cycle: util is a dumpster and is a bad place for
this. Please plan to begin the 4.6 cycle by moving this to a
netutil.py or similar so we can get util.py on a diet.

>
> This function is used in multiple place, having it in util would be better.
> (existing caller will be migrated in another series)
>
> diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py
> --- a/mercurial/changegroup.py
> +++ b/mercurial/changegroup.py
> @@ -32,14 +32,7 @@ from . import (
>  _CHANGEGROUPV2_DELTA_HEADER = "20s20s20s20s20s"
>  _CHANGEGROUPV3_DELTA_HEADER = ">20s20s20s20s20sH"
>
> -def readexactly(stream, n):
> -    '''read n bytes from stream.read and abort if less was available'''
> -    s = stream.read(n)
> -    if len(s) < n:
> -        raise error.Abort(_("stream ended unexpectedly"
> -                           " (got %d bytes, expected %d)")
> -                          % (len(s), n))
> -    return s
> +readexactly = util.readexactly
>
>  def getchunk(stream):
>      """return the next chunk from stream as a string"""
> diff --git a/mercurial/util.py b/mercurial/util.py
> --- a/mercurial/util.py
> +++ b/mercurial/util.py
> @@ -3865,3 +3865,12 @@ def safename(f, tag, ctx, others=None):
>          fn = '%s~%s~%s' % (f, tag, n)
>          if fn not in ctx and fn not in others:
>              return fn
> +
> +def readexactly(stream, n):
> +    '''read n bytes from stream.read and abort if less was available'''
> +    s = stream.read(n)
> +    if len(s) < n:
> +        raise error.Abort(_("stream ended unexpectedly"
> +                           " (got %d bytes, expected %d)")
> +                          % (len(s), n))
> +    return s
Gregory Szorc - Jan. 20, 2018, 8:22 p.m.
On Fri, Jan 19, 2018 at 3:47 PM, Boris Feld <boris.feld@octobus.net> wrote:

> # HG changeset patch
> # User Boris Feld <boris.feld@octobus.net>
> # Date 1516391495 -3600
> #      Fri Jan 19 20:51:35 2018 +0100
> # Node ID 15f7795f96a5f9acb3ed2e640fcec82f3ccd6f53
> # Parent  de32acb24949c0e3633de373d1c6c8c814faa804
> # EXP-Topic b2-stream
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r
> 15f7795f96a5
> util: move 'readexactly' in the util module
>

I was swamped the last few days and didn't have time to look at this series
even though I really wanted to. I reviewed it this morning and am generally
happy with it. However, I'd like to make some BC changes to the wire
protocol so we don't ship things in the wire protocol that we'll need to
support indefinitely. A number of other changes to make this a more useful
feature can be done post RC, during freeze.

*High priority issues* (should be addressed before RC)

HTTP server is sending bundle2 response with compression. That's a
significant reason why this format is slower than the old format. I also
think bundle2 overhead is coming into play on the server. I need to profile
once compression is removed from the equation. I'm pretty sure this change
can be done during freeze since it requires the server to drop compression
for this bundle2 response. But I'm not 100% about that given how
compression is negotiated between client and server. Would prefer this
change make it in before RC.

The new "stream" bundle2 part has a "version" parameter. I don't like
versioning bundle2 parts. I think the behavior of bundle2 parts should be
consistent over time. If we need to introduce a new version of something or
change processing semantics, we should be introducing a new bundle2 part
and associate new/different behavior with the part. Attempting content
negotiation within a named bundle2 part is extra complication and will lead
to increased protocol chaos down the road. This should be changed before RC
since it impacts the wire protocol.

Requirements string in bundle2 part should ideally use the same pre-encoded
normalization as bundlespecs for consistency with the rest of Mercurial.
Since it touches the wire protocol, this needs to change before RC.

*Medium priority issues* (ideally addressed before RC)

I don't think there's an easy way to generate bundle2 bundles with stream
parts. We need this to support clonebundles. This will likely require
bundlespec changes. I would highly prefer to get this in before RC so we
can do end-to-end integration testing with clonebundles.

*Lesser priority issues* (can be fixed post RC)

streamclone._emit() should be renamed to _emitv2() for consistency with
other functions in file.

I think streamclone.applybundlev2() should handle writing requirements, not
the bundle2 part handler. Similar logic is already in
streamclone.maybeperformlegacystreamclone().

I need to look at this in more detail, but 133a678673cb ("allow bundle'2
stream clone") is a bit suspect. It's correctness may depend on your
interpretation of whether server.disablefullbundle applies only to
changegroup bundles. I can't recall what existing behavior is with regards
to stream clone. Need to double check.

test-clone-uncompressed.t wants a comment about legacy stream clone phases
being buggy. We should also decide what we want to do about issue #5648
since bundle2 stream clone solves the problem. I'm tempted to mark #5648 as
resolved once bundle2 stream clone is enabled by default.

In 56c30b31afbe:
@@ -1465,6 +1465,7 @@ def _pullbundle2(pullop):
         kwargs['cg'] = False
         kwargs['stream'] = True
         pullop.stepsdone.add('changegroup')
+        pullop.stepsdone.add('phases')

Do we know for sure that phases were processed? It depends on what the
server sent, no? I think this code may assume the behavior of the official
Mercurial server and may not be compatible with other implementation.

The use of temporary files for non-append-only files and the context
manager and copy primitive associated with that code feels overly complex.
Can we just slurp file content into memory? We already store the filenames
and offsets in memory. We already scale memory use linearly with number of
files in the repo. I don't see a huge problem doing the same for files that
are proportional to number of changesets, branches, tags, and heads. That
being said, the code is already landed. Some maybe we just keep it around.
If we do keep it around, I think it could use some refactoring to improve
readability. The amount of functions being called is a bit much for what
the code is doing, IMO.

The code around managing multiple vfs instances feels a bit complex. I
don't think we need a dict of vfs instances. We only have 2. Let's inline
their usage and do away with all the context manager and dictionary lookup
complications.

Related to vfs, we're not using the "expectedcount" argument with
vfs.backgroundclose(). Ideally, the part header defines the number of files
in each vfs so we can pass the proper value. But, I think it's fine to use
the total expected file count as the hint to the store vfs and don't use
background close for the cache vfs (since it will almost certainly never
have enough entries to warrant background closing).

I'm pretty happy with the feature. It is currently behind an experimental
config flag. I think we can make it enabled by default in 4.5 if we fix the
more important issues.

*Other observations *

The new format seems to use less CPU on the client than the old format when
doing HTTP exchange. 30-31s versus 25-26s for mozilla-unified. And that's
with zstd in play. I haven't profiled, but this is likely due to not using
readline() and transferring caches and phases data (I believe we avoid a
linear changelog scan post transfer now since I believe something before
needed to reconstruct the branch cache).

Here are some example numbers on my i7-6700K.

legacy http
server:
8.44user 3.51system 0:51.21elapsed 23%CPU (0avgtext+0avgdata
171184maxresident)k
1144inputs+0outputs (1major+73642minor)pagefaults 0swaps

client:
30.79user 8.47system 0:42.47elapsed 92%CPU (0avgtext+0avgdata
197244maxresident)k
19288inputs+7039984outputs (2major+94838minor)pagefaults 0swaps

v2 http
server:
37.69user 8.01system 1:56.61elapsed 39%CPU (0avgtext+0avgdata
236808maxresident)k
6836840inputs+21288outputs (2major+82612minor)pagefaults 0swaps

client:
25.82user 8.32system 1:47.83elapsed 31%CPU (0avgtext+0avgdata
209164maxresident)k
60848inputs+7061216outputs (2major+78863minor)pagefaults 0swaps


>
> This function is used in multiple place, having it in util would be better.
> (existing caller will be migrated in another series)
>
> diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py
> --- a/mercurial/changegroup.py
> +++ b/mercurial/changegroup.py
> @@ -32,14 +32,7 @@ from . import (
>  _CHANGEGROUPV2_DELTA_HEADER = "20s20s20s20s20s"
>  _CHANGEGROUPV3_DELTA_HEADER = ">20s20s20s20s20sH"
>
> -def readexactly(stream, n):
> -    '''read n bytes from stream.read and abort if less was available'''
> -    s = stream.read(n)
> -    if len(s) < n:
> -        raise error.Abort(_("stream ended unexpectedly"
> -                           " (got %d bytes, expected %d)")
> -                          % (len(s), n))
> -    return s
> +readexactly = util.readexactly
>
>  def getchunk(stream):
>      """return the next chunk from stream as a string"""
> diff --git a/mercurial/util.py b/mercurial/util.py
> --- a/mercurial/util.py
> +++ b/mercurial/util.py
> @@ -3865,3 +3865,12 @@ def safename(f, tag, ctx, others=None):
>          fn = '%s~%s~%s' % (f, tag, n)
>          if fn not in ctx and fn not in others:
>              return fn
> +
> +def readexactly(stream, n):
> +    '''read n bytes from stream.read and abort if less was available'''
> +    s = stream.read(n)
> +    if len(s) < n:
> +        raise error.Abort(_("stream ended unexpectedly"
> +                           " (got %d bytes, expected %d)")
> +                          % (len(s), n))
> +    return s
>
Gregory Szorc - Jan. 20, 2018, 9:01 p.m.
On Sat, Jan 20, 2018 at 12:22 PM, Gregory Szorc <gregory.szorc@gmail.com>
wrote:

> On Fri, Jan 19, 2018 at 3:47 PM, Boris Feld <boris.feld@octobus.net>
> wrote:
>
>> # HG changeset patch
>> # User Boris Feld <boris.feld@octobus.net>
>> # Date 1516391495 -3600
>> #      Fri Jan 19 20:51:35 2018 +0100
>> # Node ID 15f7795f96a5f9acb3ed2e640fcec82f3ccd6f53
>> # Parent  de32acb24949c0e3633de373d1c6c8c814faa804
>> # EXP-Topic b2-stream
>> # Available At https://bitbucket.org/octobus/mercurial-devel/
>> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r
>> 15f7795f96a5
>> util: move 'readexactly' in the util module
>>
>
> I was swamped the last few days and didn't have time to look at this
> series even though I really wanted to. I reviewed it this morning and am
> generally happy with it. However, I'd like to make some BC changes to the
> wire protocol so we don't ship things in the wire protocol that we'll need
> to support indefinitely. A number of other changes to make this a more
> useful feature can be done post RC, during freeze.
>
> *High priority issues* (should be addressed before RC)
>
> HTTP server is sending bundle2 response with compression. That's a
> significant reason why this format is slower than the old format. I also
> think bundle2 overhead is coming into play on the server. I need to profile
> once compression is removed from the equation. I'm pretty sure this change
> can be done during freeze since it requires the server to drop compression
> for this bundle2 response. But I'm not 100% about that given how
> compression is negotiated between client and server. Would prefer this
> change make it in before RC.
>
> The new "stream" bundle2 part has a "version" parameter. I don't like
> versioning bundle2 parts. I think the behavior of bundle2 parts should be
> consistent over time. If we need to introduce a new version of something or
> change processing semantics, we should be introducing a new bundle2 part
> and associate new/different behavior with the part. Attempting content
> negotiation within a named bundle2 part is extra complication and will lead
> to increased protocol chaos down the road. This should be changed before RC
> since it impacts the wire protocol.
>

First, I should add that I'm working on patches to fix at least the
higher-priority issues. I should have them sent in an hour or two.

Related to the bundle2 part and capabilities exchange, the existing code
isn't future compatibility because the client doesn't currently advertise
its support for stream versions to the server unless a config option is
set. Right now, it must be implied that if a client sends stream=1 to
getbundle without sending a client capability, they support at most the
"stream" part with "version=v2". The client should always advertise its
part/version support so a future server knows which version to send back to
the client. It isn't critical this be addressed before RC. But leaving it
as is makes the protocol less explicit and more difficult to reason about.
I'll attempt a patch.


>
> Requirements string in bundle2 part should ideally use the same
> pre-encoded normalization as bundlespecs for consistency with the rest of
> Mercurial. Since it touches the wire protocol, this needs to change before
> RC.
>
> *Medium priority issues* (ideally addressed before RC)
>
> I don't think there's an easy way to generate bundle2 bundles with stream
> parts. We need this to support clonebundles. This will likely require
> bundlespec changes. I would highly prefer to get this in before RC so we
> can do end-to-end integration testing with clonebundles.
>
> *Lesser priority issues* (can be fixed post RC)
>
> streamclone._emit() should be renamed to _emitv2() for consistency with
> other functions in file.
>
> I think streamclone.applybundlev2() should handle writing requirements,
> not the bundle2 part handler. Similar logic is already in streamclone.
> maybeperformlegacystreamclone().
>
> I need to look at this in more detail, but 133a678673cb ("allow bundle'2
> stream clone") is a bit suspect. It's correctness may depend on your
> interpretation of whether server.disablefullbundle applies only to
> changegroup bundles. I can't recall what existing behavior is with regards
> to stream clone. Need to double check.
>
> test-clone-uncompressed.t wants a comment about legacy stream clone phases
> being buggy. We should also decide what we want to do about issue #5648
> since bundle2 stream clone solves the problem. I'm tempted to mark #5648 as
> resolved once bundle2 stream clone is enabled by default.
>
> In 56c30b31afbe:
> @@ -1465,6 +1465,7 @@ def _pullbundle2(pullop):
>          kwargs['cg'] = False
>          kwargs['stream'] = True
>          pullop.stepsdone.add('changegroup')
> +        pullop.stepsdone.add('phases')
>
> Do we know for sure that phases were processed? It depends on what the
> server sent, no? I think this code may assume the behavior of the official
> Mercurial server and may not be compatible with other implementation.
>
> The use of temporary files for non-append-only files and the context
> manager and copy primitive associated with that code feels overly complex.
> Can we just slurp file content into memory? We already store the filenames
> and offsets in memory. We already scale memory use linearly with number of
> files in the repo. I don't see a huge problem doing the same for files that
> are proportional to number of changesets, branches, tags, and heads. That
> being said, the code is already landed. Some maybe we just keep it around.
> If we do keep it around, I think it could use some refactoring to improve
> readability. The amount of functions being called is a bit much for what
> the code is doing, IMO.
>
> The code around managing multiple vfs instances feels a bit complex. I
> don't think we need a dict of vfs instances. We only have 2. Let's inline
> their usage and do away with all the context manager and dictionary lookup
> complications.
>
> Related to vfs, we're not using the "expectedcount" argument with
> vfs.backgroundclose(). Ideally, the part header defines the number of files
> in each vfs so we can pass the proper value. But, I think it's fine to use
> the total expected file count as the hint to the store vfs and don't use
> background close for the cache vfs (since it will almost certainly never
> have enough entries to warrant background closing).
>
> I'm pretty happy with the feature. It is currently behind an experimental
> config flag. I think we can make it enabled by default in 4.5 if we fix the
> more important issues.
>
> *Other observations *
>
> The new format seems to use less CPU on the client than the old format
> when doing HTTP exchange. 30-31s versus 25-26s for mozilla-unified. And
> that's with zstd in play. I haven't profiled, but this is likely due to not
> using readline() and transferring caches and phases data (I believe we
> avoid a linear changelog scan post transfer now since I believe something
> before needed to reconstruct the branch cache).
>
> Here are some example numbers on my i7-6700K.
>
> legacy http
> server:
> 8.44user 3.51system 0:51.21elapsed 23%CPU (0avgtext+0avgdata
> 171184maxresident)k
> 1144inputs+0outputs (1major+73642minor)pagefaults 0swaps
>
> client:
> 30.79user 8.47system 0:42.47elapsed 92%CPU (0avgtext+0avgdata
> 197244maxresident)k
> 19288inputs+7039984outputs (2major+94838minor)pagefaults 0swaps
>
> v2 http
> server:
> 37.69user 8.01system 1:56.61elapsed 39%CPU (0avgtext+0avgdata
> 236808maxresident)k
> 6836840inputs+21288outputs (2major+82612minor)pagefaults 0swaps
>
> client:
> 25.82user 8.32system 1:47.83elapsed 31%CPU (0avgtext+0avgdata
> 209164maxresident)k
> 60848inputs+7061216outputs (2major+78863minor)pagefaults 0swaps
>
>
>>
>> This function is used in multiple place, having it in util would be
>> better.
>> (existing caller will be migrated in another series)
>>
>> diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py
>> --- a/mercurial/changegroup.py
>> +++ b/mercurial/changegroup.py
>> @@ -32,14 +32,7 @@ from . import (
>>  _CHANGEGROUPV2_DELTA_HEADER = "20s20s20s20s20s"
>>  _CHANGEGROUPV3_DELTA_HEADER = ">20s20s20s20s20sH"
>>
>> -def readexactly(stream, n):
>> -    '''read n bytes from stream.read and abort if less was available'''
>> -    s = stream.read(n)
>> -    if len(s) < n:
>> -        raise error.Abort(_("stream ended unexpectedly"
>> -                           " (got %d bytes, expected %d)")
>> -                          % (len(s), n))
>> -    return s
>> +readexactly = util.readexactly
>>
>>  def getchunk(stream):
>>      """return the next chunk from stream as a string"""
>> diff --git a/mercurial/util.py b/mercurial/util.py
>> --- a/mercurial/util.py
>> +++ b/mercurial/util.py
>> @@ -3865,3 +3865,12 @@ def safename(f, tag, ctx, others=None):
>>          fn = '%s~%s~%s' % (f, tag, n)
>>          if fn not in ctx and fn not in others:
>>              return fn
>> +
>> +def readexactly(stream, n):
>> +    '''read n bytes from stream.read and abort if less was available'''
>> +    s = stream.read(n)
>> +    if len(s) < n:
>> +        raise error.Abort(_("stream ended unexpectedly"
>> +                           " (got %d bytes, expected %d)")
>> +                          % (len(s), n))
>> +    return s
>>
>
>
Augie Fackler - Jan. 20, 2018, 11:48 p.m.
> On Jan 20, 2018, at 4:01 PM, Gregory Szorc <gregory.szorc@gmail.com> wrote:
> 
> On Sat, Jan 20, 2018 at 12:22 PM, Gregory Szorc <gregory.szorc@gmail.com <mailto:gregory.szorc@gmail.com>> wrote:
> On Fri, Jan 19, 2018 at 3:47 PM, Boris Feld <boris.feld@octobus.net <mailto:boris.feld@octobus.net>> wrote:
> # HG changeset patch
> # User Boris Feld <boris.feld@octobus.net <mailto:boris.feld@octobus.net>>
> # Date 1516391495 -3600
> #      Fri Jan 19 20:51:35 2018 +0100
> # Node ID 15f7795f96a5f9acb3ed2e640fcec82f3ccd6f53
> # Parent  de32acb24949c0e3633de373d1c6c8c814faa804
> # EXP-Topic b2-stream
> # Available At https://bitbucket.org/octobus/mercurial-devel/ <https://bitbucket.org/octobus/mercurial-devel/>
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ <https://bitbucket.org/octobus/mercurial-devel/> -r 15f7795f96a5
> util: move 'readexactly' in the util module
> 
> I was swamped the last few days and didn't have time to look at this series even though I really wanted to. I reviewed it this morning and am generally happy with it. However, I'd like to make some BC changes to the wire protocol so we don't ship things in the wire protocol that we'll need to support indefinitely. A number of other changes to make this a more useful feature can be done post RC, during freeze.
> 
> *High priority issues* (should be addressed before RC)
> 
> HTTP server is sending bundle2 response with compression. That's a significant reason why this format is slower than the old format. I also think bundle2 overhead is coming into play on the server. I need to profile once compression is removed from the equation. I'm pretty sure this change can be done during freeze since it requires the server to drop compression for this bundle2 response. But I'm not 100% about that given how compression is negotiated between client and server. Would prefer this change make it in before RC.
> 
> The new "stream" bundle2 part has a "version" parameter. I don't like versioning bundle2 parts. I think the behavior of bundle2 parts should be consistent over time. If we need to introduce a new version of something or change processing semantics, we should be introducing a new bundle2 part and associate new/different behavior with the part. Attempting content negotiation within a named bundle2 part is extra complication and will lead to increased protocol chaos down the road. This should be changed before RC since it impacts the wire protocol.
> 
> First, I should add that I'm working on patches to fix at least the higher-priority issues. I should have them sent in an hour or two.

I don’t think we need to worry /too/ hard about this, as it’s still an experimental feature and subject to change. I agree with your points tho, so if you can find time to get them done that’d be great.

(Maybe I’m being too laid back about this. It wouldn’t be the first time.)

> 
> Related to the bundle2 part and capabilities exchange, the existing code isn't future compatibility because the client doesn't currently advertise its support for stream versions to the server unless a config option is set. Right now, it must be implied that if a client sends stream=1 to getbundle without sending a client capability, they support at most the "stream" part with "version=v2". The client should always advertise its part/version support so a future server knows which version to send back to the client. It isn't critical this be addressed before RC. But leaving it as is makes the protocol less explicit and more difficult to reason about. I'll attempt a patch.
>  
> 
> Requirements string in bundle2 part should ideally use the same pre-encoded normalization as bundlespecs for consistency with the rest of Mercurial. Since it touches the wire protocol, this needs to change before RC.
> 
> *Medium priority issues* (ideally addressed before RC)
> 
> I don't think there's an easy way to generate bundle2 bundles with stream parts. We need this to support clonebundles. This will likely require bundlespec changes. I would highly prefer to get this in before RC so we can do end-to-end integration testing with clonebundles.
> 
> *Lesser priority issues* (can be fixed post RC)
> 
> streamclone._emit() should be renamed to _emitv2() for consistency with other functions in file.
> 
> I think streamclone.applybundlev2() should handle writing requirements, not the bundle2 part handler. Similar logic is already in streamclone.maybeperformlegacystreamclone().
> 
> I need to look at this in more detail, but 133a678673cb ("allow bundle'2 stream clone") is a bit suspect. It's correctness may depend on your interpretation of whether server.disablefullbundle applies only to changegroup bundles. I can't recall what existing behavior is with regards to stream clone. Need to double check.
> 
> test-clone-uncompressed.t wants a comment about legacy stream clone phases being buggy. We should also decide what we want to do about issue #5648 since bundle2 stream clone solves the problem. I'm tempted to mark #5648 as resolved once bundle2 stream clone is enabled by default.
> 
> In 56c30b31afbe:
> @@ -1465,6 +1465,7 @@ def _pullbundle2(pullop):
>          kwargs['cg'] = False
>          kwargs['stream'] = True
>          pullop.stepsdone.add('changegroup')
> +        pullop.stepsdone.add('phases')
> 
> Do we know for sure that phases were processed? It depends on what the server sent, no? I think this code may assume the behavior of the official Mercurial server and may not be compatible with other implementation.
> 
> The use of temporary files for non-append-only files and the context manager and copy primitive associated with that code feels overly complex. Can we just slurp file content into memory? We already store the filenames and offsets in memory. We already scale memory use linearly with number of files in the repo. I don't see a huge problem doing the same for files that are proportional to number of changesets, branches, tags, and heads. That being said, the code is already landed. Some maybe we just keep it around. If we do keep it around, I think it could use some refactoring to improve readability. The amount of functions being called is a bit much for what the code is doing, IMO.
> 
> The code around managing multiple vfs instances feels a bit complex. I don't think we need a dict of vfs instances. We only have 2. Let's inline their usage and do away with all the context manager and dictionary lookup complications.
> 
> Related to vfs, we're not using the "expectedcount" argument with vfs.backgroundclose(). Ideally, the part header defines the number of files in each vfs so we can pass the proper value. But, I think it's fine to use the total expected file count as the hint to the store vfs and don't use background close for the cache vfs (since it will almost certainly never have enough entries to warrant background closing).
> 
> I'm pretty happy with the feature. It is currently behind an experimental config flag. I think we can make it enabled by default in 4.5 if we fix the more important issues.
> 
> *Other observations *
> 
> The new format seems to use less CPU on the client than the old format when doing HTTP exchange. 30-31s versus 25-26s for mozilla-unified. And that's with zstd in play. I haven't profiled, but this is likely due to not using readline() and transferring caches and phases data (I believe we avoid a linear changelog scan post transfer now since I believe something before needed to reconstruct the branch cache).
> 
> Here are some example numbers on my i7-6700K.
> 
> legacy http
> server:
> 8.44user 3.51system 0:51.21elapsed 23%CPU (0avgtext+0avgdata 171184maxresident)k
> 1144inputs+0outputs (1major+73642minor)pagefaults 0swaps
> 
> client:
> 30.79user 8.47system 0:42.47elapsed 92%CPU (0avgtext+0avgdata 197244maxresident)k
> 19288inputs+7039984outputs (2major+94838minor)pagefaults 0swaps
> 
> v2 http
> server:
> 37.69user 8.01system 1:56.61elapsed 39%CPU (0avgtext+0avgdata 236808maxresident)k
> 6836840inputs+21288outputs (2major+82612minor)pagefaults 0swaps
> 
> client:
> 25.82user 8.32system 1:47.83elapsed 31%CPU (0avgtext+0avgdata 209164maxresident)k
> 60848inputs+7061216outputs (2major+78863minor)pagefaults 0swaps
>  
> 
> This function is used in multiple place, having it in util would be better.
> (existing caller will be migrated in another series)
> 
> diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py
> --- a/mercurial/changegroup.py
> +++ b/mercurial/changegroup.py
> @@ -32,14 +32,7 @@ from . import (
>  _CHANGEGROUPV2_DELTA_HEADER = "20s20s20s20s20s"
>  _CHANGEGROUPV3_DELTA_HEADER = ">20s20s20s20s20sH"
> 
> -def readexactly(stream, n):
> -    '''read n bytes from stream.read and abort if less was available'''
> -    s = stream.read(n)
> -    if len(s) < n:
> -        raise error.Abort(_("stream ended unexpectedly"
> -                           " (got %d bytes, expected %d)")
> -                          % (len(s), n))
> -    return s
> +readexactly = util.readexactly
> 
>  def getchunk(stream):
>      """return the next chunk from stream as a string"""
> diff --git a/mercurial/util.py b/mercurial/util.py
> --- a/mercurial/util.py
> +++ b/mercurial/util.py
> @@ -3865,3 +3865,12 @@ def safename(f, tag, ctx, others=None):
>          fn = '%s~%s~%s' % (f, tag, n)
>          if fn not in ctx and fn not in others:
>              return fn
> +
> +def readexactly(stream, n):
> +    '''read n bytes from stream.read and abort if less was available'''
> +    s = stream.read(n)
> +    if len(s) < n:
> +        raise error.Abort(_("stream ended unexpectedly"
> +                           " (got %d bytes, expected %d)")
> +                          % (len(s), n))
> +    return s
Gregory Szorc - Jan. 21, 2018, 12:37 a.m.
On Sat, Jan 20, 2018 at 3:48 PM, Augie Fackler <raf@durin42.com> wrote:

>
> On Jan 20, 2018, at 4:01 PM, Gregory Szorc <gregory.szorc@gmail.com>
> wrote:
>
> On Sat, Jan 20, 2018 at 12:22 PM, Gregory Szorc <gregory.szorc@gmail.com>
> wrote:
>
>> On Fri, Jan 19, 2018 at 3:47 PM, Boris Feld <boris.feld@octobus.net>
>> wrote:
>>
>>> # HG changeset patch
>>> # User Boris Feld <boris.feld@octobus.net>
>>> # Date 1516391495 -3600
>>> #      Fri Jan 19 20:51:35 2018 +0100
>>> # Node ID 15f7795f96a5f9acb3ed2e640fcec82f3ccd6f53
>>> # Parent  de32acb24949c0e3633de373d1c6c8c814faa804
>>> # EXP-Topic b2-stream
>>> # Available At https://bitbucket.org/octobus/mercurial-devel/
>>> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r
>>> 15f7795f96a5
>>> util: move 'readexactly' in the util module
>>>
>>
>> I was swamped the last few days and didn't have time to look at this
>> series even though I really wanted to. I reviewed it this morning and am
>> generally happy with it. However, I'd like to make some BC changes to the
>> wire protocol so we don't ship things in the wire protocol that we'll need
>> to support indefinitely. A number of other changes to make this a more
>> useful feature can be done post RC, during freeze.
>>
>> *High priority issues* (should be addressed before RC)
>>
>> HTTP server is sending bundle2 response with compression. That's a
>> significant reason why this format is slower than the old format. I also
>> think bundle2 overhead is coming into play on the server. I need to profile
>> once compression is removed from the equation. I'm pretty sure this change
>> can be done during freeze since it requires the server to drop compression
>> for this bundle2 response. But I'm not 100% about that given how
>> compression is negotiated between client and server. Would prefer this
>> change make it in before RC.
>>
>> The new "stream" bundle2 part has a "version" parameter. I don't like
>> versioning bundle2 parts. I think the behavior of bundle2 parts should be
>> consistent over time. If we need to introduce a new version of something or
>> change processing semantics, we should be introducing a new bundle2 part
>> and associate new/different behavior with the part. Attempting content
>> negotiation within a named bundle2 part is extra complication and will lead
>> to increased protocol chaos down the road. This should be changed before RC
>> since it impacts the wire protocol.
>>
>
> First, I should add that I'm working on patches to fix at least the
> higher-priority issues. I should have them sent in an hour or two.
>
>
> I don’t think we need to worry /too/ hard about this, as it’s still an
> experimental feature and subject to change. I agree with your points tho,
> so if you can find time to get them done that’d be great.
>
> (Maybe I’m being too laid back about this. It wouldn’t be the first time.)
>

Once the wire protocol is out of the bag, you can't get it back. Our
choices are either to fix things now, or change the feature so the client
can't do a bundle2 stream clone without enabling the feature. And then
we'll want to make an intentional BC break that prevents old client/server
from talking to new client/server, otherwise we're in a bit of a pickle if
we make a BC change.

Right now the feature is controlled by servers. So if a server enables it,
clients will receive the data. That's arguably wrong...


>
>
> Related to the bundle2 part and capabilities exchange, the existing code
> isn't future compatibility because the client doesn't currently advertise
> its support for stream versions to the server unless a config option is
> set. Right now, it must be implied that if a client sends stream=1 to
> getbundle without sending a client capability, they support at most the
> "stream" part with "version=v2". The client should always advertise its
> part/version support so a future server knows which version to send back to
> the client. It isn't critical this be addressed before RC. But leaving it
> as is makes the protocol less explicit and more difficult to reason about.
> I'll attempt a patch.
>
>
>>
>> Requirements string in bundle2 part should ideally use the same
>> pre-encoded normalization as bundlespecs for consistency with the rest of
>> Mercurial. Since it touches the wire protocol, this needs to change before
>> RC.
>>
>> *Medium priority issues* (ideally addressed before RC)
>>
>> I don't think there's an easy way to generate bundle2 bundles with stream
>> parts. We need this to support clonebundles. This will likely require
>> bundlespec changes. I would highly prefer to get this in before RC so we
>> can do end-to-end integration testing with clonebundles.
>>
>> *Lesser priority issues* (can be fixed post RC)
>>
>> streamclone._emit() should be renamed to _emitv2() for consistency with
>> other functions in file.
>>
>> I think streamclone.applybundlev2() should handle writing requirements,
>> not the bundle2 part handler. Similar logic is already in
>> streamclone.maybeperformlegacystreamclone().
>>
>> I need to look at this in more detail, but 133a678673cb ("allow bundle'2
>> stream clone") is a bit suspect. It's correctness may depend on your
>> interpretation of whether server.disablefullbundle applies only to
>> changegroup bundles. I can't recall what existing behavior is with regards
>> to stream clone. Need to double check.
>>
>> test-clone-uncompressed.t wants a comment about legacy stream clone
>> phases being buggy. We should also decide what we want to do about issue
>> #5648 since bundle2 stream clone solves the problem. I'm tempted to mark
>> #5648 as resolved once bundle2 stream clone is enabled by default.
>>
>> In 56c30b31afbe:
>> @@ -1465,6 +1465,7 @@ def _pullbundle2(pullop):
>>          kwargs['cg'] = False
>>          kwargs['stream'] = True
>>          pullop.stepsdone.add('changegroup')
>> +        pullop.stepsdone.add('phases')
>>
>> Do we know for sure that phases were processed? It depends on what the
>> server sent, no? I think this code may assume the behavior of the official
>> Mercurial server and may not be compatible with other implementation.
>>
>> The use of temporary files for non-append-only files and the context
>> manager and copy primitive associated with that code feels overly complex.
>> Can we just slurp file content into memory? We already store the filenames
>> and offsets in memory. We already scale memory use linearly with number of
>> files in the repo. I don't see a huge problem doing the same for files that
>> are proportional to number of changesets, branches, tags, and heads. That
>> being said, the code is already landed. Some maybe we just keep it around.
>> If we do keep it around, I think it could use some refactoring to improve
>> readability. The amount of functions being called is a bit much for what
>> the code is doing, IMO.
>>
>> The code around managing multiple vfs instances feels a bit complex. I
>> don't think we need a dict of vfs instances. We only have 2. Let's inline
>> their usage and do away with all the context manager and dictionary lookup
>> complications.
>>
>> Related to vfs, we're not using the "expectedcount" argument with
>> vfs.backgroundclose(). Ideally, the part header defines the number of files
>> in each vfs so we can pass the proper value. But, I think it's fine to use
>> the total expected file count as the hint to the store vfs and don't use
>> background close for the cache vfs (since it will almost certainly never
>> have enough entries to warrant background closing).
>>
>> I'm pretty happy with the feature. It is currently behind an experimental
>> config flag. I think we can make it enabled by default in 4.5 if we fix the
>> more important issues.
>>
>> *Other observations *
>>
>> The new format seems to use less CPU on the client than the old format
>> when doing HTTP exchange. 30-31s versus 25-26s for mozilla-unified. And
>> that's with zstd in play. I haven't profiled, but this is likely due to not
>> using readline() and transferring caches and phases data (I believe we
>> avoid a linear changelog scan post transfer now since I believe something
>> before needed to reconstruct the branch cache).
>>
>> Here are some example numbers on my i7-6700K.
>>
>> legacy http
>> server:
>> 8.44user 3.51system 0:51.21elapsed 23%CPU (0avgtext+0avgdata
>> 171184maxresident)k
>> 1144inputs+0outputs (1major+73642minor)pagefaults 0swaps
>>
>> client:
>> 30.79user 8.47system 0:42.47elapsed 92%CPU (0avgtext+0avgdata
>> 197244maxresident)k
>> 19288inputs+7039984outputs (2major+94838minor)pagefaults 0swaps
>>
>> v2 http
>> server:
>> 37.69user 8.01system 1:56.61elapsed 39%CPU (0avgtext+0avgdata
>> 236808maxresident)k
>> 6836840inputs+21288outputs (2major+82612minor)pagefaults 0swaps
>>
>> client:
>> 25.82user 8.32system 1:47.83elapsed 31%CPU (0avgtext+0avgdata
>> 209164maxresident)k
>> 60848inputs+7061216outputs (2major+78863minor)pagefaults 0swaps
>>
>>
>>>
>>> This function is used in multiple place, having it in util would be
>>> better.
>>> (existing caller will be migrated in another series)
>>>
>>> diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py
>>> --- a/mercurial/changegroup.py
>>> +++ b/mercurial/changegroup.py
>>> @@ -32,14 +32,7 @@ from . import (
>>>  _CHANGEGROUPV2_DELTA_HEADER = "20s20s20s20s20s"
>>>  _CHANGEGROUPV3_DELTA_HEADER = ">20s20s20s20s20sH"
>>>
>>> -def readexactly(stream, n):
>>> -    '''read n bytes from stream.read and abort if less was available'''
>>> -    s = stream.read(n)
>>> -    if len(s) < n:
>>> -        raise error.Abort(_("stream ended unexpectedly"
>>> -                           " (got %d bytes, expected %d)")
>>> -                          % (len(s), n))
>>> -    return s
>>> +readexactly = util.readexactly
>>>
>>>  def getchunk(stream):
>>>      """return the next chunk from stream as a string"""
>>> diff --git a/mercurial/util.py b/mercurial/util.py
>>> --- a/mercurial/util.py
>>> +++ b/mercurial/util.py
>>> @@ -3865,3 +3865,12 @@ def safename(f, tag, ctx, others=None):
>>>          fn = '%s~%s~%s' % (f, tag, n)
>>>          if fn not in ctx and fn not in others:
>>>              return fn
>>> +
>>> +def readexactly(stream, n):
>>> +    '''read n bytes from stream.read and abort if less was available'''
>>> +    s = stream.read(n)
>>> +    if len(s) < n:
>>> +        raise error.Abort(_("stream ended unexpectedly"
>>> +                           " (got %d bytes, expected %d)")
>>> +                          % (len(s), n))
>>> +    return s
>>>
>>
>
Augie Fackler - Feb. 1, 2018, 8:06 p.m.
> On Jan 31, 2018, at 11:57, Boris Feld <boris.feld@octobus.net> wrote:
> 
> On Tue, 2018-01-30 at 17:02 -0500, Augie Fackler wrote:
>> 
>> 
>>> On Jan 30, 2018, at 17:00, Boris Feld <boris.feld@octobus.net> wrote:
>>> After chatting on IRC with Gregory about this, here is a detailed list of proposals:
>>> 
>>> - The `X-v2` bundlespec should be stable in time. An old Mercurial client shouldn't choke on a bundle 2 if there is a new unknown part. Adding the phases and obsolescence parts using experimental config knobs violate this guarantee.
>>> => Proposal: Bundle announced with `X-v2` bundlespec will always be bundle 2 without phases, bookmarks or obsolescence parts. `hg bundle --type=v2` would generate such bundles.
>>> 
>>> - With the recent addition of new bundle 2 parts, we should introduce a new bundle-spec which will imply that the bundle contains these recent parts.
>>> => Proposal: Introduce a new bundlespec format `X-v3`. It will be for bundle 2 containing a changegroup part in addition to a bookmarks, phase, and obsolescence parts. It will be generated with `hg bundle --type=v3`.
>>> => Proposal: As an edge-case, people may want to generate a v3 bundle without a specific part. A variant could be introduced, like `zstd-v3;no-bookmarks`.
>>> 
>>> - The old stream clone bundle use the `none-packed1` bundlespec. Instead of using a dedicated version, we should use the same bundle2 versions; v2 and v3. Also, it uses a dedicated debug command, we should instead use the bundle command which is the best-suited to handle that task.
>>> => Proposal: Introduce a new bundlespec format `none-v2;stream=v2;requirements..` for the stream clone v2. This will guarantee a stream part but the bundle will not contain the bookmarks, phase and obsolescence parts. It will be generated with `hg bundle --type=none-v2;stream=v2`.
>>> => Proposal: Introduce a new bundlespec format `none-v3;stream=v2;requirements..` for the stream clone v2. This will guarantee a stream part in addition to a bookmarks, phase, and obsolescence parts. It will be generated with `hg bundle --type=none-v3;stream=v2`.
>>> 
>>> Carrying these guarantees in the bundlespec will likely require some changes in the `parsebundlespec` API as it would likely need to return the `contentops` based on the bundlespec instead of computing the contentops in the bundle command.
>>> 
>>> Refactoring the bundlespec parsing for the `none-v2;stream=v2;` could be done before agreeing on the level of guarantees the `X-v3` bundlespec gives, so a possible two-steps plan is:
>>> 
>>> - Refactor and add the possibility to generate stream v2 clone bundle with the `none-v2;stream=v2;` bundlespec and the bundle command.
>>> - Introduce the `X-v3` bundlespecs.
>>> 
>>> I think we should be able to target the 4.5 release for the first step, I can have a series ready for tomorrow.
>> 
>> Are we really proposing significant feature work for the release that happens in 48 hours? After the RC?
>> 
>> I can't say I'm enthused, but I could be convinced if Greg is on board.
> 
> I can understand your frustration, we tried to process each of the comments made by Gregory on the original series. His feedback has been precious to improve the original series, and we tried to address or answer each comment as soon as possible. The bundlespec refactoring surely slowed the process a bit but it is a good improvement to make the clone bundle feature less hidden and more mainstream.

I'm glad we got this sorted with minimal effort for me, but this kind of last-minute can't-miss change is exactly the kind of thing I was afraid of when we first landed this work.

Patch

diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py
--- a/mercurial/changegroup.py
+++ b/mercurial/changegroup.py
@@ -32,14 +32,7 @@  from . import (
 _CHANGEGROUPV2_DELTA_HEADER = "20s20s20s20s20s"
 _CHANGEGROUPV3_DELTA_HEADER = ">20s20s20s20s20sH"
 
-def readexactly(stream, n):
-    '''read n bytes from stream.read and abort if less was available'''
-    s = stream.read(n)
-    if len(s) < n:
-        raise error.Abort(_("stream ended unexpectedly"
-                           " (got %d bytes, expected %d)")
-                          % (len(s), n))
-    return s
+readexactly = util.readexactly
 
 def getchunk(stream):
     """return the next chunk from stream as a string"""
diff --git a/mercurial/util.py b/mercurial/util.py
--- a/mercurial/util.py
+++ b/mercurial/util.py
@@ -3865,3 +3865,12 @@  def safename(f, tag, ctx, others=None):
         fn = '%s~%s~%s' % (f, tag, n)
         if fn not in ctx and fn not in others:
             return fn
+
+def readexactly(stream, n):
+    '''read n bytes from stream.read and abort if less was available'''
+    s = stream.read(n)
+    if len(s) < n:
+        raise error.Abort(_("stream ended unexpectedly"
+                           " (got %d bytes, expected %d)")
+                          % (len(s), n))
+    return s