Patchwork [3,of,3] listkeypattern: add listkeypattern wireproto method

login
register
mail settings
Submitter Stanislau Hlebik
Date Aug. 12, 2016, 12:09 p.m.
Message ID <c2ee493e216c60ff439a.1471003750@dev1918.lla1.facebook.com>
Download mbox | patch
Permalink /patch/16261/
State Changes Requested
Headers show

Comments

Stanislau Hlebik - Aug. 12, 2016, 12:09 p.m.
# HG changeset patch
# User Stanislau Hlebik <stash@fb.com>
# Date 1470999441 25200
#      Fri Aug 12 03:57:21 2016 -0700
# Node ID c2ee493e216c60ff439ab93cc1efe6ac5922d8eb
# Parent  fd2185d7c2f7aa529b2ad0a6584832fb2b1b4ecb
listkeypattern: add listkeypattern wireproto method

wireproto method to list remote keys by pattern
Gregory Szorc - Aug. 14, 2016, 5:17 p.m.
On Fri, Aug 12, 2016 at 5:09 AM, Stanislau Hlebik <stash@fb.com> wrote:

> # HG changeset patch
> # User Stanislau Hlebik <stash@fb.com>
> # Date 1470999441 25200
> #      Fri Aug 12 03:57:21 2016 -0700
> # Node ID c2ee493e216c60ff439ab93cc1efe6ac5922d8eb
> # Parent  fd2185d7c2f7aa529b2ad0a6584832fb2b1b4ecb
> listkeypattern: add listkeypattern wireproto method
>
> wireproto method to list remote keys by pattern
>
> diff --git a/mercurial/wireproto.py b/mercurial/wireproto.py
> --- a/mercurial/wireproto.py
> +++ b/mercurial/wireproto.py
> @@ -353,6 +353,22 @@
>                        % (namespace, len(d)))
>          yield pushkeymod.decodekeys(d)
>
> +    @batchable
> +    def listkeypattern(self, namespace, patterns):
> +        if not self.capable('pushkey'):
> +            yield {}, None
> +        f = future()
> +        self.ui.debug('preparing listkeys for "%s" with pattern "%s"\n' %
> +                      (namespace, patterns))
> +        yield {
> +            'namespace': encoding.fromlocal(namespace),
> +            'patterns': encodelist(patterns)
> +        }, f
> +        d = f.value
> +        self.ui.debug('received listkey for "%s": %i bytes\n'
> +                      % (namespace, len(d)))
> +        yield pushkeymod.decodekeys(d)
> +
>      def stream_out(self):
>          return self._callstream('stream_out')
>
> @@ -676,7 +692,8 @@
>      return repo.opener.tryread('clonebundles.manifest')
>
>  wireprotocaps = ['lookup', 'changegroupsubset', 'branchmap', 'pushkey',
> -                 'known', 'getbundle', 'unbundlehash', 'batch']
> +                 'known', 'getbundle', 'unbundlehash', 'batch',
> +                 'listkeypattern']
>
>  def _capabilities(repo, proto):
>      """return a list of capabilities for a repo
> @@ -791,6 +808,12 @@
>      d = repo.listkeys(encoding.tolocal(namespace)).items()
>      return pushkeymod.encodekeys(d)
>
> +@wireprotocommand('listkeypattern', 'namespace patterns *')
>

Why the "*" here? "others" is not used in the function implementation.


> +def listkeypattern(repo, proto, namespace, patterns, others):
> +    patterns = decodelist(patterns)
> +    d = repo.listkeys(encoding.tolocal(namespace),
> patterns=patterns).items()
> +    return pushkeymod.encodekeys(d)
> +
>  @wireprotocommand('lookup', 'key')
>  def lookup(repo, proto, key):
>      try:
>

I think introducing a new wire protocol command is the correct way to solve
this problem (as opposed to introducing a new argument on the existing
command).

However, if we're introducing a new wire protocol command for obtaining
pushkey values, I think we should improve deficiencies in the response
encoding rather than propagate its problems.

The "listkeys" response encoding can't transmit the full range of binary
values. This can lead to larger (and slower) responses sizes. For example,
a number of pushkey namespaces exchange lists of nodes. These have to be
represented as hex instead of binary. For pushkey namespaces like phases or
obsolescence that can exchange hundreds or thousands of nodes, the overhead
can add up.

I think the response from a new listkeys command should be using framing to
encode the key names and values so the full range of binary values can be
efficiently transferred. We may also want a special mechanism to represent
a list of nodes, as avoiding the overhead of framing on fixed width values
would be desirable.

Of course, at the point you introduce a new response encoding, we may want
to call the command "listkeys2." If others agree, I can code up an
implementation and you can add the patterns functionality on top of it.
Pierre-Yves David - Aug. 17, 2016, 12:22 a.m.
On 08/14/2016 07:17 PM, Gregory Szorc wrote:
> On Fri, Aug 12, 2016 at 5:09 AM, Stanislau Hlebik <stash@fb.com
> <mailto:stash@fb.com>> wrote:
>
>     # HG changeset patch
>     # User Stanislau Hlebik <stash@fb.com <mailto:stash@fb.com>>
>     # Date 1470999441 25200
>     #      Fri Aug 12 03:57:21 2016 -0700
>     # Node ID c2ee493e216c60ff439ab93cc1efe6ac5922d8eb
>     # Parent  fd2185d7c2f7aa529b2ad0a6584832fb2b1b4ecb
>     listkeypattern: add listkeypattern wireproto method
>
>     wireproto method to list remote keys by pattern
>
>     diff --git a/mercurial/wireproto.py b/mercurial/wireproto.py
>     --- a/mercurial/wireproto.py
>     +++ b/mercurial/wireproto.py
>     @@ -353,6 +353,22 @@
>                            % (namespace, len(d)))
>              yield pushkeymod.decodekeys(d)
>
>     +    @batchable
>     +    def listkeypattern(self, namespace, patterns):
>     +        if not self.capable('pushkey'):
>     +            yield {}, None
>     +        f = future()
>     +        self.ui.debug('preparing listkeys for "%s" with pattern
>     "%s"\n' %
>     +                      (namespace, patterns))
>     +        yield {
>     +            'namespace': encoding.fromlocal(namespace),
>     +            'patterns': encodelist(patterns)
>     +        }, f
>     +        d = f.value
>     +        self.ui.debug('received listkey for "%s": %i bytes\n'
>     +                      % (namespace, len(d)))
>     +        yield pushkeymod.decodekeys(d)
>     +
>          def stream_out(self):
>              return self._callstream('stream_out')
>
>     @@ -676,7 +692,8 @@
>          return repo.opener.tryread('clonebundles.manifest')
>
>      wireprotocaps = ['lookup', 'changegroupsubset', 'branchmap', 'pushkey',
>     -                 'known', 'getbundle', 'unbundlehash', 'batch']
>     +                 'known', 'getbundle', 'unbundlehash', 'batch',
>     +                 'listkeypattern']
>
>      def _capabilities(repo, proto):
>          """return a list of capabilities for a repo
>     @@ -791,6 +808,12 @@
>          d = repo.listkeys(encoding.tolocal(namespace)).items()
>          return pushkeymod.encodekeys(d)
>
>     +@wireprotocommand('listkeypattern', 'namespace patterns *')
>
>
> Why the "*" here? "others" is not used in the function implementation.
>
>
>     +def listkeypattern(repo, proto, namespace, patterns, others):
>     +    patterns = decodelist(patterns)
>     +    d = repo.listkeys(encoding.tolocal(namespace),
>     patterns=patterns).items()
>     +    return pushkeymod.encodekeys(d)
>     +
>      @wireprotocommand('lookup', 'key')
>      def lookup(repo, proto, key):
>          try:
>
>
> I think introducing a new wire protocol command is the correct way to
> solve this problem (as opposed to introducing a new argument on the
> existing command).
>
> However, if we're introducing a new wire protocol command for obtaining
> pushkey values, I think we should improve deficiencies in the response
> encoding rather than propagate its problems.
>
> The "listkeys" response encoding can't transmit the full range of binary
> values. This can lead to larger (and slower) responses sizes. For
> example, a number of pushkey namespaces exchange lists of nodes. These
> have to be represented as hex instead of binary. For pushkey namespaces
> like phases or obsolescence that can exchange hundreds or thousands of
> nodes, the overhead can add up.
>
> I think the response from a new listkeys command should be using framing
> to encode the key names and values so the full range of binary values
> can be efficiently transferred. We may also want a special mechanism to
> represent a list of nodes, as avoiding the overhead of framing on fixed
> width values would be desirable.
>
> Of course, at the point you introduce a new response encoding, we may
> want to call the command "listkeys2." If others agree, I can code up an
> implementation and you can add the patterns functionality on top of it.

Sorry to be a bit late to the discussion.

I don't think we should introduce a new wire-protocol command for this.

Individual listkey call have been a large source of race condition and 
related issue. Bundle2 is able to carry listkey.pushkey call just fine 
and I think we should prioritize its usage. As bundle2 already have 
framing, we could just use your better encoding with bundle2 directly.
However, we should probably push things further and use dedicated part 
for commonly used request. Having dedicated type will help more semantic 
reply and request. The series we are commenting on is a good example of 
that need for "pattern" here pretty much only apply to bookmark, having 
a dedicated "channel" (within bundle2) for this would make is painless 
to add any new arguments we could need.

TL;DR; I don't think we should touch listkey.pushkey. add a bundle2 part 
dedicated to bookmark instead.

Cheers
Gregory Szorc - Aug. 17, 2016, 1:39 a.m.
On Tue, Aug 16, 2016 at 5:22 PM, Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 08/14/2016 07:17 PM, Gregory Szorc wrote:
>
>> On Fri, Aug 12, 2016 at 5:09 AM, Stanislau Hlebik <stash@fb.com
>> <mailto:stash@fb.com>> wrote:
>>
>>     # HG changeset patch
>>     # User Stanislau Hlebik <stash@fb.com <mailto:stash@fb.com>>
>>
>>     # Date 1470999441 25200
>>     #      Fri Aug 12 03:57:21 2016 -0700
>>     # Node ID c2ee493e216c60ff439ab93cc1efe6ac5922d8eb
>>     # Parent  fd2185d7c2f7aa529b2ad0a6584832fb2b1b4ecb
>>     listkeypattern: add listkeypattern wireproto method
>>
>>     wireproto method to list remote keys by pattern
>>
>>     diff --git a/mercurial/wireproto.py b/mercurial/wireproto.py
>>     --- a/mercurial/wireproto.py
>>     +++ b/mercurial/wireproto.py
>>     @@ -353,6 +353,22 @@
>>                            % (namespace, len(d)))
>>              yield pushkeymod.decodekeys(d)
>>
>>     +    @batchable
>>     +    def listkeypattern(self, namespace, patterns):
>>     +        if not self.capable('pushkey'):
>>     +            yield {}, None
>>     +        f = future()
>>     +        self.ui.debug('preparing listkeys for "%s" with pattern
>>     "%s"\n' %
>>     +                      (namespace, patterns))
>>     +        yield {
>>     +            'namespace': encoding.fromlocal(namespace),
>>     +            'patterns': encodelist(patterns)
>>     +        }, f
>>     +        d = f.value
>>     +        self.ui.debug('received listkey for "%s": %i bytes\n'
>>     +                      % (namespace, len(d)))
>>     +        yield pushkeymod.decodekeys(d)
>>     +
>>          def stream_out(self):
>>              return self._callstream('stream_out')
>>
>>     @@ -676,7 +692,8 @@
>>          return repo.opener.tryread('clonebundles.manifest')
>>
>>      wireprotocaps = ['lookup', 'changegroupsubset', 'branchmap',
>> 'pushkey',
>>     -                 'known', 'getbundle', 'unbundlehash', 'batch']
>>     +                 'known', 'getbundle', 'unbundlehash', 'batch',
>>     +                 'listkeypattern']
>>
>>      def _capabilities(repo, proto):
>>          """return a list of capabilities for a repo
>>     @@ -791,6 +808,12 @@
>>          d = repo.listkeys(encoding.tolocal(namespace)).items()
>>          return pushkeymod.encodekeys(d)
>>
>>     +@wireprotocommand('listkeypattern', 'namespace patterns *')
>>
>>
>> Why the "*" here? "others" is not used in the function implementation.
>>
>>
>>     +def listkeypattern(repo, proto, namespace, patterns, others):
>>     +    patterns = decodelist(patterns)
>>     +    d = repo.listkeys(encoding.tolocal(namespace),
>>     patterns=patterns).items()
>>     +    return pushkeymod.encodekeys(d)
>>     +
>>      @wireprotocommand('lookup', 'key')
>>      def lookup(repo, proto, key):
>>          try:
>>
>>
>> I think introducing a new wire protocol command is the correct way to
>> solve this problem (as opposed to introducing a new argument on the
>> existing command).
>>
>> However, if we're introducing a new wire protocol command for obtaining
>> pushkey values, I think we should improve deficiencies in the response
>> encoding rather than propagate its problems.
>>
>> The "listkeys" response encoding can't transmit the full range of binary
>> values. This can lead to larger (and slower) responses sizes. For
>> example, a number of pushkey namespaces exchange lists of nodes. These
>> have to be represented as hex instead of binary. For pushkey namespaces
>> like phases or obsolescence that can exchange hundreds or thousands of
>> nodes, the overhead can add up.
>>
>> I think the response from a new listkeys command should be using framing
>> to encode the key names and values so the full range of binary values
>> can be efficiently transferred. We may also want a special mechanism to
>> represent a list of nodes, as avoiding the overhead of framing on fixed
>> width values would be desirable.
>>
>> Of course, at the point you introduce a new response encoding, we may
>> want to call the command "listkeys2." If others agree, I can code up an
>> implementation and you can add the patterns functionality on top of it.
>>
>
> Sorry to be a bit late to the discussion.
>
> I don't think we should introduce a new wire-protocol command for this.
>
> Individual listkey call have been a large source of race condition and
> related issue. Bundle2 is able to carry listkey.pushkey call just fine and
> I think we should prioritize its usage. As bundle2 already have framing, we
> could just use your better encoding with bundle2 directly.
> However, we should probably push things further and use dedicated part for
> commonly used request. Having dedicated type will help more semantic reply
> and request. The series we are commenting on is a good example of that need
> for "pattern" here pretty much only apply to bookmark, having a dedicated
> "channel" (within bundle2) for this would make is painless to add any new
> arguments we could need.
>
> TL;DR; I don't think we should touch listkey.pushkey. add a bundle2 part
> dedicated to bookmark instead.
>

I have serious objections to stuffing yet more functionality into the
"getbundle" wire protocol command. That command is quickly becoming a "god
object" [1]. I'd rather we freeze that wire protocol command and offer
equivalent functionality under new, better designed commands.

In general, I'm opposed to adding arguments to wire protocol commands.
Introducing an argument requires advertising a server capability otherwise
clients may send an unknown argument to an incompatible server, which the
server will reject. At the point you introduce a new argument/capability,
you've effectively introduced a new variation of the wire protocol command.
You might as well call it something else so the semantics of each wire
protocol are well-defined and constant over time. If nothing else, this
makes the client/server code easier to understand. I only need to point out
wireproto.getbundle() and wireproto.unbundle() for examples of overly
complicated code as a result of introducing wire protocol command arguments
(notably support for bundle2).

I'm not strongly opposed to the idea of making bundle2 a generic framing
protocol. I think we could do better. (I raised a number of concerns with
bundle2 during its implementation phase. In fact, one of them was that
bundle2 as a generic format for data in transport and at rest is not ideal:
there are various aspects you want to optimize for and one format cannot
exceed at all of them. Unfortunately, not much was done because bundle2 was
already kinda/sorta deployed in a few environments at that time. My fault
for taking too long to audit the protocol.) I think bundle2 is an OK'ish
format for data at rest. For a transport protocol, not so much.

If we do use bundle2 as a container for the wire protocol, that doesn't
mean we can't have multiple wire protocol commands for exchanging bundle2
payloads.

[1] https://en.wikipedia.org/wiki/God_object
Pierre-Yves David - Aug. 17, 2016, 12:05 p.m.
On 08/17/2016 03:39 AM, Gregory Szorc wrote:
> On Tue, Aug 16, 2016 at 5:22 PM, Pierre-Yves David
> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
> wrote:
>
>
>
>     On 08/14/2016 07:17 PM, Gregory Szorc wrote:
>
>         On Fri, Aug 12, 2016 at 5:09 AM, Stanislau Hlebik <stash@fb.com
>         <mailto:stash@fb.com>
>         <mailto:stash@fb.com <mailto:stash@fb.com>>> wrote:
>
>             # HG changeset patch
>             # User Stanislau Hlebik <stash@fb.com <mailto:stash@fb.com>
>         <mailto:stash@fb.com <mailto:stash@fb.com>>>
>
>             # Date 1470999441 25200
>             #      Fri Aug 12 03:57:21 2016 -0700
>             # Node ID c2ee493e216c60ff439ab93cc1efe6ac5922d8eb
>             # Parent  fd2185d7c2f7aa529b2ad0a6584832fb2b1b4ecb
>             listkeypattern: add listkeypattern wireproto method
>
>             wireproto method to list remote keys by pattern
>
>             diff --git a/mercurial/wireproto.py b/mercurial/wireproto.py
>             --- a/mercurial/wireproto.py
>             +++ b/mercurial/wireproto.py
>             @@ -353,6 +353,22 @@
>                                    % (namespace, len(d)))
>                      yield pushkeymod.decodekeys(d)
>
>             +    @batchable
>             +    def listkeypattern(self, namespace, patterns):
>             +        if not self.capable('pushkey'):
>             +            yield {}, None
>             +        f = future()
>             +        self.ui.debug('preparing listkeys for "%s" with pattern
>             "%s"\n' %
>             +                      (namespace, patterns))
>             +        yield {
>             +            'namespace': encoding.fromlocal(namespace),
>             +            'patterns': encodelist(patterns)
>             +        }, f
>             +        d = f.value
>             +        self.ui.debug('received listkey for "%s": %i bytes\n'
>             +                      % (namespace, len(d)))
>             +        yield pushkeymod.decodekeys(d)
>             +
>                  def stream_out(self):
>                      return self._callstream('stream_out')
>
>             @@ -676,7 +692,8 @@
>                  return repo.opener.tryread('clonebundles.manifest')
>
>              wireprotocaps = ['lookup', 'changegroupsubset',
>         'branchmap', 'pushkey',
>             -                 'known', 'getbundle', 'unbundlehash', 'batch']
>             +                 'known', 'getbundle', 'unbundlehash', 'batch',
>             +                 'listkeypattern']
>
>              def _capabilities(repo, proto):
>                  """return a list of capabilities for a repo
>             @@ -791,6 +808,12 @@
>                  d = repo.listkeys(encoding.tolocal(namespace)).items()
>                  return pushkeymod.encodekeys(d)
>
>             +@wireprotocommand('listkeypattern', 'namespace patterns *')
>
>
>         Why the "*" here? "others" is not used in the function
>         implementation.
>
>
>             +def listkeypattern(repo, proto, namespace, patterns, others):
>             +    patterns = decodelist(patterns)
>             +    d = repo.listkeys(encoding.tolocal(namespace),
>             patterns=patterns).items()
>             +    return pushkeymod.encodekeys(d)
>             +
>              @wireprotocommand('lookup', 'key')
>              def lookup(repo, proto, key):
>                  try:
>
>
>         I think introducing a new wire protocol command is the correct
>         way to
>         solve this problem (as opposed to introducing a new argument on the
>         existing command).
>
>         However, if we're introducing a new wire protocol command for
>         obtaining
>         pushkey values, I think we should improve deficiencies in the
>         response
>         encoding rather than propagate its problems.
>
>         The "listkeys" response encoding can't transmit the full range
>         of binary
>         values. This can lead to larger (and slower) responses sizes. For
>         example, a number of pushkey namespaces exchange lists of nodes.
>         These
>         have to be represented as hex instead of binary. For pushkey
>         namespaces
>         like phases or obsolescence that can exchange hundreds or
>         thousands of
>         nodes, the overhead can add up.
>
>         I think the response from a new listkeys command should be using
>         framing
>         to encode the key names and values so the full range of binary
>         values
>         can be efficiently transferred. We may also want a special
>         mechanism to
>         represent a list of nodes, as avoiding the overhead of framing
>         on fixed
>         width values would be desirable.
>
>         Of course, at the point you introduce a new response encoding,
>         we may
>         want to call the command "listkeys2." If others agree, I can
>         code up an
>         implementation and you can add the patterns functionality on top
>         of it.
>
>
>     Sorry to be a bit late to the discussion.
>
>     I don't think we should introduce a new wire-protocol command for this.
>
>     Individual listkey call have been a large source of race condition
>     and related issue. Bundle2 is able to carry listkey.pushkey call
>     just fine and I think we should prioritize its usage. As bundle2
>     already have framing, we could just use your better encoding with
>     bundle2 directly.
>     However, we should probably push things further and use dedicated
>     part for commonly used request. Having dedicated type will help more
>     semantic reply and request. The series we are commenting on is a
>     good example of that need for "pattern" here pretty much only apply
>     to bookmark, having a dedicated "channel" (within bundle2) for this
>     would make is painless to add any new arguments we could need.
>
>     TL;DR; I don't think we should touch listkey.pushkey. add a bundle2
>     part dedicated to bookmark instead.
>
>
> I have serious objections to stuffing yet more functionality into the
> "getbundle" wire protocol command. That command is quickly becoming a
> "god object" [1]. I'd rather we freeze that wire protocol command and
> offer equivalent functionality under new, better designed commands.

As our wireprotocol is stateless, it is very important that we have 
commands allowing to fetch a consistent set of information in a single 
call. I'm not saying getbundle is the best interface for that and  I'm 
not against introducing a new command, but that command still need to 
allow to fetch many different information in one call. That might even 
requires a rework of the wireprotocol (I've not put too much thinking 
into it) even.

Introducing a new command is a significant step. Again, If someone want 
to do the work to design and implement such things I'm all in favor. In 
the mean time people who want to extend the current feature set should 
go through the existing. bundle2 for data transfer and have at least one 
way to access the data through getbundle.

> In general, I'm opposed to adding arguments to wire protocol commands.
> Introducing an argument requires advertising a server capability
> otherwise clients may send an unknown argument to an incompatible
> server, which the server will reject. At the point you introduce a new
> argument/capability, you've effectively introduced a new variation of
> the wire protocol command. You might as well call it something else so
> the semantics of each wire protocol are well-defined and constant over
> time. If nothing else, this makes the client/server code easier to
> understand. I only need to point out wireproto.getbundle() and
> wireproto.unbundle() for examples of overly complicated code as a result
> of introducing wire protocol command arguments (notably support for
> bundle2).

Because we need a single command to consistently fetch all data in an 
extensible way, we'll need that entry point taking arbitrary argument. 
This is actually part of the large gains of bundle2: (1) fetching 
multiple data race-free (2) easy to extend and alter from extension.

I'm not sure I really grasp you point about new arguments and 
capabilities. It is the client responsability to properly reads server 
capabilities and communicate with the server accordingly. This have been 
working reasonably well so far.

However, I see your point than mixing bundle1 and bundle2 into the same 
getbundle command might have been a mistake. It wasn't too apparent when 
we designed it.

> I'm not strongly opposed to the idea of making bundle2 a generic framing
> protocol. I think we could do better. (I raised a number of concerns
> with bundle2 during its implementation phase. In fact, one of them was
> that bundle2 as a generic format for data in transport and at rest is
> not ideal: there are various aspects you want to optimize for and one
> format cannot exceed at all of them. Unfortunately, not much was done
> because bundle2 was already kinda/sorta deployed in a few environments
> at that time. My fault for taking too long to audit the protocol.) I
> think bundle2 is an OK'ish format for data at rest. For a transport
> protocol, not so much.

I entirely agree with you here. Your feedback on how bundle2 could have 
been better are valid (but late) and I'm certainly open to the idea of 
the bundle3 replacing bundle2. However, until someone actually 
implements that bundle3, bundle2 is the best thing we have on the market 
and we should reuse it as much as possible.

An important aspect of that is that we want a bundle2+ version of any 
data we could fetch in order to ensure we can transport it alongside 
associated data (during pull/push) and we can store it on disk.
Since we must a bundle2+ representation of that data, it does not seems 
very useful to introduce another representation/encoding of it.
(The above paragraph disregard the

> If we do use bundle2 as a container for the wire protocol, that doesn't
> mean we can't have multiple wire protocol commands for exchanging
> bundle2 payloads.

Yep, using bundle2 for transporting command data seems reasonable.

However, there is a argument similar to the one in the previous block to 
be made. The data we want to exchange most certainly needs to be carried 
alongside other in a consistent way. So we at minimum needs a support 
for fetching it through getbundle (or a better command serving the same 
purpose). And once we have a way to obtain the data through gebundle+ 
building a second way to fetch just the very same data does not seems 
very useful.

> [1] https://en.wikipedia.org/wiki/God_object

I think I have been battling god object in Mercurial for long enough to 
know what they are ;-)

Cheers,
Augie Fackler - Aug. 17, 2016, 1:12 p.m.
On Tue, Aug 16, 2016 at 06:39:38PM -0700, Gregory Szorc wrote:
> On Tue, Aug 16, 2016 at 5:22 PM, Pierre-Yves David <pierre-yves.david@ens-lyon.org> wrote:
> > On 08/14/2016 07:17 PM, Gregory Szorc wrote:
[snip]
> >> I think introducing a new wire protocol command is the correct way to
> >> solve this problem (as opposed to introducing a new argument on the
> >> existing command).
> >>
> >> However, if we're introducing a new wire protocol command for obtaining
> >> pushkey values, I think we should improve deficiencies in the response
> >> encoding rather than propagate its problems.
> >>
> >> The "listkeys" response encoding can't transmit the full range of binary
> >> values. This can lead to larger (and slower) responses sizes. For
> >> example, a number of pushkey namespaces exchange lists of nodes. These
> >> have to be represented as hex instead of binary. For pushkey namespaces
> >> like phases or obsolescence that can exchange hundreds or thousands of
> >> nodes, the overhead can add up.
> >>
> >> I think the response from a new listkeys command should be using framing
> >> to encode the key names and values so the full range of binary values
> >> can be efficiently transferred. We may also want a special mechanism to
> >> represent a list of nodes, as avoiding the overhead of framing on fixed
> >> width values would be desirable.
> >>
> >> Of course, at the point you introduce a new response encoding, we may
> >> want to call the command "listkeys2." If others agree, I can code up an
> >> implementation and you can add the patterns functionality on top of it.
> >>
> >
> > Sorry to be a bit late to the discussion.
> >
> > I don't think we should introduce a new wire-protocol command for this.
> >
> > Individual listkey call have been a large source of race condition and
> > related issue. Bundle2 is able to carry listkey.pushkey call just fine and
> > I think we should prioritize its usage. As bundle2 already have framing, we
> > could just use your better encoding with bundle2 directly.
> > However, we should probably push things further and use dedicated part for
> > commonly used request. Having dedicated type will help more semantic reply
> > and request. The series we are commenting on is a good example of that need
> > for "pattern" here pretty much only apply to bookmark, having a dedicated
> > "channel" (within bundle2) for this would make is painless to add any new
> > arguments we could need.
> >
> > TL;DR; I don't think we should touch listkey.pushkey. add a bundle2 part
> > dedicated to bookmark instead.
> >
>
> I have serious objections to stuffing yet more functionality into the
> "getbundle" wire protocol command. That command is quickly becoming a "god
> object" [1]. I'd rather we freeze that wire protocol command and offer
> equivalent functionality under new, better designed commands.

In general, I agree with your feeling on the bundle kind of being a
god object, but for bookmarks in particular Pierre-Yves is absolutely
correct. I think pushkey makes a ton of sense for things like marking
code review state, which can sensibly be done outside a transaction
(perhaps only sensibly done outside a repo history transaction? I'm
not sure), but in general most of the things we historically used
pushkey for should be in the same transaction as the rest of a push.

As a bonus, a dedicated bundle2 part for bookmarks would let us
properly resolve https://bz.mercurial-scm.org/show_bug.cgi?id=5165, so
we should probably just do that. I hadn't thought of that as a
solution, but it seems like the clear winner for that case now that
someone's said it.

What are use cases for pushkey beyond things that should be part of a
repo transaction? Can we come up with enough of those that would like
binary data payloads to justify the new command?

>
> In general, I'm opposed to adding arguments to wire protocol commands.
> Introducing an argument requires advertising a server capability otherwise
> clients may send an unknown argument to an incompatible server, which the
> server will reject. At the point you introduce a new argument/capability,
> you've effectively introduced a new variation of the wire protocol command.
> You might as well call it something else so the semantics of each wire
> protocol are well-defined and constant over time. If nothing else, this
> makes the client/server code easier to understand. I only need to point out
> wireproto.getbundle() and wireproto.unbundle() for examples of overly
> complicated code as a result of introducing wire protocol command arguments
> (notably support for bundle2).
>
> I'm not strongly opposed to the idea of making bundle2 a generic framing
> protocol. I think we could do better. (I raised a number of concerns with
> bundle2 during its implementation phase. In fact, one of them was that
> bundle2 as a generic format for data in transport and at rest is not ideal:
> there are various aspects you want to optimize for and one format cannot
> exceed at all of them. Unfortunately, not much was done because bundle2 was
> already kinda/sorta deployed in a few environments at that time. My fault
> for taking too long to audit the protocol.) I think bundle2 is an OK'ish
> format for data at rest. For a transport protocol, not so much.
>
> If we do use bundle2 as a container for the wire protocol, that doesn't
> mean we can't have multiple wire protocol commands for exchanging bundle2
> payloads.
>
> [1] https://en.wikipedia.org/wiki/God_object

> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Durham Goode - Aug. 17, 2016, 8:08 p.m.
On 8/16/16 5:22 PM, Pierre-Yves David wrote:
>
>
> On 08/14/2016 07:17 PM, Gregory Szorc wrote:
>> On Fri, Aug 12, 2016 at 5:09 AM, Stanislau Hlebik <stash@fb.com
>> <mailto:stash@fb.com>> wrote:
>>
>>     # HG changeset patch
>>     # User Stanislau Hlebik <stash@fb.com <mailto:stash@fb.com>>
>>     # Date 1470999441 25200
>>     #      Fri Aug 12 03:57:21 2016 -0700
>>     # Node ID c2ee493e216c60ff439ab93cc1efe6ac5922d8eb
>>     # Parent  fd2185d7c2f7aa529b2ad0a6584832fb2b1b4ecb
>>     listkeypattern: add listkeypattern wireproto method
>>
>>     wireproto method to list remote keys by pattern
>>
>>     diff --git a/mercurial/wireproto.py b/mercurial/wireproto.py
>>     --- a/mercurial/wireproto.py
>>     +++ b/mercurial/wireproto.py
>>     @@ -353,6 +353,22 @@
>>                            % (namespace, len(d)))
>>              yield pushkeymod.decodekeys(d)
>>
>>     +    @batchable
>>     +    def listkeypattern(self, namespace, patterns):
>>     +        if not self.capable('pushkey'):
>>     +            yield {}, None
>>     +        f = future()
>>     +        self.ui.debug('preparing listkeys for "%s" with pattern
>>     "%s"\n' %
>>     +                      (namespace, patterns))
>>     +        yield {
>>     +            'namespace': encoding.fromlocal(namespace),
>>     +            'patterns': encodelist(patterns)
>>     +        }, f
>>     +        d = f.value
>>     +        self.ui.debug('received listkey for "%s": %i bytes\n'
>>     +                      % (namespace, len(d)))
>>     +        yield pushkeymod.decodekeys(d)
>>     +
>>          def stream_out(self):
>>              return self._callstream('stream_out')
>>
>>     @@ -676,7 +692,8 @@
>>          return repo.opener.tryread('clonebundles.manifest')
>>
>>      wireprotocaps = ['lookup', 'changegroupsubset', 'branchmap', 
>> 'pushkey',
>>     -                 'known', 'getbundle', 'unbundlehash', 'batch']
>>     +                 'known', 'getbundle', 'unbundlehash', 'batch',
>>     +                 'listkeypattern']
>>
>>      def _capabilities(repo, proto):
>>          """return a list of capabilities for a repo
>>     @@ -791,6 +808,12 @@
>>          d = repo.listkeys(encoding.tolocal(namespace)).items()
>>          return pushkeymod.encodekeys(d)
>>
>>     +@wireprotocommand('listkeypattern', 'namespace patterns *')
>>
>>
>> Why the "*" here? "others" is not used in the function implementation.
>>
>>
>>     +def listkeypattern(repo, proto, namespace, patterns, others):
>>     +    patterns = decodelist(patterns)
>>     +    d = repo.listkeys(encoding.tolocal(namespace),
>>     patterns=patterns).items()
>>     +    return pushkeymod.encodekeys(d)
>>     +
>>      @wireprotocommand('lookup', 'key')
>>      def lookup(repo, proto, key):
>>          try:
>>
>>
>> I think introducing a new wire protocol command is the correct way to
>> solve this problem (as opposed to introducing a new argument on the
>> existing command).
>>
>> However, if we're introducing a new wire protocol command for obtaining
>> pushkey values, I think we should improve deficiencies in the response
>> encoding rather than propagate its problems.
>>
>> The "listkeys" response encoding can't transmit the full range of binary
>> values. This can lead to larger (and slower) responses sizes. For
>> example, a number of pushkey namespaces exchange lists of nodes. These
>> have to be represented as hex instead of binary. For pushkey namespaces
>> like phases or obsolescence that can exchange hundreds or thousands of
>> nodes, the overhead can add up.
>>
>> I think the response from a new listkeys command should be using framing
>> to encode the key names and values so the full range of binary values
>> can be efficiently transferred. We may also want a special mechanism to
>> represent a list of nodes, as avoiding the overhead of framing on fixed
>> width values would be desirable.
>>
>> Of course, at the point you introduce a new response encoding, we may
>> want to call the command "listkeys2." If others agree, I can code up an
>> implementation and you can add the patterns functionality on top of it.
>
> Sorry to be a bit late to the discussion.
>
> I don't think we should introduce a new wire-protocol command for this.
>
> Individual listkey call have been a large source of race condition and 
> related issue. Bundle2 is able to carry listkey.pushkey call just fine 
> and I think we should prioritize its usage. As bundle2 already have 
> framing, we could just use your better encoding with bundle2 directly.
> However, we should probably push things further and use dedicated part 
> for commonly used request. Having dedicated type will help more 
> semantic reply and request. The series we are commenting on is a good 
> example of that need for "pattern" here pretty much only apply to 
> bookmark, having a dedicated "channel" (within bundle2) for this would 
> make is painless to add any new arguments we could need.
If we create a bundle2 part for transmitting bookmark information, what 
does the request flow look like?  getbundle seems to take kwargs, so 
would all requests piggy back their query parameters into the getbundle 
kwargs?  Like passing "cg=False, bookmarks=True, bookmark_pattern=foo/*" 
as arguments that the receiver would know to produce a bookmarkpart that 
matches that parameter (and know not to send a changegroup)?  Seems like 
a recipe for problems if every requestable part type overloads the same 
kwargs.

Is the alternative to use the unbundle protocol to send empty parts with 
parameters as the way of requesting data?  Then the response is sent as 
a reply part?  Would that then eventually deprecate getbundle entirely?

I'm just trying to understand how we might implement the listkeypattern 
functionality using bundle2 without having to refactor bundle2 much.
Stanislau Hlebik - Aug. 17, 2016, 8:26 p.m.
Durham, that’s exactly the implementation I wrote locally:
part generator on the server that reads ‘patterns’ (patterns or empty list) and ‘bookmarks’ (set to True) parameters from kwargs (‘cg’ is set to False);
part handler on client.
Agree, it really looks error-prone and unbundle can probably be better.

On 8/17/16, 9:08 PM, "Durham Goode" <durham@fb.com> wrote:

    On 8/16/16 5:22 PM, Pierre-Yves David wrote:
    >

    >

    > On 08/14/2016 07:17 PM, Gregory Szorc wrote:

    >> On Fri, Aug 12, 2016 at 5:09 AM, Stanislau Hlebik <stash@fb.com

    >> <mailto:stash@fb.com>> wrote:

    >>

    >>     # HG changeset patch

    >>     # User Stanislau Hlebik <stash@fb.com <mailto:stash@fb.com>>

    >>     # Date 1470999441 25200

    >>     #      Fri Aug 12 03:57:21 2016 -0700

    >>     # Node ID c2ee493e216c60ff439ab93cc1efe6ac5922d8eb

    >>     # Parent  fd2185d7c2f7aa529b2ad0a6584832fb2b1b4ecb

    >>     listkeypattern: add listkeypattern wireproto method

    >>

    >>     wireproto method to list remote keys by pattern

    >>

    >>     diff --git a/mercurial/wireproto.py b/mercurial/wireproto.py

    >>     --- a/mercurial/wireproto.py

    >>     +++ b/mercurial/wireproto.py

    >>     @@ -353,6 +353,22 @@

    >>                            % (namespace, len(d)))

    >>              yield pushkeymod.decodekeys(d)

    >>

    >>     +    @batchable

    >>     +    def listkeypattern(self, namespace, patterns):

    >>     +        if not self.capable('pushkey'):

    >>     +            yield {}, None

    >>     +        f = future()

    >>     +        self.ui.debug('preparing listkeys for "%s" with pattern

    >>     "%s"\n' %

    >>     +                      (namespace, patterns))

    >>     +        yield {

    >>     +            'namespace': encoding.fromlocal(namespace),

    >>     +            'patterns': encodelist(patterns)

    >>     +        }, f

    >>     +        d = f.value

    >>     +        self.ui.debug('received listkey for "%s": %i bytes\n'

    >>     +                      % (namespace, len(d)))

    >>     +        yield pushkeymod.decodekeys(d)

    >>     +

    >>          def stream_out(self):

    >>              return self._callstream('stream_out')

    >>

    >>     @@ -676,7 +692,8 @@

    >>          return repo.opener.tryread('clonebundles.manifest')

    >>

    >>      wireprotocaps = ['lookup', 'changegroupsubset', 'branchmap', 

    >> 'pushkey',

    >>     -                 'known', 'getbundle', 'unbundlehash', 'batch']

    >>     +                 'known', 'getbundle', 'unbundlehash', 'batch',

    >>     +                 'listkeypattern']

    >>

    >>      def _capabilities(repo, proto):

    >>          """return a list of capabilities for a repo

    >>     @@ -791,6 +808,12 @@

    >>          d = repo.listkeys(encoding.tolocal(namespace)).items()

    >>          return pushkeymod.encodekeys(d)

    >>

    >>     +@wireprotocommand('listkeypattern', 'namespace patterns *')

    >>

    >>

    >> Why the "*" here? "others" is not used in the function implementation.

    >>

    >>

    >>     +def listkeypattern(repo, proto, namespace, patterns, others):

    >>     +    patterns = decodelist(patterns)

    >>     +    d = repo.listkeys(encoding.tolocal(namespace),

    >>     patterns=patterns).items()

    >>     +    return pushkeymod.encodekeys(d)

    >>     +

    >>      @wireprotocommand('lookup', 'key')

    >>      def lookup(repo, proto, key):

    >>          try:

    >>

    >>

    >> I think introducing a new wire protocol command is the correct way to

    >> solve this problem (as opposed to introducing a new argument on the

    >> existing command).

    >>

    >> However, if we're introducing a new wire protocol command for obtaining

    >> pushkey values, I think we should improve deficiencies in the response

    >> encoding rather than propagate its problems.

    >>

    >> The "listkeys" response encoding can't transmit the full range of binary

    >> values. This can lead to larger (and slower) responses sizes. For

    >> example, a number of pushkey namespaces exchange lists of nodes. These

    >> have to be represented as hex instead of binary. For pushkey namespaces

    >> like phases or obsolescence that can exchange hundreds or thousands of

    >> nodes, the overhead can add up.

    >>

    >> I think the response from a new listkeys command should be using framing

    >> to encode the key names and values so the full range of binary values

    >> can be efficiently transferred. We may also want a special mechanism to

    >> represent a list of nodes, as avoiding the overhead of framing on fixed

    >> width values would be desirable.

    >>

    >> Of course, at the point you introduce a new response encoding, we may

    >> want to call the command "listkeys2." If others agree, I can code up an

    >> implementation and you can add the patterns functionality on top of it.

    >

    > Sorry to be a bit late to the discussion.

    >

    > I don't think we should introduce a new wire-protocol command for this.

    >

    > Individual listkey call have been a large source of race condition and 

    > related issue. Bundle2 is able to carry listkey.pushkey call just fine 

    > and I think we should prioritize its usage. As bundle2 already have 

    > framing, we could just use your better encoding with bundle2 directly.

    > However, we should probably push things further and use dedicated part 

    > for commonly used request. Having dedicated type will help more 

    > semantic reply and request. The series we are commenting on is a good 

    > example of that need for "pattern" here pretty much only apply to 

    > bookmark, having a dedicated "channel" (within bundle2) for this would 

    > make is painless to add any new arguments we could need.

    If we create a bundle2 part for transmitting bookmark information, what 
    does the request flow look like?  getbundle seems to take kwargs, so 
    would all requests piggy back their query parameters into the getbundle 
    kwargs?  Like passing "cg=False, bookmarks=True, bookmark_pattern=foo/*" 
    as arguments that the receiver would know to produce a bookmarkpart that 
    matches that parameter (and know not to send a changegroup)?  Seems like 
    a recipe for problems if every requestable part type overloads the same 
    kwargs.
    
    Is the alternative to use the unbundle protocol to send empty parts with 
    parameters as the way of requesting data?  Then the response is sent as 
    a reply part?  Would that then eventually deprecate getbundle entirely?
    
    I'm just trying to understand how we might implement the listkeypattern 
    functionality using bundle2 without having to refactor bundle2 much.
Pierre-Yves David - Aug. 22, 2016, 12:12 p.m.
On 08/17/2016 10:08 PM, Durham Goode wrote:
> On 8/16/16 5:22 PM, Pierre-Yves David wrote:
>>
>>
>> On 08/14/2016 07:17 PM, Gregory Szorc wrote:
>>> On Fri, Aug 12, 2016 at 5:09 AM, Stanislau Hlebik <stash@fb.com
>>> <mailto:stash@fb.com>> wrote:
>>>
>>>     # HG changeset patch
>>>     # User Stanislau Hlebik <stash@fb.com <mailto:stash@fb.com>>
>>>     # Date 1470999441 25200
>>>     #      Fri Aug 12 03:57:21 2016 -0700
>>>     # Node ID c2ee493e216c60ff439ab93cc1efe6ac5922d8eb
>>>     # Parent  fd2185d7c2f7aa529b2ad0a6584832fb2b1b4ecb
>>>     listkeypattern: add listkeypattern wireproto method
>>>
>>>     wireproto method to list remote keys by pattern
>>>
>>>     diff --git a/mercurial/wireproto.py b/mercurial/wireproto.py
>>>     --- a/mercurial/wireproto.py
>>>     +++ b/mercurial/wireproto.py
>>>     @@ -353,6 +353,22 @@
>>>                            % (namespace, len(d)))
>>>              yield pushkeymod.decodekeys(d)
>>>
>>>     +    @batchable
>>>     +    def listkeypattern(self, namespace, patterns):
>>>     +        if not self.capable('pushkey'):
>>>     +            yield {}, None
>>>     +        f = future()
>>>     +        self.ui.debug('preparing listkeys for "%s" with pattern
>>>     "%s"\n' %
>>>     +                      (namespace, patterns))
>>>     +        yield {
>>>     +            'namespace': encoding.fromlocal(namespace),
>>>     +            'patterns': encodelist(patterns)
>>>     +        }, f
>>>     +        d = f.value
>>>     +        self.ui.debug('received listkey for "%s": %i bytes\n'
>>>     +                      % (namespace, len(d)))
>>>     +        yield pushkeymod.decodekeys(d)
>>>     +
>>>          def stream_out(self):
>>>              return self._callstream('stream_out')
>>>
>>>     @@ -676,7 +692,8 @@
>>>          return repo.opener.tryread('clonebundles.manifest')
>>>
>>>      wireprotocaps = ['lookup', 'changegroupsubset', 'branchmap',
>>> 'pushkey',
>>>     -                 'known', 'getbundle', 'unbundlehash', 'batch']
>>>     +                 'known', 'getbundle', 'unbundlehash', 'batch',
>>>     +                 'listkeypattern']
>>>
>>>      def _capabilities(repo, proto):
>>>          """return a list of capabilities for a repo
>>>     @@ -791,6 +808,12 @@
>>>          d = repo.listkeys(encoding.tolocal(namespace)).items()
>>>          return pushkeymod.encodekeys(d)
>>>
>>>     +@wireprotocommand('listkeypattern', 'namespace patterns *')
>>>
>>>
>>> Why the "*" here? "others" is not used in the function implementation.
>>>
>>>
>>>     +def listkeypattern(repo, proto, namespace, patterns, others):
>>>     +    patterns = decodelist(patterns)
>>>     +    d = repo.listkeys(encoding.tolocal(namespace),
>>>     patterns=patterns).items()
>>>     +    return pushkeymod.encodekeys(d)
>>>     +
>>>      @wireprotocommand('lookup', 'key')
>>>      def lookup(repo, proto, key):
>>>          try:
>>>
>>>
>>> I think introducing a new wire protocol command is the correct way to
>>> solve this problem (as opposed to introducing a new argument on the
>>> existing command).
>>>
>>> However, if we're introducing a new wire protocol command for obtaining
>>> pushkey values, I think we should improve deficiencies in the response
>>> encoding rather than propagate its problems.
>>>
>>> The "listkeys" response encoding can't transmit the full range of binary
>>> values. This can lead to larger (and slower) responses sizes. For
>>> example, a number of pushkey namespaces exchange lists of nodes. These
>>> have to be represented as hex instead of binary. For pushkey namespaces
>>> like phases or obsolescence that can exchange hundreds or thousands of
>>> nodes, the overhead can add up.
>>>
>>> I think the response from a new listkeys command should be using framing
>>> to encode the key names and values so the full range of binary values
>>> can be efficiently transferred. We may also want a special mechanism to
>>> represent a list of nodes, as avoiding the overhead of framing on fixed
>>> width values would be desirable.
>>>
>>> Of course, at the point you introduce a new response encoding, we may
>>> want to call the command "listkeys2." If others agree, I can code up an
>>> implementation and you can add the patterns functionality on top of it.
>>
>> Sorry to be a bit late to the discussion.
>>
>> I don't think we should introduce a new wire-protocol command for this.
>>
>> Individual listkey call have been a large source of race condition and
>> related issue. Bundle2 is able to carry listkey.pushkey call just fine
>> and I think we should prioritize its usage. As bundle2 already have
>> framing, we could just use your better encoding with bundle2 directly.
>> However, we should probably push things further and use dedicated part
>> for commonly used request. Having dedicated type will help more
>> semantic reply and request. The series we are commenting on is a good
>> example of that need for "pattern" here pretty much only apply to
>> bookmark, having a dedicated "channel" (within bundle2) for this would
>> make is painless to add any new arguments we could need.
> If we create a bundle2 part for transmitting bookmark information, what
> does the request flow look like?  getbundle seems to take kwargs, so
> would all requests piggy back their query parameters into the getbundle
> kwargs?  Like passing "cg=False, bookmarks=True, bookmark_pattern=foo/*"
> as arguments that the receiver would know to produce a bookmarkpart that
> matches that parameter (and know not to send a changegroup)?  Seems like
> a recipe for problems if every requestable part type overloads the same
> kwargs.

This is how it work right now (and how extension extend features), it 
requires some care to not step over each other but have been 
satisfactory so far.

> Is the alternative to use the unbundle protocol to send empty parts with
> parameters as the way of requesting data?  Then the response is sent as
> a reply part?  Would that then eventually deprecate getbundle entirely?

There is rarely a 1:1 mapping between argument and part. Some argument 
are need by multiple parts and some data are sent in multiple part. So 
I'm not sure that would help much.

> I'm just trying to understand how we might implement the listkeypattern
> functionality using bundle2 without having to refactor bundle2 much.

simpler proposal I can see is:

- add a capability to say we have a dedicated bookmark part (and 
associated part)
- add and use bookmarkpattern argument to request bookmark (instead of 
requesting for a bookmark listkey part) use '*' in the generic /not 
patter case

In practice we might a bit richer way to specify bookmark. For example I 
can imagine requesting "all bookmark on the pulled set of changeset" 
being something useful.

Cheers,
Gregory Szorc - Aug. 23, 2016, 2:34 a.m.
On Wed, Aug 17, 2016 at 1:08 PM, Durham Goode <durham@fb.com> wrote:

> On 8/16/16 5:22 PM, Pierre-Yves David wrote:
>
>>
>>
>> On 08/14/2016 07:17 PM, Gregory Szorc wrote:
>>
>>> On Fri, Aug 12, 2016 at 5:09 AM, Stanislau Hlebik <stash@fb.com
>>> <mailto:stash@fb.com>> wrote:
>>>
>>>     # HG changeset patch
>>>     # User Stanislau Hlebik <stash@fb.com <mailto:stash@fb.com>>
>>>     # Date 1470999441 25200
>>>     #      Fri Aug 12 03:57:21 2016 -0700
>>>     # Node ID c2ee493e216c60ff439ab93cc1efe6ac5922d8eb
>>>     # Parent  fd2185d7c2f7aa529b2ad0a6584832fb2b1b4ecb
>>>     listkeypattern: add listkeypattern wireproto method
>>>
>>>     wireproto method to list remote keys by pattern
>>>
>>>     diff --git a/mercurial/wireproto.py b/mercurial/wireproto.py
>>>     --- a/mercurial/wireproto.py
>>>     +++ b/mercurial/wireproto.py
>>>     @@ -353,6 +353,22 @@
>>>                            % (namespace, len(d)))
>>>              yield pushkeymod.decodekeys(d)
>>>
>>>     +    @batchable
>>>     +    def listkeypattern(self, namespace, patterns):
>>>     +        if not self.capable('pushkey'):
>>>     +            yield {}, None
>>>     +        f = future()
>>>     +        self.ui.debug('preparing listkeys for "%s" with pattern
>>>     "%s"\n' %
>>>     +                      (namespace, patterns))
>>>     +        yield {
>>>     +            'namespace': encoding.fromlocal(namespace),
>>>     +            'patterns': encodelist(patterns)
>>>     +        }, f
>>>     +        d = f.value
>>>     +        self.ui.debug('received listkey for "%s": %i bytes\n'
>>>     +                      % (namespace, len(d)))
>>>     +        yield pushkeymod.decodekeys(d)
>>>     +
>>>          def stream_out(self):
>>>              return self._callstream('stream_out')
>>>
>>>     @@ -676,7 +692,8 @@
>>>          return repo.opener.tryread('clonebundles.manifest')
>>>
>>>      wireprotocaps = ['lookup', 'changegroupsubset', 'branchmap',
>>> 'pushkey',
>>>     -                 'known', 'getbundle', 'unbundlehash', 'batch']
>>>     +                 'known', 'getbundle', 'unbundlehash', 'batch',
>>>     +                 'listkeypattern']
>>>
>>>      def _capabilities(repo, proto):
>>>          """return a list of capabilities for a repo
>>>     @@ -791,6 +808,12 @@
>>>          d = repo.listkeys(encoding.tolocal(namespace)).items()
>>>          return pushkeymod.encodekeys(d)
>>>
>>>     +@wireprotocommand('listkeypattern', 'namespace patterns *')
>>>
>>>
>>> Why the "*" here? "others" is not used in the function implementation.
>>>
>>>
>>>     +def listkeypattern(repo, proto, namespace, patterns, others):
>>>     +    patterns = decodelist(patterns)
>>>     +    d = repo.listkeys(encoding.tolocal(namespace),
>>>     patterns=patterns).items()
>>>     +    return pushkeymod.encodekeys(d)
>>>     +
>>>      @wireprotocommand('lookup', 'key')
>>>      def lookup(repo, proto, key):
>>>          try:
>>>
>>>
>>> I think introducing a new wire protocol command is the correct way to
>>> solve this problem (as opposed to introducing a new argument on the
>>> existing command).
>>>
>>> However, if we're introducing a new wire protocol command for obtaining
>>> pushkey values, I think we should improve deficiencies in the response
>>> encoding rather than propagate its problems.
>>>
>>> The "listkeys" response encoding can't transmit the full range of binary
>>> values. This can lead to larger (and slower) responses sizes. For
>>> example, a number of pushkey namespaces exchange lists of nodes. These
>>> have to be represented as hex instead of binary. For pushkey namespaces
>>> like phases or obsolescence that can exchange hundreds or thousands of
>>> nodes, the overhead can add up.
>>>
>>> I think the response from a new listkeys command should be using framing
>>> to encode the key names and values so the full range of binary values
>>> can be efficiently transferred. We may also want a special mechanism to
>>> represent a list of nodes, as avoiding the overhead of framing on fixed
>>> width values would be desirable.
>>>
>>> Of course, at the point you introduce a new response encoding, we may
>>> want to call the command "listkeys2." If others agree, I can code up an
>>> implementation and you can add the patterns functionality on top of it.
>>>
>>
>> Sorry to be a bit late to the discussion.
>>
>> I don't think we should introduce a new wire-protocol command for this.
>>
>> Individual listkey call have been a large source of race condition and
>> related issue. Bundle2 is able to carry listkey.pushkey call just fine and
>> I think we should prioritize its usage. As bundle2 already have framing, we
>> could just use your better encoding with bundle2 directly.
>> However, we should probably push things further and use dedicated part
>> for commonly used request. Having dedicated type will help more semantic
>> reply and request. The series we are commenting on is a good example of
>> that need for "pattern" here pretty much only apply to bookmark, having a
>> dedicated "channel" (within bundle2) for this would make is painless to add
>> any new arguments we could need.
>>
> If we create a bundle2 part for transmitting bookmark information, what
> does the request flow look like?  getbundle seems to take kwargs, so would
> all requests piggy back their query parameters into the getbundle kwargs?
> Like passing "cg=False, bookmarks=True, bookmark_pattern=foo/*" as
> arguments that the receiver would know to produce a bookmarkpart that
> matches that parameter (and know not to send a changegroup)?  Seems like a
> recipe for problems if every requestable part type overloads the same
> kwargs.
>
> Is the alternative to use the unbundle protocol to send empty parts with
> parameters as the way of requesting data?  Then the response is sent as a
> reply part?  Would that then eventually deprecate getbundle entirely?
>
> I'm just trying to understand how we might implement the listkeypattern
> functionality using bundle2 without having to refactor bundle2 much.
>

I don't think overloading "unbundle" to retrieve data is a good idea. The
support for "unbundle" returning a bundle2 is as far as I remember, a
feature added by Facebook to support the pushrebase extension and there are
no consumers of the feature in core. Either way, I think preserving the
semantic difference between "getbundle" being used for pull and "unbundle"
being used for push is important. I could see us accidentally doing things
like obtaining a lock while processing a pull-originated "unbundle" and
that's not something we'd want to happen.

To create a bundle2 for transmitting bookmark information, the server will
need a new capacity that basically says "I support receiving getbundle
arguments related to bookmarks." The client will need to advertise bundle2
capabilities saying it knows how to receive cert bookmarks related bundle2
parts. Then you teach exchange._pullbundle2 to send said arguments. Then a
@getbundle2partsgenerator in exchange.py takes care of adding the part on
the server if it was requested and a @parthandler in bundle2.py takes care
of doing something with it on the client. Look at d29201352af7 and
0c2ded041d10 for part of this. The only thing that doesn't do is advertise
the server capability (because we didn't need to pass arguments to
"getbundle").

And yes, having a shared namespace of arguments for every requestable part
type could cause chaos. It means that any extension adding an argument
could potentially conflict with a future core-provided argument. This is
one reason why I think commands should be immutable in core: if core adds a
new argument, it will expose it via e.g. getbundle<N>. Extensions would
have to explicitly monkeypatch the new command and they would probably see
they were in conflict with a core-provided argument when doing so. The
downside to this is extensions adding arguments to wire protocol commands
become out of date since an upgraded client could start using a new wire
protocol command which the extension doesn't yet know about. Something like
bundle2's separate parts constituting namespaces for arguments would help
here. But even then, we don't have a good story for what part names for 3rd
party bundle2 parts should be. Protocol design is hard.
Pierre-Yves David - Aug. 24, 2016, 9:57 p.m.
On 08/23/2016 04:34 AM, Gregory Szorc wrote:
> And yes, having a shared namespace of arguments for every requestable
> part type could cause chaos. It means that any extension adding an
> argument could potentially conflict with a future core-provided
> argument.

I greatly encourage extensions adding new arguments (actually new 
anything) to prefix them with an extension specific bits to avoid future 
conflict.

Patch

diff --git a/mercurial/wireproto.py b/mercurial/wireproto.py
--- a/mercurial/wireproto.py
+++ b/mercurial/wireproto.py
@@ -353,6 +353,22 @@ 
                       % (namespace, len(d)))
         yield pushkeymod.decodekeys(d)
 
+    @batchable
+    def listkeypattern(self, namespace, patterns):
+        if not self.capable('pushkey'):
+            yield {}, None
+        f = future()
+        self.ui.debug('preparing listkeys for "%s" with pattern "%s"\n' %
+                      (namespace, patterns))
+        yield {
+            'namespace': encoding.fromlocal(namespace),
+            'patterns': encodelist(patterns)
+        }, f
+        d = f.value
+        self.ui.debug('received listkey for "%s": %i bytes\n'
+                      % (namespace, len(d)))
+        yield pushkeymod.decodekeys(d)
+
     def stream_out(self):
         return self._callstream('stream_out')
 
@@ -676,7 +692,8 @@ 
     return repo.opener.tryread('clonebundles.manifest')
 
 wireprotocaps = ['lookup', 'changegroupsubset', 'branchmap', 'pushkey',
-                 'known', 'getbundle', 'unbundlehash', 'batch']
+                 'known', 'getbundle', 'unbundlehash', 'batch',
+                 'listkeypattern']
 
 def _capabilities(repo, proto):
     """return a list of capabilities for a repo
@@ -791,6 +808,12 @@ 
     d = repo.listkeys(encoding.tolocal(namespace)).items()
     return pushkeymod.encodekeys(d)
 
+@wireprotocommand('listkeypattern', 'namespace patterns *')
+def listkeypattern(repo, proto, namespace, patterns, others):
+    patterns = decodelist(patterns)
+    d = repo.listkeys(encoding.tolocal(namespace), patterns=patterns).items()
+    return pushkeymod.encodekeys(d)
+
 @wireprotocommand('lookup', 'key')
 def lookup(repo, proto, key):
     try: