Patchwork [1,of,3,V2] bundle2: add separate handling for error part creation

login
register
mail settings
Submitter Siddharth Agarwal
Date April 5, 2017, 3:49 a.m.
Message ID <ad6196e3370572b9d0b4.1491364175@devvm028.frc2.facebook.com>
Download mbox | patch
Permalink /patch/19967/
State Changes Requested
Headers show

Comments

Siddharth Agarwal - April 5, 2017, 3:49 a.m.
# HG changeset patch
# User Siddharth Agarwal <sid0@fb.com>
# Date 1491348154 25200
#      Tue Apr 04 16:22:34 2017 -0700
# Node ID ad6196e3370572b9d0b436ab9d901f26884633f4
# Parent  04ec317b81280c189fcea33a05c8cbbac3c186b1
bundle2: add separate handling for error part creation

This will be used in upcoming diffs to handle error parts.

See the included docstrings for why this is necessary.
Pierre-Yves David - April 6, 2017, 12:50 p.m.
On 04/05/2017 05:49 AM, Siddharth Agarwal wrote:
> # HG changeset patch
> # User Siddharth Agarwal <sid0@fb.com>
> # Date 1491348154 25200
> #      Tue Apr 04 16:22:34 2017 -0700
> # Node ID ad6196e3370572b9d0b436ab9d901f26884633f4
> # Parent  04ec317b81280c189fcea33a05c8cbbac3c186b1
> bundle2: add separate handling for error part creation
>
> This will be used in upcoming diffs to handle error parts.
>
> See the included docstrings for why this is necessary.

> […]

> @@ -1026,6 +1041,54 @@ class bundlepart(object):
>          elif len(self.data):
>              yield self.data
>
> +class bundleerrorpart(bundlepart):
> +    """
> +    This is only required for the following error parts that are recognized
> +    by Mercurial versions prior to 4.2.
> +    - error:abort
> +    - error:pushraced
> +    - error:unsupportedcontent

While I understand the need for unbounded generic error:Abort, I not 
sure we should extend this to pushraced and unsupported content one. 
They have defined enough use case where limiting the message and hint to 
255 char seems reasonable (in the exception themselves).
Limiting the scope of this more complex creation and handling to 
error:Abort.

What do you think ?

> +
> +    In Mercurial prior to 4.2, error messages for these parts were stored as
> +    part parameters. Since parameter sizes can be 255 at most, this meant that
> +    error messages longer than 255 bytes would crash the Mercurial server.
> +
> +    To avoid this issue while still retaining backwards compatibility with older
> +    versions of Mercurial, this class of parts:
> +    * truncates long params to 255 bytes
> +    * adds such params to the payload instead.

I think we could get away with a simpler way to add the long message to 
the payload.

You current approach implies packing the parameters name and length 
using our own binary encoding into the payload. This imply tedious 
binary manipulation on both end.
Instead we could leverage more of the bundle2 format. In case of too 
long message or hint, we add a corresponding 'message_long' or 
'hint_long' advisory parameter to the part. This parameter contains a 
'start:end' range of bytes in the paylog containing the message. That 
way most of the binary encoding is handled by the native bundle2 
machinery and less work is needed on the receiving side.

What do you think ?

> +
> +    Any newer error parts should instead:
> +    - use newpart
> +    - take care not to set up any params with potentially unbounded length
> +    - send the unbounded param(s) (e.g. error message) as part of the
> +      payload
> +    """
> +    def __init__(self, *args, **kwargs):
> +        super(bundleerrorpart, self).__init__(*args, **kwargs)
> +        self._errordata = []
> +        self.addparam('longparams', 'payload', mandatory=False)
> +
> +    def addlongparam(self, name, value='', mandatory=True):
> +        vparam = value
> +        if len(value) > 255:
> +            # truncate value so that it doesn't crash
> +            vparam = value[:252] + '...'
> +        self.addparam(name, vparam, mandatory=mandatory)
> +        # encode the full error message as
> +        # (len(name), name, len(value), value)
> +        # mandatory is enforced by the param
> +        data = [struct.pack(_ferrorlongparamsize, len(name)), name,
> +                struct.pack(_ferrorlongparamsize, len(value)), value]
> +        self._errordata.append(''.join(data))
> +
> +    def getchunks(self, ui):
> +        # data = number of long params + all the params
> +        numparams = len(self._errordata)
> +        errordata = ([struct.pack(_ferrorlongparamsize, numparams)] +
> +                     self._errordata)
> +        self._data = ''.join(errordata)

Small nits: _data can be an iterator. Using iter(errordata) would skip 
the join.

> +        return super(bundleerrorpart, self).getchunks(ui)
>
>  flaginterrupt = -1
Siddharth Agarwal - April 7, 2017, 7:52 p.m.
On 4/6/17 05:50, Pierre-Yves David wrote:
> While I understand the need for unbounded generic error:Abort, I not 
> sure we should extend this to pushraced and unsupported content one. 
> They have defined enough use case where limiting the message and hint 
> to 255 char seems reasonable (in the exception themselves).
> Limiting the scope of this more complex creation and handling to 
> error:Abort.
>
> What do you think ?


My view is that if we're doing this for abort anyway, we might as well 
do this for the other part types.

* unsupportedcontent is arbitrary so it can easily go beyond 255 when 
you concatenate the params together. A misbehaving client can easily 
crash the server that way.
* pushraced is less likely but there's still a non-zero chance that a 
confluence of events can cause this. I think the incremental amount of 
work required to support it is minimal, so we might as well do it in my 
opinion.


> I think we could get away with a simpler way to add the long message 
> to the payload.
>
> You current approach implies packing the parameters name and length 
> using our own binary encoding into the payload. This imply tedious 
> binary manipulation on both end.
> Instead we could leverage more of the bundle2 format. In case of too 
> long message or hint, we add a corresponding 'message_long' or 
> 'hint_long' advisory parameter to the part. This parameter contains a 
> 'start:end' range of bytes in the paylog containing the message. That 
> way most of the binary encoding is handled by the native bundle2 
> machinery and less work is needed on the receiving side.
>
> What do you think ?


It'll probably lead to simpler code overall. I like it.

I would want to store the full message in the payload though, as we're 
truncating the message in params for older clients in a friendly-ish way.
Pierre-Yves David - April 8, 2017, 8:40 p.m.
On 04/07/2017 09:52 PM, Siddharth Agarwal wrote:
> On 4/6/17 05:50, Pierre-Yves David wrote:
>> While I understand the need for unbounded generic error:Abort, I not
>> sure we should extend this to pushraced and unsupported content one.
>> They have defined enough use case where limiting the message and hint
>> to 255 char seems reasonable (in the exception themselves).
>> Limiting the scope of this more complex creation and handling to
>> error:Abort.
>>
>> What do you think ?
>
>
> My view is that if we're doing this for abort anyway, we might as well
> do this for the other part types.
>
> * unsupportedcontent is arbitrary so it can easily go beyond 255 when
> you concatenate the params together. A misbehaving client can easily
> crash the server that way.

We could decide to truncate the parameters we keep it under 255. That 
value is mostly informative.

> * pushraced is less likely but there's still a non-zero chance that a
> confluence of events can cause this.

I'm not sure how. They are a single raise of PushRaced and it is fixed size

> I think the incremental amount of work required to support it is minimal, so we might as well do it in my
> opinion.

My main reservation here is that it increase the complexity of the 
protocol. So all new implementation (server and client) side of it will 
have to deal with the extra complexity for errors that do not really 
needs it (eg: PushRaced). I usually lead toward keeping things simpler 
as long as we do not need the extra complexity.

On the other hand, I see your point and I understand some of it value, 
but it has not won me over yet, I did not want to delay my reply 
further. (as result you are provided with my raw thinking.)

I wonder what others (especially Greg) thinks.

>> I think we could get away with a simpler way to add the long message
>> to the payload.
>>
>> You current approach implies packing the parameters name and length
>> using our own binary encoding into the payload. This imply tedious
>> binary manipulation on both end.
>> Instead we could leverage more of the bundle2 format. In case of too
>> long message or hint, we add a corresponding 'message_long' or
>> 'hint_long' advisory parameter to the part. This parameter contains a
>> 'start:end' range of bytes in the paylog containing the message. That
>> way most of the binary encoding is handled by the native bundle2
>> machinery and less work is needed on the receiving side.
>>
>> What do you think ?
>
> It'll probably lead to simpler code overall. I like it.
>
> I would want to store the full message in the payload though, as we're
> truncating the message in params for older clients in a friendly-ish way.

If I understand correctly, this match my proposal. Let me clarify to 
make sure we are on the same page.

You get two parameters:
   message="something a bit too lo..."
   message_long=<0><24>
and Payload:
   something a bit too long.

Patch

diff --git a/mercurial/bundle2.py b/mercurial/bundle2.py
--- a/mercurial/bundle2.py
+++ b/mercurial/bundle2.py
@@ -177,6 +177,7 @@  urlreq = util.urlreq
 _fpartid = '>I'
 _fpayloadsize = '>i'
 _fpartparamcount = '>BB'
+_ferrorlongparamsize = '>I'
 
 preferedchunksize = 4096
 
@@ -557,6 +558,20 @@  class bundle20(object):
         self.addpart(part)
         return part
 
+    def newerrorpart(self, typeid, *args, **kwargs):
+        """create a new error part and add it to the containers
+
+        This is only required for the following error parts that are recognized
+        by Mercurial 4.1 or lower.
+        - error:abort
+        - error:pushraced
+        - error:unsupportedcontent
+
+        See bundleerrorpart for more."""
+        part = bundleerrorpart(typeid, *args, **kwargs)
+        self.addpart(part)
+        return part
+
     # methods used to generate the bundle2 stream
     def getchunks(self):
         if self.ui.debugflag:
@@ -1026,6 +1041,54 @@  class bundlepart(object):
         elif len(self.data):
             yield self.data
 
+class bundleerrorpart(bundlepart):
+    """
+    This is only required for the following error parts that are recognized
+    by Mercurial versions prior to 4.2.
+    - error:abort
+    - error:pushraced
+    - error:unsupportedcontent
+
+    In Mercurial prior to 4.2, error messages for these parts were stored as
+    part parameters. Since parameter sizes can be 255 at most, this meant that
+    error messages longer than 255 bytes would crash the Mercurial server.
+
+    To avoid this issue while still retaining backwards compatibility with older
+    versions of Mercurial, this class of parts:
+    * truncates long params to 255 bytes
+    * adds such params to the payload instead.
+
+    Any newer error parts should instead:
+    - use newpart
+    - take care not to set up any params with potentially unbounded length
+    - send the unbounded param(s) (e.g. error message) as part of the
+      payload
+    """
+    def __init__(self, *args, **kwargs):
+        super(bundleerrorpart, self).__init__(*args, **kwargs)
+        self._errordata = []
+        self.addparam('longparams', 'payload', mandatory=False)
+
+    def addlongparam(self, name, value='', mandatory=True):
+        vparam = value
+        if len(value) > 255:
+            # truncate value so that it doesn't crash
+            vparam = value[:252] + '...'
+        self.addparam(name, vparam, mandatory=mandatory)
+        # encode the full error message as
+        # (len(name), name, len(value), value)
+        # mandatory is enforced by the param
+        data = [struct.pack(_ferrorlongparamsize, len(name)), name,
+                struct.pack(_ferrorlongparamsize, len(value)), value]
+        self._errordata.append(''.join(data))
+
+    def getchunks(self, ui):
+        # data = number of long params + all the params
+        numparams = len(self._errordata)
+        errordata = ([struct.pack(_ferrorlongparamsize, numparams)] +
+                     self._errordata)
+        self._data = ''.join(errordata)
+        return super(bundleerrorpart, self).getchunks(ui)
 
 flaginterrupt = -1