Patchwork [06,of,15] speedy: custom wire protocol

login
register
mail settings
Submitter Tomasz Kleczek
Date Dec. 11, 2012, 6:38 p.m.
Message ID <62febe888e9b12cd5bb7.1355251101@dev408.prn1.facebook.com>
Download mbox | patch
Permalink /patch/54/
State Superseded
Headers show

Comments

Tomasz Kleczek - Dec. 11, 2012, 6:38 p.m.
# HG changeset patch
# User Tomasz Kleczek <tkleczek at fb.com>
# Date 1355204307 28800
# Branch stable
# Node ID 62febe888e9b12cd5bb711bea2d70b843b8b4251
# Parent  6013a399d9b4cad8578b587084ed0eb6a8080f5b
speedy: custom wire protocol

History queries take many kinds of python objects as parameters. We have
to (de)serialize them when transporting though network layer.

Here are some requirements that a candidate protocol should satisfy:
 * it shouldn't use pickle/marshal modules internally as they are not
   secure against maliciously constructed data
 * it should handle binary data without a significant overhead as
   most of the data sent over the network will be lists of node ids
 * compatibility with Python 2.4 is a plus

Considering this a custom protocol for object serialization is introduced.

Currently supported value types:
  * int
  * string
  * list of supported elements
  * dict with supported key/values
  * tuple of supported elements (serialized as list)

but can be easily extended to support more types.

The protocol is stream-oriented which means that a (de)serialization of a
single object may result in many read/writes to the underlying transport
layer.

To achieve good performance the transport layer should provide some
buffering mechanism.

To change the query into the request we perform a simple transformation:
  query_name(*args) -> [query_name, args]
and serialize the resulting list with the wire protocol. At the server
end the reverse operation in performed.

This change only introduce a wireprotocol class, it will be integrated into
exisiting code in the subsequent patches.
Augie Fackler - Dec. 13, 2012, 3:30 a.m.
On Dec 11, 2012, at 12:38 PM, Tomasz Kleczek <tkleczek at fb.com> wrote:

> +        else:
> +            raise TypeError("wireprotocol deserialization: unknown "
> +                    "value type descriptor: %r")

You've got a few weird indentations in this patch like this.

I'm also a /little/ weirded out by the lack of a version (or capability) handshake in the protocol, but perhaps it fits within YAGNI.
Tomasz Kleczek - Dec. 13, 2012, 7:23 a.m.
I thought that adding such mechanisms at this point would be an overkill..


On Wed, Dec 12, 2012 at 7:30 PM, Augie Fackler <raf at durin42.com> wrote:

>
> On Dec 11, 2012, at 12:38 PM, Tomasz Kleczek <tkleczek at fb.com> wrote:
>
> > +        else:
> > +            raise TypeError("wireprotocol deserialization: unknown "
> > +                    "value type descriptor: %r")
>
> You've got a few weird indentations in this patch like this.
>
> I'm also a /little/ weirded out by the lack of a version (or capability)
> handshake in the protocol, but perhaps it fits within YAGNI.
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20121212/1dde807b/attachment.html>
Bryan O'Sullivan - Dec. 13, 2012, 9:34 p.m.
On Wed, Dec 12, 2012 at 11:23 PM, Tomasz K?eczek
<tomasz.kleczek at gmail.com>wrote:

> I'm also a /little/ weirded out by the lack of a version (or capability)
> handshake in the protocol, but perhaps it fits within YAGNI.
>
>>

> I thought that adding such mechanisms at this point would be an overkill..


It might be right now, but almost every time we have not had such a
capability, we've regretted it and had to retrofit it after a few months. I
don't think you should do it immediately, but we'll need to add this before
this sees any real use.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20121213/c7ecaf49/attachment.html>
Augie Fackler - Dec. 14, 2012, 3:31 p.m.
On Dec 13, 2012, at 3:34 PM, Bryan O'Sullivan <bos at serpentine.com> wrote:

> On Wed, Dec 12, 2012 at 11:23 PM, Tomasz K?eczek
> <tomasz.kleczek at gmail.com>wrote:
> 
>> I'm also a /little/ weirded out by the lack of a version (or capability)
>> handshake in the protocol, but perhaps it fits within YAGNI.
>> 
>>> 
> 
>> I thought that adding such mechanisms at this point would be an overkill..
> 
> 
> It might be right now, but almost every time we have not had such a
> capability, we've regretted it and had to retrofit it after a few months. I
> don't think you should do it immediately, but we'll need to add this before
> this sees any real use.

Fair enough, but in that case we should file a bug that the protocol is incomplete so we fix it before shipping a release with the weaker protocol.
Kevin Bullock - Dec. 14, 2012, 4:37 p.m.
(Tomasz, we prefer bottom-posting on this list to keep the conversation readable in-order. Response below, with quoting damage fixed up.)

On Dec 14, 2012, at 9:31 AM, Augie Fackler wrote:

> On Dec 13, 2012, at 3:34 PM, Bryan O'Sullivan <bos at serpentine.com> wrote:
> 
>> On Dec 13, 2012, at 1:23 AM, Tomasz K?eczek wrote:
>> 
>>> On Wed, Dec 12, 2012 at 7:30 PM, Augie Fackler <raf at durin42.com> wrote:
>>> 
>>> I'm also a /little/ weirded out by the lack of a version (or capability) handshake in the protocol, but perhaps it fits within YAGNI.
>> 
>> I thought that adding such mechanisms at this point would be an overkill..
>> 
>> It might be right now, but almost every time we have not had such a
>> capability, we've regretted it and had to retrofit it after a few months. I
>> don't think you should do it immediately, but we'll need to add this before
>> this sees any real use.
> 
> Fair enough, but in that case we should file a bug that the protocol is incomplete so we fix it before shipping a release with the weaker protocol.

Indeed. Now is the time to add a capability handshake, if we think there's any remotely possible way that one will be needed in the future (and as Bryan points out, it invariably turns out to be).

Did you pursue the idea of re-using the command server protocol any further?

My other overarching question is, who is signing up to maintain this new in-tree extension for the long haul?

pacem in terris / ??? / ?????? / ????????? / ??
Kevin R. Bullock
Bryan O'Sullivan - Dec. 14, 2012, 4:53 p.m.
On Fri, Dec 14, 2012 at 8:37 AM, Kevin Bullock <
kbullock+mercurial at ringworld.org> wrote:

> My other overarching question is, who is signing up to maintain this new
> in-tree extension for the long haul?
>

Hopefully Tomasz will feel somewhat inclined to help now and then, but
there are several other people (Durham, Siddharth, David, me) who will
between us most definitely have to maintain and improve this extension.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20121214/98b5b7fa/attachment.html>
Kevin Bullock - Dec. 14, 2012, 5 p.m.
On Dec 14, 2012, at 10:53 AM, Bryan O'Sullivan wrote:

> On Fri, Dec 14, 2012 at 8:37 AM, Kevin Bullock <kbullock+mercurial at ringworld.org> wrote:
> My other overarching question is, who is signing up to maintain this new in-tree extension for the long haul?
> 
> Hopefully Tomasz will feel somewhat inclined to help now and then, but there are several other people (Durham, Siddharth, David, me) who will between us most definitely have to maintain and improve this extension.


I figured as much, but now I have you on the record ;-) Thanks.

pacem in terris / ??? / ?????? / ????????? / ??
Kevin R. Bullock

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20121214/75f8faaa/attachment.html>
Tomasz Kleczek - Dec. 14, 2012, 6:16 p.m.
On Fri, Dec 14, 2012 at 9:00 AM, Kevin Bullock <
kbullock+mercurial at ringworld.org> wrote:

> On Dec 14, 2012, at 10:53 AM, Bryan O'Sullivan wrote:
>
> On Fri, Dec 14, 2012 at 8:37 AM, Kevin Bullock <
> kbullock+mercurial at ringworld.org> wrote:
>
>> My other overarching question is, who is signing up to maintain this new
>> in-tree extension for the long haul?
>>
>
> Hopefully Tomasz will feel somewhat inclined to help now and then, but
> there are several other people (Durham, Siddharth, David, me) who will
> between us most definitely have to maintain and improve this extension.
>
>
> I figured as much, but now I have you on the record ;-) Thanks.
>
>  pacem in terris / ??? / ?????? / ????????? / ??
> Kevin R. Bullock
>
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
>
>
I am definitely going to keep working on the extension until it makes it
into the tree. Bryan and the team will have to take the
core maintenance from there (although I will help of course, if needed).

/I'm sorry for top posting/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20121214/82ff3b37/attachment.html>

Patch

diff --git a/hgext/speedy/protocol.py b/hgext/speedy/protocol.py
new file mode 100644
--- /dev/null
+++ b/hgext/speedy/protocol.py
@@ -0,0 +1,99 @@ 
+# Copyright 2012 Facebook
+#
+# This software may be used and distributed according to the terms of the
+# GNU General Public License version 2 or any later version.
+
+"""Custom wire protocol."""
+
+import struct
+
+class wireprotocol(object):
+    """Defines a mechanism to map in-memory data structures to a wire-format.
+
+    Raw data is read from/write to underlying transport using callbacks
+    provided on initialization of wireprotocol instance.
+
+    Currently supported value types:
+      * int
+      * string
+      * list of supported elements
+      * dict with supported key/values
+      * tuple of supported elements (serialized as list)
+
+    The protocol is stream-oriented which means that a (de)serialization of a
+    single object may result in many read/writes to the underlying transport.
+
+    To achieve good performance the transport layer should provide some
+    buffering mechanism.
+
+    No versioning or message framing is provided.
+    """
+    def __init__(self, read, write):
+        self._read = read
+        self._write = write
+
+    def _writeint(self, v):
+        self._write(struct.pack('>L', v))
+
+    def _readint(self):
+        return int(struct.unpack('>L', self._read(4))[0])
+
+    def serialize(self, val):
+        """Serialize a given value.
+
+        Writes data in a series of calls to the `self._write` callback.
+
+        Raises `TypeError` if the type of `val` is not supported.
+        NOTE: Some data might have been already written to transport
+              instance when the exception is raised.
+        """
+        if isinstance(val, int):
+            self._write('i')
+            self._writeint(val)
+        elif isinstance(val, str):
+            self._write('s')
+            self._writeint(len(val))
+            self._write(val)
+        elif isinstance(val, (tuple, list)):
+            # tuples are serialized as lists
+            self._write('l')
+            self._writeint(len(val))
+            serialize = self.serialize
+            for e in val:
+                serialize(e)
+        elif isinstance(val, dict):
+            self._write('d')
+            self._writeint(len(val))
+            serialize = self.serialize
+            for k, e in val.iteritems():
+                serialize(k)
+                serialize(e)
+        else:
+            raise TypeError("wireprotocol serialization: unsupported "
+                    "value type: %s" % val.__class__.__name__)
+
+    def deserialize(self):
+        """Deserialize a single value.
+
+        Reads data in a series of calls to the `self._read` callback.
+
+        Raises `TypeError` if an unknown type description is encountered.
+        """
+        type = self._read(1)
+        if type == 'i':
+            return self._readint()
+        elif type == 's':
+            size = self._readint()
+            return self._read(size)
+        elif type == 'l':
+            size = self._readint()
+            deserialize = self.deserialize
+            return [ deserialize() for x in xrange(0, size) ]
+        elif type == 'd':
+            size = self._readint()
+            deserialize = self.deserialize
+            return dict([ (deserialize(), deserialize()) for x in xrange(0,
+                size)])
+        else:
+            raise TypeError("wireprotocol deserialization: unknown "
+                    "value type descriptor: %r")