Patchwork [06,of,21,V2] speedy: custom wire protocol

login
register
mail settings
Submitter Tomasz Kleczek
Date Dec. 14, 2012, 2:52 a.m.
Message ID <f18b4628ae086a3b4474.1355453538@dev408.prn1.facebook.com>
Download mbox | patch
Permalink /patch/88/
State Deferred, archived
Headers show

Comments

Tomasz Kleczek - Dec. 14, 2012, 2:52 a.m.
# HG changeset patch
# User Tomasz Kleczek <tkleczek at fb.com>
# Date 1355383371 28800
# Node ID f18b4628ae086a3b4474d0c2a4fe5b3bb9151edc
# Parent  859126a36e9173e5f6370ddb7b8bc14fc82ef682
speedy: custom wire protocol

History queries take many kinds of python objects as parameters. We have
to (de)serialize them when transporting though network layer.

Here are some requirements that a candidate protocol should satisfy:
 * it shouldn't use pickle/marshal modules internally as they are not
   secure against maliciously constructed data
 * it should handle binary data without a significant overhead as
   most of the data sent over the network will be lists of node ids
 * compatibility with Python 2.4 is a plus

Considering this a custom protocol for object serialization is introduced.

Currently supported value types:
  * int
  * string
  * list of supported elements
  * dict with supported key/values
  * tuple of supported elements (serialized as list)

but can be easily extended to support more types.

The protocol is stream-oriented which means that a (de)serialization of a
single object may result in many read/writes to the underlying transport
layer.

To achieve good performance the transport layer should provide some
buffering mechanism.

To change the query into the request we perform a simple transformation:
  query_name(*args) -> [query_name, args]
and serialize the resulting list with the wire protocol. At the server
end the reverse operation in performed.

This change only introduce a wireprotocol class, it will be integrated into
exisiting code in the subsequent patches.

Patch

diff --git a/hgext/speedy/protocol.py b/hgext/speedy/protocol.py
new file mode 100644
--- /dev/null
+++ b/hgext/speedy/protocol.py
@@ -0,0 +1,99 @@ 
+# Copyright 2012 Facebook
+#
+# This software may be used and distributed according to the terms of the
+# GNU General Public License version 2 or any later version.
+
+"""Custom wire protocol."""
+
+import struct
+
+class wireprotocol(object):
+    """Defines a mechanism to map in-memory data structures to a wire-format.
+
+    Raw data is read from/write to underlying transport using callbacks
+    provided on initialization of wireprotocol instance.
+
+    Currently supported value types:
+      * int
+      * string
+      * list of supported elements
+      * dict with supported key/values
+      * tuple of supported elements (serialized as list)
+
+    The protocol is stream-oriented which means that a (de)serialization of a
+    single object may result in many read/writes to the underlying transport.
+
+    To achieve good performance the transport layer should provide some
+    buffering mechanism.
+
+    No versioning or message framing is provided.
+    """
+    def __init__(self, read, write):
+        self._read = read
+        self._write = write
+
+    def _writeint(self, v):
+        self._write(struct.pack('>L', v))
+
+    def _readint(self):
+        return int(struct.unpack('>L', self._read(4))[0])
+
+    def serialize(self, val):
+        """Serialize a given value.
+
+        Writes data in a series of calls to the `self._write` callback.
+
+        Raises `TypeError` if the type of `val` is not supported.
+        NOTE: Some data might have been already written to transport
+              instance when the exception is raised.
+        """
+        if isinstance(val, int):
+            self._write('i')
+            self._writeint(val)
+        elif isinstance(val, str):
+            self._write('s')
+            self._writeint(len(val))
+            self._write(val)
+        elif isinstance(val, (tuple, list)):
+            # tuples are serialized as lists
+            self._write('l')
+            self._writeint(len(val))
+            serialize = self.serialize
+            for e in val:
+                serialize(e)
+        elif isinstance(val, dict):
+            self._write('d')
+            self._writeint(len(val))
+            serialize = self.serialize
+            for k, e in val.iteritems():
+                serialize(k)
+                serialize(e)
+        else:
+            raise TypeError("wireprotocol serialization: unsupported"
+                            " value type: %s" % val.__class__.__name__)
+
+    def deserialize(self):
+        """Deserialize a single value.
+
+        Reads data in a series of calls to the `self._read` callback.
+
+        Raises `TypeError` if an unknown type description is encountered.
+        """
+        type = self._read(1)
+        if type == 'i':
+            return self._readint()
+        elif type == 's':
+            size = self._readint()
+            return self._read(size)
+        elif type == 'l':
+            size = self._readint()
+            deserialize = self.deserialize
+            return [ deserialize() for x in xrange(0, size) ]
+        elif type == 'd':
+            size = self._readint()
+            deserialize = self.deserialize
+            return dict([ (deserialize(), deserialize()) for x in xrange(0,
+                size)])
+        else:
+            raise TypeError("wireprotocol deserialization: unknown"
+                            " value type descriptor: %r")