Patchwork D2948: wireproto: syntax for encoding CBOR into frames

login
register
mail settings
Submitter phabricator
Date March 26, 2018, 6:13 p.m.
Message ID <differential-rev-PHID-DREV-3deuzvyysddmdlpq3jt4-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/29883/
State Superseded
Headers show

Comments

phabricator - March 26, 2018, 6:13 p.m.
indygreg created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  We just vendored a library for encoding and decoding the CBOR
  data format. While the intent of that vendor was to support state
  files, CBOR is really a nice data format. It is extensible and
  compact.
  
  I've been feeling dirty inventing my own data formats for
  frame payloads. While custom formats can always beat out a generic
  format, there is a cost to be paid in terms of implementation,
  comprehension, etc. CBOR is compact enough that I'm not too
  worried about efficiency loss. I think the benefits of using
  a standardized format outweigh rolling our own formats. So
  I plan to make heavy use of CBOR in the wire protocol going
  forward.
  
  This commit introduces support for encoding CBOR data in frame
  payloads to our function to make a frame from a human string.
  We do need to employ some low-level Python code in order to
  evaluate a string as a Python expression. But other than that,
  this should hopefully be pretty straightforward.
  
  Unit tests for this function have been added.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2948

AFFECTED FILES
  mercurial/debugcommands.py
  mercurial/wireprotoframing.py
  tests/test-wireproto-serverreactor.py

CHANGE DETAILS




To: indygreg, #hg-reviewers
Cc: mercurial-devel
phabricator - April 3, 2018, 3:35 p.m.
durin42 added a comment.


  I have misgivings about this code that serves testing living in core, but I don't see a better solution offhand.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2948

To: indygreg, #hg-reviewers
Cc: durin42, mercurial-devel
phabricator - April 4, 2018, 1:21 p.m.
yuja added inline comments.

INLINE COMMENTS

> stringutil.py:526
> +                   __future__.unicode_literals.compiler_flag, True)
> +    return eval(code, globs, {})

Can't we use `ast.literal_eval()` instead of unsafe `eval()` ?

https://docs.python.org/2.7/library/ast.html#ast.literal_eval

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2948

To: indygreg, #hg-reviewers, durin42
Cc: yuja, durin42, mercurial-devel

Patch

diff --git a/tests/test-wireproto-serverreactor.py b/tests/test-wireproto-serverreactor.py
--- a/tests/test-wireproto-serverreactor.py
+++ b/tests/test-wireproto-serverreactor.py
@@ -35,6 +35,59 @@ 
                       framing.createcommandframes(stream, rid, cmd, args,
                                                   datafh))
 
+class FrameHumanStringTests(unittest.TestCase):
+    def testbasic(self):
+        self.assertEqual(ffs(b'1 1 0 1 0 '),
+                         b'\x00\x00\x00\x01\x00\x01\x00\x10')
+
+        self.assertEqual(ffs(b'2 4 0 1 0 '),
+                         b'\x00\x00\x00\x02\x00\x04\x00\x10')
+
+        self.assertEqual(ffs(b'2 4 0 1 0 foo'),
+                         b'\x03\x00\x00\x02\x00\x04\x00\x10foo')
+
+    def testcborint(self):
+        self.assertEqual(ffs(b'1 1 0 1 0 cbor:15'),
+                         b'\x01\x00\x00\x01\x00\x01\x00\x10\x0f')
+
+        self.assertEqual(ffs(b'1 1 0 1 0 cbor:42'),
+                         b'\x02\x00\x00\x01\x00\x01\x00\x10\x18*')
+
+        self.assertEqual(ffs(b'1 1 0 1 0 cbor:1048576'),
+                         b'\x05\x00\x00\x01\x00\x01\x00\x10\x1a'
+                         b'\x00\x10\x00\x00')
+
+        self.assertEqual(ffs(b'1 1 0 1 0 cbor:0'),
+                         b'\x01\x00\x00\x01\x00\x01\x00\x10\x00')
+
+        self.assertEqual(ffs(b'1 1 0 1 0 cbor:-1'),
+                         b'\x01\x00\x00\x01\x00\x01\x00\x10 ')
+
+        self.assertEqual(ffs(b'1 1 0 1 0 cbor:-342542'),
+                         b'\x05\x00\x00\x01\x00\x01\x00\x10:\x00\x05:\r')
+
+    def testcborstrings(self):
+        # String literals should be unicode.
+        self.assertEqual(ffs(b"1 1 0 1 0 cbor:'foo'"),
+                         b'\x04\x00\x00\x01\x00\x01\x00\x10cfoo')
+
+        self.assertEqual(ffs(b"1 1 0 1 0 cbor:b'foo'"),
+                         b'\x04\x00\x00\x01\x00\x01\x00\x10Cfoo')
+
+        self.assertEqual(ffs(b"1 1 0 1 0 cbor:u'foo'"),
+                         b'\x04\x00\x00\x01\x00\x01\x00\x10cfoo')
+
+    def testcborlists(self):
+        self.assertEqual(ffs(b"1 1 0 1 0 cbor:[None, True, False, 42, b'foo']"),
+                         b'\n\x00\x00\x01\x00\x01\x00\x10\x85\xf6\xf5\xf4'
+                         b'\x18*Cfoo')
+
+    def testcbordicts(self):
+        self.assertEqual(ffs(b"1 1 0 1 0 "
+                             b"cbor:{b'foo': b'val1', b'bar': b'val2'}"),
+                         b'\x13\x00\x00\x01\x00\x01\x00\x10\xa2'
+                         b'CbarDval2CfooDval1')
+
 class FrameTests(unittest.TestCase):
     def testdataexactframesize(self):
         data = util.bytesio(b'x' * framing.DEFAULT_MAX_FRAME_SIZE)
diff --git a/mercurial/wireprotoframing.py b/mercurial/wireprotoframing.py
--- a/mercurial/wireprotoframing.py
+++ b/mercurial/wireprotoframing.py
@@ -11,11 +11,13 @@ 
 
 from __future__ import absolute_import
 
+import __future__
 import struct
 
 from .i18n import _
 from .thirdparty import (
     attr,
+    cbor,
 )
 from . import (
     error,
@@ -156,6 +158,9 @@ 
 def makeframefromhumanstring(s):
     """Create a frame from a human readable string
 
+    DANGER: NOT SAFE TO USE WITH UNTRUSTED INPUT BECAUSE OF POTENTIAL
+    eval() USAGE. DO NOT USE IN CORE.
+
     Strings have the form:
 
         <request-id> <stream-id> <stream-flags> <type> <flags> <payload>
@@ -169,6 +174,11 @@ 
     named constant.
 
     Flags can be delimited by `|` to bitwise OR them together.
+
+    If the payload begins with ``cbor:``, the following string will be
+    evaluated as Python code and the resulting object will be fed into
+    a CBOR encoder. Otherwise, the payload is interpreted as a Python
+    byte string literal.
     """
     fields = s.split(b' ', 5)
     requestid, streamid, streamflags, frametype, frameflags, payload = fields
@@ -196,7 +206,31 @@ 
         else:
             finalflags |= int(flag)
 
-    payload = stringutil.unescapestr(payload)
+    if payload.startswith(b'cbor:'):
+        globs = {
+            r'__builtins__': {
+                r'None': None,
+                r'False': False,
+                r'True': True,
+                r'int': int,
+                r'set': set,
+                r'tuple': tuple,
+                # Don't need to expose dict and list because we can use
+                # literals.
+            },
+        }
+
+        # We can't use eval() directly because it inherits compiler
+        # flags from this module and we need unicode literals for Python 3
+        # compatibility.
+        code = compile(payload[5:], r'<string>', r'eval',
+                       __future__.unicode_literals.compiler_flag, True)
+        o = eval(code, globs, {})
+
+        payload = cbor.dumps(o, canonical=True)
+
+    else:
+        payload = stringutil.unescapestr(payload)
 
     return makeframe(requestid=requestid, streamid=streamid,
                      streamflags=finalstreamflags, typeid=frametype,
diff --git a/mercurial/debugcommands.py b/mercurial/debugcommands.py
--- a/mercurial/debugcommands.py
+++ b/mercurial/debugcommands.py
@@ -2785,7 +2785,10 @@ 
     or a flag name for stream flags or frame flags, respectively. Values are
     resolved to integers and then bitwise OR'd together.
 
-    ``payload`` is is evaluated as a Python byte string literal.
+    ``payload`` represents the raw frame payload. If it begins with
+    ``cbor:``, the following string is evaluated as Python code and the
+    resulting object is fed into a CBOR encoder. Otherwise it is interpreted
+    as a Python byte string literal.
     """
     opts = pycompat.byteskwargs(opts)