Patchwork D3175: commands: implement `export --format=X` with support for CBOR

login
register
mail settings
Submitter phabricator
Date April 6, 2018, 10:11 p.m.
Message ID <differential-rev-PHID-DREV-vsy2mj5psrafxhznwjoc-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/30472/
State Superseded
Headers show

Comments

phabricator - April 6, 2018, 10:11 p.m.
indygreg created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  What's better than having to parse patch files? Not having to parse
  them.
  
  The current text-based patch file format used by `hg export` is good
  for humans to exchange. But for machines, it is better to use a
  data format that is more structured.
  
  We recently introduced support for CBOR. CBOR is a great, binary
  preserving data format (unlike JSON), and I think we should use it
  heavily for data interchange.
  
  This commit teaches `hg export` to write data to CBOR. It adds a
  --format argument. Hopefully this is the most controversial part of
  this patch. I thought about using --template/-T. We already have
  -Tjson. -Tcbor seems like a reasonable feature addition. That might
  be the right way forward. I'm not sure. I didn't want to scope bloat
  the patch. At least not initially. (The code for exporting a patch
  is a bit wonky and I'm not very comfortable with the templating
  layer.)

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3175

AFFECTED FILES
  mercurial/cmdutil.py
  mercurial/commands.py
  mercurial/utils/stringutil.py
  tests/test-completion.t
  tests/test-export.t

CHANGE DETAILS




To: indygreg, #hg-reviewers
Cc: mercurial-devel
phabricator - April 6, 2018, 10:18 p.m.
indygreg added a comment.


  This is more of an RFC patch. My actual goal is support for ingesting CBOR patches via `hg import`. I figured it would be easier to test that if we had support for CBOR with `hg export`. And the reason I want CBOR support for patch ingestion is because it is safer: it reduces the surface area for injecting badness via patch parsing. See e.g. http://rachelbythebay.com/w/2018/04/05/bangpatch/. This implementation is still only partially there: I think a better approach would be to have structured data for the diffs so we don't need to parse those either. That would allow us to use binary data without escaping, not have to use inline text metadata for copy/renames, not have to worry about encoding of filenames, etc. All the problems with "parse a patch" go away and you are left with commit metadata and a series of splicing instructions, which should be pretty generic.
  
  The I/O in the export code is pretty wonky. I'm kinda sad that `cbor2` insists on binding an encoder/decoder to a file object. I *really* wish you could get straight bytes out of it without having to use `io.BytesIO()`.
  
  I suspect someone is going to tell me that `hg export` should use the templating layer. If someone does, I would appreciate help implementing that. Of course, for us to get CBOR with the templating layer, we'd have to teach the templating layer to emit CBOR. Since we can emit JSON, CBOR seems reasonable. I'm just not too familiar with that code though.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3175

To: indygreg, #hg-reviewers
Cc: mercurial-devel
phabricator - April 9, 2018, 10:32 p.m.
durin42 added a comment.


  I'm broadly in favor. Agree it might make sense to just add generic templating to `hg export`.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3175

To: indygreg, #hg-reviewers
Cc: durin42, mercurial-devel
phabricator - April 12, 2018, 2:52 p.m.
yuja requested changes to this revision.
yuja added a comment.
This revision now requires changes to proceed.


  OK, `hg export -Tjson` appears mostly working. Perhaps we can add
  cborfromatter for `-Tcbor`.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3175

To: indygreg, #hg-reviewers, yuja
Cc: yuja, durin42, mercurial-devel

Patch

diff --git a/tests/test-export.t b/tests/test-export.t
--- a/tests/test-export.t
+++ b/tests/test-export.t
@@ -258,6 +258,51 @@ 
   abort: export requires at least one changeset
   [255]
 
+--format=hg works and is the default
+
+  $ hg export > expected
+  $ hg export --format hg > got
+  $ diff got expected
+
+--format=cbor emits a CBOR patch
+
+  $ hg export --format cbor > cbor
+  $ f --sha256 --hexdump cbor
+  cbor: sha256=825d5c5a23bb64d1aad17e39f1e151c933d519293f3d05eb591a7479453ea21a
+  0000: 9f bf 47 76 65 72 73 69 6f 6e 01 44 75 73 65 72 |..Gversion.Duser|
+  0010: 44 74 65 73 74 44 74 69 6d 65 fb 00 00 00 00 00 |DtestDtime......|
+  0020: 00 00 00 48 74 69 6d 65 7a 6f 6e 65 00 46 62 72 |...Htimezone.Fbr|
+  0030: 61 6e 63 68 47 64 65 66 61 75 6c 74 44 6e 6f 64 |anchGdefaultDnod|
+  0040: 65 54 19 7e cd 81 a5 7f 76 0b 54 f3 4a 58 81 7a |eT.~....v.T.JX.z|
+  0050: d5 b0 49 91 fa 47 47 70 61 72 65 6e 74 73 81 54 |..I..GGparents.T|
+  0060: f3 ac ba fa c1 61 ec 68 f1 59 8a f3 8f 79 4f 28 |.....a.h.Y...yO(|
+  0070: 84 7c a5 d3 4b 64 65 73 63 72 69 70 74 69 6f 6e |.|..Kdescription|
+  0080: 58 5b 20 21 22 23 24 25 26 28 2c 2d 2e 2f 30 31 |X[ !"#$%&(,-./01|
+  0090: 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f 40 41 |23456789:;<=>?@A|
+  00a0: 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f 50 51 |BCDEFGHIJKLMNOPQ|
+  00b0: 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f 60 61 |RSTUVWXYZ[\]^_`a|
+  00c0: 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f 70 71 |bcdefghijklmnopq|
+  00d0: 72 73 74 75 76 77 78 79 7a 7b 7c 7d 7e 4b 70 61 |rstuvwxyz{|}~Kpa|
+  00e0: 74 63 68 63 68 75 6e 6b 73 9f 58 7b 64 69 66 66 |tchchunks.X{diff|
+  00f0: 20 2d 72 20 66 33 61 63 62 61 66 61 63 31 36 31 | -r f3acbafac161|
+  0100: 20 2d 72 20 31 39 37 65 63 64 38 31 61 35 37 66 | -r 197ecd81a57f|
+  0110: 20 66 6f 6f 0a 2d 2d 2d 20 61 2f 66 6f 6f 09 54 | foo.--- a/foo.T|
+  0120: 68 75 20 4a 61 6e 20 30 31 20 30 30 3a 30 30 3a |hu Jan 01 00:00:|
+  0130: 30 30 20 31 39 37 30 20 2b 30 30 30 30 0a 2b 2b |00 1970 +0000.++|
+  0140: 2b 20 62 2f 66 6f 6f 09 54 68 75 20 4a 61 6e 20 |+ b/foo.Thu Jan |
+  0150: 30 31 20 30 30 3a 30 30 3a 30 30 20 31 39 37 30 |01 00:00:00 1970|
+  0160: 20 2b 30 30 30 30 0a 58 2f 40 40 20 2d 31 30 2c | +0000.X/@@ -10,|
+  0170: 33 20 2b 31 30 2c 34 20 40 40 0a 20 66 6f 6f 2d |3 +10,4 @@. foo-|
+  0180: 39 0a 20 66 6f 6f 2d 31 30 0a 20 66 6f 6f 2d 31 |9. foo-10. foo-1|
+  0190: 31 0a 2b 6c 69 6e 65 0a ff ff ff                |1.+line....|
+
+  >>> from __future__ import print_function
+  >>> from mercurial.thirdparty import cbor
+  >>> from mercurial.utils import stringutil
+  >>> with open('cbor', 'rb') as fh:
+  ...     print(stringutil.pprint(cbor.load(fh)))
+  [{b'branch': b'default', b'description': b' !"#$%&(,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~', b'node': b'\x19~\xcd\x81\xa5\x7fv\x0bT\xf3JX\x81z\xd5\xb0I\x91\xfaG', b'parents': [b'\xf3\xac\xba\xfa\xc1a\xech\xf1Y\x8a\xf3\x8fyO(\x84|\xa5\xd3'], b'patchchunks': [b'diff -r f3acbafac161 -r 197ecd81a57f foo\n--- a/foo\tThu Jan 01 00:00:00 1970 +0000\n+++ b/foo\tThu Jan 01 00:00:00 1970 +0000\n', b'@@ -10,3 +10,4 @@\n foo-9\n foo-10\n foo-11\n+line\n'], b'time': 0.000000, b'timezone': 0, b'user': b'test', b'version': 1}]
+
 Check for color output
   $ cat <<EOF >> $HGRCPATH
   > [color]
diff --git a/tests/test-completion.t b/tests/test-completion.t
--- a/tests/test-completion.t
+++ b/tests/test-completion.t
@@ -231,7 +231,7 @@ 
   clone: noupdate, updaterev, rev, branch, pull, uncompressed, stream, ssh, remotecmd, insecure
   commit: addremove, close-branch, amend, secret, edit, interactive, include, exclude, message, logfile, date, user, subrepos
   diff: rev, change, text, git, binary, nodates, noprefix, show-function, reverse, ignore-all-space, ignore-space-change, ignore-blank-lines, ignore-space-at-eol, unified, stat, root, include, exclude, subrepos
-  export: output, switch-parent, rev, text, git, binary, nodates
+  export: output, switch-parent, rev, format, text, git, binary, nodates
   forget: include, exclude, dry-run
   init: ssh, remotecmd, insecure
   log: follow, follow-first, date, copies, keyword, rev, line-range, removed, only-merges, user, only-branch, branch, prune, patch, git, limit, no-merges, stat, graph, style, template, include, exclude
diff --git a/mercurial/utils/stringutil.py b/mercurial/utils/stringutil.py
--- a/mercurial/utils/stringutil.py
+++ b/mercurial/utils/stringutil.py
@@ -34,6 +34,10 @@ 
             '%s: %s' % (pprint(k), pprint(v)) for k, v in sorted(o.items())))
     elif isinstance(o, bool):
         return b'True' if o else b'False'
+    elif isinstance(o, int):
+        return '%d' % o
+    elif isinstance(o, float):
+        return b'%f' % o
     else:
         raise error.ProgrammingError('do not know how to format %r' % o)
 
diff --git a/mercurial/commands.py b/mercurial/commands.py
--- a/mercurial/commands.py
+++ b/mercurial/commands.py
@@ -1893,6 +1893,7 @@ 
      _('print output to file with formatted name'), _('FORMAT')),
     ('', 'switch-parent', None, _('diff against the second parent')),
     ('r', 'rev', [], _('revisions to export'), _('REV')),
+    ('', 'format', 'hg', _('format of written data (EXPERIMENTAL)')),
     ] + diffopts,
     _('[OPTION]... [-o OUTFILESPEC] [-r] [REV]...'), cmdtype=readonly)
 def export(ui, repo, *changesets, **opts):
@@ -1966,14 +1967,20 @@ 
     revs = scmutil.revrange(repo, changesets)
     if not revs:
         raise error.Abort(_("export requires at least one changeset"))
+
+    if opts['format'] not in ('hg', 'cbor'):
+        raise error.Abort(_('invalid value for --format: only "hg" and "cbor" '
+                            'are supported'))
+
     if len(revs) > 1:
         ui.note(_('exporting patches:\n'))
     else:
         ui.note(_('exporting patch:\n'))
     ui.pager('export')
     cmdutil.export(repo, revs, fntemplate=opts.get('output'),
-                 switch_parent=opts.get('switch_parent'),
-                 opts=patch.diffallopts(ui, opts))
+                   switch_parent=opts.get('switch_parent'),
+                   opts=patch.diffallopts(ui, opts),
+                   fmt=opts['format'])
 
 @command('files',
     [('r', 'rev', '', _('search the repository as it is in REV'), _('REV')),
diff --git a/mercurial/cmdutil.py b/mercurial/cmdutil.py
--- a/mercurial/cmdutil.py
+++ b/mercurial/cmdutil.py
@@ -19,7 +19,9 @@ 
     nullrev,
     short,
 )
-
+from .thirdparty import (
+    cbor,
+)
 from . import (
     bookmarks,
     changelog,
@@ -50,6 +52,7 @@ 
 )
 
 from .utils import (
+    cborutil,
     dateutil,
     stringutil,
 )
@@ -1559,8 +1562,47 @@ 
     for chunk, label in patch.diffui(repo, prev, node, match, opts=diffopts):
         yield chunk, label
 
+def _exportsinglecbor(repo, ctx, match, switch_parent, seqno, diffopts,
+                      encoder):
+    """Export a single patch to CBOR."""
+    node = scmutil.binnode(ctx)
+    parents = [p.node() for p in ctx.parents() if p]
+    branch = ctx.branch()
+    if switch_parent:
+        parents.reverse()
+
+    if parents:
+        prev = parents[0]
+    else:
+        prev = nullid
+
+    with cborutil.streammap(encoder) as writeitem:
+        writeitem(b'version', 1)
+        writeitem(b'user', encoding.fromlocal(ctx.user()))
+        writeitem(b'time', ctx.date()[0])
+        writeitem(b'timezone', ctx.date()[1])
+        writeitem(b'branch', branch)
+        writeitem(b'node', node)
+        writeitem(b'parents', parents)
+
+        extras = []
+        for headerid in extraexport:
+            header = extraexportmap[headerid](seqno, ctx)
+            if header:
+                extras.append(header)
+
+        if extras:
+            writeitem(b'extras', extras)
+
+        writeitem(b'description',
+                  encoding.fromlocal(ctx.description().rstrip()))
+
+        encoder.encode(b'patchchunks')
+        chunks = patch.diff(repo, prev, node, match, opts=diffopts)
+        cborutil.streamarrayitems(encoder, chunks)
+
 def export(repo, revs, fntemplate='hg-%h.patch', fp=None, switch_parent=False,
-           opts=None, match=None):
+           opts=None, match=None, fmt='hg'):
     '''export changesets as hg patches
 
     Args:
@@ -1572,6 +1614,8 @@ 
                      Default is false, which always shows diff against p1.
       opts: diff options to use for generating the patch.
       match: If specified, only export changes to files matching this matcher.
+      fmt: Format to export the patch to. ``hg`` is the default Mercurial
+           patch format. ``cbor`` encodes to CBOR.
 
     Returns:
       Nothing.
@@ -1613,17 +1657,35 @@ 
             else:
                 fp.write(s)
 
+    if fmt == 'cbor' and allfp:
+        allcbor = cbor.CBOREncoder(allfp)
+        cborutil.beginindefinitearray(allcbor)
+    else:
+        allcbor = None
+
     for seqno, rev in enumerate(revs, 1):
         ctx = repo[rev]
 
         if not allfp:
             assert fntemplate
-
             with makefileobj(ctx, fntemplate, mode='wb', modemap=filemode,
                              total=total, seqno=seqno, revwidth=revwidth) as fp:
-                doexport(ctx, fp, fp.name)
+                if fmt == 'cbor':
+                    encoder = cbor.CBOREncoder(fp)
+                    with cborutil.streamarray(encoder):
+                        _exportsinglecbor(repo, ctx, match, switch_parent,
+                                          seqno, opts, encoder)
+                else:
+                    doexport(ctx, fp, fp.name)
         else:
-            doexport(ctx, allfp, dest)
+            if fmt == 'cbor':
+                _exportsinglecbor(repo, ctx, match, switch_parent, seqno, opts,
+                                  allcbor)
+            else:
+                doexport(ctx, allfp, dest)
+
+    if allcbor:
+        cborutil.endindefinite(allcbor)
 
 def showmarker(fm, marker, index=None):
     """utility function to display obsolescence marker in a readable way