Patchwork [4,of,9,V6] bookmarks: introduce binary encoding

login
register
mail settings
Submitter Stanislau Hlebik
Date Oct. 11, 2016, 4:25 p.m.
Message ID <718ed86a3698631077a0.1476203146@dev1918.lla1.facebook.com>
Download mbox | patch
Permalink /patch/17035/
State Changes Requested
Delegated to: Pierre-Yves David
Headers show

Comments

Stanislau Hlebik - Oct. 11, 2016, 4:25 p.m.
# HG changeset patch
# User Stanislau Hlebik <stash@fb.com>
# Date 1476195835 25200
#      Tue Oct 11 07:23:55 2016 -0700
# Node ID 718ed86a3698631077a087efaf668d70513056f5
# Parent  6f5a3300a5457c92eb09170a30c98328ebe3bcce
bookmarks: introduce binary encoding

Bookmarks binary encoding will be used for `bookmarks` bundle2 part.
The format is: <4 bytes - bookmark size, big-endian><bookmark>
               <1 byte - 0 if node is empty 1 otherwise><20 bytes node>
BookmarksEncodeError and BookmarksDecodeError maybe thrown if input is
incorrect.
Pierre-Yves David - Oct. 14, 2016, 1:28 a.m.
On 10/11/2016 06:25 PM, Stanislau Hlebik wrote:
> # HG changeset patch
> # User Stanislau Hlebik <stash@fb.com>
> # Date 1476195835 25200
> #      Tue Oct 11 07:23:55 2016 -0700
> # Node ID 718ed86a3698631077a087efaf668d70513056f5
> # Parent  6f5a3300a5457c92eb09170a30c98328ebe3bcce
> bookmarks: introduce binary encoding
>
> Bookmarks binary encoding will be used for `bookmarks` bundle2 part.
> The format is: <4 bytes - bookmark size, big-endian><bookmark>
>                <1 byte - 0 if node is empty 1 otherwise><20 bytes node>

CCing greg as our binary format overlord,

I'm not sure how useful it is to have the node to be optional. using 
nullid should be fine (and most stream/bundle will be compressed anyway.

This would give us the option to have the node+size first (as fixed 
width) then the bookmark name. Though ?

I do not thing it is necessary to point that the value is big-endian, we 
use big endian for all binary encoding.

Beside that, the change seems good. I've been tempted to advise for 
moving the encoding/decoding directly into the bundle2 part to benefit 
from the bundle2 utility to pack/unpack but it eventually felt more at 
home here.

I've some small comment inline for your consideration.

> BookmarksEncodeError and BookmarksDecodeError maybe thrown if input is
> incorrect.
>
> diff --git a/mercurial/bookmarks.py b/mercurial/bookmarks.py
> --- a/mercurial/bookmarks.py
> +++ b/mercurial/bookmarks.py
> @@ -7,8 +7,10 @@
>
>  from __future__ import absolute_import
>
> +import StringIO
>  import errno
>  import os
> +import struct
>
>  from .i18n import _
>  from .node import (
> @@ -23,6 +25,77 @@
>      util,
>  )
>
> +_NONEMPTYNODE = chr(1)
> +_EMPTYNODE = chr(0)

I would use 'pack' directly here. Who know what awe-full thing around 
char and str python might be preparing for pytnon3.

> +
> +def _packbookmarksize(size):
> +    return struct.pack('>i', size)

Any reason we don't just inline this?

> +
> +def _unpackbookmarksize(stream):
> +    """Returns 0 if stream is empty.
> +    """
> +
> +    expectedsize = struct.calcsize('>i')
> +    encodedbookmarksize = stream.read(expectedsize)
> +    if len(encodedbookmarksize) == 0:
> +        return 0
> +    if len(encodedbookmarksize) != expectedsize:
> +        raise error.BookmarksDecodeError(
> +            _('cannot decode bookmark size: '
> +              'expected size: %d, actual size: %d') %
> +            (expectedsize, len(encodedbookmarksize)))
> +    return struct.unpack('>i', encodedbookmarksize)[0]

small nits, The bundle2 code have a lot of utility around reading and 
unpacking, it seems like it could be worthwhile to move this unpacking 
code directly in the part.

> +def encodebookmarks(bookmarks):
> +    """Encodes bookmark to node mapping to the byte string.
> +
> +    Format: <4 bytes - bookmark size, big-endian><bookmark>
> +            <1 byte - 0 if node is empty 1 otherwise><20 bytes node>
> +    Node may be 20 byte binary string, 40 byte hex string or empty

The part about 'Hex' is surprising is it still up to date.

> +    parts = []
> +    for bookmark, node in bookmarks.iteritems():
> +        encodedbookmarksize = _packbookmarksize(len(bookmark))
> +        parts.append(encodedbookmarksize)
> +        bookmark = encoding.fromlocal(bookmark)
> +        parts.append(bookmark)
> +        if node:
> +            if len(node) != 20 and len(node) != 40:
> +                raise error.BookmarksEncodeError()
> +            if len(node) == 40:
> +                node = bin(node)
> +            parts.append(_NONEMPTYNODE)
> +            parts.append(node)
> +        else:
> +            parts.append(_EMPTYNODE)
> +    return ''.join(parts)

We should probably have something a bit more subtle than 1 unique byte 
blob. bundle2 part can be fed an iterator yielding the lines one by one 
would be a good way to have this generation smoother.

> +
> +def decodebookmarks(buf):
> +    """Decodes buffer into bookmark to node mapping.
> +
> +    Node is either 20 bytes or empty.
> +    """
> +
> +    stream = StringIO.StringIO(buf)
> +    bookmarks = {}
> +    while True:
> +        bookmarksize = _unpackbookmarksize(stream)
> +        if not bookmarksize:
> +            break

Could we use 'while bookmark size' as an iteration value. A bare 'while 
True:' is a bit scary.

> +        bookmark = stream.read(bookmarksize)
> +        if len(bookmark) != bookmarksize:
> +            raise error.BookmarksDecodeError(
> +                'cannot decode bookmark: expected size: %d, '
> +                'actual size: %d' % (bookmarksize, len(bookmark)))
> +        bookmark = encoding.tolocal(bookmark)
> +        emptynode = stream.read(1)
> +        node = ''
> +        if emptynode != _EMPTYNODE:
> +            node = stream.read(20)
> +        bookmarks[bookmark] = node
> +    return bookmarks
> +
>  def _getbkfile(repo):
>      """Hook so that extensions that mess with the store can hook bm storage.
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
Stanislau Hlebik - Nov. 2, 2016, 10:08 a.m.
If we have node+size first what advantage does it give?
I’m using empty node to show that bookmark is deleted.
I can use nullid for the same purpose but then I won’t be able to push/pull bookmarks that point to null commit
(I know this is a rare case, but it’s possible to do it now).

    >   The part about 'Hex' is surprising is it still up to date.

Why it’s not up to date? It’s possible to pass bookmarks with hex nodes.

On 10/14/16, 2:28 AM, "Pierre-Yves David" <pierre-yves.david@ens-lyon.org> wrote:

    
    
    On 10/11/2016 06:25 PM, Stanislau Hlebik wrote:
    > # HG changeset patch

    > # User Stanislau Hlebik <stash@fb.com>

    > # Date 1476195835 25200

    > #      Tue Oct 11 07:23:55 2016 -0700

    > # Node ID 718ed86a3698631077a087efaf668d70513056f5

    > # Parent  6f5a3300a5457c92eb09170a30c98328ebe3bcce

    > bookmarks: introduce binary encoding

    >

    > Bookmarks binary encoding will be used for `bookmarks` bundle2 part.

    > The format is: <4 bytes - bookmark size, big-endian><bookmark>

    >                <1 byte - 0 if node is empty 1 otherwise><20 bytes node>

    
    CCing greg as our binary format overlord,
    
    I'm not sure how useful it is to have the node to be optional. using 
    nullid should be fine (and most stream/bundle will be compressed anyway.
    
    This would give us the option to have the node+size first (as fixed 
    width) then the bookmark name. Though ?
    
    I do not thing it is necessary to point that the value is big-endian, we 
    use big endian for all binary encoding.
    
    Beside that, the change seems good. I've been tempted to advise for 
    moving the encoding/decoding directly into the bundle2 part to benefit 
    from the bundle2 utility to pack/unpack but it eventually felt more at 
    home here.
    
    I've some small comment inline for your consideration.
    
    > BookmarksEncodeError and BookmarksDecodeError maybe thrown if input is

    > incorrect.

    >

    > diff --git a/mercurial/bookmarks.py b/mercurial/bookmarks.py

    > --- a/mercurial/bookmarks.py

    > +++ b/mercurial/bookmarks.py

    > @@ -7,8 +7,10 @@

    >

    >  from __future__ import absolute_import

    >

    > +import StringIO

    >  import errno

    >  import os

    > +import struct

    >

    >  from .i18n import _

    >  from .node import (

    > @@ -23,6 +25,77 @@

    >      util,

    >  )

    >

    > +_NONEMPTYNODE = chr(1)

    > +_EMPTYNODE = chr(0)

    
    I would use 'pack' directly here. Who know what awe-full thing around 
    char and str python might be preparing for pytnon3.
    
    > +

    > +def _packbookmarksize(size):

    > +    return struct.pack('>i', size)

    
    Any reason we don't just inline this?
    
    > +

    > +def _unpackbookmarksize(stream):

    > +    """Returns 0 if stream is empty.

    > +    """

    > +

    > +    expectedsize = struct.calcsize('>i')

    > +    encodedbookmarksize = stream.read(expectedsize)

    > +    if len(encodedbookmarksize) == 0:

    > +        return 0

    > +    if len(encodedbookmarksize) != expectedsize:

    > +        raise error.BookmarksDecodeError(

    > +            _('cannot decode bookmark size: '

    > +              'expected size: %d, actual size: %d') %

    > +            (expectedsize, len(encodedbookmarksize)))

    > +    return struct.unpack('>i', encodedbookmarksize)[0]

    
    small nits, The bundle2 code have a lot of utility around reading and 
    unpacking, it seems like it could be worthwhile to move this unpacking 
    code directly in the part.
    
    > +def encodebookmarks(bookmarks):

    > +    """Encodes bookmark to node mapping to the byte string.

    > +

    > +    Format: <4 bytes - bookmark size, big-endian><bookmark>

    > +            <1 byte - 0 if node is empty 1 otherwise><20 bytes node>

    > +    Node may be 20 byte binary string, 40 byte hex string or empty

    
    The part about 'Hex' is surprising is it still up to date.
    
    > +    parts = []

    > +    for bookmark, node in bookmarks.iteritems():

    > +        encodedbookmarksize = _packbookmarksize(len(bookmark))

    > +        parts.append(encodedbookmarksize)

    > +        bookmark = encoding.fromlocal(bookmark)

    > +        parts.append(bookmark)

    > +        if node:

    > +            if len(node) != 20 and len(node) != 40:

    > +                raise error.BookmarksEncodeError()

    > +            if len(node) == 40:

    > +                node = bin(node)

    > +            parts.append(_NONEMPTYNODE)

    > +            parts.append(node)

    > +        else:

    > +            parts.append(_EMPTYNODE)

    > +    return ''.join(parts)

    
    We should probably have something a bit more subtle than 1 unique byte 
    blob. bundle2 part can be fed an iterator yielding the lines one by one 
    would be a good way to have this generation smoother.
    
    > +

    > +def decodebookmarks(buf):

    > +    """Decodes buffer into bookmark to node mapping.

    > +

    > +    Node is either 20 bytes or empty.

    > +    """

    > +

    > +    stream = StringIO.StringIO(buf)

    > +    bookmarks = {}

    > +    while True:

    > +        bookmarksize = _unpackbookmarksize(stream)

    > +        if not bookmarksize:

    > +            break

    
    Could we use 'while bookmark size' as an iteration value. A bare 'while 
    True:' is a bit scary.
    
    > +        bookmark = stream.read(bookmarksize)

    > +        if len(bookmark) != bookmarksize:

    > +            raise error.BookmarksDecodeError(

    > +                'cannot decode bookmark: expected size: %d, '

    > +                'actual size: %d' % (bookmarksize, len(bookmark)))

    > +        bookmark = encoding.tolocal(bookmark)

    > +        emptynode = stream.read(1)

    > +        node = ''

    > +        if emptynode != _EMPTYNODE:

    > +            node = stream.read(20)

    > +        bookmarks[bookmark] = node

    > +    return bookmarks

    > +

    >  def _getbkfile(repo):

    >      """Hook so that extensions that mess with the store can hook bm storage.

    >

    > _______________________________________________

    > Mercurial-devel mailing list

    > Mercurial-devel@mercurial-scm.org

    > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mercurial-2Dscm.org_mailman_listinfo_mercurial-2Ddevel&d=DQICaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=1EQ58Dmb5uX1qHujcsT1Mg&m=1DxDEGgOTUUOmuCXmZ95OkWoyOXz2D7ILnHuyeqI0iQ&s=-N8JIBF1kxrGeXpr19jfoLFYJFS2Oj2MSZEFpUyjhOE&e= 

    >

    
    -- 
    Pierre-Yves David

Patch

diff --git a/mercurial/bookmarks.py b/mercurial/bookmarks.py
--- a/mercurial/bookmarks.py
+++ b/mercurial/bookmarks.py
@@ -7,8 +7,10 @@ 
 
 from __future__ import absolute_import
 
+import StringIO
 import errno
 import os
+import struct
 
 from .i18n import _
 from .node import (
@@ -23,6 +25,77 @@ 
     util,
 )
 
+_NONEMPTYNODE = chr(1)
+_EMPTYNODE = chr(0)
+
+def _packbookmarksize(size):
+    return struct.pack('>i', size)
+
+def _unpackbookmarksize(stream):
+    """Returns 0 if stream is empty.
+    """
+
+    expectedsize = struct.calcsize('>i')
+    encodedbookmarksize = stream.read(expectedsize)
+    if len(encodedbookmarksize) == 0:
+        return 0
+    if len(encodedbookmarksize) != expectedsize:
+        raise error.BookmarksDecodeError(
+            _('cannot decode bookmark size: '
+              'expected size: %d, actual size: %d') %
+            (expectedsize, len(encodedbookmarksize)))
+    return struct.unpack('>i', encodedbookmarksize)[0]
+
+def encodebookmarks(bookmarks):
+    """Encodes bookmark to node mapping to the byte string.
+
+    Format: <4 bytes - bookmark size, big-endian><bookmark>
+            <1 byte - 0 if node is empty 1 otherwise><20 bytes node>
+    Node may be 20 byte binary string, 40 byte hex string or empty
+    """
+
+    parts = []
+    for bookmark, node in bookmarks.iteritems():
+        encodedbookmarksize = _packbookmarksize(len(bookmark))
+        parts.append(encodedbookmarksize)
+        bookmark = encoding.fromlocal(bookmark)
+        parts.append(bookmark)
+        if node:
+            if len(node) != 20 and len(node) != 40:
+                raise error.BookmarksEncodeError()
+            if len(node) == 40:
+                node = bin(node)
+            parts.append(_NONEMPTYNODE)
+            parts.append(node)
+        else:
+            parts.append(_EMPTYNODE)
+    return ''.join(parts)
+
+def decodebookmarks(buf):
+    """Decodes buffer into bookmark to node mapping.
+
+    Node is either 20 bytes or empty.
+    """
+
+    stream = StringIO.StringIO(buf)
+    bookmarks = {}
+    while True:
+        bookmarksize = _unpackbookmarksize(stream)
+        if not bookmarksize:
+            break
+        bookmark = stream.read(bookmarksize)
+        if len(bookmark) != bookmarksize:
+            raise error.BookmarksDecodeError(
+                'cannot decode bookmark: expected size: %d, '
+                'actual size: %d' % (bookmarksize, len(bookmark)))
+        bookmark = encoding.tolocal(bookmark)
+        emptynode = stream.read(1)
+        node = ''
+        if emptynode != _EMPTYNODE:
+            node = stream.read(20)
+        bookmarks[bookmark] = node
+    return bookmarks
+
 def _getbkfile(repo):
     """Hook so that extensions that mess with the store can hook bm storage.