Submitter | Stanislau Hlebik |
---|---|
Date | Oct. 11, 2016, 4:25 p.m. |
Message ID | <718ed86a3698631077a0.1476203146@dev1918.lla1.facebook.com> |
Download | mbox | patch |
Permalink | /patch/17035/ |
State | Changes Requested |
Delegated to: | Pierre-Yves David |
Headers | show |
Comments
On 10/11/2016 06:25 PM, Stanislau Hlebik wrote: > # HG changeset patch > # User Stanislau Hlebik <stash@fb.com> > # Date 1476195835 25200 > # Tue Oct 11 07:23:55 2016 -0700 > # Node ID 718ed86a3698631077a087efaf668d70513056f5 > # Parent 6f5a3300a5457c92eb09170a30c98328ebe3bcce > bookmarks: introduce binary encoding > > Bookmarks binary encoding will be used for `bookmarks` bundle2 part. > The format is: <4 bytes - bookmark size, big-endian><bookmark> > <1 byte - 0 if node is empty 1 otherwise><20 bytes node> CCing greg as our binary format overlord, I'm not sure how useful it is to have the node to be optional. using nullid should be fine (and most stream/bundle will be compressed anyway. This would give us the option to have the node+size first (as fixed width) then the bookmark name. Though ? I do not thing it is necessary to point that the value is big-endian, we use big endian for all binary encoding. Beside that, the change seems good. I've been tempted to advise for moving the encoding/decoding directly into the bundle2 part to benefit from the bundle2 utility to pack/unpack but it eventually felt more at home here. I've some small comment inline for your consideration. > BookmarksEncodeError and BookmarksDecodeError maybe thrown if input is > incorrect. > > diff --git a/mercurial/bookmarks.py b/mercurial/bookmarks.py > --- a/mercurial/bookmarks.py > +++ b/mercurial/bookmarks.py > @@ -7,8 +7,10 @@ > > from __future__ import absolute_import > > +import StringIO > import errno > import os > +import struct > > from .i18n import _ > from .node import ( > @@ -23,6 +25,77 @@ > util, > ) > > +_NONEMPTYNODE = chr(1) > +_EMPTYNODE = chr(0) I would use 'pack' directly here. Who know what awe-full thing around char and str python might be preparing for pytnon3. > + > +def _packbookmarksize(size): > + return struct.pack('>i', size) Any reason we don't just inline this? > + > +def _unpackbookmarksize(stream): > + """Returns 0 if stream is empty. > + """ > + > + expectedsize = struct.calcsize('>i') > + encodedbookmarksize = stream.read(expectedsize) > + if len(encodedbookmarksize) == 0: > + return 0 > + if len(encodedbookmarksize) != expectedsize: > + raise error.BookmarksDecodeError( > + _('cannot decode bookmark size: ' > + 'expected size: %d, actual size: %d') % > + (expectedsize, len(encodedbookmarksize))) > + return struct.unpack('>i', encodedbookmarksize)[0] small nits, The bundle2 code have a lot of utility around reading and unpacking, it seems like it could be worthwhile to move this unpacking code directly in the part. > +def encodebookmarks(bookmarks): > + """Encodes bookmark to node mapping to the byte string. > + > + Format: <4 bytes - bookmark size, big-endian><bookmark> > + <1 byte - 0 if node is empty 1 otherwise><20 bytes node> > + Node may be 20 byte binary string, 40 byte hex string or empty The part about 'Hex' is surprising is it still up to date. > + parts = [] > + for bookmark, node in bookmarks.iteritems(): > + encodedbookmarksize = _packbookmarksize(len(bookmark)) > + parts.append(encodedbookmarksize) > + bookmark = encoding.fromlocal(bookmark) > + parts.append(bookmark) > + if node: > + if len(node) != 20 and len(node) != 40: > + raise error.BookmarksEncodeError() > + if len(node) == 40: > + node = bin(node) > + parts.append(_NONEMPTYNODE) > + parts.append(node) > + else: > + parts.append(_EMPTYNODE) > + return ''.join(parts) We should probably have something a bit more subtle than 1 unique byte blob. bundle2 part can be fed an iterator yielding the lines one by one would be a good way to have this generation smoother. > + > +def decodebookmarks(buf): > + """Decodes buffer into bookmark to node mapping. > + > + Node is either 20 bytes or empty. > + """ > + > + stream = StringIO.StringIO(buf) > + bookmarks = {} > + while True: > + bookmarksize = _unpackbookmarksize(stream) > + if not bookmarksize: > + break Could we use 'while bookmark size' as an iteration value. A bare 'while True:' is a bit scary. > + bookmark = stream.read(bookmarksize) > + if len(bookmark) != bookmarksize: > + raise error.BookmarksDecodeError( > + 'cannot decode bookmark: expected size: %d, ' > + 'actual size: %d' % (bookmarksize, len(bookmark))) > + bookmark = encoding.tolocal(bookmark) > + emptynode = stream.read(1) > + node = '' > + if emptynode != _EMPTYNODE: > + node = stream.read(20) > + bookmarks[bookmark] = node > + return bookmarks > + > def _getbkfile(repo): > """Hook so that extensions that mess with the store can hook bm storage. > > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel >
If we have node+size first what advantage does it give? I’m using empty node to show that bookmark is deleted. I can use nullid for the same purpose but then I won’t be able to push/pull bookmarks that point to null commit (I know this is a rare case, but it’s possible to do it now). > The part about 'Hex' is surprising is it still up to date. Why it’s not up to date? It’s possible to pass bookmarks with hex nodes. On 10/14/16, 2:28 AM, "Pierre-Yves David" <pierre-yves.david@ens-lyon.org> wrote: On 10/11/2016 06:25 PM, Stanislau Hlebik wrote: > # HG changeset patch > # User Stanislau Hlebik <stash@fb.com> > # Date 1476195835 25200 > # Tue Oct 11 07:23:55 2016 -0700 > # Node ID 718ed86a3698631077a087efaf668d70513056f5 > # Parent 6f5a3300a5457c92eb09170a30c98328ebe3bcce > bookmarks: introduce binary encoding > > Bookmarks binary encoding will be used for `bookmarks` bundle2 part. > The format is: <4 bytes - bookmark size, big-endian><bookmark> > <1 byte - 0 if node is empty 1 otherwise><20 bytes node> CCing greg as our binary format overlord, I'm not sure how useful it is to have the node to be optional. using nullid should be fine (and most stream/bundle will be compressed anyway. This would give us the option to have the node+size first (as fixed width) then the bookmark name. Though ? I do not thing it is necessary to point that the value is big-endian, we use big endian for all binary encoding. Beside that, the change seems good. I've been tempted to advise for moving the encoding/decoding directly into the bundle2 part to benefit from the bundle2 utility to pack/unpack but it eventually felt more at home here. I've some small comment inline for your consideration. > BookmarksEncodeError and BookmarksDecodeError maybe thrown if input is > incorrect. > > diff --git a/mercurial/bookmarks.py b/mercurial/bookmarks.py > --- a/mercurial/bookmarks.py > +++ b/mercurial/bookmarks.py > @@ -7,8 +7,10 @@ > > from __future__ import absolute_import > > +import StringIO > import errno > import os > +import struct > > from .i18n import _ > from .node import ( > @@ -23,6 +25,77 @@ > util, > ) > > +_NONEMPTYNODE = chr(1) > +_EMPTYNODE = chr(0) I would use 'pack' directly here. Who know what awe-full thing around char and str python might be preparing for pytnon3. > + > +def _packbookmarksize(size): > + return struct.pack('>i', size) Any reason we don't just inline this? > + > +def _unpackbookmarksize(stream): > + """Returns 0 if stream is empty. > + """ > + > + expectedsize = struct.calcsize('>i') > + encodedbookmarksize = stream.read(expectedsize) > + if len(encodedbookmarksize) == 0: > + return 0 > + if len(encodedbookmarksize) != expectedsize: > + raise error.BookmarksDecodeError( > + _('cannot decode bookmark size: ' > + 'expected size: %d, actual size: %d') % > + (expectedsize, len(encodedbookmarksize))) > + return struct.unpack('>i', encodedbookmarksize)[0] small nits, The bundle2 code have a lot of utility around reading and unpacking, it seems like it could be worthwhile to move this unpacking code directly in the part. > +def encodebookmarks(bookmarks): > + """Encodes bookmark to node mapping to the byte string. > + > + Format: <4 bytes - bookmark size, big-endian><bookmark> > + <1 byte - 0 if node is empty 1 otherwise><20 bytes node> > + Node may be 20 byte binary string, 40 byte hex string or empty The part about 'Hex' is surprising is it still up to date. > + parts = [] > + for bookmark, node in bookmarks.iteritems(): > + encodedbookmarksize = _packbookmarksize(len(bookmark)) > + parts.append(encodedbookmarksize) > + bookmark = encoding.fromlocal(bookmark) > + parts.append(bookmark) > + if node: > + if len(node) != 20 and len(node) != 40: > + raise error.BookmarksEncodeError() > + if len(node) == 40: > + node = bin(node) > + parts.append(_NONEMPTYNODE) > + parts.append(node) > + else: > + parts.append(_EMPTYNODE) > + return ''.join(parts) We should probably have something a bit more subtle than 1 unique byte blob. bundle2 part can be fed an iterator yielding the lines one by one would be a good way to have this generation smoother. > + > +def decodebookmarks(buf): > + """Decodes buffer into bookmark to node mapping. > + > + Node is either 20 bytes or empty. > + """ > + > + stream = StringIO.StringIO(buf) > + bookmarks = {} > + while True: > + bookmarksize = _unpackbookmarksize(stream) > + if not bookmarksize: > + break Could we use 'while bookmark size' as an iteration value. A bare 'while True:' is a bit scary. > + bookmark = stream.read(bookmarksize) > + if len(bookmark) != bookmarksize: > + raise error.BookmarksDecodeError( > + 'cannot decode bookmark: expected size: %d, ' > + 'actual size: %d' % (bookmarksize, len(bookmark))) > + bookmark = encoding.tolocal(bookmark) > + emptynode = stream.read(1) > + node = '' > + if emptynode != _EMPTYNODE: > + node = stream.read(20) > + bookmarks[bookmark] = node > + return bookmarks > + > def _getbkfile(repo): > """Hook so that extensions that mess with the store can hook bm storage. > > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mercurial-2Dscm.org_mailman_listinfo_mercurial-2Ddevel&d=DQICaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=1EQ58Dmb5uX1qHujcsT1Mg&m=1DxDEGgOTUUOmuCXmZ95OkWoyOXz2D7ILnHuyeqI0iQ&s=-N8JIBF1kxrGeXpr19jfoLFYJFS2Oj2MSZEFpUyjhOE&e= > -- Pierre-Yves David
Patch
diff --git a/mercurial/bookmarks.py b/mercurial/bookmarks.py --- a/mercurial/bookmarks.py +++ b/mercurial/bookmarks.py @@ -7,8 +7,10 @@ from __future__ import absolute_import +import StringIO import errno import os +import struct from .i18n import _ from .node import ( @@ -23,6 +25,77 @@ util, ) +_NONEMPTYNODE = chr(1) +_EMPTYNODE = chr(0) + +def _packbookmarksize(size): + return struct.pack('>i', size) + +def _unpackbookmarksize(stream): + """Returns 0 if stream is empty. + """ + + expectedsize = struct.calcsize('>i') + encodedbookmarksize = stream.read(expectedsize) + if len(encodedbookmarksize) == 0: + return 0 + if len(encodedbookmarksize) != expectedsize: + raise error.BookmarksDecodeError( + _('cannot decode bookmark size: ' + 'expected size: %d, actual size: %d') % + (expectedsize, len(encodedbookmarksize))) + return struct.unpack('>i', encodedbookmarksize)[0] + +def encodebookmarks(bookmarks): + """Encodes bookmark to node mapping to the byte string. + + Format: <4 bytes - bookmark size, big-endian><bookmark> + <1 byte - 0 if node is empty 1 otherwise><20 bytes node> + Node may be 20 byte binary string, 40 byte hex string or empty + """ + + parts = [] + for bookmark, node in bookmarks.iteritems(): + encodedbookmarksize = _packbookmarksize(len(bookmark)) + parts.append(encodedbookmarksize) + bookmark = encoding.fromlocal(bookmark) + parts.append(bookmark) + if node: + if len(node) != 20 and len(node) != 40: + raise error.BookmarksEncodeError() + if len(node) == 40: + node = bin(node) + parts.append(_NONEMPTYNODE) + parts.append(node) + else: + parts.append(_EMPTYNODE) + return ''.join(parts) + +def decodebookmarks(buf): + """Decodes buffer into bookmark to node mapping. + + Node is either 20 bytes or empty. + """ + + stream = StringIO.StringIO(buf) + bookmarks = {} + while True: + bookmarksize = _unpackbookmarksize(stream) + if not bookmarksize: + break + bookmark = stream.read(bookmarksize) + if len(bookmark) != bookmarksize: + raise error.BookmarksDecodeError( + 'cannot decode bookmark: expected size: %d, ' + 'actual size: %d' % (bookmarksize, len(bookmark))) + bookmark = encoding.tolocal(bookmark) + emptynode = stream.read(1) + node = '' + if emptynode != _EMPTYNODE: + node = stream.read(20) + bookmarks[bookmark] = node + return bookmarks + def _getbkfile(repo): """Hook so that extensions that mess with the store can hook bm storage.