Patchwork [13,of,13] obsolete: explicitly track folds inside the markers

login
register
mail settings
Submitter Boris Feld
Date Sept. 27, 2018, 5:08 p.m.
Message ID <1abacb9c2d03520a12af.1538068125@localhost.localdomain>
Download mbox | patch
Permalink /patch/35152/
State Accepted
Headers show

Comments

Boris Feld - Sept. 27, 2018, 5:08 p.m.
# HG changeset patch
# User Boris Feld <boris.feld@octobus.net>
# Date 1537998614 -7200
#      Wed Sep 26 23:50:14 2018 +0200
# Node ID 1abacb9c2d03520a12af25ae0d9ae5978f6aa3db
# Parent  bd762ec24f1858bc577b05de0388688a7756f6ba
# EXP-Topic trackfold
# Available At https://bitbucket.org/octobus/mercurial-devel/
#              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 1abacb9c2d03
obsolete: explicitly track folds inside the markers

We know records information to be able to recognize "fold" event from
obsolescence markers. To do so, we track the following piece of information:

a) a fold ID. Unique to that fold (per successors),
b) the number of predecessors,
c) the index of the predecessor in that fold.

We will now be able to create algorithm able to find "predecessorssets".

We now store this data in the generic "metadata" field of the markers.
Updating the format to have a more compact storage for this would be useful.

This way of tracking a fold through multiple markers could be applied to split
too. This would have two advantages:

1) We get a simpler format, since successors numbers are limited to [0-1].
2) We can better deal with situations where only some of the split successors
   are pushed to a remote repository.

We should look into the relevance of such change before updating the on disk
format.

note: unlike splits, folds do not have to deal with cases where only some of
the markers have been synchronized. As they all share the same successor
changesets, they are all relevant to the same nodes.
Yuya Nishihara - Sept. 30, 2018, 1:13 p.m.
On Thu, 27 Sep 2018 19:08:45 +0200, Boris Feld wrote:
> # HG changeset patch
> # User Boris Feld <boris.feld@octobus.net>
> # Date 1537998614 -7200
> #      Wed Sep 26 23:50:14 2018 +0200
> # Node ID 1abacb9c2d03520a12af25ae0d9ae5978f6aa3db
> # Parent  bd762ec24f1858bc577b05de0388688a7756f6ba
> # EXP-Topic trackfold
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 1abacb9c2d03
> obsolete: explicitly track folds inside the markers

Queued up to the previous patch, dropping the first two which appear to be
already in. Thanks.

I don't have a strong concern about this patch, but this touches the storage
so I think this needs more eyes.

> +def makefoldid(relation):
> +    folddata = ''.join(p.node() for p in relation[0] + relation[1])
> +    # Since fold only has to compete against fold for the same successors, it

Does it mean there's no reason to include relation[1] in the hash?

> +    # seems find to use a small ID. Smaller ID save space.

s/find/fine/

> +    return hashlib.sha1(folddata).hexdigest()[:8]

Nit: has to be hex(...digest()) to be py3 compat.

> -            for prec in predecessors:
> +            foldid = None
> +            foldsize = len(predecessors)
> +            if 1 < foldsize:
> +                foldid = makefoldid(rel)
> +            for foldidx, prec in enumerate(predecessors, 1):

Nit: any reason to make it 1-origin?

>                  sucs = rel[1]
>                  localmetadata = metadata.copy()
>                  if 2 < len(rel):
>                      localmetadata.update(rel[2])
> +                if foldid is not None:
> +                    localmetadata['fold-id'] = foldid
> +                    localmetadata['fold-idx'] = str(foldidx)
> +                    localmetadata['fold-size'] = str(foldsize)

'%d' % for py3 compat.

Nit: I'm not sure if "idx" and "size" are good names, but I'm fine with that.
Boris Feld - Oct. 4, 2018, 6:09 a.m.
On 30/09/2018 15:13, Yuya Nishihara wrote:
> On Thu, 27 Sep 2018 19:08:45 +0200, Boris Feld wrote:
>> # HG changeset patch
>> # User Boris Feld <boris.feld@octobus.net>
>> # Date 1537998614 -7200
>> #      Wed Sep 26 23:50:14 2018 +0200
>> # Node ID 1abacb9c2d03520a12af25ae0d9ae5978f6aa3db
>> # Parent  bd762ec24f1858bc577b05de0388688a7756f6ba
>> # EXP-Topic trackfold
>> # Available At https://bitbucket.org/octobus/mercurial-devel/
>> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 1abacb9c2d03
>> obsolete: explicitly track folds inside the markers
> Queued up to the previous patch, dropping the first two which appear to be
> already in. Thanks.
>
> I don't have a strong concern about this patch, but this touches the storage
> so I think this needs more eyes.
>
>> +def makefoldid(relation):
>> +    folddata = ''.join(p.node() for p in relation[0] + relation[1])
>> +    # Since fold only has to compete against fold for the same successors, it
> Does it mean there's no reason to include relation[1] in the hash?
It is probably still useful to have an id as unique as possible. The
goal here is more about producing a unique identifier than a
reproducible one. In fact, we should probably also include local
revisions numbers and users to help distinguish between two similar but
independent fold.
>
>> +    # seems find to use a small ID. Smaller ID save space.
> s/find/fine/
>
>> +    return hashlib.sha1(folddata).hexdigest()[:8]
> Nit: has to be hex(...digest()) to be py3 compat.
>
>> -            for prec in predecessors:
>> +            foldid = None
>> +            foldsize = len(predecessors)
>> +            if 1 < foldsize:
>> +                foldid = makefoldid(rel)
>> +            for foldidx, prec in enumerate(predecessors, 1):
> Nit: any reason to make it 1-origin?
Not a strong reason, but working on ancestors iteration in Rust reminded
us how annoying it was to not have the option to use '0' as a special
value when talking about arbitrary indexes. So we would prefer to be on
the safe side here.
>
>>                  sucs = rel[1]
>>                  localmetadata = metadata.copy()
>>                  if 2 < len(rel):
>>                      localmetadata.update(rel[2])
>> +                if foldid is not None:
>> +                    localmetadata['fold-id'] = foldid
>> +                    localmetadata['fold-idx'] = str(foldidx)
>> +                    localmetadata['fold-size'] = str(foldsize)
> '%d' % for py3 compat.
>
> Nit: I'm not sure if "idx" and "size" are good names, but I'm fine with that.

Ideally, we'll make them official binary field the next iteration of the
obsstore format. Removing the naming concerns.

We will send a V2 for the rest of the Nits.

> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Pulkit Goyal - Oct. 4, 2018, 12:44 p.m.
On Thu, Oct 4, 2018 at 9:10 AM Boris FELD <boris.feld@octobus.net> wrote:

> Ideally, we'll make them official binary field the next iteration of the
> obsstore format. Removing the naming concerns.
>

Next iteration of obsstore format! Can I read more about it somewhere, like
what's the idea and what's the new format. Also it will be nice to discuss
the format before you invest some time implementing it. :)
Yuya Nishihara - Oct. 4, 2018, 12:57 p.m.
On Thu, 4 Oct 2018 08:09:08 +0200, Boris FELD wrote:
> On 30/09/2018 15:13, Yuya Nishihara wrote:
> > On Thu, 27 Sep 2018 19:08:45 +0200, Boris Feld wrote:
> >> # HG changeset patch
> >> # User Boris Feld <boris.feld@octobus.net>
> >> # Date 1537998614 -7200
> >> #      Wed Sep 26 23:50:14 2018 +0200
> >> # Node ID 1abacb9c2d03520a12af25ae0d9ae5978f6aa3db
> >> # Parent  bd762ec24f1858bc577b05de0388688a7756f6ba
> >> # EXP-Topic trackfold
> >> # Available At https://bitbucket.org/octobus/mercurial-devel/
> >> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 1abacb9c2d03
> >> obsolete: explicitly track folds inside the markers
> > Queued up to the previous patch, dropping the first two which appear to be
> > already in. Thanks.
> >
> > I don't have a strong concern about this patch, but this touches the storage
> > so I think this needs more eyes.
> >
> >> +def makefoldid(relation):
> >> +    folddata = ''.join(p.node() for p in relation[0] + relation[1])
> >> +    # Since fold only has to compete against fold for the same successors, it
> > Does it mean there's no reason to include relation[1] in the hash?
> It is probably still useful to have an id as unique as possible. The
> goal here is more about producing a unique identifier than a
> reproducible one. In fact, we should probably also include local
> revisions numbers and users to help distinguish between two similar but
> independent fold.

Can you add that to the inline comment? I was confused about that because the
code didn't follow what the comment says.

Patch

diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py
--- a/mercurial/obsolete.py
+++ b/mercurial/obsolete.py
@@ -70,6 +70,7 @@  comment associated with each format for 
 from __future__ import absolute_import
 
 import errno
+import hashlib
 import struct
 
 from .i18n import _
@@ -954,6 +955,11 @@  def _computecontentdivergentset(repo):
             toprocess.update(obsstore.predecessors.get(prec, ()))
     return divergent
 
+def makefoldid(relation):
+    folddata = ''.join(p.node() for p in relation[0] + relation[1])
+    # Since fold only has to compete against fold for the same successors, it
+    # seems find to use a small ID. Smaller ID save space.
+    return hashlib.sha1(folddata).hexdigest()[:8]
 
 def createmarkers(repo, relations, flag=0, date=None, metadata=None,
                   operation=None):
@@ -1000,11 +1006,19 @@  def createmarkers(repo, relations, flag=
             if 1 < len(predecessors) and len(rel[1]) != 1:
                 msg = 'Fold markers can only have 1 successors, not %d'
                 raise error.ProgrammingError(msg % len(rel[1]))
-            for prec in predecessors:
+            foldid = None
+            foldsize = len(predecessors)
+            if 1 < foldsize:
+                foldid = makefoldid(rel)
+            for foldidx, prec in enumerate(predecessors, 1):
                 sucs = rel[1]
                 localmetadata = metadata.copy()
                 if 2 < len(rel):
                     localmetadata.update(rel[2])
+                if foldid is not None:
+                    localmetadata['fold-id'] = foldid
+                    localmetadata['fold-idx'] = str(foldidx)
+                    localmetadata['fold-size'] = str(foldsize)
 
                 if not prec.mutable():
                     raise error.Abort(_("cannot obsolete public changeset: %s")
diff --git a/tests/test-rebase-obsolete.t b/tests/test-rebase-obsolete.t
--- a/tests/test-rebase-obsolete.t
+++ b/tests/test-rebase-obsolete.t
@@ -356,9 +356,9 @@  collapse rebase
   $ hg id --debug -r tip
   4dc2197e807bae9817f09905b50ab288be2dbbcf tip
   $ hg debugobsolete
-  42ccdea3bb16d28e1848c95fe2e44c000f3f21b1 4dc2197e807bae9817f09905b50ab288be2dbbcf 0 (Thu Jan 01 00:00:00 1970 +0000) {'ef1': '13', 'operation': 'rebase', 'user': 'test'}
-  5fddd98957c8a54a4d436dfe1da9d87f21a1b97b 4dc2197e807bae9817f09905b50ab288be2dbbcf 0 (Thu Jan 01 00:00:00 1970 +0000) {'ef1': '13', 'operation': 'rebase', 'user': 'test'}
-  32af7686d403cf45b5d95f2d70cebea587ac806a 4dc2197e807bae9817f09905b50ab288be2dbbcf 0 (Thu Jan 01 00:00:00 1970 +0000) {'ef1': '13', 'operation': 'rebase', 'user': 'test'}
+  42ccdea3bb16d28e1848c95fe2e44c000f3f21b1 4dc2197e807bae9817f09905b50ab288be2dbbcf 0 (Thu Jan 01 00:00:00 1970 +0000) {'ef1': '13', 'fold-id': '350cf01c', 'fold-idx': '1', 'fold-size': '3', 'operation': 'rebase', 'user': 'test'}
+  5fddd98957c8a54a4d436dfe1da9d87f21a1b97b 4dc2197e807bae9817f09905b50ab288be2dbbcf 0 (Thu Jan 01 00:00:00 1970 +0000) {'ef1': '13', 'fold-id': '350cf01c', 'fold-idx': '2', 'fold-size': '3', 'operation': 'rebase', 'user': 'test'}
+  32af7686d403cf45b5d95f2d70cebea587ac806a 4dc2197e807bae9817f09905b50ab288be2dbbcf 0 (Thu Jan 01 00:00:00 1970 +0000) {'ef1': '13', 'fold-id': '350cf01c', 'fold-idx': '3', 'fold-size': '3', 'operation': 'rebase', 'user': 'test'}
 
   $ cd ..