Patchwork [1,of,4,V2] obsolete: track node versions

login
register
mail settings
Submitter Jun Wu
Date March 13, 2017, 9:48 a.m.
Message ID <dec2b2328ef19c166f0e.1489398493@localhost.localdomain>
Download mbox | patch
Permalink /patch/19280/
State Changes Requested
Delegated to: Jun Wu
Headers show

Comments

Jun Wu - March 13, 2017, 9:48 a.m.
# HG changeset patch
# User Jun Wu <quark@fb.com>
# Date 1489385975 25200
#      Sun Mar 12 23:19:35 2017 -0700
# Node ID dec2b2328ef19c166f0ed1cb711b6c99dc9c590a
# Parent  8a17c541177f32348e248608b6a9dfd7fefdf517
# Available At https://bitbucket.org/quark-zju/hg-draft
#              hg pull https://bitbucket.org/quark-zju/hg-draft -r dec2b2328ef1
obsolete: track node versions

This is the first step to support obs cycle nicely.

The core idea is straightforward,

  A new marker "X -> Y" has the intention to make Y visible.

So if we know "X -> Y" is not older than another marker "Y -> ...", the
latter gets ignored. It's "not older than", not "newer", because then a
single changeset can be revived by "X -> X" (plus the "cancel out" property,
see below).

Then we just need to figure out how to sort the markers - i.e. defining
their "version"s. My very first idea is to use the offsets of markers stored
in obsstore, which is unambiguous. While marmoute suggested that the "date"
field makes more sense if markers are exchanged - the offsets are local
while the dates are global. And that's what this patch uses. If two markers
form a cycle and have a same date, they "cancel out" automatically, like
what darcs deals with conflicts - a reasonable behavior.

Note that sorting all markers naively by date works but is very slow. My
hg-committed has 11k markers. It takes nearly 1 second to sort them. Not to
mention sorting all markers will have difficulty with "mergemarkers" or
"addmarker", since they may require sorting all the markers again, which is
an unacceptable time complexity.

Therefore I came up with the dynamic filter idea. That's to make "node
versions" a separate state to track. And use it to filter "successors",
"precursors", etc. dynamically. Building nodeversions on my hg-committed
repo takes about 0.1 seconds, which looks pretty good. In theory it should
have similar time complexity to building "successors", which is acceptable.
Jun Wu - March 27, 2017, 8:49 a.m.
Thanks for the detailed reply. I have replies on multiple subtopics. But I
deleted topics that I think are less important, to make the most important
topic clear, and make the discussion more efficient. If you think I need to
respond to more topics, feel free to point me at them.

Excerpts from Pierre-Yves David's message of 2017-03-27 09:14:53 +0200:

[snip]

> Simple practical example with date:
> -----------------------------------
> 
> [time-1] repo-B: changeset C-A is pulled from repo-A
> [time-2] repo-A: changeset C-A is rewritten as C-B (time-2)
> [time-3] repo-B: changeset C-C is marked as superceeded by C-A (time-3)
> [time-4] repo-B: Changeset C-B (with marker) is pulled from repo-A
> 
> The markers pointing from C-B to C-A have a version "time-2" but "C-A" 
> now have a version of "time-3". "C-A" become wrongly un-obsoleted (and 

I think we basically disagree here. You said that "C-A" being visible does
not match "the meaning of the user action", could you define "the meaning of
the user action" formally ?

In the "node version" approach, the "meaning of a user action", where action
is "creating a marker", is defined as below:

  When the user replace changeset X with Y, the marker X -> Y gets created,
  they want Y to be visible, regardless of what previous (local or remote)
  attempts hiding it.

So the un-obsolete behavior is expected.

> C-B is in strange state). The fact C-A is un-obsoleted here is a bug and 

Strictly speaking, C-B still has an obsoleted precursor of C-A. If we draw a
2-D graph, where Y-axis is revision, X-axis is time, and markers are edges,
the above case could be represented like:

   ^ rev
   |      C-C
   |        \
   |  C-A   C-A
   |     \
   |     C-B
   |
   +---------------> time

Note that there are 2 "C-A"s, the right one is visible, the left one is not.
We show users the right one. Internally, the code could distinguish the
different of 2 "C-A"s, if it wants.

> do not match the meaning of the user action. This is a regression from 
> what evolution is currently able to achieve.

[snip]
Jun Wu - March 30, 2017, 4:33 p.m.
Per discussion on IRC. I'll drop this series from patchwork and send a new
version with better documentation and some planned fixes.

Excerpts from Jun Wu's message of 2017-03-27 01:49:03 -0700:
> Thanks for the detailed reply. I have replies on multiple subtopics. But I
> deleted topics that I think are less important, to make the most important
> topic clear, and make the discussion more efficient. If you think I need to
> respond to more topics, feel free to point me at them.
> 
> Excerpts from Pierre-Yves David's message of 2017-03-27 09:14:53 +0200:
> 
> [snip]
> 
> > Simple practical example with date:
> > -----------------------------------
> > 
> > [time-1] repo-B: changeset C-A is pulled from repo-A
> > [time-2] repo-A: changeset C-A is rewritten as C-B (time-2)
> > [time-3] repo-B: changeset C-C is marked as superceeded by C-A (time-3)
> > [time-4] repo-B: Changeset C-B (with marker) is pulled from repo-A
> > 
> > The markers pointing from C-B to C-A have a version "time-2" but "C-A" 
> > now have a version of "time-3". "C-A" become wrongly un-obsoleted (and 
> 
> I think we basically disagree here. You said that "C-A" being visible does
> not match "the meaning of the user action", could you define "the meaning of
> the user action" formally ?
> 
> In the "node version" approach, the "meaning of a user action", where action
> is "creating a marker", is defined as below:
> 
>   When the user replace changeset X with Y, the marker X -> Y gets created,
>   they want Y to be visible, regardless of what previous (local or remote)
>   attempts hiding it.
> 
> So the un-obsolete behavior is expected.
> 
> > C-B is in strange state). The fact C-A is un-obsoleted here is a bug and 
> 
> Strictly speaking, C-B still has an obsoleted precursor of C-A. If we draw a
> 2-D graph, where Y-axis is revision, X-axis is time, and markers are edges,
> the above case could be represented like:
> 
>    ^ rev
>    |      C-C
>    |        \
>    |  C-A   C-A
>    |     \
>    |     C-B
>    |
>    +---------------> time
> 
> Note that there are 2 "C-A"s, the right one is visible, the left one is not.
> We show users the right one. Internally, the code could distinguish the
> different of 2 "C-A"s, if it wants.
> 
> > do not match the meaning of the user action. This is a regression from 
> > what evolution is currently able to achieve.
> 
> [snip]
Durham Goode - March 30, 2017, 6:28 p.m.
Let's step back a moment and think about what obsmarkers are used for. 
They are used to hide commits, and to automatically perform rebases. The 
concerns around obscycles is that allowing cycles (without a perfect 
version scheme) could affect those two uses.  For hiding, it could 
result in hiding commits from the user that they didn't mean to be 
hidden. For rebasing, it means we could rebase onto the wrong successor.

I think the underlying issue is that we're trying to do too much magic 
for the user, and we're unable to find a perfect design to make that 
magic safe and understandable. I think finding the right answer here may 
be impossible.

Proposal
===

I propose a series of changes to reduce the magic and remove the need 
for obs versioning, while maintaining the power and increasing the 
understandability of these workflows. It has two parts:


1. Never hide a commit during hg pull. Only hide commits when the user 
does an action (strip/rebase/amend/histedit/evolve)
---

One of the recurring issues with obsmarkers is that commits may 
magically disappear when you pull from another repo. This has two 
issues: A) we have no good way of explaining it to the user, and B) we 
can't even be sure the user wants those commits to disappear.

I propose we *never* hide a commit when doing hg pull. When the user 
pulls, we download the obsmarkers like normal, but the commits don't 
disappear. Instead, we show the "[amended|rebased] to HASH" text in a 
log/smartlog output so the user can know the old commits are dead, then 
they can strip them at their leisure. This has the added benefit of 
making it user visible what obsmarker changes the pull brought in.

This proposal has two optional side features:

1a. *Do* hide commits during pull if the pulled version is public.
1b. Add an option to strip/prune to auto hide any visible commits that 
have visible successors and don't have visible descendants. So "hg pull 
&& hg strip/prune --obsolete-commits" would be roughly equivalent to hg 
pull today.

This proposal requires a notable change to core:

- Hidden must be separated from obsmarkers storage and mutable outside 
obsmarkers. This is possible with Jun's proposed phaseroot-esque hidden 
storage.


2. Auto rebase uses "visible successors" instead of "latest successor"
---

Today auto rebase (i.e. evolve) uses the latest successor as the 
destination of the rebases. I propose we change that to "visible 
successor" instead, where visible successor is defined as "any successor 
commit that is not hidden". This means, regardless of the current 
obsmarkers setup, the destination of the auto rebase is whatever 
successor is currently visible. Which is probably what the user wanted 
anyway.

If there are multiple visible successors, auto rebase fails and shows a 
list of the potential visible successors. Each item in the list can have 
the "[amended|rebased] to HASH" string next to it so users can 
understand at a glance the ordering and make a decision. They can either 
use -d on rebase to specify a solution to the conflict, or they  can use 
the normal visibility commands (hg strip/prune) to remove any 
undesirable commits.

The presence of cycles does not affect this at all, and there is no need 
for marker versioning. Auto rebase still uses whatever successors are 
visible, even if the successor is "older" than it, because of a cycle. 
The user can use the same rebase -d or strip/prune resolutions to 
resolve the conflict.

Summary of what these two changes achieve
===

 From a single, non-exchanging user's perspective the above proposal 
does not affect current UX around when things are hidden (like during 
rebase/amend/etc), but does allows all the benefits of the obscycle 
discussion (allowing unhiding) without the need for obsmarker versioning.

 From an exchange user's perspective, this makes exchange much more 
deterministic for the user (nothing is magically removed from the repo, 
and what new obs information from the pull is explained via smartlog), 
while still enabling auto rebase workflows. It also makes obsmarker 
conflict (divergence/etc) easier to understand and resolve by allowing 
users to resolve obsmarker conflicts using tools they're already 
familiar with (rebase -d, strip/prune).
Durham Goode - March 30, 2017, 6:34 p.m.
On 3/30/17 11:28 AM, Durham Goode wrote:
>
> 1. Never hide a commit during hg pull. Only hide commits when the user
> does an action (strip/rebase/amend/histedit/evolve)
>
> 2. Auto rebase uses "visible successors" instead of "latest successor"

To elaborate on how I see this obs cycle series affecting this proposal, 
I think the end result of this proposal is that obs versioning won't 
matter anymore.  Or at the very least versioning only becomes a rough 
heuristic to suggest to the user which visible successor is likely to be 
their desired auto rebase destination.

But, this series would be a fast way to introduce the concept of 
unhiding into core, which would let us start developing user experiences 
that benefit from unhiding, until we have truly separate hidden storage. 
So if we think my proposal is a good idea and where we want to be in the 
long run, I think we take this obscycle series, and don't worry about 
date being an imperfect heuristic since it won't matter in the long run. 
And we don't worry about the edge cases where it's unclear if X should 
be visible or Y, because in the future visibility will only ever be 
changed by explicit user action, and never by deducing it from obsmarker.
Jun Wu - March 30, 2017, 8:48 p.m.
I think this is a very nice approach to move forward. There are some
behavior changes. But I've discussed with Durham and I'm happy about the new
behaviors.

The "node version" approach achieves "unhide" in a fast and more
conservative way. The root-based hidden seems to require some non-trivial
changes so I'll send a new version of "node version" patches solely to solve
the "visibility" issue. Without a more general purposed API targeting future
possibilities like exchange etc.

Excerpts from Durham Goode's message of 2017-03-30 11:28:32 -0700:
> Let's step back a moment and think about what obsmarkers are used for. 
> They are used to hide commits, and to automatically perform rebases. The 
> concerns around obscycles is that allowing cycles (without a perfect 
> version scheme) could affect those two uses.  For hiding, it could 
> result in hiding commits from the user that they didn't mean to be 
> hidden. For rebasing, it means we could rebase onto the wrong successor.
> 
> I think the underlying issue is that we're trying to do too much magic 
> for the user, and we're unable to find a perfect design to make that 
> magic safe and understandable. I think finding the right answer here may 
> be impossible.
> 
> Proposal
> ===
> 
> I propose a series of changes to reduce the magic and remove the need 
> for obs versioning, while maintaining the power and increasing the 
> understandability of these workflows. It has two parts:
> 
> 
> 1. Never hide a commit during hg pull. Only hide commits when the user 
> does an action (strip/rebase/amend/histedit/evolve)
> ---
> 
> One of the recurring issues with obsmarkers is that commits may 
> magically disappear when you pull from another repo. This has two 
> issues: A) we have no good way of explaining it to the user, and B) we 
> can't even be sure the user wants those commits to disappear.
> 
> I propose we *never* hide a commit when doing hg pull. When the user 
> pulls, we download the obsmarkers like normal, but the commits don't 
> disappear. Instead, we show the "[amended|rebased] to HASH" text in a 
> log/smartlog output so the user can know the old commits are dead, then 
> they can strip them at their leisure. This has the added benefit of 
> making it user visible what obsmarker changes the pull brought in.
> 
> This proposal has two optional side features:
> 
> 1a. *Do* hide commits during pull if the pulled version is public.
> 1b. Add an option to strip/prune to auto hide any visible commits that 
> have visible successors and don't have visible descendants. So "hg pull 
> && hg strip/prune --obsolete-commits" would be roughly equivalent to hg 
> pull today.
> 
> This proposal requires a notable change to core:
> 
> - Hidden must be separated from obsmarkers storage and mutable outside 
> obsmarkers. This is possible with Jun's proposed phaseroot-esque hidden 
> storage.
> 
> 
> 2. Auto rebase uses "visible successors" instead of "latest successor"
> ---
> 
> Today auto rebase (i.e. evolve) uses the latest successor as the 
> destination of the rebases. I propose we change that to "visible 
> successor" instead, where visible successor is defined as "any successor 
> commit that is not hidden". This means, regardless of the current 
> obsmarkers setup, the destination of the auto rebase is whatever 
> successor is currently visible. Which is probably what the user wanted 
> anyway.
> 
> If there are multiple visible successors, auto rebase fails and shows a 
> list of the potential visible successors. Each item in the list can have 
> the "[amended|rebased] to HASH" string next to it so users can 
> understand at a glance the ordering and make a decision. They can either 
> use -d on rebase to specify a solution to the conflict, or they  can use 
> the normal visibility commands (hg strip/prune) to remove any 
> undesirable commits.
> 
> The presence of cycles does not affect this at all, and there is no need 
> for marker versioning. Auto rebase still uses whatever successors are 
> visible, even if the successor is "older" than it, because of a cycle. 
> The user can use the same rebase -d or strip/prune resolutions to 
> resolve the conflict.
> 
> Summary of what these two changes achieve
> ===
> 
>  From a single, non-exchanging user's perspective the above proposal 
> does not affect current UX around when things are hidden (like during 
> rebase/amend/etc), but does allows all the benefits of the obscycle 
> discussion (allowing unhiding) without the need for obsmarker versioning.
> 
>  From an exchange user's perspective, this makes exchange much more 
> deterministic for the user (nothing is magically removed from the repo, 
> and what new obs information from the pull is explained via smartlog), 
> while still enabling auto rebase workflows. It also makes obsmarker 
> conflict (divergence/etc) easier to understand and resolve by allowing 
> users to resolve obsmarker conflicts using tools they're already 
> familiar with (rebase -d, strip/prune).
Pierre-Yves David - April 5, 2017, 11:04 p.m.
On 04/05/2017 03:30 PM, Ryan McElroy wrote:
> […]
>
> I strongly believe that going back into a long discussion or battle over
> how evolve should behave is a good use of anyone's time. Can't we just
> build out core hiding mechanisms and build experiences on top of that?
> We can each use this in the way we feel best.

So, there seems to be a misunderstanding here. We already have a generic 
and extensible hiding API. For the past 4 years. In order to cut some 
heads of this threads hydra, please see and comment on my extended reply 
here.

https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-April/096294.html
Pierre-Yves David - April 7, 2017, 11:30 a.m.
On 04/05/2017 11:37 PM, Durham Goode wrote:
> I respond inline, but I'm also happy to drop discussion about
> evolve-like user experience changes and instead focus on the proposal
> about just making hidden commits a separate system. So we can discuss
> the various topics incrementally.

I think most of the important point raised in this email are already 
covered (or open) in the more fresh thread. So I'm going to drop this 
with the hope to save everyones time.

Cheers,

Patch

diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py
--- a/mercurial/obsolete.py
+++ b/mercurial/obsolete.py
@@ -499,4 +499,11 @@  def _addchildren(children, markers):
                 children.setdefault(p, set()).add(mark)
 
+@util.nogc
+def _bumpversions(nodeversions, markers):
+    for mark in markers:
+        date = mark[4][0]
+        for suc in mark[1]:
+            nodeversions[suc] = max(nodeversions.get(suc, -1), date)
+
 def _checkinvalidmarkers(markers):
     """search for marker with invalid data and raise error if needed
@@ -535,4 +542,11 @@  class obsstore(object):
         self._readonly = readonly
 
+        # the latest versions of possibly "revived" nodes. currently we use the
+        # creation date of markers as versions. so this is {node: date}
+        # A marker with successor=REVS will bump REVS to the version of the
+        # date of that marker.
+        # A marker with date <= nodeversions.get(precursor, -1) is invisible.
+        self._nodeversions = {}
+
     def __iter__(self):
         return iter(self._all)
@@ -643,4 +657,5 @@  class obsstore(object):
         self._version, markers = _readmarkers(data)
         markers = list(markers)
+        _bumpversions(self._nodeversions, markers)
         _checkinvalidmarkers(markers)
         return markers
@@ -676,4 +691,5 @@  class obsstore(object):
         if self._cached('children'):
             _addchildren(self.children, markers)
+        _bumpversions(self._nodeversions, markers)
         _checkinvalidmarkers(markers)