Submitter | Jun Wu |
---|---|
Date | March 13, 2017, 9:48 a.m. |
Message ID | <6ae6d1069ba1d4089afa.1489398496@localhost.localdomain> |
Download | mbox | patch |
Permalink | /patch/19282/ |
State | Changes Requested |
Delegated to: | Jun Wu |
Headers | show |
Comments
On 3/13/17 2:48 AM, Jun Wu wrote: > # HG changeset patch > # User Jun Wu <quark@fb.com> > # Date 1489395002 25200 > # Mon Mar 13 01:50:02 2017 -0700 > # Node ID 6ae6d1069ba1d4089afaeb0bb8ef2411983a1292 > # Parent 0280ee091bd0ae33aa0a67b0c8a55ccffd2e0718 > # Available At https://urldefense.proofpoint.com/v2/url?u=https-3A__bitbucket.org_quark-2Dzju_hg-2Ddraft&d=DwIGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=nuarHzhP1wi1T9iURRCj1A&m=teHrSOWg352cIHpPJ_QkfGT4tGnEsAx8upZBTGYdh94&s=aOoZRXz4F1btSijuLHmlEK-JsH1Sp_YSgBvT6DoaL-E&e= > # hg pull https://urldefense.proofpoint.com/v2/url?u=https-3A__bitbucket.org_quark-2Dzju_hg-2Ddraft&d=DwIGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=nuarHzhP1wi1T9iURRCj1A&m=teHrSOWg352cIHpPJ_QkfGT4tGnEsAx8upZBTGYdh94&s=aOoZRXz4F1btSijuLHmlEK-JsH1Sp_YSgBvT6DoaL-E&e= -r 6ae6d1069ba1 > obsolete: allow cycles > > Now we can handle cycles nicely, allow them to be created. Some practical > examples: > > - To revive X, just create a marker X -> X, with a newer date. > - To prune X again, just create a marker X -> (), with a newer date. > - The above two could be repeated. > > - To unamend A -> B, just create a marker B -> A, with a newer date. > > It's now possible for "touch" and "unamend" to reuse hashes (therefore more > user-friendly). And it's no longer necessary to write "*_source" in commit > metadata to workarounds obs cycles. The hacky inhibit extension also becomes > unnecessary. > > Finally. I have been wanting all these for a long time. Seems pretty elegant, though I haven't fully understood it yet. Maybe you guys talked about this in person, but what effect (if any) does this have on exchange?
Excerpts from Durham Goode's message of 2017-03-13 09:23:49 -0700: > On 3/13/17 2:48 AM, Jun Wu wrote: > > # HG changeset patch > > # User Jun Wu <quark@fb.com> > > # Date 1489395002 25200 > > # Mon Mar 13 01:50:02 2017 -0700 > > # Node ID 6ae6d1069ba1d4089afaeb0bb8ef2411983a1292 > > # Parent 0280ee091bd0ae33aa0a67b0c8a55ccffd2e0718 > > # Available At https://bitbucket.org/quark-zju/hg-draft > > # hg pull https://bitbucket.org/quark-zju/hg-draft -r 6ae6d1069ba1 > > obsolete: allow cycles > > > > Now we can handle cycles nicely, allow them to be created. Some practical > > examples: > > > > - To revive X, just create a marker X -> X, with a newer date. > > - To prune X again, just create a marker X -> (), with a newer date. > > - The above two could be repeated. > > > > - To unamend A -> B, just create a marker B -> A, with a newer date. > > > > It's now possible for "touch" and "unamend" to reuse hashes (therefore more > > user-friendly). And it's no longer necessary to write "*_source" in commit > > metadata to workarounds obs cycles. The hacky inhibit extension also becomes > > unnecessary. > > > > Finally. I have been wanting all these for a long time. > > Seems pretty elegant, It is. I didn't believe I solved it using just 40+ lines with reasonable performance :p > though I haven't fully understood it yet. Maybe > you guys talked about this in person, That's what happened. And we decided to use "date" as the version number. > but what effect (if any) does this > have on exchange? As long as exchange sends dates, markers are globally ordered, and they behave no different then the local case. Say Alice has A -> B created on date1, Bob has B -> A with date2. If date2 > date1, Bob wins. The time of pulling or the pull direction does not matter. Only the global version (date) matters. If date2 == date1, two markers "cancel out" and both A and B become visible. Previously no matter what the dates are, A and B are both permanently invisible.
On 03/13/2017 09:23 AM, Durham Goode wrote: > On 3/13/17 2:48 AM, Jun Wu wrote: >> # HG changeset patch >> # User Jun Wu <quark@fb.com> >> # Date 1489395002 25200 >> # Mon Mar 13 01:50:02 2017 -0700 >> # Node ID 6ae6d1069ba1d4089afaeb0bb8ef2411983a1292 >> # Parent 0280ee091bd0ae33aa0a67b0c8a55ccffd2e0718 >> # Available At >> https://urldefense.proofpoint.com/v2/url?u=https-3A__bitbucket.org_quark-2Dzju_hg-2Ddraft&d=DwIGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=nuarHzhP1wi1T9iURRCj1A&m=teHrSOWg352cIHpPJ_QkfGT4tGnEsAx8upZBTGYdh94&s=aOoZRXz4F1btSijuLHmlEK-JsH1Sp_YSgBvT6DoaL-E&e= >> >> # hg pull >> https://urldefense.proofpoint.com/v2/url?u=https-3A__bitbucket.org_quark-2Dzju_hg-2Ddraft&d=DwIGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=nuarHzhP1wi1T9iURRCj1A&m=teHrSOWg352cIHpPJ_QkfGT4tGnEsAx8upZBTGYdh94&s=aOoZRXz4F1btSijuLHmlEK-JsH1Sp_YSgBvT6DoaL-E&e= >> -r 6ae6d1069ba1 >> obsolete: allow cycles >> >> Now we can handle cycles nicely, allow them to be created. Some practical >> examples: >> >> - To revive X, just create a marker X -> X, with a newer date. >> - To prune X again, just create a marker X -> (), with a newer date. >> - The above two could be repeated. >> >> - To unamend A -> B, just create a marker B -> A, with a newer date. >> >> It's now possible for "touch" and "unamend" to reuse hashes (therefore >> more >> user-friendly). And it's no longer necessary to write "*_source" in >> commit >> metadata to workarounds obs cycles. The hacky inhibit extension also >> becomes >> unnecessary. >> >> Finally. I have been wanting all these for a long time. > > Seems pretty elegant, though I haven't fully understood it yet. Maybe > you guys talked about this in person, but what effect (if any) does this > have on exchange? We did not really discuss this at the sprint :-/ This probably has effect on two aspects of pull and push: * Computation of the relevant markers for a None * Computation of "head replacement" when pushing branches obsoleting another one. The proposal have some very interesting aspect, I'll try to do a deep review of its impact on the general concept soon™ (probably not this week) Cheers,
Excerpts from Pierre-Yves David's message of 2017-03-13 11:57:08 -0700: > > On 03/13/2017 09:23 AM, Durham Goode wrote: > > On 3/13/17 2:48 AM, Jun Wu wrote: > >> # HG changeset patch > >> # User Jun Wu <quark@fb.com> > >> # Date 1489395002 25200 > >> # Mon Mar 13 01:50:02 2017 -0700 > >> # Node ID 6ae6d1069ba1d4089afaeb0bb8ef2411983a1292 > >> # Parent 0280ee091bd0ae33aa0a67b0c8a55ccffd2e0718 > >> # Available At > >> https://bitbucket.org/quark-zju/hg-draft > >> > >> # hg pull > >> https://bitbucket.org/quark-zju/hg-draft > >> -r 6ae6d1069ba1 > >> obsolete: allow cycles > >> > >> Now we can handle cycles nicely, allow them to be created. Some practical > >> examples: > >> > >> - To revive X, just create a marker X -> X, with a newer date. > >> - To prune X again, just create a marker X -> (), with a newer date. > >> - The above two could be repeated. > >> > >> - To unamend A -> B, just create a marker B -> A, with a newer date. > >> > >> It's now possible for "touch" and "unamend" to reuse hashes (therefore > >> more > >> user-friendly). And it's no longer necessary to write "*_source" in > >> commit > >> metadata to workarounds obs cycles. The hacky inhibit extension also > >> becomes > >> unnecessary. > >> > >> Finally. I have been wanting all these for a long time. > > > > Seems pretty elegant, though I haven't fully understood it yet. Maybe > > you guys talked about this in person, but what effect (if any) does this > > have on exchange? > > We did not really discuss this at the sprint :-/ > > This probably has effect on two aspects of pull and push: > > * Computation of the relevant markers for a None Note sure what "None" means here. Just want to mention, in terms of exchange, we have 2 choices: 1. Exchange the filtered markers. 2. Exchange the unfiltered markers. I think either will work. 1 is simpler because we don't need to introduce a separate "unfiltered" layer. 2 may be nicer if people want the full history of evolution. > * Computation of "head replacement" when pushing branches obsoleting > another one. > > The proposal have some very interesting aspect, I'll try to do a deep > review of its impact on the general concept soon™ (probably not this week) > > Cheers, >
On Mon, Mar 13, 2017 at 12:25:25PM -0700, Jun Wu wrote: > Excerpts from Pierre-Yves David's message of 2017-03-13 11:57:08 -0700: > > > > On 03/13/2017 09:23 AM, Durham Goode wrote: > > > On 3/13/17 2:48 AM, Jun Wu wrote: > > >> # HG changeset patch > > >> # User Jun Wu <quark@fb.com> > > >> # Date 1489395002 25200 > > >> # Mon Mar 13 01:50:02 2017 -0700 > > >> # Node ID 6ae6d1069ba1d4089afaeb0bb8ef2411983a1292 > > >> # Parent 0280ee091bd0ae33aa0a67b0c8a55ccffd2e0718 > > >> # Available At > > >> https://bitbucket.org/quark-zju/hg-draft > > >> > > >> # hg pull > > >> https://bitbucket.org/quark-zju/hg-draft > > >> -r 6ae6d1069ba1 > > >> obsolete: allow cycles > > >> > > >> Now we can handle cycles nicely, allow them to be created. Some practical > > >> examples: > > >> > > >> - To revive X, just create a marker X -> X, with a newer date. > > >> - To prune X again, just create a marker X -> (), with a newer date. > > >> - The above two could be repeated. > > >> > > >> - To unamend A -> B, just create a marker B -> A, with a newer date. > > >> > > >> It's now possible for "touch" and "unamend" to reuse hashes (therefore > > >> more > > >> user-friendly). And it's no longer necessary to write "*_source" in > > >> commit > > >> metadata to workarounds obs cycles. The hacky inhibit extension also > > >> becomes > > >> unnecessary. > > >> > > >> Finally. I have been wanting all these for a long time. > > > > > > Seems pretty elegant, though I haven't fully understood it yet. Maybe > > > you guys talked about this in person, but what effect (if any) does this > > > have on exchange? > > > > We did not really discuss this at the sprint :-/ > > > > This probably has effect on two aspects of pull and push: > > > > * Computation of the relevant markers for a None > > Note sure what "None" means here. I think he meant "node". > > Just want to mention, in terms of exchange, we have 2 choices: > > 1. Exchange the filtered markers. > 2. Exchange the unfiltered markers. > > I think either will work. 1 is simpler because we don't need to introduce > a separate "unfiltered" layer. 2 may be nicer if people want the full > history of evolution. > > > * Computation of "head replacement" when pushing branches obsoleting > > another one. > > > > The proposal have some very interesting aspect, I'll try to do a deep > > review of its impact on the general concept soon™ (probably not this week) > > > > Cheers, > > > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
On Mon, Mar 13, 2017 at 10:11:34AM -0700, Jun Wu wrote: > Excerpts from Durham Goode's message of 2017-03-13 09:23:49 -0700: > > On 3/13/17 2:48 AM, Jun Wu wrote: > > > # HG changeset patch > > > # User Jun Wu <quark@fb.com> > > > # Date 1489395002 25200 > > > # Mon Mar 13 01:50:02 2017 -0700 > > > # Node ID 6ae6d1069ba1d4089afaeb0bb8ef2411983a1292 > > > # Parent 0280ee091bd0ae33aa0a67b0c8a55ccffd2e0718 > > > # Available At https://bitbucket.org/quark-zju/hg-draft > > > # hg pull https://bitbucket.org/quark-zju/hg-draft -r 6ae6d1069ba1 > > > obsolete: allow cycles > > > > > > Now we can handle cycles nicely, allow them to be created. Some practical > > > examples: > > > > > > - To revive X, just create a marker X -> X, with a newer date. > > > - To prune X again, just create a marker X -> (), with a newer date. > > > - The above two could be repeated. > > > > > > - To unamend A -> B, just create a marker B -> A, with a newer date. > > > > > > It's now possible for "touch" and "unamend" to reuse hashes (therefore more > > > user-friendly). And it's no longer necessary to write "*_source" in commit > > > metadata to workarounds obs cycles. The hacky inhibit extension also becomes > > > unnecessary. > > > > > > Finally. I have been wanting all these for a long time. > > > > Seems pretty elegant, > > It is. I didn't believe I solved it using just 40+ lines with reasonable > performance :p > > > though I haven't fully understood it yet. Maybe > > you guys talked about this in person, > > That's what happened. And we decided to use "date" as the version number. > > > but what effect (if any) does this > have on exchange? > > As long as exchange sends dates, markers are globally ordered, and they > behave no different then the local case. > > Say Alice has A -> B created on date1, Bob has B -> A with date2. If date2 > > date1, Bob wins. The time of pulling or the pull direction does not matter. > Only the global version (date) matters. If date2 == date1, two markers > "cancel out" and both A and B become visible. Previously no matter what the > dates are, A and B are both permanently invisible. I do worry a little about someone with a bogus clock years in the future pumping out malicious nodes that can't be overridden, but I suppose in that case your "way out" is that you perturb the hash of the revision(s) you want to keep. Seem good enough? > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Excerpts from Augie Fackler's message of 2017-03-21 18:03:33 -0400: > > As long as exchange sends dates, markers are globally ordered, and they > > behave no different then the local case. > > > > Say Alice has A -> B created on date1, Bob has B -> A with date2. If date2 > > > date1, Bob wins. The time of pulling or the pull direction does not matter. > > Only the global version (date) matters. If date2 == date1, two markers > > "cancel out" and both A and B become visible. Previously no matter what the > > dates are, A and B are both permanently invisible. > > I do worry a little about someone with a bogus clock years in the > future pumping out malicious nodes that can't be overridden, but I > suppose in that case your "way out" is that you perturb the hash of > the revision(s) you want to keep. Seem good enough? With a bogus clock, power users could also write arbitrary markers with a newer date to solve the issue. The whole point of the series is to make it possible to reuse hashes. That's required for a nature user-experience of "hg unamend", "hg unstrip", "hg unabsorb" and maybe a general purposed "hg undo", etc. Yes, the undo reusing hash behavior may be solved in some other way, like striping the obsstore, or "inhibit". But I believe the obscycle is the most elegant, bug-free way. And it deals with exchange just well.
On Tue, Mar 21, 2017 at 3:15 PM, Jun Wu <quark@fb.com> wrote: > Excerpts from Augie Fackler's message of 2017-03-21 18:03:33 -0400: > > > As long as exchange sends dates, markers are globally ordered, and they > > > behave no different then the local case. > > > > > > Say Alice has A -> B created on date1, Bob has B -> A with date2. If > date2 > > > > date1, Bob wins. The time of pulling or the pull direction does not > matter. > > > Only the global version (date) matters. If date2 == date1, two markers > > > "cancel out" and both A and B become visible. Previously no matter > what the > > > dates are, A and B are both permanently invisible. > > > > I do worry a little about someone with a bogus clock years in the > > future pumping out malicious nodes that can't be overridden, but I > > suppose in that case your "way out" is that you perturb the hash of > > the revision(s) you want to keep. Seem good enough? > > With a bogus clock, power users could also write arbitrary markers with a > newer date to solve the issue. > Hold up. Wall clock times cannot be trusted. This is like a cardinal rule of writing software in the real world. Yet this patch appears to not only rely on local wall clock times being ordered but also relies on wall clocks from different machines being ordered. That's a double no-no. Sorting by wall clocks will lead to unexpected behavior which in the eyes of an end-user is a bug. Sorting by wall clocks is hardly ever a good idea. I have serious reservations about using wall times to determine obsolescence marker precedence. > > The whole point of the series is to make it possible to reuse hashes. > That's > required for a nature user-experience of "hg unamend", "hg unstrip", > "hg unabsorb" and maybe a general purposed "hg undo", etc. > > Yes, the undo reusing hash behavior may be solved in some other way, like > striping the obsstore, or "inhibit". But I believe the obscycle is the most > elegant, bug-free way. And it deals with exchange just well. > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel >
Excerpts from Gregory Szorc's message of 2017-03-21 19:32:06 -0700: > On Tue, Mar 21, 2017 at 3:15 PM, Jun Wu <quark@fb.com> wrote: > > > Excerpts from Augie Fackler's message of 2017-03-21 18:03:33 -0400: > > > > As long as exchange sends dates, markers are globally ordered, and they > > > > behave no different then the local case. > > > > > > > > Say Alice has A -> B created on date1, Bob has B -> A with date2. If > > date2 > > > > > date1, Bob wins. The time of pulling or the pull direction does not > > matter. > > > > Only the global version (date) matters. If date2 == date1, two markers > > > > "cancel out" and both A and B become visible. Previously no matter > > what the > > > > dates are, A and B are both permanently invisible. > > > > > > I do worry a little about someone with a bogus clock years in the > > > future pumping out malicious nodes that can't be overridden, but I > > > suppose in that case your "way out" is that you perturb the hash of > > > the revision(s) you want to keep. Seem good enough? > > > > With a bogus clock, power users could also write arbitrary markers with a > > newer date to solve the issue. > > > > Hold up. > > Wall clock times cannot be trusted. This is like a cardinal rule of writing > software in the real world. Yet this patch appears to not only rely on > local wall clock times being ordered but also relies on wall clocks from > different machines being ordered. That's a double no-no. Sorting by wall > clocks will lead to unexpected behavior which in the eyes of an end-user is > a bug. > > Sorting by wall clocks is hardly ever a good idea. I have serious > reservations about using wall times to determine obsolescence marker > precedence. The date idea was suggested by marmoute at the sprint. I was thinking about the physical offsets of markers in "obsstore" before. Using "date" for sorting is better than using physical offsets, because date is stored in markers and is a global thing, while offsets are local, which means you need to have some special non-trivial logic to deal with exchange. This reason alone is enough to convince me to switch from offsets. I'd note that for some systems like Zookeeper, clocks must be treated *very* seriously. But for the obsstore, clocks are not that important. Because the worst case is some visibility issue of commits. No data loss. No repo corruption. No consistency issue. And could be fixed by power commands. Not to mention most systems have sane clocks, and most markers are date unrelated - only those cycles matter - cycles are uncommon. I think dsop summaried this up well: Mar 13 10:57:26 <dsop> junw: so basically it boils down to: using date is not perefct, it makes the solution easy and elegant and if clocks on computers are wrong, the user might have a non-optimal user experience, but we never loose data I've been thinking about the cycle problem for a long time and don't think there is a better solution practically. The current approach (tens of lines) is probably the most elegant thing I've ever contributed to the list. You're encouraged to suggest new ideas. But if the new idea is like some fancy format change plus some fancy conflict resolution during exchange, which sounds like thousands of lines, I think it's reasonable to say no-no to it. > > > > The whole point of the series is to make it possible to reuse hashes. > > That's > > required for a nature user-experience of "hg unamend", "hg unstrip", > > "hg unabsorb" and maybe a general purposed "hg undo", etc. > > > > Yes, the undo reusing hash behavior may be solved in some other way, like > > striping the obsstore, or "inhibit". But I believe the obscycle is the most > > elegant, bug-free way. And it deals with exchange just well. > > _______________________________________________ > > Mercurial-devel mailing list > > Mercurial-devel@mercurial-scm.org > > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel > >
On Tue, Mar 21, 2017 at 8:44 PM, Jun Wu <quark@fb.com> wrote: > Excerpts from Gregory Szorc's message of 2017-03-21 19:32:06 -0700: > > On Tue, Mar 21, 2017 at 3:15 PM, Jun Wu <quark@fb.com> wrote: > > > > > Excerpts from Augie Fackler's message of 2017-03-21 18:03:33 -0400: > > > > > As long as exchange sends dates, markers are globally ordered, and > they > > > > > behave no different then the local case. > > > > > > > > > > Say Alice has A -> B created on date1, Bob has B -> A with date2. > If > > > date2 > > > > > > date1, Bob wins. The time of pulling or the pull direction does not > > > matter. > > > > > Only the global version (date) matters. If date2 == date1, two > markers > > > > > "cancel out" and both A and B become visible. Previously no matter > > > what the > > > > > dates are, A and B are both permanently invisible. > > > > > > > > I do worry a little about someone with a bogus clock years in the > > > > future pumping out malicious nodes that can't be overridden, but I > > > > suppose in that case your "way out" is that you perturb the hash of > > > > the revision(s) you want to keep. Seem good enough? > > > > > > With a bogus clock, power users could also write arbitrary markers > with a > > > newer date to solve the issue. > > > > > > > Hold up. > > > > Wall clock times cannot be trusted. This is like a cardinal rule of > writing > > software in the real world. Yet this patch appears to not only rely on > > local wall clock times being ordered but also relies on wall clocks from > > different machines being ordered. That's a double no-no. Sorting by wall > > clocks will lead to unexpected behavior which in the eyes of an end-user > is > > a bug. > > > > Sorting by wall clocks is hardly ever a good idea. I have serious > > reservations about using wall times to determine obsolescence marker > > precedence. > > The date idea was suggested by marmoute at the sprint. I was thinking about > the physical offsets of markers in "obsstore" before. > > Using "date" for sorting is better than using physical offsets, because > date > is stored in markers and is a global thing, while offsets are local, which > means you need to have some special non-trivial logic to deal with > exchange. > This reason alone is enough to convince me to switch from offsets. > > I'd note that for some systems like Zookeeper, clocks must be treated > *very* > seriously. But for the obsstore, clocks are not that important. Because the > worst case is some visibility issue of commits. No data loss. No repo > corruption. No consistency issue. And could be fixed by power commands. > Correct me if I'm wrong, but can't incorrect clocks result in the "wrong" version of a changeset being visible? For example, I'm working on different machines and pushing changes to a central server. I create divergence then correct it later via various obsolescence markers. But, because of clocks, the "wrong" changeset is unhidden and I unknowingly start basing new work on it instead of the "correct" changeset. I accidentally land a predecessor because I assumed obsolescence "just worked" and introduce a subtle bug in the process. What are the processes in place to prevent this from happening? How will a power user even know to use a power command to fix the situation? > > Not to mention most systems have sane clocks, This is dangerous to believe. Anecdotally, I've experienced the following problems on just the machines I use day-to-day: * An old battery on a motherboard caused the system clock to go haywire, drifing so fast that the default ntp configuration failed to readjust the clock fast enough, leading to clock skew on the order of several minutes. Other times, the system would boot up without any persisted time, thinking it was at some epoch. This resulted in mtimes decades in the past. * Hyper-V attempts to adjust the system clock inside running Linux VMs by default. Windows was on California time but the Linux VM was on UTC. The timezones conflicts and every few seconds Hyper-V and something in systemd would race to fix the system time. The system time bounced around 7 hours at a time. * Virtual machines (in Virtualbox IIRC) resumed after hibernate with the system time from when they were powered down. It took several seconds or minutes for NTP or some other process to kick in to adjust the VM's clock to reality. And these problems all occurred on machines with internet connectivity. Could you imagine what would happen if they weren't able to communicate with an NTP server? and most markers are date > unrelated - only those cycles matter - cycles are uncommon. > OK. Maybe that makes things better. In what scenarios can we get cycles? > > I think dsop summaried this up well: > > Mar 13 10:57:26 <dsop> junw: so basically it boils down to: using date is > not perefct, it makes the solution easy and elegant and if clocks on > computers are wrong, the user might have a non-optimal user experience, > but we never loose data > Strictly speaking, yes, we never lose data. But I'd argue that using the wrong data (changeset) inadvertently would be a massive bug and would create distrust in version control. > > I've been thinking about the cycle problem for a long time and don't think > there is a better solution practically. The current approach (tens of > lines) > is probably the most elegant thing I've ever contributed to the list. > You're > encouraged to suggest new ideas. But if the new idea is like some fancy > format change plus some fancy conflict resolution during exchange, which > sounds like thousands of lines, I think it's reasonable to say no-no to it. > I agree the solution is simple, elegant, and probably works most of the time. But you've built this castle on an unstable foundation. Convince me it doesn't matter. As for alternatives, a correct solution needs to refer to the marker it is replacing. Since cycles should be rare, can we record the full "replaced marker" data inline in the new marker? In local storage, that could be an integer offset to the marker. (Honestly, I don't fully understand the problem space. I just saw "sort by clocks" and all kinds of alarm bells went off.) > > > > > > > The whole point of the series is to make it possible to reuse hashes. > > > That's > > > required for a nature user-experience of "hg unamend", "hg unstrip", > > > "hg unabsorb" and maybe a general purposed "hg undo", etc. > > > > > > Yes, the undo reusing hash behavior may be solved in some other way, > like > > > striping the obsstore, or "inhibit". But I believe the obscycle is the > most > > > elegant, bug-free way. And it deals with exchange just well. > > > _______________________________________________ > > > Mercurial-devel mailing list > > > Mercurial-devel@mercurial-scm.org > > > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel > > > >
Excerpts from Gregory Szorc's message of 2017-03-21 21:25:03 -0700: > Correct me if I'm wrong, but can't incorrect clocks result in the "wrong" > version of a changeset being visible? For example, I'm working on different > machines and pushing changes to a central server. I create divergence then > correct it later via various obsolescence markers. But, because of clocks, Let's look at an example of divergence: o C (amended from A) | | o B (amended from A) |/ x A If you see the above divergence in one machine, fix it by "hg prune B -s C", and verify the fix on that machine, then the fix could be applied globally. That means, if the related markers are sent to elsewhere, the fix will be observed elsewhere as expected. If the clock was wrong, what you may notice is "hg prune" looks like a no-op. But it does not unhide things. In this case only "A" is hidden, since you are not creating markers involving "A" to resolve the divergence, nothing gets "unhidden". I'm trying to construct a case that you may be concerned about, to explain the difference caused by date: o C (amended from B) | (marker 2.2) o B (amended from A) | x B (amended from A) | (marker 1.1) |/ (marker 2.1) x A x A Machine 1 Machine 2 Then if Machine 1 gets markers from Machine 2, if marker 1.1 is newer than marker 2.1, B stays visible and is a divergence. You may think "B" should be hidden, but that's also explainable. Think about the following order: 1. hg amend # A -> B 2. hg amend # B -> C 3. hg update --hidden -r A; ...; hg amend # A -> B If "3" is the latest operation, it has the intention to make whatever it creates visible - that's B. And that creates a divergence. The key here is the order of operation 2 and 3. If 2 happens after 3, then B gets hidden as expected. Currently, this cycle will get both B and C hidden, which I don't think is a sane behavior. > the "wrong" changeset is unhidden and I unknowingly start basing new work > on it instead of the "correct" changeset. I accidentally land a predecessor > because I assumed obsolescence "just worked" and introduce a subtle bug in > the process. > > What are the processes in place to prevent this from happening? How will a > power user even know to use a power command to fix the situation? They will notice an hg command looks like a no-op. And we may be able to print some friendly help about how to fix it, or just write a faked date automatically with a warning. > > > > Not to mention most systems have sane clocks, > > This is dangerous to believe. > > Anecdotally, I've experienced the following problems on just the machines I > use day-to-day: > > * An old battery on a motherboard caused the system clock to go haywire, > drifing so fast that the default ntp configuration failed to readjust the > clock fast enough, leading to clock skew on the order of several minutes. > Other times, the system would boot up without any persisted time, thinking > it was at some epoch. This resulted in mtimes decades in the past. > > * Hyper-V attempts to adjust the system clock inside running Linux VMs by > default. Windows was on California time but the Linux VM was on UTC. The > timezones conflicts and every few seconds Hyper-V and something in systemd > would race to fix the system time. The system time bounced around 7 hours > at a time. > > * Virtual machines (in Virtualbox IIRC) resumed after hibernate with the > system time from when they were powered down. It took several seconds or > minutes for NTP or some other process to kick in to adjust the VM's clock > to reality. > > And these problems all occurred on machines with internet connectivity. > Could you imagine what would happen if they weren't able to communicate > with an NTP server? As I said, the worst case is only about visibility - no repo corruption, no data loss, no broken consistency (ex. two synchronized repos have different understanding about things). And visibility is possible to fix by simply adding markers. > > unrelated - only those cycles matter - cycles are uncommon. > > > > OK. Maybe that makes things better. In what scenarios can we get cycles? Sorry, I was wrong. Precisely speaking, the date will also affect non-cycles where a successor is another marker's precursor. So in theory it could unhide commits in non-cycle situations. You can think it's approximately "sorting markers within a single marker chain". It's definitely not sorting all markers in obsstore, which is also unacceptably slow. We avoid creating cycles currently. But at Facebook "unamend" needs to restore the old hash, which will be done by just creating a cycle. > > > > I think dsop summaried this up well: > > > > Mar 13 10:57:26 <dsop> junw: so basically it boils down to: using date is > > not perefct, it makes the solution easy and elegant and if clocks on > > computers are wrong, the user might have a non-optimal user experience, > > but we never loose data > > > > Strictly speaking, yes, we never lose data. But I'd argue that using the > wrong data (changeset) inadvertently would be a massive bug and would > create distrust in version control. In theory we can build a DAG for the obsstore, and have some fancy conflict resolution. I feel it's making the problem unnecessarily much more complex and the code complexity isn't worth the problem it solves (or creates). > > > > I've been thinking about the cycle problem for a long time and don't think > > there is a better solution practically. The current approach (tens of > > lines) > > is probably the most elegant thing I've ever contributed to the list. > > You're > > encouraged to suggest new ideas. But if the new idea is like some fancy > > format change plus some fancy conflict resolution during exchange, which > > sounds like thousands of lines, I think it's reasonable to say no-no to it. > > > > I agree the solution is simple, elegant, and probably works most of the > time. But you've built this castle on an unstable foundation. Convince me > it doesn't matter. > > As for alternatives, a correct solution needs to refer to the marker it is > replacing. Since cycles should be rare, can we record the full "replaced > marker" data inline in the new marker? In local storage, that could be an If "markers being replaced" are explicitly recorded, you will miss remote markers that can be possibly replaced because you don't know them at the time appending a new local marker. So you end up with some "conflict resolution" logic during exchange. That is not very different from just using the offsets - since obsstore is append-only, new markers just "replace" old ones (I don't think there is an exception that the newly added marker is intended to be replaced by a previous one when working locally). It's simpler but has the same exchange headache. > integer offset to the marker. (Honestly, I don't fully understand the > problem space. I just saw "sort by clocks" and all kinds of alarm bells > went off.)
Excerpts from Jun Wu's message of 2017-03-21 22:42:11 -0700: > The key here is the order of operation 2 and 3. If 2 happens after 3, then B > gets hidden as expected. Currently, this cycle will get both B and C hidden, > which I don't think is a sane behavior. Sorry. It should be "Currently, 3 is always treated as happened before 2". There are no cycles in this simple case.
Jun Wu <quark@fb.com> writes: > Excerpts from Gregory Szorc's message of 2017-03-21 21:25:03 -0700: >> Correct me if I'm wrong, but can't incorrect clocks result in the "wrong" >> version of a changeset being visible? For example, I'm working on different >> machines and pushing changes to a central server. I create divergence then >> correct it later via various obsolescence markers. But, because of clocks, > > Let's look at an example of divergence: > > o C (amended from A) > | > | o B (amended from A) > |/ > x A > > If you see the above divergence in one machine, fix it by "hg prune B -s C", > and verify the fix on that machine, then the fix could be applied globally. > That means, if the related markers are sent to elsewhere, the fix will be > observed elsewhere as expected. > > If the clock was wrong, what you may notice is "hg prune" looks like a > no-op. But it does not unhide things. > > In this case only "A" is hidden, since you are not creating markers > involving "A" to resolve the divergence, nothing gets "unhidden". > > > I'm trying to construct a case that you may be concerned about, to explain > the difference caused by date: > > o C (amended from B) > | (marker 2.2) > o B (amended from A) | x B (amended from A) > | (marker 1.1) |/ (marker 2.1) > x A x A > > Machine 1 Machine 2 > > Then if Machine 1 gets markers from Machine 2, if marker 1.1 is newer than > marker 2.1, B stays visible and is a divergence. You may think "B" should be > hidden, but that's also explainable. Think about the following order: > > 1. hg amend # A -> B > 2. hg amend # B -> C > 3. hg update --hidden -r A; ...; hg amend # A -> B > > If "3" is the latest operation, it has the intention to make whatever it > creates visible - that's B. And that creates a divergence. > > The key here is the order of operation 2 and 3. If 2 happens after 3, then B > gets hidden as expected. Currently, this cycle will get both B and C hidden, > which I don't think is a sane behavior. > >> the "wrong" changeset is unhidden and I unknowingly start basing new work >> on it instead of the "correct" changeset. I accidentally land a predecessor >> because I assumed obsolescence "just worked" and introduce a subtle bug in >> the process. >> >> What are the processes in place to prevent this from happening? How will a >> power user even know to use a power command to fix the situation? > > They will notice an hg command looks like a no-op. And we may be able to > print some friendly help about how to fix it, or just write a faked date > automatically with a warning. > >> > >> > Not to mention most systems have sane clocks, >> >> This is dangerous to believe. >> >> Anecdotally, I've experienced the following problems on just the machines I >> use day-to-day: >> >> * An old battery on a motherboard caused the system clock to go haywire, >> drifing so fast that the default ntp configuration failed to readjust the >> clock fast enough, leading to clock skew on the order of several minutes. >> Other times, the system would boot up without any persisted time, thinking >> it was at some epoch. This resulted in mtimes decades in the past. >> >> * Hyper-V attempts to adjust the system clock inside running Linux VMs by >> default. Windows was on California time but the Linux VM was on UTC. The >> timezones conflicts and every few seconds Hyper-V and something in systemd >> would race to fix the system time. The system time bounced around 7 hours >> at a time. >> >> * Virtual machines (in Virtualbox IIRC) resumed after hibernate with the >> system time from when they were powered down. It took several seconds or >> minutes for NTP or some other process to kick in to adjust the VM's clock >> to reality. >> >> And these problems all occurred on machines with internet connectivity. >> Could you imagine what would happen if they weren't able to communicate >> with an NTP server? > > As I said, the worst case is only about visibility - no repo corruption, no > data loss, no broken consistency (ex. two synchronized repos have different > understanding about things). And visibility is possible to fix by simply > adding markers. Let's take a step back for a minute. Technical discussions aside (and they are, indeed, very technical), is this a feature that we should be concentrating on *right now*? I worry that evolve is already too complicated and we haven't done the first tasks of bringing bits of it into core. I've overheard from some Bitbucket co-workers (that prefer to not chime in)[1] that it's getting a bit out of hand with everyone implementing every single niche feature they want and no one seems to be around to herd these cats. If this feature is really, really needed, then fine. My only purpose here is to ask the current leaders: in all honesty, how will we, as a community, avoid over-engineering Mercurial into JIRA? [1] https://www.youtube.com/watch?v=DG0Quq36c5s
Excerpts from Sean Farley's message of 2017-03-22 11:58:34 -0700: > Let's take a step back for a minute. Technical discussions aside (and > they are, indeed, very technical), is this a feature that we should be > concentrating on *right now*? Yes. Right now. For example, At Facebook we use "inhibit" to implement "umamend" and has tons of code to support all kinds of corner cases. And it still is not prefect. This tens of lines will allow us to get rid of those hacky code (and "inhibit" all together). > I worry that evolve is already too complicated and we haven't done the > first tasks of bringing bits of it into core. I've overheard from some > Bitbucket co-workers (that prefer to not chime in)[1] that it's getting > a bit out of hand with everyone implementing every single niche feature > they want and no one seems to be around to herd these cats. The fact that the evolve version of "unamend" or "unstrip (touch)" changing commit hashes is "complicated" to explian to end-users. So although the code is slightly more complicated, the end-user experience is much better. I think it's worthy. Besides, if you take "inhibit" into consideration, this approach not only simplifies things *greatly*, it also handles the case much cleanly and with much more confidence. If we count the removal lines in inhibit and all kinds of code supporting it, I'm sure there will be much more deleted lines than added. That is a decent clean-up. How could you define it as "over-engineering" ? If you had good points indicating that "inhibit" is a reasonable permanent solution and provides a good user experience, then I'd be happy to drop the series. If you do so, be prepared with questions about all kinds of corner cases. By the way, I'm not sure if you have noticed that "inhibit" was recently moved to a directory called "hack/" in mutable-history. > If this feature is really, really needed, then fine. My only purpose > here is to ask the current leaders: in all honesty, how will we, as a > community, avoid over-engineering Mercurial into JIRA?
Jun Wu <quark@fb.com> writes: > Excerpts from Sean Farley's message of 2017-03-22 11:58:34 -0700: >> Let's take a step back for a minute. Technical discussions aside (and >> they are, indeed, very technical), is this a feature that we should be >> concentrating on *right now*? > > Yes. Right now. For example, At Facebook we use "inhibit" to implement > "umamend" and has tons of code to support all kinds of corner cases. And it > still is not prefect. Heh, yes, that made me chuckle. I understand that *you* need this. > This tens of lines will allow us to get rid of those hacky code (and > "inhibit" all together). > >> I worry that evolve is already too complicated and we haven't done the >> first tasks of bringing bits of it into core. I've overheard from some >> Bitbucket co-workers (that prefer to not chime in)[1] that it's getting >> a bit out of hand with everyone implementing every single niche feature >> they want and no one seems to be around to herd these cats. > > The fact that the evolve version of "unamend" or "unstrip (touch)" changing > commit hashes is "complicated" to explian to end-users. So although > the code is slightly more complicated, the end-user experience is much > better. I think it's worthy. > > Besides, if you take "inhibit" into consideration, But that's the thing. I'm not taking "inhibit" into consideration because inhibit isn't a common workflow. It's completely a facebook thing. > this approach not only > simplifies things *greatly*, it also handles the case much cleanly and with > much more confidence. If we count the removal lines in inhibit and all kinds > of code supporting it, I'm sure there will be much more deleted lines than > added. That is a decent clean-up. How could you define it as > "over-engineering" ? I think that's a discussion for another time. > If you had good points indicating that "inhibit" is a reasonable permanent > solution and provides a good user experience, then I'd be happy to drop the > series. If you do so, be prepared with questions about all kinds of corner > cases. > > By the way, I'm not sure if you have noticed that "inhibit" was recently > moved to a directory called "hack/" in mutable-history. Yes, things in evolve are still baking. I don't see the need to rush obs cycles into core based on that, though.
Excerpts from Sean Farley's message of 2017-03-22 13:18:29 -0700: > Heh, yes, that made me chuckle. I understand that *you* need this. I'd argue *most people* wants this. See below. > But that's the thing. I'm not taking "inhibit" into consideration > because inhibit isn't a common workflow. It's completely a facebook > thing. If you could look a bit ahead, commends like "unamend", "unstrip", "unabsorb" and "undo" in general will basically need the stuff. I guess Bitbucket don't have the data about how much demand users want those commands, since they are power-user only and need complex manual setup. But we ship commands like "unamend" and friends we have data and feedback that those commands are great and in high demand. You may argue it's still Facebook-specific. But I don't see why "unamend" has any fb-specific bit. The demand of those commands is universal. > > this approach not only > > simplifies things *greatly*, it also handles the case much cleanly and with > > much more confidence. If we count the removal lines in inhibit and all kinds > > of code supporting it, I'm sure there will be much more deleted lines than > > added. That is a decent clean-up. How could you define it as > > "over-engineering" ? > > I think that's a discussion for another time. > > > If you had good points indicating that "inhibit" is a reasonable permanent > > solution and provides a good user experience, then I'd be happy to drop the > > series. If you do so, be prepared with questions about all kinds of corner > > cases. > > > > By the way, I'm not sure if you have noticed that "inhibit" was recently > > moved to a directory called "hack/" in mutable-history. > > Yes, things in evolve are still baking. I don't see the need to rush obs > cycles into core based on that, though. Shall we have "unamend", "unrebase", "unhistedit" and "undo" ready, please do not use them as you don't need them.
Jun Wu <quark@fb.com> writes: > Excerpts from Sean Farley's message of 2017-03-22 13:18:29 -0700: >> Heh, yes, that made me chuckle. I understand that *you* need this. > > I'd argue *most people* wants this. See below. > >> But that's the thing. I'm not taking "inhibit" into consideration >> because inhibit isn't a common workflow. It's completely a facebook >> thing. > > If you could look a bit ahead, commends like "unamend", "unstrip", > "unabsorb" and "undo" in general will basically need the stuff. > > I guess Bitbucket don't have the data about how much demand users want those > commands, since they are power-user only and need complex manual setup. But > we ship commands like "unamend" and friends we have data and feedback that > those commands are great and in high demand. Evolve doesn't even have the terminology finished. What you are basing this on is, in my opinion, a very advanced and Facebook specific workflow. > You may argue it's still Facebook-specific. But I don't see why "unamend" > has any fb-specific bit. The demand of those commands is universal. > >> > this approach not only >> > simplifies things *greatly*, it also handles the case much cleanly and with >> > much more confidence. If we count the removal lines in inhibit and all kinds >> > of code supporting it, I'm sure there will be much more deleted lines than >> > added. That is a decent clean-up. How could you define it as >> > "over-engineering" ? >> >> I think that's a discussion for another time. >> >> > If you had good points indicating that "inhibit" is a reasonable permanent >> > solution and provides a good user experience, then I'd be happy to drop the >> > series. If you do so, be prepared with questions about all kinds of corner >> > cases. >> > >> > By the way, I'm not sure if you have noticed that "inhibit" was recently >> > moved to a directory called "hack/" in mutable-history. >> >> Yes, things in evolve are still baking. I don't see the need to rush obs >> cycles into core based on that, though. > > Shall we have "unamend", "unrebase", "unhistedit" and "undo" ready, please > do not use them as you don't need them. But that's part of the problem: all the edge cases mentioned here will affect normal use of obs markers. Why not put this in an extension for now? I don't see how this is more important than fixing terminology / error messages / UI / UX / etc etc etc. That being said, my questions weren't really for you. I am not a leader here. My questions are for the steering committee and the rest of the community.
On Wed, Mar 22, 2017 at 1:42 AM, Jun Wu <quark@fb.com> wrote: >> As for alternatives, a correct solution needs to refer to the marker it is >> replacing. Since cycles should be rare, can we record the full "replaced >> marker" data inline in the new marker? In local storage, that could be an > > If "markers being replaced" are explicitly recorded, you will miss remote > markers that can be possibly replaced because you don't know them at the > time appending a new local marker. So you end up with some "conflict > resolution" logic during exchange. > > That is not very different from just using the offsets - since obsstore is > append-only, new markers just "replace" old ones (I don't think there is an > exception that the newly added marker is intended to be replaced by a > previous one when working locally). It's simpler but has the same exchange > headache. I wonder: could we get away from using dates by putting some kind of generation number in the marker? So if a marker would create a cycle, we increment its generation number relative to the previous highest value in the cycle. It still means a bad actor can produce a hard-to-defeat cycle (sort of), but it solves the clock skew issue.
On Wed, Mar 22, 2017 at 4:59 PM, Sean Farley <sean@farley.io> wrote: >>> > By the way, I'm not sure if you have noticed that "inhibit" was recently >>> > moved to a directory called "hack/" in mutable-history. >>> >>> Yes, things in evolve are still baking. I don't see the need to rush obs >>> cycles into core based on that, though. >> >> Shall we have "unamend", "unrebase", "unhistedit" and "undo" ready, please >> do not use them as you don't need them. > > But that's part of the problem: all the edge cases mentioned here will > affect normal use of obs markers. Why not put this in an extension for > now? I don't see how this is more important than fixing terminology / > error messages / UI / UX / etc etc etc. I'm deeply sympathetic to the desire to "just roll back to an old version of this" and not have the hash change, and I'm also extremely sympathetic to the idea that cycles should have an obvious, easy way to recover. Something like what Jun has proposed in this series strikes me as the least-awful option to date, though I haven't considered it as much as I'm sure Jun and Pierre-Yves have.
On Mon, Mar 13, 2017 at 11:57:08AM -0700, Pierre-Yves David wrote: > > > On 03/13/2017 09:23 AM, Durham Goode wrote: > >On 3/13/17 2:48 AM, Jun Wu wrote: > >># HG changeset patch > >># User Jun Wu <quark@fb.com> > >># Date 1489395002 25200 > >># Mon Mar 13 01:50:02 2017 -0700 > >># Node ID 6ae6d1069ba1d4089afaeb0bb8ef2411983a1292 > >># Parent 0280ee091bd0ae33aa0a67b0c8a55ccffd2e0718 > >># Available At > >>https://urldefense.proofpoint.com/v2/url?u=https-3A__bitbucket.org_quark-2Dzju_hg-2Ddraft&d=DwIGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=nuarHzhP1wi1T9iURRCj1A&m=teHrSOWg352cIHpPJ_QkfGT4tGnEsAx8upZBTGYdh94&s=aOoZRXz4F1btSijuLHmlEK-JsH1Sp_YSgBvT6DoaL-E&e= > >> > >># hg pull > >>https://urldefense.proofpoint.com/v2/url?u=https-3A__bitbucket.org_quark-2Dzju_hg-2Ddraft&d=DwIGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=nuarHzhP1wi1T9iURRCj1A&m=teHrSOWg352cIHpPJ_QkfGT4tGnEsAx8upZBTGYdh94&s=aOoZRXz4F1btSijuLHmlEK-JsH1Sp_YSgBvT6DoaL-E&e= > >>-r 6ae6d1069ba1 > >>obsolete: allow cycles > >> > >>Now we can handle cycles nicely, allow them to be created. Some practical > >>examples: > >> > >> - To revive X, just create a marker X -> X, with a newer date. > >> - To prune X again, just create a marker X -> (), with a newer date. > >> - The above two could be repeated. > >> > >> - To unamend A -> B, just create a marker B -> A, with a newer date. > >> > >>It's now possible for "touch" and "unamend" to reuse hashes (therefore > >>more > >>user-friendly). And it's no longer necessary to write "*_source" in > >>commit > >>metadata to workarounds obs cycles. The hacky inhibit extension also > >>becomes > >>unnecessary. > >> > >>Finally. I have been wanting all these for a long time. > > > >Seems pretty elegant, though I haven't fully understood it yet. Maybe > >you guys talked about this in person, but what effect (if any) does this > >have on exchange? > > We did not really discuss this at the sprint :-/ > > This probably has effect on two aspects of pull and push: > > * Computation of the relevant markers for a None > > * Computation of "head replacement" when pushing branches obsoleting another > one. > > The proposal have some very interesting aspect, I'll try to do a deep review > of its impact on the general concept soon™ (probably not this week) I've had a look over the code, and it's surprisingly simple (though it could probably use more comments for future code archaeologists.) I'm wary (as is indygreg) of using the date field for this sorting, just because clocks are known sources of evil and consternation. (I acknowledge we might not have a significantly better choice though, although I'd like to give it some more thought.) > > Cheers, > > -- > Pierre-Yves David > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Excerpts from Augie Fackler's message of 2017-03-23 17:53:39 -0400: > On Wed, Mar 22, 2017 at 1:42 AM, Jun Wu <quark@fb.com> wrote: > >> As for alternatives, a correct solution needs to refer to the marker it is > >> replacing. Since cycles should be rare, can we record the full "replaced > >> marker" data inline in the new marker? In local storage, that could be an > > > > If "markers being replaced" are explicitly recorded, you will miss remote > > markers that can be possibly replaced because you don't know them at the > > time appending a new local marker. So you end up with some "conflict > > resolution" logic during exchange. > > > > That is not very different from just using the offsets - since obsstore is > > append-only, new markers just "replace" old ones (I don't think there is an > > exception that the newly added marker is intended to be replaced by a > > previous one when working locally). It's simpler but has the same exchange > > headache. > > I wonder: could we get away from using dates by putting some kind of > generation number in the marker? So if a marker would create a cycle, > we increment its generation number relative to the previous highest > value in the cycle. The problem is you don't know if a marker will create a cycle (or should invalidate another marker), because remote markers are unknown. If you do that during exchange, it makes exchange more complex. I think that is not very different from just using the offsets. Basically, if the new filed which is meant to be spread globally, date is probably the best option. If it is not meant to be spread globally, offset is already a decent choice, without format change. But offsets brings up the exchange headache. marmoute preferred date to offsets. And I agree because I don't have good idea to handle the exchange headache (or I think it's better to be fully automatic). > It still means a bad actor can produce a hard-to-defeat cycle (sort > of), but it solves the clock skew issue. For clocks going backwards, we can detect that, say we get maxdate = max(m.date for m in markers), then if we are going to write a marker with date < maxdate, we could warn people, and/or write (date+rand(0 to 1)) as a fake date. That would at least make non-exchange experience always expected.
> On Mar 23, 2017, at 6:15 PM, Jun Wu <quark@fb.com> wrote: > > Excerpts from Augie Fackler's message of 2017-03-23 17:53:39 -0400: >> On Wed, Mar 22, 2017 at 1:42 AM, Jun Wu <quark@fb.com> wrote: >>>> As for alternatives, a correct solution needs to refer to the marker it is >>>> replacing. Since cycles should be rare, can we record the full "replaced >>>> marker" data inline in the new marker? In local storage, that could be an >>> >>> If "markers being replaced" are explicitly recorded, you will miss remote >>> markers that can be possibly replaced because you don't know them at the >>> time appending a new local marker. So you end up with some "conflict >>> resolution" logic during exchange. >>> >>> That is not very different from just using the offsets - since obsstore is >>> append-only, new markers just "replace" old ones (I don't think there is an >>> exception that the newly added marker is intended to be replaced by a >>> previous one when working locally). It's simpler but has the same exchange >>> headache. >> >> I wonder: could we get away from using dates by putting some kind of >> generation number in the marker? So if a marker would create a cycle, >> we increment its generation number relative to the previous highest >> value in the cycle. > > The problem is you don't know if a marker will create a cycle (or should > invalidate another marker), because remote markers are unknown. If you do > that during exchange, it makes exchange more complex. > > I think that is not very different from just using the offsets. > > Basically, if the new filed which is meant to be spread globally, date is > probably the best option. If it is not meant to be spread globally, offset > is already a decent choice, without format change. But offsets brings up > the exchange headache. > > marmoute preferred date to offsets. And I agree because I don't have good > idea to handle the exchange headache (or I think it's better to be fully > automatic). > >> It still means a bad actor can produce a hard-to-defeat cycle (sort >> of), but it solves the clock skew issue. > > For clocks going backwards, we can detect that, say we get > maxdate = max(m.date for m in markers), then if we are going to write a > marker with date < maxdate, we could warn people, and/or write > (date+rand(0 to 1)) as a fake date. That would at least make non-exchange > experience always expected. Could we default to using the current timestamp, but if there’s a member of the obsmarker chain we’re about to add to with a time ahead of our clock, we do (that time + rand(0,1)) as the time? Greg, does that alleviate most of your concern around using times? (This has the negative that in the presence of marker cycles time is somewhat a lie, but I think the overall ergonomics seem to be about right at least as a thought experiment...)
Excerpts from Augie Fackler's message of 2017-03-23 20:59:03 -0400: > > For clocks going backwards, we can detect that, say we get > > maxdate = max(m.date for m in markers), then if we are going to write a > > marker with date < maxdate, we could warn people, and/or write > > (date+rand(0 to 1)) as a fake date. That would at least make non-exchange > > experience always expected. > > Could we default to using the current timestamp, but if there’s a member > of the obsmarker chain we’re about to add to with a time ahead of our > clock, we do (that time + rand(0,1)) as the time? Greg, does that > alleviate most of your concern around using times? That was what I suggested. A warning is probably worthy if that happens. More strictly, when loading the obsstore, if any marker has a time greater than the local system time, it may be worth a warning so we can detect problems early. > (This has the negative that in the presence of marker cycles time is > somewhat a lie, but I think the overall ergonomics seem to be about right > at least as a thought experiment...)
On 23/03/2017 22:15, Jun Wu wrote: > Excerpts from Augie Fackler's message of 2017-03-23 17:53:39 -0400: >> On Wed, Mar 22, 2017 at 1:42 AM, Jun Wu <quark@fb.com> wrote: >>>> As for alternatives, a correct solution needs to refer to the marker it is >>>> replacing. Since cycles should be rare, can we record the full "replaced >>>> marker" data inline in the new marker? In local storage, that could be an >>> >>> If "markers being replaced" are explicitly recorded, you will miss remote >>> markers that can be possibly replaced because you don't know them at the >>> time appending a new local marker. So you end up with some "conflict >>> resolution" logic during exchange. >>> >>> That is not very different from just using the offsets - since obsstore is >>> append-only, new markers just "replace" old ones (I don't think there is an >>> exception that the newly added marker is intended to be replaced by a >>> previous one when working locally). It's simpler but has the same exchange >>> headache. >> >> I wonder: could we get away from using dates by putting some kind of >> generation number in the marker? So if a marker would create a cycle, >> we increment its generation number relative to the previous highest >> value in the cycle. > > The problem is you don't know if a marker will create a cycle (or should > invalidate another marker), because remote markers are unknown. If you do > that during exchange, it makes exchange more complex. > > I think that is not very different from just using the offsets. > > Basically, if the new filed which is meant to be spread globally, date is > probably the best option. If it is not meant to be spread globally, offset > is already a decent choice, without format change. But offsets brings up > the exchange headache. > A thought experiment, to tease out concerns here; if we replaced the use of date with hashfunc(marker) (where hashfunc is a randomly chosen crypto hash), would people still have concerns? In other words, if we take the semantics of "date" away, and treat "date" as a random number that happens to always exist, do people's concerns still hold water?
On 25/03/2017 08:20, Simon Farnsworth wrote: > On 23/03/2017 22:15, Jun Wu wrote: >> Excerpts from Augie Fackler's message of 2017-03-23 17:53:39 -0400: >>> On Wed, Mar 22, 2017 at 1:42 AM, Jun Wu <quark@fb.com> wrote: >>>>> As for alternatives, a correct solution needs to refer to the marker it is >>>>> replacing. Since cycles should be rare, can we record the full "replaced >>>>> marker" data inline in the new marker? In local storage, that could be an >>>> If "markers being replaced" are explicitly recorded, you will miss remote >>>> markers that can be possibly replaced because you don't know them at the >>>> time appending a new local marker. So you end up with some "conflict >>>> resolution" logic during exchange. >>>> That is not very different from just using the offsets - since obsstore is >>>> append-only, new markers just "replace" old ones (I don't think there is an >>>> exception that the newly added marker is intended to be replaced by a >>>> previous one when working locally). It's simpler but has the same exchange >>>> headache. >>> I wonder: could we get away from using dates by putting some kind of >>> generation number in the marker? So if a marker would create a cycle, >>> we increment its generation number relative to the previous highest >>> value in the cycle. >> The problem is you don't know if a marker will create a cycle (or should >> invalidate another marker), because remote markers are unknown. If you do >> that during exchange, it makes exchange more complex. >> I think that is not very different from just using the offsets. >> Basically, if the new filed which is meant to be spread globally, date is >> probably the best option. If it is not meant to be spread globally, offset >> is already a decent choice, without format change. But offsets brings up >> the exchange headache. > A thought experiment, to tease out concerns here; if we replaced the use > of date with hashfunc(marker) (where hashfunc is a randomly chosen > crypto hash), would people still have concerns? > In other words, if we take the semantics of "date" away, and treat > "date" as a random number that happens to always exist, do people's > concerns still hold water? How do we decide which of the different hashes represents the cycle stop? E.g. I created a A<-B marker with "extra" H1, someone else cleated B->A marker with "extra" H2, we exchanged, how do we decide which commit to use? My understanding was that as long as H1 and H2 are universally oredered (meaning dates), we know what to do, but without ordering (like hashes) we do not. Am I missing something? >
On Thu, Mar 23, 2017 at 6:29 PM, Jun Wu <quark@fb.com> wrote: > Excerpts from Augie Fackler's message of 2017-03-23 20:59:03 -0400: > > > For clocks going backwards, we can detect that, say we get > > > maxdate = max(m.date for m in markers), then if we are going to write a > > > marker with date < maxdate, we could warn people, and/or write > > > (date+rand(0 to 1)) as a fake date. That would at least make > non-exchange > > > experience always expected. > > > > Could we default to using the current timestamp, but if there’s a member > > of the obsmarker chain we’re about to add to with a time ahead of our > > clock, we do (that time + rand(0,1)) as the time? Greg, does that > > alleviate most of your concern around using times? > > That was what I suggested. A warning is probably worthy if that happens. > > More strictly, when loading the obsstore, if any marker has a time greater > than the local system time, it may be worth a warning so we can detect > problems early. > That does mitigate the impact of bad future times somewhat. I think there are still issues with bad past times losing out precedence. But I could be wrong since I don't know the full semantics of obsolete store processing. > > > (This has the negative that in the presence of marker cycles time is > > somewhat a lie, but I think the overall ergonomics seem to be about right > > at least as a thought experiment...) > I have a thought experiment to add. The obsolete store is strictly ordered. So for a repo detached from the world and not doing any exchange, "date" is strictly metadata and the offset of a marker in the store is all that matters for precedence. This assumption goes completely out the window once exchange is involved because when you do `hg pull` you find all markers from the remote not in the local store and append them. So now your local store is unordered and we have this problem of needing to reorder markers to establish precedence. It seems like we're shooting ourselves in the foot by consolidating all markers into a single local store. Have we ever considered having a separate store per remote or at least preserving the illusion of a separate store (e.g. by recording metadata such as "markers N to M pulled from X")? This would preserve properties of strict ordering within each peer. There are still problems of figuring out the relative ordering between markers produced on different peers (with different clocks). But without online coordination at the time of marker generation, this is *impossible* to achieve. The best you can do is preserve strict ordering within each source and exchange/record generation numbers (e.g. marker number/offset) to attempt to resolve relative ordering and to drive "merging." Of course, the implementation of this is complex and requires the concept of a persistent and identifiable peer. But it is more robust than relying on clocks alone. Without an online distributed system coordinating strict ordering, it is impossible to automatically resolve marker ordering with 100% accuracy. I worry that attempting to use wall clock times to resolve ordering will result in silent or non-obvious data loss. That breaks a fundamental property of version control: don't lose (or corrupt) my data. Whatever we do here, there needs to be an option to manually resolve ambiguous ordering without relying on dates. I concede this is a very difficult problem on several fronts.
> On Mar 26, 2017, at 2:25 PM, Gregory Szorc <gregory.szorc@gmail.com> wrote: > > It seems like we're shooting ourselves in the foot by consolidating all markers into a single local store. Have we ever considered having a separate store per remote or at least preserving the illusion of a separate store (e.g. by recording metadata such as "markers N to M pulled from X")? This would preserve properties of strict ordering within each peer. There are still problems of figuring out the relative ordering between markers produced on different peers (with different clocks). But without online coordination at the time of marker generation, this is *impossible* to achieve. The best you can do is preserve strict ordering within each source and exchange/record generation numbers (e.g. marker number/offset) to attempt to resolve relative ordering and to drive "merging." I’ve considered a similar line of thinking for prunes (basically having “local” markers and “exchanged” markers, and by default not propagating prunes). Maybe it’s worth revisiting.
Excerpts from Gregory Szorc's message of 2017-03-26 11:25:28 -0700: > I have a thought experiment to add. > > The obsolete store is strictly ordered. So for a repo detached from the > world and not doing any exchange, "date" is strictly metadata and the > offset of a marker in the store is all that matters for precedence. > > This assumption goes completely out the window once exchange is involved > because when you do `hg pull` you find all markers from the remote not in > the local store and append them. So now your local store is unordered and > we have this problem of needing to reorder markers to establish precedence. > > It seems like we're shooting ourselves in the foot by consolidating all > markers into a single local store. Have we ever considered having a > separate store per remote or at least preserving the illusion of a separate > store (e.g. by recording metadata such as "markers N to M pulled from X")? > This would preserve properties of strict ordering within each peer. There > are still problems of figuring out the relative ordering between markers > produced on different peers (with different clocks). But without online > coordination at the time of marker generation, this is *impossible* to > achieve. The best you can do is preserve strict ordering within each source > and exchange/record generation numbers (e.g. marker number/offset) to > attempt to resolve relative ordering and to drive "merging." One property that marmoute wants to maintain (and I agree) is the information is global. That is to say, if we have information about "how to resolve between X and Y", the "how to" information needs to be shared to other peers. So every peer will have a consistent view. > Of course, the implementation of this is complex and requires the concept > of a persistent and identifiable peer. But it is more robust than relying > on clocks alone. > > Without an online distributed system coordinating strict ordering, it is > impossible to automatically resolve marker ordering with 100% accuracy. I True. > worry that attempting to use wall clock times to resolve ordering will > result in silent or non-obvious data loss. That breaks a fundamental > property of version control: don't lose (or corrupt) my data. Whatever we > do here, there needs to be an option to manually resolve ambiguous ordering > without relying on dates. I concede this is a very difficult problem on > several fronts. The above idea introduces "local", "remote" concepts and more interestingly, a new state where conflicted markers co-exist but remain unresolved. Where the "date" approach does not allow that state. I'd discuss what do we do if we encounter that unresolved conflict. For example, there are a local marker "A is replaced by B", and a foreign marker "B is replaced by A". Options are: 1. Hide both A and B. It's the current behavior. 2. Hide none. Inspired by "darcs" - things get canceled out if conflict. 3. Resolve automatically by something, ex. date. No doubt 1 is not the right thing to do. Bboth 2 and 3 are reasonable to me. 2 is perfect in theory, but practically it's less user-friendly: it require users to learn new concepts and force manual actions where using date may be just 95+% correct. So I'd prefer 3, not only because it's much simpler to implement, but also from a user-friendliness perspective. I agree that "manual resolution" is needed to fix cases when auto resolution is wrong. Given the current situation, I think that could be implemented as a non-blocking "post-resolution", instead of a "pre-resolution" that may block some other workflow. That is, the obsstore always auto resolves everything without interaction. But we provide a command to examine markers potentially out-of-order, and allows users to "manual resolve" them, by writing new markers with newer dates. Does that sound good?
Per discussion on IRC with marmoute, this series lacks of documentation. I'll drop it from patchwork and send a new version with better documentation and some planned fixes. The next version has a same core idea, but the interface is more formalized and is made future proof in mind. The interface changes are by myself and haven't been discussed here. Excerpts from Augie Fackler's message of 2017-03-23 18:05:45 -0400: > On Mon, Mar 13, 2017 at 11:57:08AM -0700, Pierre-Yves David wrote: > > > > > > On 03/13/2017 09:23 AM, Durham Goode wrote: > > >On 3/13/17 2:48 AM, Jun Wu wrote: > > >># HG changeset patch > > >># User Jun Wu <quark@fb.com> > > >># Date 1489395002 25200 > > >># Mon Mar 13 01:50:02 2017 -0700 > > >># Node ID 6ae6d1069ba1d4089afaeb0bb8ef2411983a1292 > > >># Parent 0280ee091bd0ae33aa0a67b0c8a55ccffd2e0718 > > >># Available At > > >>https://bitbucket.org/quark-zju/hg-draft > > >> > > >># hg pull > > >>https://bitbucket.org/quark-zju/hg-draft > > >>-r 6ae6d1069ba1 > > >>obsolete: allow cycles > > >> > > >>Now we can handle cycles nicely, allow them to be created. Some practical > > >>examples: > > >> > > >> - To revive X, just create a marker X -> X, with a newer date. > > >> - To prune X again, just create a marker X -> (), with a newer date. > > >> - The above two could be repeated. > > >> > > >> - To unamend A -> B, just create a marker B -> A, with a newer date. > > >> > > >>It's now possible for "touch" and "unamend" to reuse hashes (therefore > > >>more > > >>user-friendly). And it's no longer necessary to write "*_source" in > > >>commit > > >>metadata to workarounds obs cycles. The hacky inhibit extension also > > >>becomes > > >>unnecessary. > > >> > > >>Finally. I have been wanting all these for a long time. > > > > > >Seems pretty elegant, though I haven't fully understood it yet. Maybe > > >you guys talked about this in person, but what effect (if any) does this > > >have on exchange? > > > > We did not really discuss this at the sprint :-/ > > > > This probably has effect on two aspects of pull and push: > > > > * Computation of the relevant markers for a None > > > > * Computation of "head replacement" when pushing branches obsoleting another > > one. > > > > The proposal have some very interesting aspect, I'll try to do a deep review > > of its impact on the general concept soon™ (probably not this week) > > I've had a look over the code, and it's surprisingly simple (though it > could probably use more comments for future code archaeologists.) I'm > wary (as is indygreg) of using the date field for this sorting, just > because clocks are known sources of evil and consternation. > > (I acknowledge we might not have a significantly better choice though, > although I'd like to give it some more thought.) > > > > > Cheers, > > > > -- > > Pierre-Yves David > > _______________________________________________ > > Mercurial-devel mailing list > > Mercurial-devel@mercurial-scm.org > > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
On Thu, Mar 30, 2017 at 12:40 PM, Jun Wu <quark@fb.com> wrote: > I'll drop it from patchwork and send a new version with better documentation > and some planned fixes. The next version has a same core idea, but the > interface is more formalized and is made future proof in mind. The interface > changes are by myself and haven't been discussed here. Actually, based on a quick discussion I had with Durham, can you check with him before sending another volley? It'll help the reviewers if we can keep the patch traffic on this topic at a lower rate until we actually get some better understanding of where we want/need to go architecturally. Thanks!
Excerpts from Augie Fackler's message of 2017-03-30 12:43:31 -0400: > On Thu, Mar 30, 2017 at 12:40 PM, Jun Wu <quark@fb.com> wrote: > > I'll drop it from patchwork and send a new version with better documentation > > and some planned fixes. The next version has a same core idea, but the > > interface is more formalized and is made future proof in mind. The interface > > changes are by myself and haven't been discussed here. > > > Actually, based on a quick discussion I had with Durham, can you check > with him before sending another volley? It'll help the reviewers if we > can keep the patch traffic on this topic at a lower rate until we > actually get some better understanding of where we want/need to go > architecturally. > > Thanks! I'll chat with him today. Thanks! By the way, for series about histedit - I agree that marmoute was right, and rolling back is the right choice for now. So everything about histedit should be considered as "settled for now". A better solution about histedit *depends on* the "node versions" approach. So I may restart the histedit change if we have "node versions". The most interesting discussion to me is so-called "obscycles" (aka this series), but it's more precisely "node versions" as the first patch title suggests. "cycles" are just a subset of problems it solves <- this paragraph is for marmoute who might think I don't even understand my own patches.
Patch
diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py --- a/mercurial/obsolete.py +++ b/mercurial/obsolete.py @@ -638,6 +638,4 @@ class obsstore(object): if len(succ) != 20: raise ValueError(succ) - if prec in succs: - raise ValueError(_('in-marker cycle with %s') % node.hex(prec)) metadata = tuple(sorted(metadata.iteritems())) @@ -1296,7 +1294,4 @@ def createmarkers(repo, relations, flag= if not nsucs: npare = tuple(p.node() for p in prec.parents()) - if nprec in nsucs: - raise error.Abort(_("changeset %s cannot obsolete itself") - % prec) # Creating the marker causes the hidden cache to become invalid, diff --git a/tests/test-obsolete-cycle.t b/tests/test-obsolete-cycle.t new file mode 100644 --- /dev/null +++ b/tests/test-obsolete-cycle.t @@ -0,0 +1,153 @@ + $ cat >> $HGRCPATH << EOF + > [ui] + > logtemplate="{rev} {desc}\n" + > [phases] + > publish=false + > [experimental] + > evolution=createmarkers + > [extensions] + > drawdag=$TESTDIR/drawdag.py + > EOF + + $ getid() { + > hg log -T "{node}\n" --hidden -r "desc('$1')" + > } + $ assignnames() { + > for i in "$@"; do + > eval $i=`getid $i` + > # remove local tags created by drawdag + > hg tag --remove -l $i + > done + > } + +Cycle of 2 changesets + + $ hg init repo1 + $ cd repo1 + + $ hg debugdrawdag << EOF + > B C + > |/ + > A + > EOF + + $ assignnames A B C + + $ hg debugobsolete -d '0 0' $B $C + $ hg log -G + o 2 C + | + o 0 A + + + $ hg debugobsolete -d '1 0' $C $B + $ hg log -G + o 1 B + | + o 0 A + + + $ hg debugobsolete -d '2 0' $B $C + $ hg log -G + o 2 C + | + o 0 A + + + $ hg debugobsolete -d '3 0' $B $B + $ hg log -G + o 2 C + | + | o 1 B + |/ + o 0 A + + + $ hg debugobsolete -d '4 0' $B + $ hg debugobsolete -d '6 0' $C + $ hg log -G + o 0 A + + + $ hg debugobsolete -d '7 0' $B $B + $ hg debugobsolete -d '5 0' $C $C + $ hg log -G + o 1 B + | + o 0 A + + + $ hg debugobsolete -d '8 0' $C $C + $ hg log -G + o 2 C + | + | o 1 B + |/ + o 0 A + + + $ cd .. + +Cycle of 3 changesets + $ hg init repo2 + $ cd repo2 + + $ hg debugdrawdag << EOF + > D B C + > \|/ + > A + > EOF + + $ assignnames A B C D + + $ hg debugobsolete -d '2 0' $C $D + $ hg debugobsolete -d '3 0' $D $C + $ hg debugobsolete -d '1 0' $B $C + + $ hg log -G + o 2 C + | + o 0 A + + + $ hg debugobsolete -d '4 0' $D $D + $ hg log -G + o 3 D + | + | o 2 C + |/ + o 0 A + + + $ hg debugobsolete -d '5 0' $B $B + $ hg log -G + o 3 D + | + | o 2 C + |/ + | o 1 B + |/ + o 0 A + + + $ hg debugobsolete -d '6 0' $D $B + $ hg log -G + o 2 C + | + | o 1 B + |/ + o 0 A + + $ hg debugobsolete -d '9 0' $C + $ hg log -G + o 1 B + | + o 0 A + + + $ hg debugobsolete -d '8 0' $B $C + $ hg log -G + o 0 A + + + $ cd .. diff --git a/tests/test-obsolete.t b/tests/test-obsolete.t --- a/tests/test-obsolete.t +++ b/tests/test-obsolete.t @@ -62,11 +62,4 @@ Killing a single changeset without repla $ hg up --hidden tip --quiet -Killing a single changeset with itself should fail -(simple local safeguard) - - $ hg debugobsolete `getid kill_me` `getid kill_me` - abort: bad obsmarker input: in-marker cycle with 97b7c2d76b1845ed3eb988cd612611e72406cef0 - [255] - $ cd ..