Patchwork rebase: move actual rebase into a single transaction

login
register
mail settings
Submitter Durham Goode
Date March 4, 2017, 10:08 p.m.
Message ID <9c3ea2112952f398aaa7.1488665306@dev111.prn1.facebook.com>
Download mbox | patch
Permalink /patch/18909/
State Superseded
Delegated to: Augie Fackler
Headers show

Comments

Durham Goode - March 4, 2017, 10:08 p.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1488665211 28800
#      Sat Mar 04 14:06:51 2017 -0800
# Node ID 9c3ea2112952f398aaa7625f43dcfa36cfd34379
# Parent  b4cd912d7704cd976e1bee3a3c927e0e578ec88f
rebase: move actual rebase into a single transaction

Previously, rebasing would open several transaction over the course of rebasing
several commits. Opening a transaction can have notable overhead (like copying
the dirstate) which can add up when rebasing many commits.

This patch adds a single large transaction around the actual commit rebase
operation, with a catch for intervention which serializes the current state if
we need to drop back to the terminal for user intervention. Amazingly, almost
all the tests seem to pass.

On large repos with large working copies, this can speed up rebasing 7 commits
by 25%. I'd expect the percentage to be a bit larger for rebasing even more
commits.
Durham Goode - March 4, 2017, 10:11 p.m.
On 3/4/17 2:08 PM, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1488665211 28800
> #      Sat Mar 04 14:06:51 2017 -0800
> # Node ID 9c3ea2112952f398aaa7625f43dcfa36cfd34379
> # Parent  b4cd912d7704cd976e1bee3a3c927e0e578ec88f
> rebase: move actual rebase into a single transaction
>
> Previously, rebasing would open several transaction over the course of rebasing
> several commits. Opening a transaction can have notable overhead (like copying
> the dirstate) which can add up when rebasing many commits.
>
> This patch adds a single large transaction around the actual commit rebase
> operation, with a catch for intervention which serializes the current state if
> we need to drop back to the terminal for user intervention. Amazingly, almost
> all the tests seem to pass.
>
> On large repos with large working copies, this can speed up rebasing 7 commits
> by 25%. I'd expect the percentage to be a bit larger for rebasing even more
> commits.

This will also help prep us for inmemory rebases, since the concept of 
serializing mid-rebase for user interaction could apply in that case too.
Durham Goode - March 4, 2017, 11 p.m.
On 3/4/17 2:11 PM, Durham Goode wrote:
>
>
> On 3/4/17 2:08 PM, Durham Goode wrote:
>> # HG changeset patch
>> # User Durham Goode <durham@fb.com>
>> # Date 1488665211 28800
>> #      Sat Mar 04 14:06:51 2017 -0800
>> # Node ID 9c3ea2112952f398aaa7625f43dcfa36cfd34379
>> # Parent  b4cd912d7704cd976e1bee3a3c927e0e578ec88f
>> rebase: move actual rebase into a single transaction
>>
>> Previously, rebasing would open several transaction over the course of
>> rebasing
>> several commits. Opening a transaction can have notable overhead (like
>> copying
>> the dirstate) which can add up when rebasing many commits.
>>
>> This patch adds a single large transaction around the actual commit
>> rebase
>> operation, with a catch for intervention which serializes the current
>> state if
>> we need to drop back to the terminal for user intervention. Amazingly,
>> almost
>> all the tests seem to pass.
>>
>> On large repos with large working copies, this can speed up rebasing 7
>> commits
>> by 25%. I'd expect the percentage to be a bit larger for rebasing even
>> more
>> commits.
>
> This will also help prep us for inmemory rebases, since the concept of
> serializing mid-rebase for user interaction could apply in that case too.

A similar patch for histedit speeds up a 7 commit histedit in a large 
repo by over 35%. I'll wait for this patch to receive comments before 
sending out the histedit one, since I assume the comments will be very 
similar.
Augie Fackler - March 7, 2017, 6:08 p.m.
On Sat, Mar 04, 2017 at 02:08:26PM -0800, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1488665211 28800
> #      Sat Mar 04 14:06:51 2017 -0800
> # Node ID 9c3ea2112952f398aaa7625f43dcfa36cfd34379
> # Parent  b4cd912d7704cd976e1bee3a3c927e0e578ec88f
> rebase: move actual rebase into a single transaction
>
> Previously, rebasing would open several transaction over the course of rebasing
> several commits. Opening a transaction can have notable overhead (like copying
> the dirstate) which can add up when rebasing many commits.
>
> This patch adds a single large transaction around the actual commit rebase
> operation, with a catch for intervention which serializes the current state if
> we need to drop back to the terminal for user intervention. Amazingly, almost
> all the tests seem to pass.
>
> On large repos with large working copies, this can speed up rebasing 7 commits
> by 25%. I'd expect the percentage to be a bit larger for rebasing even more
> commits.

I like it, seems like the correct thing to do in any case. However...

[...]

> diff --git a/tests/test-rebase-abort.t b/tests/test-rebase-abort.t
> --- a/tests/test-rebase-abort.t
> +++ b/tests/test-rebase-abort.t
>
[...]

> @@ -398,7 +399,7 @@ test aborting an interrupted series (iss
>    parent: 0:df4f53cec30a
>     base
>    branch: default
> -  commit: (clean)
> +  commit: 1 unknown (interrupted update)
>    update: 6 new changesets (update)
>    phases: 7 draft

What's up with this? Shouldn't rebase --abort put us back in a clean state?

>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Durham Goode - March 7, 2017, 9:16 p.m.
On 3/7/17 10:08 AM, Augie Fackler wrote:
> On Sat, Mar 04, 2017 at 02:08:26PM -0800, Durham Goode wrote:
>> # HG changeset patch
>> # User Durham Goode <durham@fb.com>
>> # Date 1488665211 28800
>> #      Sat Mar 04 14:06:51 2017 -0800
>> # Node ID 9c3ea2112952f398aaa7625f43dcfa36cfd34379
>> # Parent  b4cd912d7704cd976e1bee3a3c927e0e578ec88f
>> rebase: move actual rebase into a single transaction
>>
>> Previously, rebasing would open several transaction over the course of rebasing
>> several commits. Opening a transaction can have notable overhead (like copying
>> the dirstate) which can add up when rebasing many commits.
>>
>> This patch adds a single large transaction around the actual commit rebase
>> operation, with a catch for intervention which serializes the current state if
>> we need to drop back to the terminal for user intervention. Amazingly, almost
>> all the tests seem to pass.
>>
>> On large repos with large working copies, this can speed up rebasing 7 commits
>> by 25%. I'd expect the percentage to be a bit larger for rebasing even more
>> commits.
>
> I like it, seems like the correct thing to do in any case. However...
>
> [...]
>
>> diff --git a/tests/test-rebase-abort.t b/tests/test-rebase-abort.t
>> --- a/tests/test-rebase-abort.t
>> +++ b/tests/test-rebase-abort.t
>>
> [...]
>
>> @@ -398,7 +399,7 @@ test aborting an interrupted series (iss
>>    parent: 0:df4f53cec30a
>>     base
>>    branch: default
>> -  commit: (clean)
>> +  commit: 1 unknown (interrupted update)
>>    update: 6 new changesets (update)
>>    phases: 7 draft
>
> What's up with this? Shouldn't rebase --abort put us back in a clean state?

This is actually a bug in normal rebase.  If the rebase aborts while the 
working copy is on the commit that it started on, then it doesn't need 
to perform any update during rebase --abort, which means it doesn't 
clean up the .hg/updatestate file. The single transaction makes this 
more obvious since when the whole transaction fails, it dumps everything 
back to the way things were (i.e. you're back on the initial commit), 
but leaves the working copy dirty.  I guess we could fix this by also 
checking if status is clean when deciding whether to perform an update 
during rebase --abort.  Or maybe we just run the update every time, 
since it's extremely likely to be needed.

This raises another issue though, that the rebase state file is not 
managed by the transaction, so it is being written mid transaction and 
doesn't get rolled back appropriately.  I'll look into fixing both of 
these and resending.
Jun Wu - March 7, 2017, 9:51 p.m.
Excerpts from Durham Goode's message of 2017-03-07 13:16:29 -0800:
> 
> On 3/7/17 10:08 AM, Augie Fackler wrote:
> > On Sat, Mar 04, 2017 at 02:08:26PM -0800, Durham Goode wrote:
> >> # HG changeset patch
> >> # User Durham Goode <durham@fb.com>
> >> # Date 1488665211 28800
> >> #      Sat Mar 04 14:06:51 2017 -0800
> >> # Node ID 9c3ea2112952f398aaa7625f43dcfa36cfd34379
> >> # Parent  b4cd912d7704cd976e1bee3a3c927e0e578ec88f
> >> rebase: move actual rebase into a single transaction
> >>
> >> Previously, rebasing would open several transaction over the course of rebasing
> >> several commits. Opening a transaction can have notable overhead (like copying
> >> the dirstate) which can add up when rebasing many commits.
> >>
> >> This patch adds a single large transaction around the actual commit rebase
> >> operation, with a catch for intervention which serializes the current state if
> >> we need to drop back to the terminal for user intervention. Amazingly, almost
> >> all the tests seem to pass.
> >>
> >> On large repos with large working copies, this can speed up rebasing 7 commits
> >> by 25%. I'd expect the percentage to be a bit larger for rebasing even more
> >> commits.
> >
> > I like it, seems like the correct thing to do in any case. However...
> >
> > [...]
> >
> >> diff --git a/tests/test-rebase-abort.t b/tests/test-rebase-abort.t
> >> --- a/tests/test-rebase-abort.t
> >> +++ b/tests/test-rebase-abort.t
> >>
> > [...]
> >
> >> @@ -398,7 +399,7 @@ test aborting an interrupted series (iss
> >>    parent: 0:df4f53cec30a
> >>     base
> >>    branch: default
> >> -  commit: (clean)
> >> +  commit: 1 unknown (interrupted update)
> >>    update: 6 new changesets (update)
> >>    phases: 7 draft
> >
> > What's up with this? Shouldn't rebase --abort put us back in a clean state?
> 
> This is actually a bug in normal rebase.  If the rebase aborts while the 
> working copy is on the commit that it started on, then it doesn't need 
> to perform any update during rebase --abort, which means it doesn't 
> clean up the .hg/updatestate file. The single transaction makes this 
> more obvious since when the whole transaction fails, it dumps everything 
> back to the way things were (i.e. you're back on the initial commit), 
> but leaves the working copy dirty.  I guess we could fix this by also 
> checking if status is clean when deciding whether to perform an update 
> during rebase --abort.  Or maybe we just run the update every time, 
> since it's extremely likely to be needed.
> 
> This raises another issue though, that the rebase state file is not 
> managed by the transaction, so it is being written mid transaction and 
> doesn't get rolled back appropriately.  I'll look into fixing both of 
> these and resending.

I think we either just update so the behavior will "look like" what it was
before, or mark this as a "BC".
Jun Wu - March 7, 2017, 9:52 p.m.
Excerpts from Jun Wu's message of 2017-03-07 13:51:17 -0800:
> I think we either just update so the behavior will "look like" what it was
> before, or mark this as a "BC".

After a second thought, it does not seem that we could keep the old
behavior. So it's probably a BC either way.

Patch

diff --git a/hgext/rebase.py b/hgext/rebase.py
--- a/hgext/rebase.py
+++ b/hgext/rebase.py
@@ -679,7 +679,12 @@  def rebase(ui, repo, **opts):
             if retcode is not None:
                 return retcode
 
-        rbsrt._performrebase()
+        with repo.transaction('rebase') as tr:
+            try:
+                rbsrt._performrebase()
+            except error.InterventionRequired:
+                tr.close()
+                raise
         rbsrt._finishrebase()
     finally:
         release(lock, wlock)
diff --git a/tests/test-rebase-abort.t b/tests/test-rebase-abort.t
--- a/tests/test-rebase-abort.t
+++ b/tests/test-rebase-abort.t
@@ -374,11 +374,12 @@  test aborting an interrupted series (iss
   $ hg --config extensions.n=$TESTDIR/failfilemerge.py rebase -s 3 -d tip
   rebasing 3:3a71550954f1 "b"
   rebasing 4:e80b69427d80 "c"
+  transaction abort!
+  rollback completed
   abort: ^C
   [255]
   $ hg rebase --abort
-  saved backup bundle to $TESTTMP/interrupted/.hg/strip-backup/3d8812cf300d-93041a90-backup.hg (glob)
-  rebase aborted
+  rebase aborted (no revision is removed, only broken state is cleared)
   $ hg log -G --template "{rev} {desc} {bookmarks}"
   o  6 no-a
   |
@@ -398,7 +399,7 @@  test aborting an interrupted series (iss
   parent: 0:df4f53cec30a 
    base
   branch: default
-  commit: (clean)
+  commit: 1 unknown (interrupted update)
   update: 6 new changesets (update)
   phases: 7 draft