Patchwork [1,of,2] rebase: add config to move rebase into a single transaction

login
register
mail settings
Submitter Durham Goode
Date July 11, 2017, 8:54 p.m.
Message ID <46b1d80be3c6741ba69f.1499806443@dev111.prn1.facebook.com>
Download mbox | patch
Permalink /patch/22239/
State Changes Requested
Headers show

Comments

Durham Goode - July 11, 2017, 8:54 p.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1499805985 25200
#      Tue Jul 11 13:46:25 2017 -0700
# Node ID 46b1d80be3c6741ba69f5029a8a4315151552293
# Parent  062c1bde178118b7c4d5abb1ead16ac8ad494280
rebase: add config to move rebase into a single transaction

This was previously landed as cf8ad0e6c0e4 but backed out in a5abaa81fa because
it broke hook mid rebase and caused conflict resolution data loss in the event
of unexpected exceptions. This new version adds the behavior back but behind a
config flag, since the performance improvement is notable in large repositories.

The next patch adds a test covering this config.

The old commit message was:

Previously, rebasing would open several transaction over the course of rebasing
several commits. Opening a transaction can have notable overhead (like copying
the dirstate) which can add up when rebasing many commits.

This patch adds a single large transaction around the actual commit rebase
operation, with a catch for intervention which serializes the current state if
we need to drop back to the terminal for user intervention. Amazingly, almost
all the tests seem to pass.

On large repos with large working copies, this can speed up rebasing 7 commits
by 25%. I'd expect the percentage to be a bit larger for rebasing even more
commits.

There are minor test changes because we're rolling back the entire transaction
during unexpected exceptions instead of just stopping mid-rebase, so there's no
more backup bundle. It also leave an unknown file in the working copy, since our
clean up 'hg update' doesn't delete unknown files.
via Mercurial-devel - July 11, 2017, 9:20 p.m.
On Tue, Jul 11, 2017 at 1:54 PM, Durham Goode <durham@fb.com> wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1499805985 25200
> #      Tue Jul 11 13:46:25 2017 -0700
> # Node ID 46b1d80be3c6741ba69f5029a8a4315151552293
> # Parent  062c1bde178118b7c4d5abb1ead16ac8ad494280
> rebase: add config to move rebase into a single transaction
>
> This was previously landed as cf8ad0e6c0e4 but backed out in a5abaa81fa because
> it broke hook mid rebase and caused conflict resolution data loss in the event
> of unexpected exceptions. This new version adds the behavior back but behind a
> config flag, since the performance improvement is notable in large repositories.
>
> The next patch adds a test covering this config.
>
> The old commit message was:
>
> Previously, rebasing would open several transaction over the course of rebasing
> several commits. Opening a transaction can have notable overhead (like copying
> the dirstate) which can add up when rebasing many commits.
>
> This patch adds a single large transaction around the actual commit rebase
> operation, with a catch for intervention which serializes the current state if
> we need to drop back to the terminal for user intervention. Amazingly, almost
> all the tests seem to pass.
>
> On large repos with large working copies, this can speed up rebasing 7 commits
> by 25%. I'd expect the percentage to be a bit larger for rebasing even more
> commits.
>
> There are minor test changes because we're rolling back the entire transaction
> during unexpected exceptions instead of just stopping mid-rebase, so there's no
> more backup bundle. It also leave an unknown file in the working copy, since our
> clean up 'hg update' doesn't delete unknown files.
>
> diff --git a/hgext/rebase.py b/hgext/rebase.py
> --- a/hgext/rebase.py
> +++ b/hgext/rebase.py
> @@ -343,7 +343,7 @@ class rebaseruntime(object):
>          if dest.closesbranch() and not self.keepbranchesf:
>              self.ui.status(_('reopening closed branch head %s\n') % dest)
>
> -    def _performrebase(self):
> +    def _performrebase(self, tr):
>          repo, ui, opts = self.repo, self.ui, self.opts
>          if self.keepbranchesf:
>              # insert _savebranch at the start of extrafns so if
> @@ -394,7 +394,7 @@ class rebaseruntime(object):
>                                               self.state,
>                                               self.destancestors,
>                                               self.obsoletenotrebased)
> -                self.storestatus()
> +                self.storestatus(tr=tr)
>                  storecollapsemsg(repo, self.collapsemsg)
>                  if len(repo[None].parents()) == 2:
>                      repo.ui.debug('resuming interrupted rebase\n')
> @@ -641,6 +641,15 @@ def rebase(ui, repo, **opts):
>        [commands]
>        rebase.requiredest = True
>
> +    By default, rebase will close the transaction after each commit. For
> +    performance purposes, you can configure rebase to use a single transaction
> +    across the entire rebase. WARNING: This setting introduces a significant
> +    risk of losing the work you've done in a rebase if the rebase aborts
> +    unexpectedly::
> +
> +      [rebase]
> +      singletransaction = True
> +
>      Return Values:
>
>      Returns 0 on success, 1 if nothing to rebase or there are
> @@ -700,7 +709,21 @@ def rebase(ui, repo, **opts):
>              if retcode is not None:
>                  return retcode
>
> -        rbsrt._performrebase()
> +        tr = None
> +        try:
> +            if ui.configbool('rebase', 'singletransaction'):
> +                tr = repo.transaction('rebase')
> +            rbsrt._performrebase(tr)
> +            if tr:
> +                tr.close()
> +        except error.InterventionRequired:
> +            if tr:
> +                tr.close()
> +            raise
> +        finally:
> +            if tr:
> +                tr.release()
> +

Looks like you can move "if tr: tr.close()" to the finally-block?
Might be even clearer as (untested):

# This could be in util.py
@contextlib.contextmanager
def nullcontextmanager():
    yield

tr = nullcontextmanager()
if ui.configbool('rebase', 'singletransaction'):
    tr = repo.transaction('rebase')
with tr:
    try:
        rbsrt._performrebase(tr)
    except error.InterventionRequired:
           raise



>          rbsrt._finishrebase()
>
>  def _definesets(ui, repo, destf=None, srcf=None, basef=None, revf=None,
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Durham Goode - July 11, 2017, 9:41 p.m.
On 7/11/17 2:20 PM, Martin von Zweigbergk wrote:
> On Tue, Jul 11, 2017 at 1:54 PM, Durham Goode <durham@fb.com> wrote:
>> # HG changeset patch
>> # User Durham Goode <durham@fb.com>
>> # Date 1499805985 25200
>> #      Tue Jul 11 13:46:25 2017 -0700
>> # Node ID 46b1d80be3c6741ba69f5029a8a4315151552293
>> # Parent  062c1bde178118b7c4d5abb1ead16ac8ad494280
>> rebase: add config to move rebase into a single transaction
>>
>> This was previously landed as cf8ad0e6c0e4 but backed out in a5abaa81fa because
>> it broke hook mid rebase and caused conflict resolution data loss in the event
>> of unexpected exceptions. This new version adds the behavior back but behind a
>> config flag, since the performance improvement is notable in large repositories.
>>
>> The next patch adds a test covering this config.
>>
>> The old commit message was:
>>
>> Previously, rebasing would open several transaction over the course of rebasing
>> several commits. Opening a transaction can have notable overhead (like copying
>> the dirstate) which can add up when rebasing many commits.
>>
>> This patch adds a single large transaction around the actual commit rebase
>> operation, with a catch for intervention which serializes the current state if
>> we need to drop back to the terminal for user intervention. Amazingly, almost
>> all the tests seem to pass.
>>
>> On large repos with large working copies, this can speed up rebasing 7 commits
>> by 25%. I'd expect the percentage to be a bit larger for rebasing even more
>> commits.
>>
>> There are minor test changes because we're rolling back the entire transaction
>> during unexpected exceptions instead of just stopping mid-rebase, so there's no
>> more backup bundle. It also leave an unknown file in the working copy, since our
>> clean up 'hg update' doesn't delete unknown files.
>>
>> diff --git a/hgext/rebase.py b/hgext/rebase.py
>> --- a/hgext/rebase.py
>> +++ b/hgext/rebase.py
>> @@ -343,7 +343,7 @@ class rebaseruntime(object):
>>          if dest.closesbranch() and not self.keepbranchesf:
>>              self.ui.status(_('reopening closed branch head %s\n') % dest)
>>
>> -    def _performrebase(self):
>> +    def _performrebase(self, tr):
>>          repo, ui, opts = self.repo, self.ui, self.opts
>>          if self.keepbranchesf:
>>              # insert _savebranch at the start of extrafns so if
>> @@ -394,7 +394,7 @@ class rebaseruntime(object):
>>                                               self.state,
>>                                               self.destancestors,
>>                                               self.obsoletenotrebased)
>> -                self.storestatus()
>> +                self.storestatus(tr=tr)
>>                  storecollapsemsg(repo, self.collapsemsg)
>>                  if len(repo[None].parents()) == 2:
>>                      repo.ui.debug('resuming interrupted rebase\n')
>> @@ -641,6 +641,15 @@ def rebase(ui, repo, **opts):
>>        [commands]
>>        rebase.requiredest = True
>>
>> +    By default, rebase will close the transaction after each commit. For
>> +    performance purposes, you can configure rebase to use a single transaction
>> +    across the entire rebase. WARNING: This setting introduces a significant
>> +    risk of losing the work you've done in a rebase if the rebase aborts
>> +    unexpectedly::
>> +
>> +      [rebase]
>> +      singletransaction = True
>> +
>>      Return Values:
>>
>>      Returns 0 on success, 1 if nothing to rebase or there are
>> @@ -700,7 +709,21 @@ def rebase(ui, repo, **opts):
>>              if retcode is not None:
>>                  return retcode
>>
>> -        rbsrt._performrebase()
>> +        tr = None
>> +        try:
>> +            if ui.configbool('rebase', 'singletransaction'):
>> +                tr = repo.transaction('rebase')
>> +            rbsrt._performrebase(tr)
>> +            if tr:
>> +                tr.close()
>> +        except error.InterventionRequired:
>> +            if tr:
>> +                tr.close()
>> +            raise
>> +        finally:
>> +            if tr:
>> +                tr.release()
>> +
>
> Looks like you can move "if tr: tr.close()" to the finally-block?

We can't run close in the finally because if we're in the middle of an 
unhandled exception we don't know if it's safe to commit the transaction.

> Might be even clearer as (untested):
>
> # This could be in util.py
> @contextlib.contextmanager
> def nullcontextmanager():
>     yield

Hmm, that would be a convenient pattern.  I'll make that change.

> tr = nullcontextmanager()
> if ui.configbool('rebase', 'singletransaction'):
>     tr = repo.transaction('rebase')
> with tr:
>     try:
>         rbsrt._performrebase(tr)
>     except error.InterventionRequired:
>            raise
>
>
>
>>          rbsrt._finishrebase()
>>
>>  def _definesets(ui, repo, destf=None, srcf=None, basef=None, revf=None,
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel@mercurial-scm.org
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mercurial-2Dscm.org_mailman_listinfo_mercurial-2Ddevel&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=nuarHzhP1wi1T9iURRCj1A&m=QZIPEbDD6weNDVQQGJeE-0IGuMoAwgrzj9nAJY5kjZI&s=-zqkTyghrBhFsmCYd75gqTg5TpxJVoOBzuJoi1CfPkE&e=

Patch

diff --git a/hgext/rebase.py b/hgext/rebase.py
--- a/hgext/rebase.py
+++ b/hgext/rebase.py
@@ -343,7 +343,7 @@  class rebaseruntime(object):
         if dest.closesbranch() and not self.keepbranchesf:
             self.ui.status(_('reopening closed branch head %s\n') % dest)
 
-    def _performrebase(self):
+    def _performrebase(self, tr):
         repo, ui, opts = self.repo, self.ui, self.opts
         if self.keepbranchesf:
             # insert _savebranch at the start of extrafns so if
@@ -394,7 +394,7 @@  class rebaseruntime(object):
                                              self.state,
                                              self.destancestors,
                                              self.obsoletenotrebased)
-                self.storestatus()
+                self.storestatus(tr=tr)
                 storecollapsemsg(repo, self.collapsemsg)
                 if len(repo[None].parents()) == 2:
                     repo.ui.debug('resuming interrupted rebase\n')
@@ -641,6 +641,15 @@  def rebase(ui, repo, **opts):
       [commands]
       rebase.requiredest = True
 
+    By default, rebase will close the transaction after each commit. For
+    performance purposes, you can configure rebase to use a single transaction
+    across the entire rebase. WARNING: This setting introduces a significant
+    risk of losing the work you've done in a rebase if the rebase aborts
+    unexpectedly::
+
+      [rebase]
+      singletransaction = True
+
     Return Values:
 
     Returns 0 on success, 1 if nothing to rebase or there are
@@ -700,7 +709,21 @@  def rebase(ui, repo, **opts):
             if retcode is not None:
                 return retcode
 
-        rbsrt._performrebase()
+        tr = None
+        try:
+            if ui.configbool('rebase', 'singletransaction'):
+                tr = repo.transaction('rebase')
+            rbsrt._performrebase(tr)
+            if tr:
+                tr.close()
+        except error.InterventionRequired:
+            if tr:
+                tr.close()
+            raise
+        finally:
+            if tr:
+                tr.release()
+
         rbsrt._finishrebase()
 
 def _definesets(ui, repo, destf=None, srcf=None, basef=None, revf=None,