Patchwork D1621: transaction: split changes['phases'] into sets for src and target phase

login
register
mail settings
Submitter phabricator
Date Dec. 9, 2017, 12:06 a.m.
Message ID <differential-rev-PHID-DREV-xx3ob6jsfeid6hxzy3c7-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/26131/
State New
Headers show

Comments

phabricator - Dec. 9, 2017, 12:06 a.m.
joerg.sonnenberger created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  changes['phases'] used to be a dictionary mapping revisions to
  (old, new) tuples. The encoding is highly redundant and eats ~40MB
  for the test case in issue5691. Recognize that for new, only the three
  phases are valid and record the state using three sets. For old, keep
  three sets for the phases as well and encode unknown as lack of
  membership in the three sets. Provide a helper function to enumerate the
  changes.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1621

AFFECTED FILES
  mercurial/localrepo.py
  mercurial/phases.py
  tests/testlib/ext-phase-report.py

CHANGE DETAILS




To: joerg.sonnenberger, #hg-reviewers
Cc: mercurial-devel
phabricator - Dec. 9, 2017, 12:10 a.m.
joerg.sonnenberger added a comment.


  This is a Proof-of-Concept and with a follow-up in https://phab.mercurial-scm.org/D1622 to use a C implementation for the bitset.
  
  Original code: 485s / 584MB
  Set version: 501s / 582MB
  intbitset: 478s / 563MB

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1621

To: joerg.sonnenberger, #hg-reviewers
Cc: mercurial-devel
phabricator - Dec. 9, 2017, 12:23 a.m.
quark added inline comments.

INLINE COMMENTS

> localrepo.py:1243
>                                **pycompat.strkwargs(args))
>              if hook.hashook(repo.ui, 'pretxnclose-phase'):
>                  cl = repo.unfiltered().changelog

Since this hook is the only user of `tr.changes['phases']`. I'd suggest disabling calculating `tr.changes['phases']` if the hook does not exist.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1621

To: joerg.sonnenberger, #hg-reviewers
Cc: quark, mercurial-devel
phabricator - Dec. 9, 2017, 12:25 a.m.
quark accepted this revision.
quark added a comment.


  That said, I think the usage of list of sets is smart and this change looks good to me.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1621

To: joerg.sonnenberger, #hg-reviewers, quark
Cc: quark, mercurial-devel
phabricator - Dec. 9, 2017, 1:11 p.m.
yuja added a comment.


  > Original code: 485s / 584MB
  >  Set version: 501s / 582MB
  >  intbitset: 478s / 563MB
  
  I'm not sure how to read this. Do we only get -2/584MB better space
  consumption against the original simplest implementation?

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1621

To: joerg.sonnenberger, #hg-reviewers, quark
Cc: yuja, quark, mercurial-devel
phabricator - Dec. 10, 2017, 9:13 p.m.
joerg.sonnenberger added a comment.


  This version should be committable. It introduces the necessary API for isolating further changes and gives a small improvement already. I'll be looking at provider a memory and time efficient sparce vector next, that's where the real benefits will be.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1621

To: joerg.sonnenberger, #hg-reviewers, quark
Cc: yuja, quark, mercurial-devel
phabricator - Dec. 11, 2017, 2:25 p.m.
yuja added a comment.


  In https://phab.mercurial-scm.org/D1621#28285, @joerg.sonnenberger wrote:
  
  > This version should be committable. It introduces the necessary API for isolating further changes and gives a small improvement already. I'll be looking at provider a memory and time efficient sparce vector next, that's where the real benefits will be.
  
  
  I think it's better to hold this change off until we can get a real performance or
  memory-space win. The current number, "peak RSS by 3MB", won't buy much
  compared to the API complexity this patch will introduce.
  
  Regarding the API, I think `tr.changes[key]` is the source of the problem. It allows
  free read/write access to arbitrary data, which is hard to gradually improve the
  underlying data structure or algorithm.
  
  Noob quesiton. Can't we completely switch of `tr.changes['phases']` when it's
  unnecessary?

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1621

To: joerg.sonnenberger, #hg-reviewers, quark
Cc: yuja, quark, mercurial-devel
phabricator - Dec. 11, 2017, 6:05 p.m.
joerg.sonnenberger added a comment.


  The bitset version has shown already that optimizing this is worthwhile and can eliminate up to 10% of the total size of the transaction object.
  
  The change as is primarily gets the interfaces in place, so that outside code can be adjusted.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1621

To: joerg.sonnenberger, #hg-reviewers, quark
Cc: yuja, quark, mercurial-devel
phabricator - Dec. 12, 2017, 1:21 p.m.
yuja added a comment.


  In https://phab.mercurial-scm.org/D1621#28387, @joerg.sonnenberger wrote:
  
  > The bitset version has shown already that optimizing this is worthwhile and can eliminate up to 10% of the total size of the transaction object.
  
  
  IIRC, Jun said a plain intbitset wouldn't be optimal for all scenarios, and I agree
  with that. Also, it isn't a standard Python library.
  
  I think the current way of recording phase change is too ad-hoc if it does iterate
  all incoming nodes to build `(old, new)` phases pairs. Instead, can't we reconstruct
  them from DAG + old/new phase roots, for example?
  
  If the current `tr.changes['phases']` is only used by hooks and third-party
  extensions, it can be opt-in feature.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1621

To: joerg.sonnenberger, #hg-reviewers, quark
Cc: yuja, quark, mercurial-devel
phabricator - Dec. 16, 2017, 6:01 a.m.
quark resigned from this revision.
quark added a comment.


  When thinking about it again, I believe providing a way to disable `tr.chanages['phases']` is a better way to solve this issue. If we cannot disable that automatically from what hooks are used, we can still provide a config option to disable that.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1621

To: joerg.sonnenberger, #hg-reviewers, quark
Cc: yuja, quark, mercurial-devel

Patch

diff --git a/tests/testlib/ext-phase-report.py b/tests/testlib/ext-phase-report.py
--- a/tests/testlib/ext-phase-report.py
+++ b/tests/testlib/ext-phase-report.py
@@ -1,11 +1,12 @@ 
 # tiny extension to report phase changes during transaction
 
 from __future__ import absolute_import
+from mercurial import phases
 
 def reposetup(ui, repo):
 
     def reportphasemove(tr):
-        for rev, move in sorted(tr.changes['phases'].iteritems()):
+        for rev, move in sorted(phases.phasechanges(tr.changes['phases'])):
             if move[0] is None:
                 ui.write(('test-debug-phase: new rev %d:  x -> %d\n'
                           % (rev, move[1])))
diff --git a/mercurial/phases.py b/mercurial/phases.py
--- a/mercurial/phases.py
+++ b/mercurial/phases.py
@@ -186,17 +186,38 @@ 
         headsbyphase[phase].append(node)
     return headsbyphase
 
+def phasechanges(data):
+    offset = len(allphases)
+    for tgtphase in allphases:
+        for rev in data[tgtphase + offset]:
+            srcphase = None
+            for phase in allphases:
+                if rev in data[phase]:
+                    srcphase = phase
+                    break
+            yield (rev, (srcphase, tgtphase))
+
 def _trackphasechange(data, rev, old, new):
-    """add a phase move the <data> dictionnary
+    """add a phase move the <data> sets
 
     If data is None, nothing happens.
     """
     if data is None:
         return
-    existing = data.get(rev)
-    if existing is not None:
-        old = existing[0]
-    data[rev] = (old, new)
+    assert old in allphases or old is None
+    assert new in allphases
+    offset = len(allphases)
+
+    if rev not in data[new + offset]:
+        found = False
+        for phase in allphases:
+            if phase == new:
+                data[phase + offset].add(rev)
+            elif rev in data[phase + offset]:
+                data[phase + offset].discard(rev)
+                found = True
+        if not found and old is not None:
+            data[old].add(rev)
 
 class phasecache(object):
     def __init__(self, repo, phasedefaults, _load=True):
diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -1242,7 +1242,8 @@ 
                               **pycompat.strkwargs(args))
             if hook.hashook(repo.ui, 'pretxnclose-phase'):
                 cl = repo.unfiltered().changelog
-                for rev, (old, new) in tr.changes['phases'].items():
+                phasechanges = phases.phasechanges(tr.changes['phases'])
+                for rev, (old, new) in phasechanges:
                     args = tr.hookargs.copy()
                     node = hex(cl.node(rev))
                     args.update(phases.preparehookargs(node, old, new))
@@ -1277,7 +1278,7 @@ 
                                      checkambigfiles=_cachedfiles)
         tr.changes['revs'] = xrange(0, 0)
         tr.changes['obsmarkers'] = set()
-        tr.changes['phases'] = {}
+        tr.changes['phases'] = [set() for i in range(7)]
         tr.changes['bookmarks'] = {}
 
         tr.hookargs['txnid'] = txnid
@@ -1306,7 +1307,7 @@ 
 
                 if hook.hashook(repo.ui, 'txnclose-phase'):
                     cl = repo.unfiltered().changelog
-                    phasemv = sorted(tr.changes['phases'].items())
+                    phasemv = sorted(phases.phasechanges(tr.changes['phases']))
                     for rev, (old, new) in phasemv:
                         args = tr.hookargs.copy()
                         node = hex(cl.node(rev))