Patchwork D1351: changegroup: use any node, not min(), in treemanifest's generatemanifests

login
register
mail settings
Submitter phabricator
Date Nov. 9, 2017, 3:18 a.m.
Message ID <differential-rev-PHID-DREV-dh4u7lpmei4t6ykcb67x-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/25435/
State Superseded
Headers show

Comments

phabricator - Nov. 9, 2017, 3:18 a.m.
spectral created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  This is fixing quadratic behavior, which is probably not noticeable in the
  common case, but if a very large directory gets added here, it can get pretty
  bad. This was noticed because we had some pushes that spent >25s in changegroup
  generation calling min() here, according to profiling.
  
  The original reasoning for min() being used in https://phab.mercurial-scm.org/rHG829d369fc5a89f4c290013271c6e5dff2aea63de was that, at that
  point in the series, we were adding almost everything to tmfnodes during the
  first iteration through the loop , so we needed to avoid sending child
  directories before parents. Later changes made it so that the child directories
  were added only when we visited the parent directory (not all of them on the
  first iteration), so this is no longer necessary - there won't be any child
  directories in tmfnodes before the parents have been sent.
  
  This does mean that the manifests are now exchanged unordered, whereas
  previously we would essentially do [a, b, b/c, b/c/d, e], we now can send a, b,
  and e in any order; b/c must still follow b, and b/c/d must still follow b/c.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1351

AFFECTED FILES
  mercurial/changegroup.py

CHANGE DETAILS




To: spectral, #hg-reviewers
Cc: mercurial-devel
phabricator - Nov. 9, 2017, 5:18 a.m.
martinvonz added a comment.


  As the author of that line of code, this looks good to me. I'll let someone else queue it since I was involved in the internal discussion and I don't want to feel like I'm queuing my own code (even though I wasn't involved in the actual coding).

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1351

To: spectral, #hg-reviewers
Cc: martinvonz, mercurial-devel
phabricator - Nov. 10, 2017, 5:30 a.m.
indygreg added a comment.


  Should I be concerned about the lack of test fallout? This new behavior is non-deterministic. Do we not have testing for this code or is existing testing not low-level enough to uncover behavior changes resulting from this change?

INLINE COMMENTS

> changegroup.py:738
> +            # element.
> +            dir = next(iter(tmfnodes))
>              nodes = tmfnodes[dir]

Can we use ``dict.popitem()`` instead? That will pop a random key-value pair. I just don't know if the key needs to remain in the dict until later in the function...

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1351

To: spectral, #hg-reviewers
Cc: indygreg, martinvonz, mercurial-devel
phabricator - Nov. 10, 2017, 5:41 a.m.
martinvonz added a comment.


  In https://phab.mercurial-scm.org/D1351#22473, @indygreg wrote:
  
  > Should I be concerned about the lack of test fallout? This new behavior is non-deterministic. Do we not have testing for this code or is existing testing not low-level enough to uncover behavior changes resulting from this change?
  
  
  I'm not worried about it. It only makes a difference for treemanifests, and the change is just about the order in changegroup and (therefore) the order in which we write revlogs. We don't have much testing of treemanifests and we rarely check the exact format in a changegroup. We do print out a debug message on line 482 that could be used to see a difference before and after this patch, but I just checked that test-treemanifest.t doesn't pass --verbose. Still, I wouldn't mind if some Facebooker tried to run their treemanifest tests (in hgexperimental) against a version with this patch applied.

INLINE COMMENTS

> indygreg wrote in changegroup.py:738
> Can we use ``dict.popitem()`` instead? That will pop a random key-value pair. I just don't know if the key needs to remain in the dict until later in the function...

Good idea. I'm pretty sure that should be safe (but perhaps tests will prove me wrong).

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1351

To: spectral, #hg-reviewers
Cc: indygreg, martinvonz, mercurial-devel
phabricator - Nov. 10, 2017, 5:47 a.m.
spectral added inline comments.

INLINE COMMENTS

> indygreg wrote in changegroup.py:738
> Can we use ``dict.popitem()`` instead? That will pop a random key-value pair. I just don't know if the key needs to remain in the dict until later in the function...

I think that it needs to remain, makelookupmflinknode(dir) relies on it (L716).  I haven't attempted to popitem and pass that to makelookupmflinknode instead, let me try that out now...

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1351

To: spectral, #hg-reviewers
Cc: indygreg, martinvonz, mercurial-devel
phabricator - Nov. 10, 2017, 5:51 a.m.
martinvonz added inline comments.

INLINE COMMENTS

> spectral wrote in changegroup.py:738
> I think that it needs to remain, makelookupmflinknode(dir) relies on it (L716).  I haven't attempted to popitem and pass that to makelookupmflinknode instead, let me try that out now...

Oh, yuck. Can perhaps pass "nodes" into makelookupmflinknode() along with "dir"? It will still update it on line 721, but at least one of those unclear sideeffects go away.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1351

To: spectral, #hg-reviewers
Cc: indygreg, martinvonz, mercurial-devel
phabricator - Nov. 10, 2017, 6:41 a.m.
spectral marked 4 inline comments as done.
spectral added a comment.


  run-tests.py found no issues with this version, my manual testing (using PYTHONHASHSEED to adjust the ordering) also encountered no issues

INLINE COMMENTS

> indygreg wrote in changegroup.py:738
> Can we use ``dict.popitem()`` instead? That will pop a random key-value pair. I just don't know if the key needs to remain in the dict until later in the function...

profiling seems to indicate this is a little faster as well as being a bit cleaner, so thanks for making me check :)

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1351

To: spectral, #hg-reviewers
Cc: indygreg, martinvonz, mercurial-devel

Patch

diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py
--- a/mercurial/changegroup.py
+++ b/mercurial/changegroup.py
@@ -733,7 +733,9 @@ 
 
         size = 0
         while tmfnodes:
-            dir = min(tmfnodes)
+            # Pick some element from tmfnodes, this is not necessarily the 'min'
+            # element.
+            dir = next(iter(tmfnodes))
             nodes = tmfnodes[dir]
             prunednodes = self.prune(dirlog(dir), nodes, commonrevs)
             if not dir or prunednodes: