Patchwork D6422: copies: avoid calling matcher if matcher.always()

login
register
mail settings
Submitter phabricator
Date May 22, 2019, 12:33 a.m.
Message ID <differential-rev-PHID-DREV-zzxlyl6ig4zltslpe5mw-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/40177/
State Superseded
Headers show

Comments

phabricator - May 22, 2019, 12:33 a.m.
martinvonz created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  When storing copy information in the changesets, this patch speeds up
  
    hg debugpathcopies FENNEC_58_0_2_BUILD1 FIREFOX_59_0b8_BUILD2
  
  from 5.9s to 4.7s.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6422

AFFECTED FILES
  mercurial/copies.py

CHANGE DETAILS




To: martinvonz, #hg-reviewers
Cc: mercurial-devel
phabricator - May 22, 2019, 8:19 a.m.
marmoute added a comment.


  We could maybe make it a function of both the number of heads and roots. That is not strictly the number of connected set, but that would provide a more conservative approach. That could over-sample for hour-glass shape, but they are probably less common.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6422

To: martinvonz, #hg-reviewers
Cc: marmoute, mercurial-devel
phabricator - May 22, 2019, 10:59 a.m.
marmoute added a comment.


  I have no idea how this comment ended up on this diff…

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6422

To: martinvonz, #hg-reviewers
Cc: marmoute, mercurial-devel
phabricator - May 22, 2019, 6:34 p.m.
marmoute added a comment.


  Can you indicate a summary of the total speedup of the series ? (from base to last changesets?). Also I am not sure for which case these number apply ? Is this the compatibility mode or after repository conversion ? Can we have number for both ?
  
  To have a more diverse picture of the performacne of thes change, can you provide timing data for the following case?
  
    mozilla-central: hg perfpathcopies 76caed42cf7cb7098aa0eb58242dd36054d06865 1daa622bbe42f8a85e0b4880c5c25df8ea60e95f
    pypy:            hg perfpathcopies 3c8ac35c653afe108127ca75688e2f8278192512 d7746d32bf9d785bbc0c6afc9aa6015410a38c8f
    mercurial:       hg perfpathcopies 7adb1274a4f930e13b35545ef23914ccae7d5534 0c6c600c03fddabcc45f1046e869f84b276fb467
    netbeans:        hg perfpathcopies 588c2d1ced709885eb0bc6b88137efbadbb35b76 1aad62e59ddde2ce37882af12fbb202d3b7961dc

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6422

To: martinvonz, #hg-reviewers
Cc: marmoute, mercurial-devel
phabricator - May 22, 2019, 7:17 p.m.
martinvonz added a comment.


  In https://phab.mercurial-scm.org/D6422#93466, @marmoute wrote:
  
  > Can you indicate a summary of the total speedup of the series ? (from base to last changesets?).
  
  
  Sure, done.
  
  > Also I am not sure for which case these number apply ? Is this the compatibility mode or after repository conversion ?
  
  After repo conversion.
  
  > Can we have number for both ?
  
  The compatibility number is going to be similar to before this series, since it won't benefit from having the removed set of files available cheaply. It would make sense with a follow-up for speeding up compatibility mode by not filtering out removed files. I'm not sure if that should be a separate option or not.
  
  > To have a more diverse picture of the performacne of thes change, can you provide timing data for the following case?
  > 
  >   mozilla-central: hg perfpathcopies 76caed42cf7cb7098aa0eb58242dd36054d06865 1daa622bbe42f8a85e0b4880c5c25df8ea60e95f
  >   pypy:            hg perfpathcopies 3c8ac35c653afe108127ca75688e2f8278192512 d7746d32bf9d785bbc0c6afc9aa6015410a38c8f
  >   mercurial:       hg perfpathcopies 7adb1274a4f930e13b35545ef23914ccae7d5534 0c6c600c03fddabcc45f1046e869f84b276fb467
  >   netbeans:        hg perfpathcopies 588c2d1ced709885eb0bc6b88137efbadbb35b76 1aad62e59ddde2ce37882af12fbb202d3b7961dc
  
  Can you provide some tags you're curious about instead so it's easier to run the same command in both repos (the hashes are different)? I have the mozilla-unified repo and the hg repo converted.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6422

To: martinvonz, #hg-reviewers
Cc: marmoute, mercurial-devel
phabricator - May 22, 2019, 10:29 p.m.
marmoute added a comment.


  The nodes in the above example have been selected by a script because they had interresting property. They are not based on a tag so I can't give you one. How did you converted the repo ? I think `hg convert` keeps a map somewhere, otherwise, using the commit message could work.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6422

To: martinvonz, #hg-reviewers
Cc: marmoute, mercurial-devel
phabricator - May 22, 2019, 11:09 p.m.
martinvonz added a comment.


  In https://phab.mercurial-scm.org/D6422#93543, @marmoute wrote:
  
  > The nodes in the above example have been selected by a script because they had interresting property. They are not based on a tag so I can't give you one. How did you converted the repo ? I think `hg convert` keeps a map somewhere, otherwise, using the commit message could work.
  
  
  Fair enough. For the mozilla repo, it takes 25s with copies in filelogs and 1m40s with copies in changesets (after this patch). For the mercurial repo, it takes 180ms with either format.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6422

To: martinvonz, #hg-reviewers
Cc: marmoute, mercurial-devel
phabricator - May 24, 2019, 11:17 a.m.
marmoute added a comment.


  (I did some experiment, here seems a good spot to report them)
  
  I build a crude cache (cbor based storage) for the data that needs caching after this series and tested it against my pypy test case.
  
    filelog-based: 15s
    compatibility mode; without cache: 75s
    compatibility mode; caching copies without this series: 60s
    compatibility mode; caching copies with this series: 40s
    compatibility mode; caching all data with this series: 7s (65% spend parsing cbor cache data)
  
  This is much promissing, even if need to check on more diverse cases (various factor can influence performance: number of considered file, number of changeset traversed, number of intermediate version, etc).
  
  The timing above is enough motivation for me to look seriously into a caching/alternative storage plan.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6422

To: martinvonz, #hg-reviewers
Cc: marmoute, mercurial-devel
phabricator - May 24, 2019, 4:17 p.m.
martinvonz added a comment.


  In https://phab.mercurial-scm.org/D6422#93611, @marmoute wrote:
  
  > (I did some experiment, here seems a good spot to report them)
  >
  > I build a crude cache (cbor based storage) for the data that needs caching after this series and tested it against my pypy test case.
  >
  >   filelog-based: 15s
  >   compatibility mode; without cache: 75s
  >   compatibility mode; caching copies without this series: 60s
  >   compatibility mode; caching copies with this series: 40s
  >   compatibility mode; caching all data with this series: 7s (65% spend parsing cbor cache data)
  >
  >
  > This is much promissing, even if need to check on more diverse cases (various factor can influence performance: number of considered file, number of changeset traversed, number of intermediate version, etc).
  >
  > The timing above is enough motivation for me to look seriously into a caching/alternative storage plan.
  
  
  Nice :) Thanks for working on a way to get this stuff out to existing repos (which has not been a priority for me, since that is not Google's use case).

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6422

To: martinvonz, #hg-reviewers
Cc: marmoute, mercurial-devel

Patch

diff --git a/mercurial/copies.py b/mercurial/copies.py
--- a/mercurial/copies.py
+++ b/mercurial/copies.py
@@ -270,6 +270,7 @@ 
     # came from.
     work = [(r, 1, {}) for r in roots]
     heapq.heapify(work)
+    alwaysmatch = match.always()
     while work:
         r, i1, copies1 = heapq.heappop(work)
         if work and work[0][0] == r:
@@ -280,7 +281,7 @@ 
             # TODO: perhaps this filtering should be done as long as ctx
             # is merge, whether or not we're tracing from both parent.
             for dst in allcopies:
-                if not match(dst):
+                if not alwaysmatch and not match(dst):
                     continue
                 # Unlike when copies are stored in the filelog, we consider
                 # it a copy even if the destination already existed on the
@@ -306,7 +307,7 @@ 
                 assert r == childctx.p2().rev()
                 parent = 2
                 childcopies = childctx.p2copies()
-            if not match.always():
+            if not alwaysmatch:
                 childcopies = {dst: src for dst, src in childcopies.items()
                                if match(dst)}
             # Copy the dict only if later iterations will also need it