Submitter | phabricator |
---|---|
Date | Dec. 13, 2019, 11:58 p.m. |
Message ID | <differential-rev-PHID-DREV-etc5myaxrltruk5lsb67-req@mercurial-scm.org> |
Download | mbox | patch |
Permalink | /patch/43830/ |
State | Superseded |
Headers | show |
Comments
martinvonz added inline comments. INLINE COMMENTS > dirstate.py:611-620 > + elif len(changedfiles) < 10: > + # Avoid turning allfiles into a set, which can be expensive if it's > + # large. > + to_lookup = [] > + to_drop = [] > + for f in changedfiles: > + if f in allfiles: How slow? Specifically, how much slower (in percent, or dB, or whatever) is it compared to not converting to a set? I wonder if it's worth the code. `hg debugrebuilddirstate` should be a very rare operation. REPOSITORY rHG Mercurial CHANGES SINCE LAST ACTION https://phab.mercurial-scm.org/D7665/new/ REVISION DETAIL https://phab.mercurial-scm.org/D7665 To: spectral, #hg-reviewers Cc: martinvonz, mercurial-devel
spectral added inline comments. INLINE COMMENTS > martinvonz wrote in dirstate.py:611-620 > How slow? Specifically, how much slower (in percent, or dB, or whatever) is it compared to not converting to a set? I wonder if it's worth the code. `hg debugrebuilddirstate` should be a very rare operation. With 1 file in changedfiles and 200k files in allfiles, approximately 15x slower. That said, it's a little over 1.5ms in the old version, so in absolute terms this isn't a big difference until you get really huge repos. I haven't really investigated when changedfiles is not None, if it's just for `hg debugrebuilddirstate --minimal`, I agree this isn't worth the complexity, but it looks like mq, strip, and absorb all call rebuild in some cases, and strip and absorb will pass in a list of files. Unsure if that changes things materially, though. REPOSITORY rHG Mercurial CHANGES SINCE LAST ACTION https://phab.mercurial-scm.org/D7665/new/ REVISION DETAIL https://phab.mercurial-scm.org/D7665 To: spectral, #hg-reviewers Cc: martinvonz, mercurial-devel
martinvonz added inline comments. INLINE COMMENTS > spectral wrote in dirstate.py:611-620 > With 1 file in changedfiles and 200k files in allfiles, approximately 15x slower. That said, it's a little over 1.5ms in the old version, so in absolute terms this isn't a big difference until you get really huge repos. I haven't really investigated when changedfiles is not None, if it's just for `hg debugrebuilddirstate --minimal`, I agree this isn't worth the complexity, but it looks like mq, strip, and absorb all call rebuild in some cases, and strip and absorb will pass in a list of files. Unsure if that changes things materially, though. Okay, that's at least a measurable difference, so I'm fine with the extra complexity. > dirstate.py:623 > + changedfilesset = set(changedfiles) > + to_lookup = changedfilesset.intersection(set(allfiles)) > + to_drop = changedfilesset - to_lookup We typically use the `&` operator instead of `.intersection()`. I'll change it in flight. REPOSITORY rHG Mercurial CHANGES SINCE LAST ACTION https://phab.mercurial-scm.org/D7665/new/ REVISION DETAIL https://phab.mercurial-scm.org/D7665 To: spectral, #hg-reviewers Cc: martinvonz, mercurial-devel
Patch
diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py --- a/mercurial/dirstate.py +++ b/mercurial/dirstate.py @@ -603,19 +603,34 @@ def rebuild(self, parent, allfiles, changedfiles=None): if changedfiles is None: # Rebuild entire dirstate - changedfiles = allfiles + to_lookup = allfiles + to_drop = [] lastnormaltime = self._lastnormaltime self.clear() self._lastnormaltime = lastnormaltime + elif len(changedfiles) < 10: + # Avoid turning allfiles into a set, which can be expensive if it's + # large. + to_lookup = [] + to_drop = [] + for f in changedfiles: + if f in allfiles: + to_lookup.append(f) + else: + to_drop.append(f) + else: + changedfilesset = set(changedfiles) + to_lookup = changedfilesset.intersection(set(allfiles)) + to_drop = changedfilesset - to_lookup if self._origpl is None: self._origpl = self._pl self._map.setparents(parent, nullid) - for f in changedfiles: - if f in allfiles: - self.normallookup(f) - else: - self.drop(f) + + for f in to_lookup: + self.normallookup(f) + for f in to_drop: + self.drop(f) self._dirty = True