Submitter | phabricator |
---|---|
Date | Dec. 6, 2017, 2:48 p.m. |
Message ID | <differential-rev-PHID-DREV-tibfbh364arde4dcflju-req@phab.mercurial-scm.org> |
Download | mbox | patch |
Permalink | /patch/25969/ |
State | Superseded |
Headers | show |
Comments
quark requested changes to this revision. quark added a comment. This revision now requires changes to proceed. I have been also wanting to remove this O(changelog) space usage for long. I noticed phasecache was the biggest memory consumer among all `repo.` components (dirstate, changelog, etc) when prototyping stateful chg early this year. Requesting changes mainly because the `if public in phases` branch in `getrevset` has issues. Otherwise it looks good to me. INLINE COMMENTS > revlog.c:627 > > static PyObject *compute_phases_map_sets(indexObject *self, PyObject *args) > { I guess the `map` here means a `rev` -> `phase` map, which was the list. Since we now return len(size). Maybe rename the function to `compute_phases_len_sets`. > revlog.c:1965 > "get an index entry"}, > {"computephasesmapsets", (PyCFunction)compute_phases_map_sets, > METH_VARARGS, "compute phases"}, Also here, `computephaseslensets` > phases.py:226 > else: > - # slow path - enumerate all revisions > - phase = self.phase > - revs = (r for r in repo if phase(repo, r) in phases) > - return smartset.generatorset(revs, iterasc=True) > + assert not repo.changelog.filteredrevs > + phases = set(allphases).difference(phases) The `assert` is incorrect - repo could be "filtered". The old code support filtered repo implicitly by using `for r in repo`, the iterator of `repo` will remove filtered revisions. > phases.py:226-237 > + assert not repo.changelog.filteredrevs > + phases = set(allphases).difference(phases) > + if not phases: > + return smartset.fullreposet() > + if len(phases) == 1: > + [p] = phases > + revs = self._phasesets[p] Sorry - I made a mistake using `repo.revs('public()')` to test the logic here. Actually that exercises a different code path <https://www.mercurial-scm.org/repo/hg/file/aa905f9cdcda/mercurial/revset.py#l1616>. There are practically no users exercising this `public in phases` code path. So issues like `fullreposet` requires a `repo` argument is not exposed. I think we can either: - Pass an optional `subset` here then replace the revset implementation to use it: def getrevset(self, repo, phases, subset=None): if subset is None: subset = smartset.fullreposet(repo) if ... # fast path ... return subset & smartset.baseset(revs) else: ... - Remove support for calculating `public()` here by just `raise error.ProgrammingError('public() is unsupported')` I guess the former is preferred eventually. But if you want to land this faster, the latter is easier. I can help writing a follow-up. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D1606 To: joerg.sonnenberger, #hg-reviewers, quark Cc: quark, mercurial-devel
joerg.sonnenberger added inline comments. INLINE COMMENTS > quark wrote in phases.py:226-237 > Sorry - I made a mistake using `repo.revs('public()')` to test the logic here. Actually that exercises a different code path <https://www.mercurial-scm.org/repo/hg/file/aa905f9cdcda/mercurial/revset.py#l1616>. > > There are practically no users exercising this `public in phases` code path. So issues like `fullreposet` requires a `repo` argument is not exposed. > > I think we can either: > > - Pass an optional `subset` here then replace the revset implementation to use it: > > def getrevset(self, repo, phases, subset=None): > if subset is None: > subset = smartset.fullreposet(repo) > if ... > # fast path > ... > return subset & smartset.baseset(revs) > else: > ... > > > > - Remove support for calculating `public()` here by just `raise error.ProgrammingError('public() is unsupported')` > > I guess the former is preferred eventually. But if you want to land this faster, the latter is easier. I can help writing a follow-up. I think this can just follow the logic of the other branch, i.e. short cut if `len(phases) == 1 and repo.changelog.filteredrevs`, otherwise, add `repo.changelog.filteredrevs` to the union as well. I added the assert to make sure that the check wasn't necessary, missed the implicit dependency on repo here. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D1606 To: joerg.sonnenberger, #hg-reviewers, quark Cc: durin42, quark, mercurial-devel
quark added inline comments. INLINE COMMENTS > joerg.sonnenberger wrote in phases.py:226-237 > I think this can just follow the logic of the other branch, i.e. short cut if `len(phases) == 1 and repo.changelog.filteredrevs`, otherwise, add `repo.changelog.filteredrevs` to the union as well. > I added the assert to make sure that the check wasn't necessary, missed the implicit dependency on repo here. Agree. The only real difference with `revset.py:public` is the `subset` argument. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D1606 To: joerg.sonnenberger, #hg-reviewers, quark Cc: durin42, quark, mercurial-devel
quark accepted this revision. quark added a comment. I think this is good enough. We can migrate `revset.py` `public()` implementation later. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D1606 To: joerg.sonnenberger, #hg-reviewers, quark Cc: durin42, quark, mercurial-devel
yuja accepted this revision. yuja added a comment. This revision is now accepted and ready to land. Queued, thanks. Can you send follow-up patches to address trivial issues? INLINE COMMENTS > revlog.c:692 > for (i = 0; i < len; i++) { > PyObject *phaseval; > Removed unused variable in flight. > phases.py:205 > self.phaseroots, self.dirty = _readroots(repo, phasedefaults) > - self._phaserevs = None > + self._phasemaxrev = nullrev > self._phasesets = None Nit: this isn't the "max" revision, but the size. Perhaps the initial value should be 0. > phases.py:236 > + return smartset.fullreposet(repo) > + return smartset.fullreposet(repo).filter(lambda r: r not in revs) > Nit: could be written as `fullrepoest(repo) - baseset(revs)`. > phases.py:264 > repo = repo.unfiltered() > - revs = [public] * len(repo.changelog) > - self._phaserevs = revs > - self._populatephaseroots(repo) > - for phase in trackedphases: > - roots = list(map(repo.changelog.rev, self.phaseroots[phase])) > - if roots: > - for rev in roots: > - revs[rev] = phase > - for rev in repo.changelog.descendants(roots): > - revs[rev] = phase > + cl = repo.changelog > + self._phasesets = [set() for phase in allphases] Nice, no dependency on repo. As a follow-up, maybe we can extract a pure function as truly drop-in replacement for the native function. > phases.py:266 > + self._phasesets = [set() for phase in allphases] > + roots = map(cl.rev, self.phaseroots[secret]) > + if roots: s/map/pycompat.maplist/ for Python 3 compatibility. Fixed in flight. > phases.py:295 > def phase(self, repo, rev): > # We need a repo argument here to be able to build _phaserevs > # if necessary. The repository instance is not stored in s/_phaserevs/_phasesets/. Fixed in flight. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D1606 To: joerg.sonnenberger, #hg-reviewers, quark, yuja Cc: yuja, durin42, quark, mercurial-devel
quark added inline comments. INLINE COMMENTS > yuja wrote in phases.py:236 > Nit: could be written as `fullrepoest(repo) - baseset(revs)`. `baseset.__contains__` is slower than `set.__contains__` so the current code is faster. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D1606 To: joerg.sonnenberger, #hg-reviewers, quark, yuja Cc: yuja, durin42, quark, mercurial-devel
yuja added inline comments. INLINE COMMENTS > quark wrote in phases.py:236 > `baseset.__contains__` is slower than `set.__contains__` so the current code is faster. Really? It's aliased to `self._set.__contains__`. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D1606 To: joerg.sonnenberger, #hg-reviewers, quark, yuja Cc: yuja, durin42, quark, mercurial-devel
Patch
diff --git a/mercurial/policy.py b/mercurial/policy.py --- a/mercurial/policy.py +++ b/mercurial/policy.py @@ -75,7 +75,7 @@ (r'cext', r'diffhelpers'): 1, (r'cext', r'mpatch'): 1, (r'cext', r'osutil'): 1, - (r'cext', r'parsers'): 3, + (r'cext', r'parsers'): 4, } # map import request to other package or module diff --git a/mercurial/phases.py b/mercurial/phases.py --- a/mercurial/phases.py +++ b/mercurial/phases.py @@ -202,46 +202,55 @@ if _load: # Cheap trick to allow shallow-copy without copy module self.phaseroots, self.dirty = _readroots(repo, phasedefaults) - self._phaserevs = None + self._phasemaxrev = nullrev self._phasesets = None self.filterunknown(repo) self.opener = repo.svfs def getrevset(self, repo, phases): """return a smartset for the given phases""" self.loadphaserevs(repo) # ensure phase's sets are loaded - - if self._phasesets and all(self._phasesets[p] is not None - for p in phases): - # fast path - use _phasesets - revs = self._phasesets[phases[0]] - if len(phases) > 1: - revs = revs.copy() # only copy when needed - for p in phases[1:]: - revs.update(self._phasesets[p]) + phases = set(phases) + if public not in phases: + # fast path: _phasesets contains the interesting sets, + # might only need a union and post-filtering. + if len(phases) == 1: + [p] = phases + revs = self._phasesets[p] + else: + revs = set.union(*[self._phasesets[p] for p in phases]) if repo.changelog.filteredrevs: revs = revs - repo.changelog.filteredrevs return smartset.baseset(revs) else: - # slow path - enumerate all revisions - phase = self.phase - revs = (r for r in repo if phase(repo, r) in phases) - return smartset.generatorset(revs, iterasc=True) + assert not repo.changelog.filteredrevs + phases = set(allphases).difference(phases) + if not phases: + return smartset.fullreposet() + if len(phases) == 1: + [p] = phases + revs = self._phasesets[p] + else: + revs = set.union(*[self._phasesets[p] for p in phases]) + if not revs: + return smartset.fullreposet() + return smartset.fullreposet().filter(lambda r: r not in revs) def copy(self): # Shallow copy meant to ensure isolation in # advance/retractboundary(), nothing more. ph = self.__class__(None, None, _load=False) ph.phaseroots = self.phaseroots[:] ph.dirty = self.dirty ph.opener = self.opener - ph._phaserevs = self._phaserevs + ph._phasemaxrev = self._phasemaxrev ph._phasesets = self._phasesets return ph def replace(self, phcache): """replace all values in 'self' with content of phcache""" - for a in ('phaseroots', 'dirty', 'opener', '_phaserevs', '_phasesets'): + for a in ('phaseroots', 'dirty', 'opener', '_phasemaxrev', + '_phasesets'): setattr(self, a, getattr(phcache, a)) def _getphaserevsnative(self, repo): @@ -253,40 +262,36 @@ def _computephaserevspure(self, repo): repo = repo.unfiltered() - revs = [public] * len(repo.changelog) - self._phaserevs = revs - self._populatephaseroots(repo) - for phase in trackedphases: - roots = list(map(repo.changelog.rev, self.phaseroots[phase])) - if roots: - for rev in roots: - revs[rev] = phase - for rev in repo.changelog.descendants(roots): - revs[rev] = phase + cl = repo.changelog + self._phasesets = [set() for phase in allphases] + roots = map(cl.rev, self.phaseroots[secret]) + if roots: + ps = set(cl.descendants(roots)) + for root in roots: + ps.add(root) + self._phasesets[secret] = ps + roots = map(cl.rev, self.phaseroots[draft]) + if roots: + ps = set(cl.descendants(roots)) + for root in roots: + ps.add(root) + ps.difference_update(self._phasesets[secret]) + self._phasesets[draft] = ps + self._phasemaxrev = len(cl) def loadphaserevs(self, repo): """ensure phase information is loaded in the object""" - if self._phaserevs is None: + if self._phasesets is None: try: res = self._getphaserevsnative(repo) - self._phaserevs, self._phasesets = res + self._phasemaxrev, self._phasesets = res except AttributeError: self._computephaserevspure(repo) def invalidate(self): - self._phaserevs = None + self._phasemaxrev = nullrev self._phasesets = None - def _populatephaseroots(self, repo): - """Fills the _phaserevs cache with phases for the roots. - """ - cl = repo.changelog - phaserevs = self._phaserevs - for phase in trackedphases: - roots = map(cl.rev, self.phaseroots[phase]) - for root in roots: - phaserevs[root] = phase - def phase(self, repo, rev): # We need a repo argument here to be able to build _phaserevs # if necessary. The repository instance is not stored in @@ -297,10 +302,13 @@ return public if rev < nullrev: raise ValueError(_('cannot lookup negative revision')) - if self._phaserevs is None or rev >= len(self._phaserevs): + if rev >= self._phasemaxrev: self.invalidate() self.loadphaserevs(repo) - return self._phaserevs[rev] + for phase in trackedphases: + if rev in self._phasesets[phase]: + return phase + return public def write(self): if not self.dirty: @@ -455,10 +463,10 @@ if filtered: self.dirty = True # filterunknown is called by repo.destroyed, we may have no changes in - # root but phaserevs contents is certainly invalid (or at least we + # root but _phasesets contents is certainly invalid (or at least we # have not proper way to check that). related to issue 3858. # - # The other caller is __init__ that have no _phaserevs initialized + # The other caller is __init__ that have no _phasesets initialized # anyway. If this change we should consider adding a dedicated # "destroyed" function to phasecache or a proper cache key mechanism # (see branchmap one) diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py --- a/mercurial/localrepo.py +++ b/mercurial/localrepo.py @@ -704,8 +704,8 @@ def _activebookmark(self): return self._bookmarks.active - # _phaserevs and _phasesets depend on changelog. what we need is to - # call _phasecache.invalidate() if '00changelog.i' was changed, but it + # _phasesets depend on changelog. what we need is to call + # _phasecache.invalidate() if '00changelog.i' was changed, but it # can't be easily expressed in filecache mechanism. @storecache('phaseroots', '00changelog.i') def _phasecache(self): diff --git a/mercurial/cext/revlog.c b/mercurial/cext/revlog.c --- a/mercurial/cext/revlog.c +++ b/mercurial/cext/revlog.c @@ -628,7 +628,7 @@ { PyObject *roots = Py_None; PyObject *ret = NULL; - PyObject *phaseslist = NULL; + PyObject *phasessize = NULL; PyObject *phaseroots = NULL; PyObject *phaseset = NULL; PyObject *phasessetlist = NULL; @@ -685,8 +685,8 @@ } } /* Transform phase list to a python list */ - phaseslist = PyList_New(len); - if (phaseslist == NULL) + phasessize = PyInt_FromLong(len); + if (phasessize == NULL) goto release; for (i = 0; i < len; i++) { PyObject *phaseval; @@ -702,15 +702,11 @@ PySet_Add(phaseset, rev); Py_XDECREF(rev); } - phaseval = PyInt_FromLong(phase); - if (phaseval == NULL) - goto release; - PyList_SET_ITEM(phaseslist, i, phaseval); } - ret = PyTuple_Pack(2, phaseslist, phasessetlist); + ret = PyTuple_Pack(2, phasessize, phasessetlist); release: - Py_XDECREF(phaseslist); + Py_XDECREF(phasessize); Py_XDECREF(phasessetlist); done: free(phases); diff --git a/mercurial/cext/parsers.c b/mercurial/cext/parsers.c --- a/mercurial/cext/parsers.c +++ b/mercurial/cext/parsers.c @@ -710,7 +710,7 @@ void manifest_module_init(PyObject *mod); void revlog_module_init(PyObject *mod); -static const int version = 3; +static const int version = 4; static void module_init(PyObject *mod) {