Patchwork [2,of,3] revset: use phasecache.getrevs

login
register
mail settings
Submitter Jun Wu
Date Feb. 10, 2017, 2:34 p.m.
Message ID <44f2c707475a79758ecb.1486737258@localhost.localdomain>
Download mbox | patch
Permalink /patch/18394/
State Changes Requested
Delegated to: Yuya Nishihara
Headers show

Comments

Jun Wu - Feb. 10, 2017, 2:34 p.m.
# HG changeset patch
# User Jun Wu <quark@fb.com>
# Date 1486735766 28800
#      Fri Feb 10 06:09:26 2017 -0800
# Node ID 44f2c707475a79758ecbc4b4117b9cda9d7a4380
# Parent  1d7a184bb013a1a6f6b92d1f6d89406a7254ba2b
# Available At https://bitbucket.org/quark-zju/hg-draft
#              hg pull https://bitbucket.org/quark-zju/hg-draft -r 44f2c707475a
revset: use phasecache.getrevs

This is part of a refactoring that moves some phase query optimization from
revset.py to phases.py. See the previous patch for motivation.

This patch changes revset code to use phasecache.getrevs so it no longer
accesses private field: _phasecache.getrevs directly.

The change may make "not public()" slower in the pure version where there
are many non-public changesets, because of an unnecessary sort. But
performance should be unchanged for the default CPython version. Since
non-public changesets are not expected to be many, and most users use the
CPython version, the code clean-up seems worthy.

In the future, if we have a set-like C structured implemented by a bitmap,
we could get perf right - no longer need to use unordered Python set.
Jun Wu - Feb. 10, 2017, 2:47 p.m.
Excerpts from Jun Wu's message of 2017-02-10 06:34:18 -0800:
> # HG changeset patch
> # User Jun Wu <quark@fb.com>
> # Date 1486735766 28800
> #      Fri Feb 10 06:09:26 2017 -0800
> # Node ID 44f2c707475a79758ecbc4b4117b9cda9d7a4380
> # Parent  1d7a184bb013a1a6f6b92d1f6d89406a7254ba2b
> # Available At https://bitbucket.org/quark-zju/hg-draft 
> #              hg pull https://bitbucket.org/quark-zju/hg-draft  -r 44f2c707475a
> revset: use phasecache.getrevs
> 
> This is part of a refactoring that moves some phase query optimization from
> revset.py to phases.py. See the previous patch for motivation.
> 
> This patch changes revset code to use phasecache.getrevs so it no longer
> accesses private field: _phasecache.getrevs directly.
                          ^^^^^^^^^^^^^^^^^^^
                          This should be "_phasesets".

I probably want to add some hacky hooks to "hg email" to audit the commit
messages. Maybe just send them to some Text-To-Speech engine to double
confirm.

> 
> The change may make "not public()" slower in the pure version where there
> are many non-public changesets, because of an unnecessary sort. But
> performance should be unchanged for the default CPython version. Since
> non-public changesets are not expected to be many, and most users use the
> CPython version, the code clean-up seems worthy.
> 
> In the future, if we have a set-like C structured implemented by a bitmap,
> we could get perf right - no longer need to use unordered Python set.
> 
> diff --git a/mercurial/revset.py b/mercurial/revset.py
> --- a/mercurial/revset.py
> +++ b/mercurial/revset.py
> @@ -1645,17 +1645,10 @@ def parents(repo, subset, x):
>      return subset & ps
>  
> -def _phase(repo, subset, target):
> -    """helper to select all rev in phase <target>"""
> -    repo._phasecache.loadphaserevs(repo) # ensure phase's sets are loaded
> -    if repo._phasecache._phasesets:
> -        s = repo._phasecache._phasesets[target] - repo.changelog.filteredrevs
> -        s = baseset(s)
> -        s.sort() # set are non ordered, so we enforce ascending
> -        return subset & s
> -    else:
> -        phase = repo._phasecache.phase
> -        condition = lambda r: phase(repo, r) == target
> -        return subset.filter(condition, condrepr=('<phase %r>', target),
> -                             cache=False)
> +def _phase(repo, subset, *targets):
> +    """helper to select all rev in <targets> phases"""
> +    revs = repo._phasecache.getrevs(repo, targets)
> +    s = baseset(revs)
> +    s.sort()
> +    return subset & s
>  
>  @predicate('draft()', safe=True)
> @@ -1718,18 +1711,5 @@ def present(repo, subset, x):
>  def _notpublic(repo, subset, x):
>      getargs(x, 0, 0, "_notpublic takes no arguments")
> -    repo._phasecache.loadphaserevs(repo) # ensure phase's sets are loaded
> -    if repo._phasecache._phasesets:
> -        s = set()
> -        for u in repo._phasecache._phasesets[1:]:
> -            s.update(u)
> -        s = baseset(s - repo.changelog.filteredrevs)
> -        s.sort()
> -        return subset & s
> -    else:
> -        phase = repo._phasecache.phase
> -        target = phases.public
> -        condition = lambda r: phase(repo, r) != target
> -        return subset.filter(condition, condrepr=('<phase %r>', target),
> -                             cache=False)
> +    return _phase(repo, subset, *phases.allphases[1:])
>  
>  @predicate('public()', safe=True)
Yuya Nishihara - Feb. 13, 2017, 2:56 p.m.
On Fri, 10 Feb 2017 06:34:18 -0800, Jun Wu wrote:
> # HG changeset patch
> # User Jun Wu <quark@fb.com>
> # Date 1486735766 28800
> #      Fri Feb 10 06:09:26 2017 -0800
> # Node ID 44f2c707475a79758ecbc4b4117b9cda9d7a4380
> # Parent  1d7a184bb013a1a6f6b92d1f6d89406a7254ba2b
> # Available At https://bitbucket.org/quark-zju/hg-draft
> #              hg pull https://bitbucket.org/quark-zju/hg-draft -r 44f2c707475a
> revset: use phasecache.getrevs

> The change may make "not public()" slower in the pure version where there
> are many non-public changesets, because of an unnecessary sort. But
> performance should be unchanged for the default CPython version. Since
> non-public changesets are not expected to be many, and most users use the
> CPython version, the code clean-up seems worthy.

Perhaps you can make getrevs() return a lazy smartset object so the pure
version should have the same performance characteristic as before. I mean
getrevs(repo, target) could be identical to _phase(repo, subset, target) and
_notpublic(repo, subset, x) given subset=fullreposet.

The direction of this series looks nice.
Jun Wu - Feb. 13, 2017, 5:35 p.m.
Excerpts from Yuya Nishihara's message of 2017-02-13 23:56:20 +0900:
> On Fri, 10 Feb 2017 06:34:18 -0800, Jun Wu wrote:
> > # HG changeset patch
> > # User Jun Wu <quark@fb.com>
> > # Date 1486735766 28800
> > #      Fri Feb 10 06:09:26 2017 -0800
> > # Node ID 44f2c707475a79758ecbc4b4117b9cda9d7a4380
> > # Parent  1d7a184bb013a1a6f6b92d1f6d89406a7254ba2b
> > # Available At https://bitbucket.org/quark-zju/hg-draft 
> > #              hg pull https://bitbucket.org/quark-zju/hg-draft  -r 44f2c707475a
> > revset: use phasecache.getrevs
> 
> > The change may make "not public()" slower in the pure version where there
> > are many non-public changesets, because of an unnecessary sort. But
> > performance should be unchanged for the default CPython version. Since
> > non-public changesets are not expected to be many, and most users use the
> > CPython version, the code clean-up seems worthy.
> 
> Perhaps you can make getrevs() return a lazy smartset object so the pure
> version should have the same performance characteristic as before. I mean
> getrevs(repo, target) could be identical to _phase(repo, subset, target) and
> _notpublic(repo, subset, x) given subset=fullreposet.
> 
> The direction of this series looks nice.

Good advice! I didn't realize that smartset is decoupled from the repo
object.
Jun Wu - Feb. 16, 2017, 9:31 p.m.
Excerpts from Yuya Nishihara's message of 2017-02-13 23:56:20 +0900:
> On Fri, 10 Feb 2017 06:34:18 -0800, Jun Wu wrote:
> > # HG changeset patch
> > # User Jun Wu <quark@fb.com>
> > # Date 1486735766 28800
> > #      Fri Feb 10 06:09:26 2017 -0800
> > # Node ID 44f2c707475a79758ecbc4b4117b9cda9d7a4380
> > # Parent  1d7a184bb013a1a6f6b92d1f6d89406a7254ba2b
> > # Available At https://bitbucket.org/quark-zju/hg-draft 
> > #              hg pull https://bitbucket.org/quark-zju/hg-draft  -r 44f2c707475a
> > revset: use phasecache.getrevs
> 
> > The change may make "not public()" slower in the pure version where there
> > are many non-public changesets, because of an unnecessary sort. But
> > performance should be unchanged for the default CPython version. Since
> > non-public changesets are not expected to be many, and most users use the
> > CPython version, the code clean-up seems worthy.
> 
> Perhaps you can make getrevs() return a lazy smartset object so the pure
> version should have the same performance characteristic as before. I mean
> getrevs(repo, target) could be identical to _phase(repo, subset, target) and
> _notpublic(repo, subset, x) given subset=fullreposet.

Actually, smartset.baseset will do "data = list(data)" if data is a set.
If we use baseset, the "set" to "list" conversion is an unwanted overhead.

Maybe we can be even smarter: lazily initializing baseset._list.

> 
> The direction of this series looks nice.

Patch

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -1645,17 +1645,10 @@  def parents(repo, subset, x):
     return subset & ps
 
-def _phase(repo, subset, target):
-    """helper to select all rev in phase <target>"""
-    repo._phasecache.loadphaserevs(repo) # ensure phase's sets are loaded
-    if repo._phasecache._phasesets:
-        s = repo._phasecache._phasesets[target] - repo.changelog.filteredrevs
-        s = baseset(s)
-        s.sort() # set are non ordered, so we enforce ascending
-        return subset & s
-    else:
-        phase = repo._phasecache.phase
-        condition = lambda r: phase(repo, r) == target
-        return subset.filter(condition, condrepr=('<phase %r>', target),
-                             cache=False)
+def _phase(repo, subset, *targets):
+    """helper to select all rev in <targets> phases"""
+    revs = repo._phasecache.getrevs(repo, targets)
+    s = baseset(revs)
+    s.sort()
+    return subset & s
 
 @predicate('draft()', safe=True)
@@ -1718,18 +1711,5 @@  def present(repo, subset, x):
 def _notpublic(repo, subset, x):
     getargs(x, 0, 0, "_notpublic takes no arguments")
-    repo._phasecache.loadphaserevs(repo) # ensure phase's sets are loaded
-    if repo._phasecache._phasesets:
-        s = set()
-        for u in repo._phasecache._phasesets[1:]:
-            s.update(u)
-        s = baseset(s - repo.changelog.filteredrevs)
-        s.sort()
-        return subset & s
-    else:
-        phase = repo._phasecache.phase
-        target = phases.public
-        condition = lambda r: phase(repo, r) != target
-        return subset.filter(condition, condrepr=('<phase %r>', target),
-                             cache=False)
+    return _phase(repo, subset, *phases.allphases[1:])
 
 @predicate('public()', safe=True)