Patchwork revset: improve head revset performance

login
register
mail settings
Submitter Durham Goode
Date March 13, 2014, 9:32 p.m.
Message ID <4a6fb092c3172b1c7f0d.1394746362@dev2000.prn2.facebook.com>
Download mbox | patch
Permalink /patch/3943/
State Superseded
Commit 6a1a4c212d50b761c9abffabbd988d68aa95f607
Headers show

Comments

Durham Goode - March 13, 2014, 9:32 p.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1394743641 25200
#      Thu Mar 13 13:47:21 2014 -0700
# Node ID 4a6fb092c3172b1c7f0df59f49f307a19005326e
# Parent  1cd5bff45db28150d7c140be493fe851e6560f27
revset: improve head revset performance

Previously the head() revset would iterate over every item in the subset and
check if it was a head.  Since the subset is often the entire repo, this was
slow on large repos. Now we iterate over each item in the head list and check if
it's in the subset, which results in much less work.

hg log -r 'head()' on a large repo:
Before: 0.95s
After: 0.28s
Pierre-Yves David - March 13, 2014, 9:38 p.m.
On 03/13/2014 02:32 PM, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1394743641 25200
> #      Thu Mar 13 13:47:21 2014 -0700
> # Node ID 4a6fb092c3172b1c7f0df59f49f307a19005326e
> # Parent  1cd5bff45db28150d7c140be493fe851e6560f27
> revset: improve head revset performance
>
> Previously the head() revset would iterate over every item in the subset and
> check if it was a head.  Since the subset is often the entire repo, this was
> slow on large repos. Now we iterate over each item in the head list and check if
> it's in the subset, which results in much less work.
>
> hg log -r 'head()' on a large repo:
> Before: 0.95s
> After: 0.28s
>
> diff --git a/mercurial/revset.py b/mercurial/revset.py
> --- a/mercurial/revset.py
> +++ b/mercurial/revset.py
> @@ -941,7 +941,7 @@
>       hs = set()
>       for b, ls in repo.branchmap().iteritems():
>           hs.update(repo[h].rev() for h in ls)
> -    return subset.filter(lambda r: r in hs)
> +    return lazyset(list(hs), lambda r: r in subset)

Please use operators.contains into a slowish lambda

http://docs.python.org/2/library/operator.html#operator.contains
Lucas Moscovicz - March 13, 2014, 9:41 p.m.
On 3/13/14, 2:32 PM, "Durham Goode" <durham@fb.com> wrote:

># HG changeset patch
># User Durham Goode <durham@fb.com>
># Date 1394743641 25200
>#      Thu Mar 13 13:47:21 2014 -0700
># Node ID 4a6fb092c3172b1c7f0df59f49f307a19005326e
># Parent  1cd5bff45db28150d7c140be493fe851e6560f27
>revset: improve head revset performance
>
>Previously the head() revset would iterate over every item in the subset
>and
>check if it was a head.  Since the subset is often the entire repo, this
>was
>slow on large repos. Now we iterate over each item in the head list and
>check if
>it's in the subset, which results in much less work.
>
>hg log -r 'head()' on a large repo:
>Before: 0.95s
>After: 0.28s
>
>diff --git a/mercurial/revset.py b/mercurial/revset.py
>--- a/mercurial/revset.py
>+++ b/mercurial/revset.py
>@@ -941,7 +941,7 @@
>     hs = set()
>     for b, ls in repo.branchmap().iteritems():
>         hs.update(repo[h].rev() for h in ls)
>-    return subset.filter(lambda r: r in hs)
>+    return lazyset(list(hs), lambda r: r in subset)

Here you should use baseset(hs) instead of list(hs) since every method
called on that lazyset may actually have to be called on the inner
structure and we want that to be one of our own.

> 
> def heads(repo, subset, x):
>     """``heads(set)``
>_______________________________________________
>Mercurial-devel mailing list
>Mercurial-devel@selenic.com
>https://urldefense.proofpoint.com/v1/url?u=http://selenic.com/mailman/list
>info/mercurial-devel&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=OvJpSDyvbZ%2BdRIG
>uE%2BQNXdEMu%2FMWX%2BVvreTVxvKUMnE%3D%0A&m=K%2FnZJ7p002Lih1axzNDQ2mAB3tcv%
>2FLXf%2B9BByp4kIBA%3D%0A&s=3626b16fc42c34839996e96ad9fdd64beefe06a93a319f2
>d304605dcf380a1a3

Patch

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -941,7 +941,7 @@ 
     hs = set()
     for b, ls in repo.branchmap().iteritems():
         hs.update(repo[h].rev() for h in ls)
-    return subset.filter(lambda r: r in hs)
+    return lazyset(list(hs), lambda r: r in subset)
 
 def heads(repo, subset, x):
     """``heads(set)``