Patchwork [7,of,8] revset: detect integer list on parsing

login
register
mail settings
Submitter Boris Feld
Date Jan. 11, 2019, 11:29 a.m.
Message ID <4a56fbdacff33c3985bb.1547206149@Laptop-Boris.lan>
Download mbox | patch
Permalink /patch/37663/
State Accepted
Headers show

Comments

Boris Feld - Jan. 11, 2019, 11:29 a.m.
# HG changeset patch
# User Boris Feld <boris.feld@octobus.net>
# Date 1546575973 -3600
#      Fri Jan 04 05:26:13 2019 +0100
# Node ID 4a56fbdacff33c3985bbb84f2e19ddfbd48ed4fa
# Parent  438ea9b8a44c181d62741338c1d16b2031fdda41
# EXP-Topic revs-efficiency
# Available At https://bitbucket.org/octobus/mercurial-devel/
#              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 4a56fbdacff3
revset: detect integer list on parsing

Right now, using "%ld" with `repo.revs("…%ld…", somerevs)` is very
inefficient, all items in `somerevs` will be serialized to ascii and then
reparsed as integers. If `somerevs` contains just an handful of entry this is
fine, however, when you get to thousands or hundreds of thousands of revisions
this becomes very slow.

To avoid this serialization we need to first detect this situation. The code
involved in the whole process is quite complex so we start simple and focus on
some "simple" but widespread cases.

So far we only detect the situation and don't do anything special about it.
The singled out will be serialized in `formatspec` in the same way as before.
Yuya Nishihara - Jan. 12, 2019, 4:38 a.m.
On Fri, 11 Jan 2019 12:29:09 +0100, Boris Feld wrote:
> # HG changeset patch
> # User Boris Feld <boris.feld@octobus.net>
> # Date 1546575973 -3600
> #      Fri Jan 04 05:26:13 2019 +0100
> # Node ID 4a56fbdacff33c3985bbb84f2e19ddfbd48ed4fa
> # Parent  438ea9b8a44c181d62741338c1d16b2031fdda41
> # EXP-Topic revs-efficiency
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 4a56fbdacff3
> revset: detect integer list on parsing

> +            if islist and d == 'd' and arg:
> +                # special case, we might be able to speedup the list of int case
> +                safeinputtype = (list, tuple, set, smartset.abstractsmartset)
> +                if isinstance(arg, safeinputtype):

Just curious. What's the unsafe type for example? I think 'arg' may be an
iterator/generator of ints, but which can be safely converted to list/baseset.

> +                    # we don't create a baseset yet, because it come with an
> +                    # extra cost. If we are going to serialize it we better
> +                    # skip it.
> +                    ret.append(('baseset', arg))
> +                    pos += 1
> +                    continue
Boris Feld - Jan. 13, 2019, 7:38 a.m.
On 12/01/2019 05:38, Yuya Nishihara wrote:
> On Fri, 11 Jan 2019 12:29:09 +0100, Boris Feld wrote:
>> # HG changeset patch
>> # User Boris Feld <boris.feld@octobus.net>
>> # Date 1546575973 -3600
>> #      Fri Jan 04 05:26:13 2019 +0100
>> # Node ID 4a56fbdacff33c3985bbb84f2e19ddfbd48ed4fa
>> # Parent  438ea9b8a44c181d62741338c1d16b2031fdda41
>> # EXP-Topic revs-efficiency
>> # Available At https://bitbucket.org/octobus/mercurial-devel/
>> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 4a56fbdacff3
>> revset: detect integer list on parsing
>> +            if islist and d == 'd' and arg:
>> +                # special case, we might be able to speedup the list of int case
>> +                safeinputtype = (list, tuple, set, smartset.abstractsmartset)
>> +                if isinstance(arg, safeinputtype):
> Just curious. What's the unsafe type for example? I think 'arg' may be an
> iterator/generator of ints, but which can be safely converted to list/baseset.

Honestly, I'm not sure.

The code around revset is quite complex and I want to focus on having a
first thing working. The overall speedup from this change is so massive
that I rather have a very narrow version of it quickly and expand it later.

>
>> +                    # we don't create a baseset yet, because it come with an
>> +                    # extra cost. If we are going to serialize it we better
>> +                    # skip it.
>> +                    ret.append(('baseset', arg))
>> +                    pos += 1
>> +                    continue
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Yuya Nishihara - Jan. 13, 2019, 9:17 a.m.
On Sun, 13 Jan 2019 08:38:47 +0100, Boris FELD wrote:
> On 12/01/2019 05:38, Yuya Nishihara wrote:
> > On Fri, 11 Jan 2019 12:29:09 +0100, Boris Feld wrote:
> >> # HG changeset patch
> >> # User Boris Feld <boris.feld@octobus.net>
> >> # Date 1546575973 -3600
> >> #      Fri Jan 04 05:26:13 2019 +0100
> >> # Node ID 4a56fbdacff33c3985bbb84f2e19ddfbd48ed4fa
> >> # Parent  438ea9b8a44c181d62741338c1d16b2031fdda41
> >> # EXP-Topic revs-efficiency
> >> # Available At https://bitbucket.org/octobus/mercurial-devel/
> >> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 4a56fbdacff3
> >> revset: detect integer list on parsing
> >> +            if islist and d == 'd' and arg:
> >> +                # special case, we might be able to speedup the list of int case
> >> +                safeinputtype = (list, tuple, set, smartset.abstractsmartset)
> >> +                if isinstance(arg, safeinputtype):
> > Just curious. What's the unsafe type for example? I think 'arg' may be an
> > iterator/generator of ints, but which can be safely converted to list/baseset.
> 
> Honestly, I'm not sure.
> 
> The code around revset is quite complex and I want to focus on having a
> first thing working. The overall speedup from this change is so massive
> that I rather have a very narrow version of it quickly and expand it later.

Can you add inline comment then?

Testing these types would give false notion that there are other types this
function has to support.

Patch

diff --git a/mercurial/revsetlang.py b/mercurial/revsetlang.py
--- a/mercurial/revsetlang.py
+++ b/mercurial/revsetlang.py
@@ -15,6 +15,7 @@  from . import (
     node,
     parser,
     pycompat,
+    smartset,
     util,
 )
 from .utils import (
@@ -682,6 +683,13 @@  def formatspec(expr, *args):
     for t, arg in parsed:
         if t is None:
             ret.append(arg)
+        if t == 'baseset':
+            if isinstance(arg, set):
+                arg = sorted(arg)
+            try:
+                ret.append(_formatintlist(list(arg)))
+            except (TypeError, ValueError):
+                raise error.ParseError(_('invalid argument for revspec'))
     return b''.join(ret)
 
 def _parseargs(expr, args):
@@ -690,7 +698,8 @@  def _parseargs(expr, args):
     return a list of tuple [(arg-type, arg-value)]
 
     Arg-type can be:
-    * None: a string ready to be concatenated into a final spec
+    * None:      a string ready to be concatenated into a final spec
+    * 'baseset': an iterable of revisions
     """
     expr = pycompat.bytestr(expr)
     argiter = iter(args)
@@ -720,10 +729,21 @@  def _parseargs(expr, args):
         if f:
             # a list of some type, might be expensive, do not replace
             pos += 1
+            islist = (d == 'l')
             try:
                 d = expr[pos]
             except IndexError:
                 raise error.ParseError(_('incomplete revspec format character'))
+            if islist and d == 'd' and arg:
+                # special case, we might be able to speedup the list of int case
+                safeinputtype = (list, tuple, set, smartset.abstractsmartset)
+                if isinstance(arg, safeinputtype):
+                    # we don't create a baseset yet, because it come with an
+                    # extra cost. If we are going to serialize it we better
+                    # skip it.
+                    ret.append(('baseset', arg))
+                    pos += 1
+                    continue
             try:
                 ret.append((None, f(list(arg), d)))
             except (TypeError, ValueError):