Patchwork [07,of,10] phabricator: try to fetch differential revisions in batch

login
register
mail settings
Submitter Jun Wu
Date July 5, 2017, 1:58 a.m.
Message ID <6c71bc31de7ef4d70e7b.1499219912@x1c>
Download mbox | patch
Permalink /patch/22003/
State Accepted
Headers show

Comments

Jun Wu - July 5, 2017, 1:58 a.m.
# HG changeset patch
# User Jun Wu <quark@fb.com>
# Date 1499211408 25200
#      Tue Jul 04 16:36:48 2017 -0700
# Node ID 6c71bc31de7ef4d70e7b4322d433d90035621156
# Parent  992fc5ed028d5312eb96acd360b6cc6611cc76bd
# Available At https://bitbucket.org/quark-zju/hg-draft
#              hg pull https://bitbucket.org/quark-zju/hg-draft -r 6c71bc31de7e
phabricator: try to fetch differential revisions in batch

Previously, we read Differential Revisions one by one by calling
`differential.query`.

Fetching them one by one is suboptimal. Unfortunately, there is no Conduit
API that allows us to get a stack of diffids using a single API call.

This patch tries to be smarter using a simple heuristic: when fetching D59
as a stack, previous IDs like D51, D52, D53, ..., D58 are likely belonging
to a same stack so just fetch them as well. Since `differential.query` only
returns cheap metadata without expensive diff content, it shouldn't be a big
problem for the server.

Using a test Phabricator instance, this patch reduces `phabread` reading a
10 patch stack from about 13 to 30 seconds to 8 seconds.
Phillip Cohen - July 5, 2017, 5:45 a.m.
> when fetching D59
as a stack, previous IDs like D51, D52, D53, ..., D58 are likely belonging
to a same stack so just fetch them as well

I look forward to the day when we have enough contributors that this
is unlikely to be true :)

On Tue, Jul 4, 2017 at 6:58 PM, Jun Wu <quark@fb.com> wrote:
> # HG changeset patch
> # User Jun Wu <quark@fb.com>
> # Date 1499211408 25200
> #      Tue Jul 04 16:36:48 2017 -0700
> # Node ID 6c71bc31de7ef4d70e7b4322d433d90035621156
> # Parent  992fc5ed028d5312eb96acd360b6cc6611cc76bd
> # Available At https://bitbucket.org/quark-zju/hg-draft
> #              hg pull https://bitbucket.org/quark-zju/hg-draft -r 6c71bc31de7e
> phabricator: try to fetch differential revisions in batch
>
> Previously, we read Differential Revisions one by one by calling
> `differential.query`.
>
> Fetching them one by one is suboptimal. Unfortunately, there is no Conduit
> API that allows us to get a stack of diffids using a single API call.
>
> This patch tries to be smarter using a simple heuristic: when fetching D59
> as a stack, previous IDs like D51, D52, D53, ..., D58 are likely belonging
> to a same stack so just fetch them as well. Since `differential.query` only
> returns cheap metadata without expensive diff content, it shouldn't be a big
> problem for the server.
>
> Using a test Phabricator instance, this patch reduces `phabread` reading a
> 10 patch stack from about 13 to 30 seconds to 8 seconds.
>
> diff --git a/contrib/phabricator.py b/contrib/phabricator.py
> --- a/contrib/phabricator.py
> +++ b/contrib/phabricator.py
> @@ -359,12 +359,32 @@ def querydrev(repo, params, stack=False)
>      list of a unique "Differential Revision dict".
>      """
> +    prefetched = {} # {id or phid: drev}
> +    def fetch(params):
> +        """params -> single drev or None"""
> +        key = (params.get(r'ids') or params.get(r'phids') or [None])[0]
> +        if key in prefetched:
> +            return prefetched[key]
> +        # Otherwise, send the request. If we're fetching a stack, be smarter
> +        # and fetch more ids in one batch, even if it could be unnecessary.
> +        batchparams = params
> +        if stack and len(params.get(r'ids', [])) == 1:
> +            i = int(params[r'ids'][0])
> +            # developer config: phabricator.batchsize
> +            batchsize = repo.ui.configint('phabricator', 'batchsize', 12)
> +            batchparams = {'ids': range(max(1, i - batchsize), i + 1)}
> +        drevs = callconduit(repo, 'differential.query', batchparams)
> +        # Fill prefetched with the result
> +        for drev in drevs:
> +            prefetched[drev[r'phid']] = drev
> +            prefetched[int(drev[r'id'])] = drev
> +        if key not in prefetched:
> +            raise error.Abort(_('cannot get Differential Revision %r') % params)
> +        return prefetched[key]
> +
>      result = []
>      queue = [params]
>      while queue:
>          params = queue.pop()
> -        drevs = callconduit(repo, 'differential.query', params)
> -        if len(drevs) != 1:
> -            raise error.Abort(_('cannot get Differential Revision %r') % params)
> -        drev = drevs[0]
> +        drev = fetch(params)
>          result.append(drev)
>          if stack:
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Augie Fackler - July 5, 2017, 5:27 p.m.
On Tue, Jul 04, 2017 at 06:58:32PM -0700, Jun Wu wrote:
> # HG changeset patch
> # User Jun Wu <quark@fb.com>
> # Date 1499211408 25200
> #      Tue Jul 04 16:36:48 2017 -0700
> # Node ID 6c71bc31de7ef4d70e7b4322d433d90035621156
> # Parent  992fc5ed028d5312eb96acd360b6cc6611cc76bd
> # Available At https://bitbucket.org/quark-zju/hg-draft
> #              hg pull https://bitbucket.org/quark-zju/hg-draft -r 6c71bc31de7e
> phabricator: try to fetch differential revisions in batch

Taken patches 1-7, will meditate more on 8 in a little bit here.

>
> Previously, we read Differential Revisions one by one by calling
> `differential.query`.
>
> Fetching them one by one is suboptimal. Unfortunately, there is no Conduit
> API that allows us to get a stack of diffids using a single API call.
>
> This patch tries to be smarter using a simple heuristic: when fetching D59
> as a stack, previous IDs like D51, D52, D53, ..., D58 are likely belonging
> to a same stack so just fetch them as well. Since `differential.query` only
> returns cheap metadata without expensive diff content, it shouldn't be a big
> problem for the server.
>
> Using a test Phabricator instance, this patch reduces `phabread` reading a
> 10 patch stack from about 13 to 30 seconds to 8 seconds.
>
> diff --git a/contrib/phabricator.py b/contrib/phabricator.py
> --- a/contrib/phabricator.py
> +++ b/contrib/phabricator.py
> @@ -359,12 +359,32 @@ def querydrev(repo, params, stack=False)
>      list of a unique "Differential Revision dict".
>      """
> +    prefetched = {} # {id or phid: drev}
> +    def fetch(params):
> +        """params -> single drev or None"""
> +        key = (params.get(r'ids') or params.get(r'phids') or [None])[0]
> +        if key in prefetched:
> +            return prefetched[key]
> +        # Otherwise, send the request. If we're fetching a stack, be smarter
> +        # and fetch more ids in one batch, even if it could be unnecessary.
> +        batchparams = params
> +        if stack and len(params.get(r'ids', [])) == 1:
> +            i = int(params[r'ids'][0])
> +            # developer config: phabricator.batchsize
> +            batchsize = repo.ui.configint('phabricator', 'batchsize', 12)
> +            batchparams = {'ids': range(max(1, i - batchsize), i + 1)}
> +        drevs = callconduit(repo, 'differential.query', batchparams)
> +        # Fill prefetched with the result
> +        for drev in drevs:
> +            prefetched[drev[r'phid']] = drev
> +            prefetched[int(drev[r'id'])] = drev
> +        if key not in prefetched:
> +            raise error.Abort(_('cannot get Differential Revision %r') % params)
> +        return prefetched[key]
> +
>      result = []
>      queue = [params]
>      while queue:
>          params = queue.pop()
> -        drevs = callconduit(repo, 'differential.query', params)
> -        if len(drevs) != 1:
> -            raise error.Abort(_('cannot get Differential Revision %r') % params)
> -        drev = drevs[0]
> +        drev = fetch(params)
>          result.append(drev)
>          if stack:
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Patch

diff --git a/contrib/phabricator.py b/contrib/phabricator.py
--- a/contrib/phabricator.py
+++ b/contrib/phabricator.py
@@ -359,12 +359,32 @@  def querydrev(repo, params, stack=False)
     list of a unique "Differential Revision dict".
     """
+    prefetched = {} # {id or phid: drev}
+    def fetch(params):
+        """params -> single drev or None"""
+        key = (params.get(r'ids') or params.get(r'phids') or [None])[0]
+        if key in prefetched:
+            return prefetched[key]
+        # Otherwise, send the request. If we're fetching a stack, be smarter
+        # and fetch more ids in one batch, even if it could be unnecessary.
+        batchparams = params
+        if stack and len(params.get(r'ids', [])) == 1:
+            i = int(params[r'ids'][0])
+            # developer config: phabricator.batchsize
+            batchsize = repo.ui.configint('phabricator', 'batchsize', 12)
+            batchparams = {'ids': range(max(1, i - batchsize), i + 1)}
+        drevs = callconduit(repo, 'differential.query', batchparams)
+        # Fill prefetched with the result
+        for drev in drevs:
+            prefetched[drev[r'phid']] = drev
+            prefetched[int(drev[r'id'])] = drev
+        if key not in prefetched:
+            raise error.Abort(_('cannot get Differential Revision %r') % params)
+        return prefetched[key]
+
     result = []
     queue = [params]
     while queue:
         params = queue.pop()
-        drevs = callconduit(repo, 'differential.query', params)
-        if len(drevs) != 1:
-            raise error.Abort(_('cannot get Differential Revision %r') % params)
-        drev = drevs[0]
+        drev = fetch(params)
         result.append(drev)
         if stack: