Patchwork [6,of,8] changegroup: refactor getsubsetraw into getchangegroupchunks (API)

login
register
mail settings
Submitter Gregory Szorc
Date Aug. 5, 2016, 3:17 a.m.
Message ID <0270c4c4c1aef296d040.1470367022@ubuntu-vm-main>
Download mbox | patch
Permalink /patch/16110/
State Accepted
Headers show

Comments

Gregory Szorc - Aug. 5, 2016, 3:17 a.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1470364484 25200
#      Thu Aug 04 19:34:44 2016 -0700
# Node ID 0270c4c4c1aef296d0405950a34bdef3aaa01f30
# Parent  d9a7b7fc58081021ec6f7401300ac11998994a37
changegroup: refactor getsubsetraw into getchangegroupchunks (API)

getsubsetraw() was /almost/ a generic and generally useful API.
However, it wasn't ideal because it required passing in a bundler
(which can be constructed from a repo, version, and bundle caps).
And, there was no mechanism to pass extra metadata back to the
caller.

This patch effectively refactors getsubsetraw() into
getchangegroupchunks() and replaces callers of getsubsetraw() with
the new function.

The patch is marked as API because getsubsetraw() has been removed.
(The only in-repo consumer was changegroup.py.)
Pierre-Yves David - Aug. 6, 2016, 12:47 a.m.
On 08/05/2016 05:17 AM, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1470364484 25200
> #      Thu Aug 04 19:34:44 2016 -0700
> # Node ID 0270c4c4c1aef296d0405950a34bdef3aaa01f30
> # Parent  d9a7b7fc58081021ec6f7401300ac11998994a37
> changegroup: refactor getsubsetraw into getchangegroupchunks (API)
>
> getsubsetraw() was /almost/ a generic and generally useful API.
> However, it wasn't ideal because it required passing in a bundler
> (which can be constructed from a repo, version, and bundle caps).
> And, there was no mechanism to pass extra metadata back to the
> caller.

I'm not sure what you intend to use these meta data for. Can you elaborate?

(In all case, I've feedback about the implementation of these metadata, 
cf inline comment).


> diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py
> --- a/mercurial/changegroup.py
> +++ b/mercurial/changegroup.py
> @@ -907,36 +907,66 @@ def getunbundler(version, fh, alg, extra
>  def _changegroupinfo(repo, nodes, source):
>      if repo.ui.verbose or source == 'bundle':
>          repo.ui.status(_("%d changesets found\n") % len(nodes))
>      if repo.ui.debugflag:
>          repo.ui.debug("list of changesets:\n")
>          for node in nodes:
>              repo.ui.debug("%s\n" % hex(node))
>
> -def getsubsetraw(repo, outgoing, bundler, source, fastpath=False):
> +def getchangegroupchunks(repo, outgoing, version, source,
> +                         fastpathlinkrev=False, bundlecaps=None):
> +    """Obtain changegroup data as a generator of chunks.
> +
> +    This is the preferred mechanism for obtaining changegroup data.
> +
> +    Receives a localrepo, a ``discovery.outgoing``, a changegroup version
> +    string (e.g. ``01``), and a ``source`` string indicating where the data
> +    is coming from (will be used in hooks). The caller can also specify
> +    whether to use fast path linkrev lookup and optional bundle capabilities
> +    supported by the consumer.
> +
> +    ``preoutgoing`` and ``outgoing`` hooks will be fired.
> +
> +    Returns a dict with metadata describing the returned changegroup and
> +    a generator of raw changegroup chunks. The 2nd value may be ``None``
> +    if the changegroup would be empty.
> +
> +    The metadata dict contains the following keys:
> +
> +    csetcount
> +       Integer number of changesets in the changegroup.
> +    """
> +    metadata = {
> +        'csetcount': len(outgoing.missing),
> +    }

1) caller pass an outgoing object. I'm confused about why they would 
need to receive a similar information in return
2) the metadata use 'csetcount', you pass it to getunbundler as 
'clcount' and bundle2 part uses 'nbchanges'. We should probably unify 
all this.
3) last, but not least. I'm really not enthousiastic about this pair 
returns, if we are to return any meta data could we make them attributes 
of the returned object? As pointed in a previous series, using 
'whatever.outgoing' to access data about the changegroup content seems 
like the way to go to me.

Patch

diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py
--- a/mercurial/changegroup.py
+++ b/mercurial/changegroup.py
@@ -907,36 +907,66 @@  def getunbundler(version, fh, alg, extra
 def _changegroupinfo(repo, nodes, source):
     if repo.ui.verbose or source == 'bundle':
         repo.ui.status(_("%d changesets found\n") % len(nodes))
     if repo.ui.debugflag:
         repo.ui.debug("list of changesets:\n")
         for node in nodes:
             repo.ui.debug("%s\n" % hex(node))
 
-def getsubsetraw(repo, outgoing, bundler, source, fastpath=False):
+def getchangegroupchunks(repo, outgoing, version, source,
+                         fastpathlinkrev=False, bundlecaps=None):
+    """Obtain changegroup data as a generator of chunks.
+
+    This is the preferred mechanism for obtaining changegroup data.
+
+    Receives a localrepo, a ``discovery.outgoing``, a changegroup version
+    string (e.g. ``01``), and a ``source`` string indicating where the data
+    is coming from (will be used in hooks). The caller can also specify
+    whether to use fast path linkrev lookup and optional bundle capabilities
+    supported by the consumer.
+
+    ``preoutgoing`` and ``outgoing`` hooks will be fired.
+
+    Returns a dict with metadata describing the returned changegroup and
+    a generator of raw changegroup chunks. The 2nd value may be ``None``
+    if the changegroup would be empty.
+
+    The metadata dict contains the following keys:
+
+    csetcount
+       Integer number of changesets in the changegroup.
+    """
+    metadata = {
+        'csetcount': len(outgoing.missing),
+    }
+
+    if not outgoing.missing:
+        return metadata, None
+
     repo = repo.unfiltered()
-    commonrevs = outgoing.common
-    csets = outgoing.missing
-    heads = outgoing.missingheads
+
     # We go through the fast path if we get told to, or if all (unfiltered
     # heads have been requested (since we then know there all linkrevs will
     # be pulled by the client).
-    heads.sort()
-    fastpathlinkrev = fastpath or (
-            repo.filtername is None and heads == sorted(repo.heads()))
+    fastpathlinkrev = fastpathlinkrev or (
+        set(repo.heads()) == set(outgoing.missingheads))
 
     repo.hook('preoutgoing', throw=True, source=source)
-    _changegroupinfo(repo, csets, source)
-    return bundler.generate(commonrevs, csets, fastpathlinkrev, source)
+    _changegroupinfo(repo, outgoing.missing, source)
+    bundler = getbundler(version, repo, bundlecaps=bundlecaps)
+    chunks = bundler.generate(outgoing.common, outgoing.missing,
+                              fastpathlinkrev, source)
+    return metadata, chunks
 
 def getsubset(repo, outgoing, bundler, source, fastpath=False):
-    gengroup = getsubsetraw(repo, outgoing, bundler, source, fastpath)
-    return getunbundler(bundler.version, util.chunkbuffer(gengroup), None,
-                        {'clcount': len(outgoing.missing)})
+    m, gen = getchangegroupchunks(repo, outgoing, bundler.version, source,
+                                  fastpathlinkrev=fastpath)
+    return getunbundler(bundler.version, util.chunkbuffer(gen), None,
+                        {'clcount': m['csetcount']})
 
 def changegroupsubset(repo, roots, heads, source, version='01'):
     """Compute a changegroup consisting of all the nodes that are
     descendants of any of the roots and ancestors of any of the heads.
     Return a chunkbuffer object whose read() method will return
     successive changegroup chunks.
 
     It is fairly complex as determining which filenodes and which
@@ -951,20 +981,18 @@  def changegroupsubset(repo, roots, heads
     return getsubset(repo, outgoing, bundler, source)
 
 def getlocalchangegroupraw(repo, source, outgoing, bundlecaps=None,
                            version='01'):
     """Like getbundle, but taking a discovery.outgoing as an argument.
 
     This is only implemented for local repos and reuses potentially
     precomputed sets in outgoing. Returns a raw changegroup generator."""
-    if not outgoing.missing:
-        return None
-    bundler = getbundler(version, repo, bundlecaps)
-    return getsubsetraw(repo, outgoing, bundler, source)
+    return getchangegroupchunks(repo, outgoing, version, source,
+                                bundlecaps=bundlecaps)[1]
 
 def getlocalchangegroup(repo, source, outgoing, bundlecaps=None,
                         version='01'):
     """Like getbundle, but taking a discovery.outgoing as an argument.
 
     This is only implemented for local repos and reuses potentially
     precomputed sets in outgoing."""
     if not outgoing.missing: