Patchwork [4,of,8] bundle: refactor changegroup prune to be its own function

login
register
mail settings
Submitter Durham Goode
Date May 31, 2013, 5:19 p.m.
Message ID <036972b09c16295c0008.1370020786@dev350.prn1.facebook.com>
Download mbox | patch
Permalink /patch/1687/
State Accepted
Commit 6ea1f858efd9e58a8518630b948c2810e5b844ce
Headers show

Comments

Durham Goode - May 31, 2013, 5:19 p.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1369961473 25200
#      Thu May 30 17:51:13 2013 -0700
# Node ID 036972b09c16295c000847ba359193858e7b3a4d
# Parent  66c552d6910908070552d1a1c41d729932b8b111
bundle: refactor changegroup prune to be its own function

Moving the prune function to be a non-nested function allows extensions to
control which revisions are allowed in the changegroup. For example, in my
shallow repo extension I want to prevent filelogs from being added to the
bundle.

This also allows an extension to use a filelog implementation that doesn't
have revlog.linkrev implemented.
Matt Mackall - June 3, 2013, 8:22 p.m.
On Fri, 2013-05-31 at 10:19 -0700, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1369961473 25200
> #      Thu May 30 17:51:13 2013 -0700
> # Node ID 036972b09c16295c000847ba359193858e7b3a4d
> # Parent  66c552d6910908070552d1a1c41d729932b8b111
> bundle: refactor changegroup prune to be its own function

Queued for default, thanks.
Peter Arrenbrecht - June 18, 2013, 5:19 p.m.
Hi Durham, are any details on your shallow repo extension available
somewhere?

Thanks,
--peter


On Fri, May 31, 2013 at 7:19 PM, Durham Goode <durham@fb.com> wrote:

> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1369961473 25200
> #      Thu May 30 17:51:13 2013 -0700
> # Node ID 036972b09c16295c000847ba359193858e7b3a4d
> # Parent  66c552d6910908070552d1a1c41d729932b8b111
> bundle: refactor changegroup prune to be its own function
>
> Moving the prune function to be a non-nested function allows extensions to
> control which revisions are allowed in the changegroup. For example, in my
> shallow repo extension I want to prevent filelogs from being added to the
> bundle.
>
> This also allows an extension to use a filelog implementation that doesn't
> have revlog.linkrev implemented.
>
> diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py
> --- a/mercurial/changegroup.py
> +++ b/mercurial/changegroup.py
> @@ -296,6 +296,11 @@
>
>          yield self.close()
>
> +    # filter any nodes that claim to be part of the known set
> +    def prune(self, revlog, missing, commonrevs, source):
> +        rr, rl = revlog.rev, revlog.linkrev
> +        return [n for n in missing if rl(rr(n)) not in commonrevs]
> +
>      def generate(self, commonrevs, clnodes, fastpathlinkrev, source):
>          '''yield a sequence of changegroup chunks (strings)'''
>          repo = self._repo
> @@ -311,11 +316,6 @@
>          fnodes = {} # needed file nodes
>          changedfiles = set()
>
> -        # filter any nodes that claim to be part of the known set
> -        def prune(revlog, missing):
> -            rr, rl = revlog.rev, revlog.linkrev
> -            return [n for n in missing if rl(rr(n)) not in commonrevs]
> -
>          # Callback for the changelog, used to collect changed files and
> manifest
>          # nodes.
>          # Returns the linkrev node (identity in the changelog case).
> @@ -347,7 +347,7 @@
>
>          for f in changedfiles:
>              fnodes[f] = {}
> -        mfnodes = prune(mf, mfs)
> +        mfnodes = self.prune(mf, mfs, commonrevs, source)
>          for chunk in self.group(mfnodes, mf, lookupmf,
> units=_('manifests'),
>                                  reorder=reorder):
>              yield chunk
> @@ -377,7 +377,7 @@
>              def lookupfilelog(x):
>                  return linkrevnodes[x]
>
> -            filenodes = prune(filerevlog, linkrevnodes)
> +            filenodes = self.prune(filerevlog, linkrevnodes, commonrevs,
> source)
>              if filenodes:
>                  progress(msgbundling, i + 1, item=fname, unit=msgfiles,
>                           total=total)
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
>
Durham Goode - June 18, 2013, 5:37 p.m.
Unfortunately no. The extension is still in development and isn't publicly
available yet. I'm hopeful that I can open source it for feedback sometime
in Q3 (i.e. between now and September).

The basic gist of the extension is to keep all the filelog history
remotely and fetch it on demand, while keeping the entire changelog and
manifest history locally so most commands don't need to hit the server.
This has several performance benefits on large repositories (clone time,
pull time, rebase/amend time) and I'm treating file revisions as key/value
pairs so it makes caching and storage easier.

Durham

On 6/18/13 10:19 AM, "Peter Arrenbrecht" <peter.arrenbrecht@gmail.com>
wrote:

>Hi Durham, are any details on your shallow repo extension available
>somewhere?
>
>
>Thanks,
>--peter
>
>
>
>On Fri, May 31, 2013 at 7:19 PM, Durham Goode
><durham@fb.com> wrote:
>
># HG changeset patch
># User Durham Goode <durham@fb.com>
># Date 1369961473 25200
>#      Thu May 30 17:51:13 2013 -0700
># Node ID 036972b09c16295c000847ba359193858e7b3a4d
># Parent  66c552d6910908070552d1a1c41d729932b8b111
>bundle: refactor changegroup prune to be its own function
>
>Moving the prune function to be a non-nested function allows extensions to
>control which revisions are allowed in the changegroup. For example, in my
>shallow repo extension I want to prevent filelogs from being added to the
>bundle.
>
>This also allows an extension to use a filelog implementation that doesn't
>have revlog.linkrev implemented.
Peter Arrenbrecht - June 21, 2013, 12:21 p.m.
Interesting. Are you considering a mode where folks can explicitly say they
want everything locally that descends from a specific set of revisions?
Then it would not only help performance but also help folks who want to
work offline on a large repo.
--peter


On Tue, Jun 18, 2013 at 10:37 AM, Durham Goode <durham@fb.com> wrote:

> Unfortunately no. The extension is still in development and isn't publicly
> available yet. I'm hopeful that I can open source it for feedback sometime
> in Q3 (i.e. between now and September).
>
> The basic gist of the extension is to keep all the filelog history
> remotely and fetch it on demand, while keeping the entire changelog and
> manifest history locally so most commands don't need to hit the server.
> This has several performance benefits on large repositories (clone time,
> pull time, rebase/amend time) and I'm treating file revisions as key/value
> pairs so it makes caching and storage easier.
>
> Durham
>
> On 6/18/13 10:19 AM, "Peter Arrenbrecht" <peter.arrenbrecht@gmail.com>
> wrote:
>
> >Hi Durham, are any details on your shallow repo extension available
> >somewhere?
> >
> >
> >Thanks,
> >--peter
> >
> >
> >
> >On Fri, May 31, 2013 at 7:19 PM, Durham Goode
> ><durham@fb.com> wrote:
> >
> ># HG changeset patch
> ># User Durham Goode <durham@fb.com>
> ># Date 1369961473 25200
> >#      Thu May 30 17:51:13 2013 -0700
> ># Node ID 036972b09c16295c000847ba359193858e7b3a4d
> ># Parent  66c552d6910908070552d1a1c41d729932b8b111
> >bundle: refactor changegroup prune to be its own function
> >
> >Moving the prune function to be a non-nested function allows extensions to
> >control which revisions are allowed in the changegroup. For example, in my
> >shallow repo extension I want to prevent filelogs from being added to the
> >bundle.
> >
> >This also allows an extension to use a filelog implementation that doesn't
> >have revlog.linkrev implemented.
>
>
Durham Goode - June 21, 2013, 4:51 p.m.
I hadn't considered it, but it would be trivial to add a command like 'hg
prefetch -r <revset>' to allow people to fetch certain chunks of the repo.
 Good idea.

Right now I keep a local cache of the information you've touched recently.
 So you can work offline amongst your existing local commits and any
server commit that you've updated to recently.  I'll have to play around
with the prefetch and eviction policies to get the best user experience,
but being able to continue working while the server is down is definitely
a priority.

Durham

On 6/21/13 5:21 AM, "Peter Arrenbrecht" <peter.arrenbrecht@gmail.com>
wrote:

>Interesting. Are you considering a mode where folks can explicitly say
>they want everything locally that descends from a specific set of
>revisions? Then it would not only help performance but also help folks
>who want to work offline on a large
> repo.
>--peter
>
>
>
>On Tue, Jun 18, 2013 at 10:37 AM, Durham Goode
><durham@fb.com> wrote:
>
>Unfortunately no. The extension is still in development and isn't publicly
>available yet. I'm hopeful that I can open source it for feedback sometime
>in Q3 (i.e. between now and September).
>
>The basic gist of the extension is to keep all the filelog history
>remotely and fetch it on demand, while keeping the entire changelog and
>manifest history locally so most commands don't need to hit the server.
>This has several performance benefits on large repositories (clone time,
>pull time, rebase/amend time) and I'm treating file revisions as key/value
>pairs so it makes caching and storage easier.
>
>Durham
>
>On 6/18/13 10:19 AM, "Peter Arrenbrecht" <peter.arrenbrecht@gmail.com>
>wrote:
>
>>Hi Durham, are any details on your shallow repo extension available
>>somewhere?
>>
>>
>>Thanks,
>>--peter
>>
>>
>>
>>On Fri, May 31, 2013 at 7:19 PM, Durham Goode
>><durham@fb.com> wrote:
>>
>># HG changeset patch
>># User Durham Goode <durham@fb.com>
>># Date 1369961473 25200
>>#      Thu May 30 17:51:13 2013 -0700
>># Node ID 036972b09c16295c000847ba359193858e7b3a4d
>># Parent  66c552d6910908070552d1a1c41d729932b8b111
>>bundle: refactor changegroup prune to be its own function
>>
>>Moving the prune function to be a non-nested function allows extensions
>>to
>>control which revisions are allowed in the changegroup. For example, in
>>my
>>shallow repo extension I want to prevent filelogs from being added to the
>>bundle.
>>
>>This also allows an extension to use a filelog implementation that
>>doesn't
>>have revlog.linkrev implemented.
>
>
>
>
>
>
>
>
>

Patch

diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py
--- a/mercurial/changegroup.py
+++ b/mercurial/changegroup.py
@@ -296,6 +296,11 @@ 
 
         yield self.close()
 
+    # filter any nodes that claim to be part of the known set
+    def prune(self, revlog, missing, commonrevs, source):
+        rr, rl = revlog.rev, revlog.linkrev
+        return [n for n in missing if rl(rr(n)) not in commonrevs]
+
     def generate(self, commonrevs, clnodes, fastpathlinkrev, source):
         '''yield a sequence of changegroup chunks (strings)'''
         repo = self._repo
@@ -311,11 +316,6 @@ 
         fnodes = {} # needed file nodes
         changedfiles = set()
 
-        # filter any nodes that claim to be part of the known set
-        def prune(revlog, missing):
-            rr, rl = revlog.rev, revlog.linkrev
-            return [n for n in missing if rl(rr(n)) not in commonrevs]
-
         # Callback for the changelog, used to collect changed files and manifest
         # nodes.
         # Returns the linkrev node (identity in the changelog case).
@@ -347,7 +347,7 @@ 
 
         for f in changedfiles:
             fnodes[f] = {}
-        mfnodes = prune(mf, mfs)
+        mfnodes = self.prune(mf, mfs, commonrevs, source)
         for chunk in self.group(mfnodes, mf, lookupmf, units=_('manifests'),
                                 reorder=reorder):
             yield chunk
@@ -377,7 +377,7 @@ 
             def lookupfilelog(x):
                 return linkrevnodes[x]
 
-            filenodes = prune(filerevlog, linkrevnodes)
+            filenodes = self.prune(filerevlog, linkrevnodes, commonrevs, source)
             if filenodes:
                 progress(msgbundling, i + 1, item=fname, unit=msgfiles,
                          total=total)