Patchwork [01,of,10] revlog: add clone method

login
register
mail settings
Submitter Gregory Szorc
Date Nov. 6, 2016, 4:40 a.m.
Message ID <ebbd8d975e4bf59b2bdd.1478407217@ubuntu-vm-main>
Download mbox | patch
Permalink /patch/17363/
State Changes Requested
Headers show

Comments

Gregory Szorc - Nov. 6, 2016, 4:40 a.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1478405835 25200
#      Sat Nov 05 21:17:15 2016 -0700
# Node ID ebbd8d975e4bf59b2bdd44736fdf13222988d1a4
# Parent  f01367faa792635ad2f7a6b175ae3252292b5121
revlog: add clone method

Upcoming patches will introduce functionality for in-place
repository/store "upgrades." Copying the contents of a revlog
feels sufficiently low-level to warrant being in the revlog
class.
Augie Fackler - Nov. 21, 2016, 8:37 p.m.
On Sat, Nov 05, 2016 at 09:40:17PM -0700, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1478405835 25200
> #      Sat Nov 05 21:17:15 2016 -0700
> # Node ID ebbd8d975e4bf59b2bdd44736fdf13222988d1a4
> # Parent  f01367faa792635ad2f7a6b175ae3252292b5121
> revlog: add clone method
>
> Upcoming patches will introduce functionality for in-place
> repository/store "upgrades." Copying the contents of a revlog
> feels sufficiently low-level to warrant being in the revlog
> class.
>
> diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> --- a/mercurial/revlog.py
> +++ b/mercurial/revlog.py
> @@ -1818,3 +1818,32 @@ class revlog(object):
>          if not self._inline:
>              res.append(self.datafile)
>          return res
> +
> +    def clone(self, tr, destrevlog, addrevisioncb=None):
> +        """Copy the contents of this revlog to another revlog.
> +
> +        The destination revlog will contain the same revisions and nodes.

It might be worth calling out that this copying is done via repeated
calls to addrevision, so delta recomputation will take place. Which is
to say this is not a cheap clone, but rather a very expensive clone,
as it's doing a full decompress-undelta-delta-compress cycle.

> +        However, it may not be bit-for-bit identical due to e.g. delta encoding
> +        differences.
> +        """
> +        if len(destrevlog):
> +            raise ValueError(_('destination revlog is not empty'))
> +
> +        index = self.index
> +        for rev in self:
> +            entry = index[rev]
> +
> +            # Some classes override linkrev to take filtered revs into
> +            # account. Use raw entry from index.
> +            linkrev = entry[4]
> +            p1 = index[entry[5]][7]
> +            p2 = index[entry[6]][7]
> +            node = entry[7]
> +            # FUTURE we could optionally allow reusing the delta to avoid
> +            # expensive recomputation.
> +            text = self.revision(rev)
> +
> +            destrevlog.addrevision(text, tr, linkrev, p1, p2, node=node)
> +
> +            if addrevisioncb:
> +                addrevisioncb(self, rev, node)
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Gregory Szorc - Nov. 21, 2016, 9:14 p.m.
On Mon, Nov 21, 2016 at 12:37 PM, Augie Fackler <raf@durin42.com> wrote:

> On Sat, Nov 05, 2016 at 09:40:17PM -0700, Gregory Szorc wrote:
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc@gmail.com>
> > # Date 1478405835 25200
> > #      Sat Nov 05 21:17:15 2016 -0700
> > # Node ID ebbd8d975e4bf59b2bdd44736fdf13222988d1a4
> > # Parent  f01367faa792635ad2f7a6b175ae3252292b5121
> > revlog: add clone method
> >
> > Upcoming patches will introduce functionality for in-place
> > repository/store "upgrades." Copying the contents of a revlog
> > feels sufficiently low-level to warrant being in the revlog
> > class.
> >
> > diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> > --- a/mercurial/revlog.py
> > +++ b/mercurial/revlog.py
> > @@ -1818,3 +1818,32 @@ class revlog(object):
> >          if not self._inline:
> >              res.append(self.datafile)
> >          return res
> > +
> > +    def clone(self, tr, destrevlog, addrevisioncb=None):
> > +        """Copy the contents of this revlog to another revlog.
> > +
> > +        The destination revlog will contain the same revisions and
> nodes.
>
> It might be worth calling out that this copying is done via repeated
> calls to addrevision, so delta recomputation will take place. Which is
> to say this is not a cheap clone, but rather a very expensive clone,
> as it's doing a full decompress-undelta-delta-compress cycle.
>

In my unpublished v2 of this series, I've added an argument to clone() that
controls delta reuse. I believe I also changed the default to feed the
cached delta into addrevision(), which will prevent the full delta cycle if
the delta has the revisions that would be chosen by addrevision(). I
believe this is a reasonable default, as the only benefit to a full delta
cycle is running bdiff again and hoping to get a better result. And we
don't need that unless we significantly change bdiff's behavior. Even then,
it is quite cost prohibitive, so I'm fine hiding that behind a --slow
argument or some such.


>
> > +        However, it may not be bit-for-bit identical due to e.g. delta
> encoding
> > +        differences.
> > +        """
> > +        if len(destrevlog):
> > +            raise ValueError(_('destination revlog is not empty'))
> > +
> > +        index = self.index
> > +        for rev in self:
> > +            entry = index[rev]
> > +
> > +            # Some classes override linkrev to take filtered revs into
> > +            # account. Use raw entry from index.
> > +            linkrev = entry[4]
> > +            p1 = index[entry[5]][7]
> > +            p2 = index[entry[6]][7]
> > +            node = entry[7]
> > +            # FUTURE we could optionally allow reusing the delta to
> avoid
> > +            # expensive recomputation.
> > +            text = self.revision(rev)
> > +
> > +            destrevlog.addrevision(text, tr, linkrev, p1, p2, node=node)
> > +
> > +            if addrevisioncb:
> > +                addrevisioncb(self, rev, node)
> > _______________________________________________
> > Mercurial-devel mailing list
> > Mercurial-devel@mercurial-scm.org
> > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
Augie Fackler - Nov. 22, 2016, 12:53 a.m.
> On Nov 21, 2016, at 16:14, Gregory Szorc <gregory.szorc@gmail.com> wrote:
> 
> > +        The destination revlog will contain the same revisions and nodes.
> 
> It might be worth calling out that this copying is done via repeated
> calls to addrevision, so delta recomputation will take place. Which is
> to say this is not a cheap clone, but rather a very expensive clone,
> as it's doing a full decompress-undelta-delta-compress cycle.
> 
> In my unpublished v2 of this series, I've added an argument to clone() that controls delta reuse. I believe I also changed the default to feed the cached delta into addrevision(), which will prevent the full delta cycle if the delta has the revisions that would be chosen by addrevision(). I believe this is a reasonable default, as the only benefit to a full delta cycle is running bdiff again and hoping to get a better result. And we don't need that unless we significantly change bdiff's behavior. Even then, it is quite cost prohibitive, so I'm fine hiding that behind a --slow argument or some such.

Even so, it strikes me that on a casual glance I'd probably assume that this was doing a file copy, not iteratively copying revisions.
Pierre-Yves David - Nov. 22, 2016, 2:01 a.m.
On 11/21/2016 10:14 PM, Gregory Szorc wrote:
> On Mon, Nov 21, 2016 at 12:37 PM, Augie Fackler <raf@durin42.com
> <mailto:raf@durin42.com>> wrote:
>
>     On Sat, Nov 05, 2016 at 09:40:17PM -0700, Gregory Szorc wrote:
>     > # HG changeset patch
>     > # User Gregory Szorc <gregory.szorc@gmail.com <mailto:gregory.szorc@gmail.com>>
>     > # Date 1478405835 25200
>     > #      Sat Nov 05 21:17:15 2016 -0700
>     > # Node ID ebbd8d975e4bf59b2bdd44736fdf13222988d1a4
>     > # Parent  f01367faa792635ad2f7a6b175ae3252292b5121
>     > revlog: add clone method
>     >
>     > Upcoming patches will introduce functionality for in-place
>     > repository/store "upgrades." Copying the contents of a revlog
>     > feels sufficiently low-level to warrant being in the revlog
>     > class.
>     >
>     > diff --git a/mercurial/revlog.py b/mercurial/revlog.py
>     > --- a/mercurial/revlog.py
>     > +++ b/mercurial/revlog.py
>     > @@ -1818,3 +1818,32 @@ class revlog(object):
>     >          if not self._inline:
>     >              res.append(self.datafile)
>     >          return res
>     > +
>     > +    def clone(self, tr, destrevlog, addrevisioncb=None):
>     > +        """Copy the contents of this revlog to another revlog.
>     > +
>     > +        The destination revlog will contain the same revisions and nodes.
>
>     It might be worth calling out that this copying is done via repeated
>     calls to addrevision, so delta recomputation will take place. Which is
>     to say this is not a cheap clone, but rather a very expensive clone,
>     as it's doing a full decompress-undelta-delta-compress cycle.
>
>
> In my unpublished v2 of this series, I've added an argument to clone()
> that controls delta reuse. I believe I also changed the default to feed
> the cached delta into addrevision(), which will prevent the full delta
> cycle if the delta has the revisions that would be chosen by
> addrevision(). I believe this is a reasonable default, as the only
> benefit to a full delta cycle is running bdiff again and hoping to get a
> better result. And we don't need that unless we significantly change
> bdiff's behavior. Even then, it is quite cost prohibitive, so I'm fine
> hiding that behind a --slow argument or some such.

+1 for this unpublished change. Being able to "cheaply" upgrade a 
repository to general delta and benefit from it moving forward would 
already be a massive plus. Having a way to do the slow upgrade is useful 
but can come later.

>     > +        However, it may not be bit-for-bit identical due to e.g. delta encoding
>     > +        differences.
>     > +        """
>     > +        if len(destrevlog):
>     > +            raise ValueError(_('destination revlog is not empty'))
>     > +
>     > +        index = self.index
>     > +        for rev in self:
>     > +            entry = index[rev]
>     > +
>     > +            # Some classes override linkrev to take filtered revs into
>     > +            # account. Use raw entry from index.
>     > +            linkrev = entry[4]
>     > +            p1 = index[entry[5]][7]
>     > +            p2 = index[entry[6]][7]
>     > +            node = entry[7]
>     > +            # FUTURE we could optionally allow reusing the delta to avoid
>     > +            # expensive recomputation.
>     > +            text = self.revision(rev)
>     > +
>     > +            destrevlog.addrevision(text, tr, linkrev, p1, p2, node=node)
>     > +
>     > +            if addrevisioncb:
>     > +                addrevisioncb(self, rev, node)

nits: I prefer explicit test for None. This othewise always eventually 
come back to bites you at some point.

I assume the callback is here to allow progress report. It is probably 
worth mentioning it in the patch description.

Patch

diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -1818,3 +1818,32 @@  class revlog(object):
         if not self._inline:
             res.append(self.datafile)
         return res
+
+    def clone(self, tr, destrevlog, addrevisioncb=None):
+        """Copy the contents of this revlog to another revlog.
+
+        The destination revlog will contain the same revisions and nodes.
+        However, it may not be bit-for-bit identical due to e.g. delta encoding
+        differences.
+        """
+        if len(destrevlog):
+            raise ValueError(_('destination revlog is not empty'))
+
+        index = self.index
+        for rev in self:
+            entry = index[rev]
+
+            # Some classes override linkrev to take filtered revs into
+            # account. Use raw entry from index.
+            linkrev = entry[4]
+            p1 = index[entry[5]][7]
+            p2 = index[entry[6]][7]
+            node = entry[7]
+            # FUTURE we could optionally allow reusing the delta to avoid
+            # expensive recomputation.
+            text = self.revision(rev)
+
+            destrevlog.addrevision(text, tr, linkrev, p1, p2, node=node)
+
+            if addrevisioncb:
+                addrevisioncb(self, rev, node)