Patchwork [3,of,3] addrevision: use general delta when the incoming base delta is bad

login
register
mail settings
Submitter Pierre-Yves David
Date Dec. 2, 2015, 5:32 p.m.
Message ID <64e48bdff2461e793c96.1449077570@marginatus.alto.octopoid.net>
Download mbox | patch
Permalink /patch/11763/
State Accepted
Headers show

Comments

Pierre-Yves David - Dec. 2, 2015, 5:32 p.m.
# HG changeset patch
# User Pierre-Yves David <pierre-yves.david@fb.com>
# Date 1449015359 28800
#      Tue Dec 01 16:15:59 2015 -0800
# Node ID 64e48bdff2461e793c963d677407d35fc32c30a9
# Parent  49f5bfa7043ae0c797536dab93260aee92fcbaa8
# EXP-Topic generaldelta
# Available At http://hg.netv6.net/marmoute-wip/mercurial/
#              hg pull http://hg.netv6.net/marmoute-wip/mercurial/ -r 64e48bdff246
addrevision: use general delta when the incoming base delta is bad

We unify the delta selection process to be a simple three options process:

- try to use the incoming delta (if lazydeltabase is on)
- try to find a suitable parents to delta against (if gd is on)
- try to delta against the tipmost revision

The first of this option that yield a valid delta will be used.

The test change in 'test-generaldelta.t' show this behavior as we use a delta
against the parent instead of a full delta when the incoming delta is not
suitable.

This as some impact on 'test-bundle.t' because a delta somewhere changes. It
does not seems to change the test semantic and have been ignored.
Martin von Zweigbergk - Dec. 3, 2015, 7:15 a.m.
On Wed, Dec 2, 2015 at 9:56 AM Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@fb.com>
> # Date 1449015359 28800
> #      Tue Dec 01 16:15:59 2015 -0800
> # Node ID 64e48bdff2461e793c963d677407d35fc32c30a9
> # Parent  49f5bfa7043ae0c797536dab93260aee92fcbaa8
> # EXP-Topic generaldelta
> # Available At http://hg.netv6.net/marmoute-wip/mercurial/
> #              hg pull http://hg.netv6.net/marmoute-wip/mercurial/ -r
> 64e48bdff246
> addrevision: use general delta when the incoming base delta is bad
>
> We unify the delta selection process to be a simple three options process:
>
> - try to use the incoming delta (if lazydeltabase is on)
> - try to find a suitable parents to delta against (if gd is on)
> - try to delta against the tipmost revision
>
> The first of this option that yield a valid delta will be used.
>
> The test change in 'test-generaldelta.t' show this behavior as we use a
> delta
> against the parent instead of a full delta when the incoming delta is not
> suitable.
>
> This as some impact on 'test-bundle.t' because a delta somewhere changes.
> It
> does not seems to change the test semantic and have been ignored.
>
> diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> --- a/mercurial/revlog.py
> +++ b/mercurial/revlog.py
> @@ -1422,38 +1422,37 @@ class revlog(object):
>          else:
>              textlen = len(text)
>
>          # should we try to build a delta?
>          if prev != nullrev:
> +            tested = set()
>              if cachedelta and self._generaldelta and self._lazydeltabase:
>                  # Assume what we received from the server is a good choice
>                  # build delta will reuse the cache
>                  candidatedelta = builddelta(cachedelta[0])
> +                tested.add(candidatedelta[3])
>                  if self._isgooddelta(candidatedelta, textlen):
>                      delta = candidatedelta
> -                elif prev != candidatedelta[3]:
> -                    # Try against prev to hopefully save us a fulltext.
> -                    delta = builddelta(prev)
> -            elif self._generaldelta:
> +            if delta is None and self._generaldelta:
>                  parents = [p1r, p2r]
> -                if not self._aggressivemergedeltas:
> +                # exclude already lazy tested base if any
> +                parents = [p for p in parents if p not in tested]
> +                if parents and not self._aggressivemergedeltas:
>                      # Pick whichever parent is closer to us (to minimize
> the
> -                    # chance of having to build a fulltext). Since
> -                    # nullrev == -1, any non-merge commit will always
> pick p1r.
> +                    # chance of having to build a fulltext).
>                      parents = [max(parents)]
>

Before this patch, there were always 2 revisions in parents before this
line, so the nullrev would never be tested only for root commits. After
this patch, nullrev will also be tested for non-merge commits where the
first parent was the incoming delta. I suppose it's unlikely to make a
difference, since those will almost always probably not be good deltas, but
it also seems unnecessary to even check. Is the testing of nullrev even
intentional?



> +                tested.update(parents)
>                  pdeltas = []
>                  for p in parents:
>                      pd = builddelta(p)
>                      if self._isgooddelta(pd, textlen):
>                          pdeltas.append(pd)
>                  if pdeltas:
>                      delta = min(pdeltas, key=lambda x: x[1])
> -                elif prev not in parents:
> -                    # Neither is good, try against prev to hopefully save
> us
> -                    # a fulltext.
> -                    delta = builddelta(prev)
> -            else:
> +            if delta is None and prev not in tested:
> +                # other approach failed try against prev to hopefully
> save us a
> +                # fulltext.
>                  delta = builddelta(prev)
>          if delta is not None:
>              dist, l, data, base, chainbase, chainlen, compresseddeltalen
> = delta
>
>          if not self._isgooddelta(delta, textlen):
> diff --git a/tests/test-bundle.t b/tests/test-bundle.t
> --- a/tests/test-bundle.t
> +++ b/tests/test-bundle.t
> @@ -264,17 +264,17 @@ Cannot produce streaming clone bundles w
>    [255]
>
>  packed1 is produced properly
>
>    $ hg -R test debugcreatestreamclonebundle packed.hg
> -  writing 2663 bytes for 6 files
> +  writing 2667 bytes for 6 files
>    bundle requirements: generaldelta, revlogv1
>
>    $ f -B 64 --size --sha1 --hexdump packed.hg
> -  packed.hg: size=2826, sha1=e139f97692a142b19cdcff64a69697d5307ce6d4
> +  packed.hg: size=2830, sha1=c28255110a88ffa52ddc44985cad295b1ab349bc
>    0000: 48 47 53 31 55 4e 00 00 00 00 00 00 00 06 00 00 |HGS1UN..........|
> -  0010: 00 00 00 00 0a 67 00 16 67 65 6e 65 72 61 6c 64 |.....g..generald|
> +  0010: 00 00 00 00 0a 6b 00 16 67 65 6e 65 72 61 6c 64 |.....k..generald|
>    0020: 65 6c 74 61 2c 72 65 76 6c 6f 67 76 31 00 64 61 |elta,revlogv1.da|
>    0030: 74 61 2f 61 64 69 66 66 65 72 65 6e 74 66 69 6c |ta/adifferentfil|
>
>  generaldelta requirement is listed in stream clone bundles
>
> diff --git a/tests/test-generaldelta.t b/tests/test-generaldelta.t
> --- a/tests/test-generaldelta.t
> +++ b/tests/test-generaldelta.t
> @@ -103,11 +103,11 @@ delta coming from the server base delta
>    $ hg -R usegd debugindex -m
>       rev    offset  length  delta linkrev nodeid       p1           p2
>         0         0     104     -1       0 cef96823c800 000000000000
> 000000000000
>         1       104      57      0       1 58ab9a8d541d cef96823c800
> 000000000000
>         2       161      57      1       2 134fdc6fd680 cef96823c800
> 000000000000
> -       3       218     104     -1       3 723508934dad cef96823c800
> 000000000000
> +       3       218      57      0       3 723508934dad cef96823c800
> 000000000000
>    $ hg -R full debugindex -m
>       rev    offset  length  delta linkrev nodeid       p1           p2
>         0         0     104     -1       0 cef96823c800 000000000000
> 000000000000
>         1       104      57      0       1 58ab9a8d541d cef96823c800
> 000000000000
>         2       161      57      0       2 134fdc6fd680 cef96823c800
> 000000000000
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> https://selenic.com/mailman/listinfo/mercurial-devel
>
Pierre-Yves David - Dec. 3, 2015, 7:19 a.m.
On 12/02/2015 11:15 PM, Martin von Zweigbergk wrote:
>
>
> On Wed, Dec 2, 2015 at 9:56 AM Pierre-Yves David
> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
> wrote:
>
>     # HG changeset patch
>     # User Pierre-Yves David <pierre-yves.david@fb.com
>     <mailto:pierre-yves.david@fb.com>>
>     # Date 1449015359 28800
>     #      Tue Dec 01 16:15:59 2015 -0800
>     # Node ID 64e48bdff2461e793c963d677407d35fc32c30a9
>     # Parent  49f5bfa7043ae0c797536dab93260aee92fcbaa8
>     # EXP-Topic generaldelta
>     # Available At http://hg.netv6.net/marmoute-wip/mercurial/
>     #              hg pull http://hg.netv6.net/marmoute-wip/mercurial/
>     -r 64e48bdff246
>     addrevision: use general delta when the incoming base delta is bad
>
>     We unify the delta selection process to be a simple three options
>     process:
>
>     - try to use the incoming delta (if lazydeltabase is on)
>     - try to find a suitable parents to delta against (if gd is on)
>     - try to delta against the tipmost revision
>
>     The first of this option that yield a valid delta will be used.
>
>     The test change in 'test-generaldelta.t' show this behavior as we
>     use a delta
>     against the parent instead of a full delta when the incoming delta
>     is not
>     suitable.
>
>     This as some impact on 'test-bundle.t' because a delta somewhere
>     changes. It
>     does not seems to change the test semantic and have been ignored.
>
>     diff --git a/mercurial/revlog.py b/mercurial/revlog.py
>     --- a/mercurial/revlog.py
>     +++ b/mercurial/revlog.py
>     @@ -1422,38 +1422,37 @@ class revlog(object):
>               else:
>                   textlen = len(text)
>
>               # should we try to build a delta?
>               if prev != nullrev:
>     +            tested = set()
>                   if cachedelta and self._generaldelta and
>     self._lazydeltabase:
>                       # Assume what we received from the server is a
>     good choice
>                       # build delta will reuse the cache
>                       candidatedelta = builddelta(cachedelta[0])
>     +                tested.add(candidatedelta[3])
>                       if self._isgooddelta(candidatedelta, textlen):
>                           delta = candidatedelta
>     -                elif prev != candidatedelta[3]:
>     -                    # Try against prev to hopefully save us a fulltext.
>     -                    delta = builddelta(prev)
>     -            elif self._generaldelta:
>     +            if delta is None and self._generaldelta:
>                       parents = [p1r, p2r]
>     -                if not self._aggressivemergedeltas:
>     +                # exclude already lazy tested base if any
>     +                parents = [p for p in parents if p not in tested]
>     +                if parents and not self._aggressivemergedeltas:
>                           # Pick whichever parent is closer to us (to
>     minimize the
>     -                    # chance of having to build a fulltext). Since
>     -                    # nullrev == -1, any non-merge commit will
>     always pick p1r.
>     +                    # chance of having to build a fulltext).
>                           parents = [max(parents)]
>
>
> Before this patch, there were always 2 revisions in parents before this
> line, so the nullrev would never be tested only for root commits. After
> this patch, nullrev will also be tested for non-merge commits where the
> first parent was the incoming delta. I suppose it's unlikely to make a
> difference, since those will almost always probably not be good deltas,
> but it also seems unnecessary to even check. Is the testing of nullrev
> even intentional?

earlier version of this patch was testing for nullrev explicitly, but I 
figured out that builddelta could also we feed nullrev (root commit) and 
therefore was able to handle it.
Martin von Zweigbergk - Dec. 3, 2015, 7:23 a.m.
On Wed, Dec 2, 2015 at 11:19 PM Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 12/02/2015 11:15 PM, Martin von Zweigbergk wrote:
> >
> >
> > On Wed, Dec 2, 2015 at 9:56 AM Pierre-Yves David
> > <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
> > wrote:
> >
> >     # HG changeset patch
> >     # User Pierre-Yves David <pierre-yves.david@fb.com
> >     <mailto:pierre-yves.david@fb.com>>
> >     # Date 1449015359 28800
> >     #      Tue Dec 01 16:15:59 2015 -0800
> >     # Node ID 64e48bdff2461e793c963d677407d35fc32c30a9
> >     # Parent  49f5bfa7043ae0c797536dab93260aee92fcbaa8
> >     # EXP-Topic generaldelta
> >     # Available At http://hg.netv6.net/marmoute-wip/mercurial/
> >     #              hg pull http://hg.netv6.net/marmoute-wip/mercurial/
> >     -r 64e48bdff246
> >     addrevision: use general delta when the incoming base delta is bad
> >
> >     We unify the delta selection process to be a simple three options
> >     process:
> >
> >     - try to use the incoming delta (if lazydeltabase is on)
> >     - try to find a suitable parents to delta against (if gd is on)
> >     - try to delta against the tipmost revision
> >
> >     The first of this option that yield a valid delta will be used.
> >
> >     The test change in 'test-generaldelta.t' show this behavior as we
> >     use a delta
> >     against the parent instead of a full delta when the incoming delta
> >     is not
> >     suitable.
> >
> >     This as some impact on 'test-bundle.t' because a delta somewhere
> >     changes. It
> >     does not seems to change the test semantic and have been ignored.
> >
> >     diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> >     --- a/mercurial/revlog.py
> >     +++ b/mercurial/revlog.py
> >     @@ -1422,38 +1422,37 @@ class revlog(object):
> >               else:
> >                   textlen = len(text)
> >
> >               # should we try to build a delta?
> >               if prev != nullrev:
> >     +            tested = set()
> >                   if cachedelta and self._generaldelta and
> >     self._lazydeltabase:
> >                       # Assume what we received from the server is a
> >     good choice
> >                       # build delta will reuse the cache
> >                       candidatedelta = builddelta(cachedelta[0])
> >     +                tested.add(candidatedelta[3])
> >                       if self._isgooddelta(candidatedelta, textlen):
> >                           delta = candidatedelta
> >     -                elif prev != candidatedelta[3]:
> >     -                    # Try against prev to hopefully save us a
> fulltext.
> >     -                    delta = builddelta(prev)
> >     -            elif self._generaldelta:
> >     +            if delta is None and self._generaldelta:
> >                       parents = [p1r, p2r]
> >     -                if not self._aggressivemergedeltas:
> >     +                # exclude already lazy tested base if any
> >     +                parents = [p for p in parents if p not in tested]
> >     +                if parents and not self._aggressivemergedeltas:
> >                           # Pick whichever parent is closer to us (to
> >     minimize the
> >     -                    # chance of having to build a fulltext). Since
> >     -                    # nullrev == -1, any non-merge commit will
> >     always pick p1r.
> >     +                    # chance of having to build a fulltext).
> >                           parents = [max(parents)]
> >
> >
> > Before this patch, there were always 2 revisions in parents before this
> > line, so the nullrev would never be tested only for root commits. After
> > this patch, nullrev will also be tested for non-merge commits where the
> > first parent was the incoming delta. I suppose it's unlikely to make a
> > difference, since those will almost always probably not be good deltas,
> > but it also seems unnecessary to even check. Is the testing of nullrev
> > even intentional?
>
> earlier version of this patch was testing for nullrev explicitly, but I
> figured out that builddelta could also we feed nullrev (root commit) and
> therefore was able to handle it.
>

I guess you're saying it's not wasteful either. Good, pushed to the
clowncopter!


>
> --
> Pierre-Yves David
>

Patch

diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -1422,38 +1422,37 @@  class revlog(object):
         else:
             textlen = len(text)
 
         # should we try to build a delta?
         if prev != nullrev:
+            tested = set()
             if cachedelta and self._generaldelta and self._lazydeltabase:
                 # Assume what we received from the server is a good choice
                 # build delta will reuse the cache
                 candidatedelta = builddelta(cachedelta[0])
+                tested.add(candidatedelta[3])
                 if self._isgooddelta(candidatedelta, textlen):
                     delta = candidatedelta
-                elif prev != candidatedelta[3]:
-                    # Try against prev to hopefully save us a fulltext.
-                    delta = builddelta(prev)
-            elif self._generaldelta:
+            if delta is None and self._generaldelta:
                 parents = [p1r, p2r]
-                if not self._aggressivemergedeltas:
+                # exclude already lazy tested base if any
+                parents = [p for p in parents if p not in tested]
+                if parents and not self._aggressivemergedeltas:
                     # Pick whichever parent is closer to us (to minimize the
-                    # chance of having to build a fulltext). Since
-                    # nullrev == -1, any non-merge commit will always pick p1r.
+                    # chance of having to build a fulltext).
                     parents = [max(parents)]
+                tested.update(parents)
                 pdeltas = []
                 for p in parents:
                     pd = builddelta(p)
                     if self._isgooddelta(pd, textlen):
                         pdeltas.append(pd)
                 if pdeltas:
                     delta = min(pdeltas, key=lambda x: x[1])
-                elif prev not in parents:
-                    # Neither is good, try against prev to hopefully save us
-                    # a fulltext.
-                    delta = builddelta(prev)
-            else:
+            if delta is None and prev not in tested:
+                # other approach failed try against prev to hopefully save us a
+                # fulltext.
                 delta = builddelta(prev)
         if delta is not None:
             dist, l, data, base, chainbase, chainlen, compresseddeltalen = delta
 
         if not self._isgooddelta(delta, textlen):
diff --git a/tests/test-bundle.t b/tests/test-bundle.t
--- a/tests/test-bundle.t
+++ b/tests/test-bundle.t
@@ -264,17 +264,17 @@  Cannot produce streaming clone bundles w
   [255]
 
 packed1 is produced properly
 
   $ hg -R test debugcreatestreamclonebundle packed.hg
-  writing 2663 bytes for 6 files
+  writing 2667 bytes for 6 files
   bundle requirements: generaldelta, revlogv1
 
   $ f -B 64 --size --sha1 --hexdump packed.hg
-  packed.hg: size=2826, sha1=e139f97692a142b19cdcff64a69697d5307ce6d4
+  packed.hg: size=2830, sha1=c28255110a88ffa52ddc44985cad295b1ab349bc
   0000: 48 47 53 31 55 4e 00 00 00 00 00 00 00 06 00 00 |HGS1UN..........|
-  0010: 00 00 00 00 0a 67 00 16 67 65 6e 65 72 61 6c 64 |.....g..generald|
+  0010: 00 00 00 00 0a 6b 00 16 67 65 6e 65 72 61 6c 64 |.....k..generald|
   0020: 65 6c 74 61 2c 72 65 76 6c 6f 67 76 31 00 64 61 |elta,revlogv1.da|
   0030: 74 61 2f 61 64 69 66 66 65 72 65 6e 74 66 69 6c |ta/adifferentfil|
 
 generaldelta requirement is listed in stream clone bundles
 
diff --git a/tests/test-generaldelta.t b/tests/test-generaldelta.t
--- a/tests/test-generaldelta.t
+++ b/tests/test-generaldelta.t
@@ -103,11 +103,11 @@  delta coming from the server base delta 
   $ hg -R usegd debugindex -m
      rev    offset  length  delta linkrev nodeid       p1           p2
        0         0     104     -1       0 cef96823c800 000000000000 000000000000
        1       104      57      0       1 58ab9a8d541d cef96823c800 000000000000
        2       161      57      1       2 134fdc6fd680 cef96823c800 000000000000
-       3       218     104     -1       3 723508934dad cef96823c800 000000000000
+       3       218      57      0       3 723508934dad cef96823c800 000000000000
   $ hg -R full debugindex -m
      rev    offset  length  delta linkrev nodeid       p1           p2
        0         0     104     -1       0 cef96823c800 000000000000 000000000000
        1       104      57      0       1 58ab9a8d541d cef96823c800 000000000000
        2       161      57      0       2 134fdc6fd680 cef96823c800 000000000000