Patchwork pull: make a single call to obsstore.add (issue5006)

login
register
mail settings
Submitter Matt Mackall
Date Dec. 18, 2015, 7:59 p.m.
Message ID <d386b2d49c5842c0d22e.1450468775@ruin.waste.org>
Download mbox | patch
Permalink /patch/12168/
State Accepted
Commit b5b54825de6ba111090c91c45963c75346a7a797
Delegated to: Pierre-Yves David
Headers show

Comments

Matt Mackall - Dec. 18, 2015, 7:59 p.m.
# HG changeset patch
# User Matt Mackall <mpm@selenic.com>
# Date 1450468430 21600
#      Fri Dec 18 13:53:50 2015 -0600
# Node ID d386b2d49c5842c0d22e7dadd90dacf120ee2de7
# Parent  5155fa08609e2530cd4d2e66bdf21e934a35351e
pull: make a single call to obsstore.add (issue5006)

Prior to this, a pull of 90k markers (already known locally!) was
making about 2000 calls to obsstore.add, which was repeatedly building
a full set of known markers (in addition to other transaction
overhead). This quadratic behavior accounted for about 50 seconds of a
70 second no-op pull. After this change, we're down to 20 seconds.

While it would seem simplest to just cache the known set for
obsstore.add, this would also introduce issues of correct cache invalidation.
The extra pointless transaction overhead would also remain.
Yuya Nishihara - Dec. 25, 2015, 12:39 p.m.
On Fri, 18 Dec 2015 13:59:35 -0600, Matt Mackall wrote:
> # HG changeset patch
> # User Matt Mackall <mpm@selenic.com>
> # Date 1450468430 21600
> #      Fri Dec 18 13:53:50 2015 -0600
> # Node ID d386b2d49c5842c0d22e7dadd90dacf120ee2de7
> # Parent  5155fa08609e2530cd4d2e66bdf21e934a35351e
> pull: make a single call to obsstore.add (issue5006)
> 
> Prior to this, a pull of 90k markers (already known locally!) was
> making about 2000 calls to obsstore.add, which was repeatedly building
> a full set of known markers (in addition to other transaction
> overhead). This quadratic behavior accounted for about 50 seconds of a
> 70 second no-op pull. After this change, we're down to 20 seconds.
> 
> While it would seem simplest to just cache the known set for
> obsstore.add, this would also introduce issues of correct cache invalidation.
> The extra pointless transaction overhead would also remain.
> 
> diff -r 5155fa08609e -r d386b2d49c58 mercurial/exchange.py
> --- a/mercurial/exchange.py	Mon Dec 14 20:57:21 2015 -0500
> +++ b/mercurial/exchange.py	Fri Dec 18 13:53:50 2015 -0600
> @@ -1380,10 +1380,14 @@
>          remoteobs = pullop.remote.listkeys('obsolete')
>          if 'dump0' in remoteobs:
>              tr = pullop.gettransaction()
> +            markers = []
>              for key in sorted(remoteobs, reverse=True):
>                  if key.startswith('dump'):
>                      data = base85.b85decode(remoteobs[key])
> -                    pullop.repo.obsstore.mergemarkers(tr, data)
> +                    version, newmarks = obsolete._readmarkers(data)
> +                    markers += newmarks
> +            if markers:
> +                pullop.repo.obsstore.add(tr, markers)

Looks good to me. Pierre-Yves, can you take a look at?

FWIW, we could avoid calling private function if mergemarkers() accepts a list
of data.
Augie Fackler - Dec. 29, 2015, 10:28 p.m.
On Fri, Dec 25, 2015 at 09:39:22PM +0900, Yuya Nishihara wrote:
> On Fri, 18 Dec 2015 13:59:35 -0600, Matt Mackall wrote:
> > # HG changeset patch
> > # User Matt Mackall <mpm@selenic.com>
> > # Date 1450468430 21600
> > #      Fri Dec 18 13:53:50 2015 -0600
> > # Node ID d386b2d49c5842c0d22e7dadd90dacf120ee2de7
> > # Parent  5155fa08609e2530cd4d2e66bdf21e934a35351e
> > pull: make a single call to obsstore.add (issue5006)
> >
> > Prior to this, a pull of 90k markers (already known locally!) was
> > making about 2000 calls to obsstore.add, which was repeatedly building
> > a full set of known markers (in addition to other transaction
> > overhead). This quadratic behavior accounted for about 50 seconds of a
> > 70 second no-op pull. After this change, we're down to 20 seconds.
> >
> > While it would seem simplest to just cache the known set for
> > obsstore.add, this would also introduce issues of correct cache invalidation.
> > The extra pointless transaction overhead would also remain.
> >
> > diff -r 5155fa08609e -r d386b2d49c58 mercurial/exchange.py
> > --- a/mercurial/exchange.py	Mon Dec 14 20:57:21 2015 -0500
> > +++ b/mercurial/exchange.py	Fri Dec 18 13:53:50 2015 -0600
> > @@ -1380,10 +1380,14 @@
> >          remoteobs = pullop.remote.listkeys('obsolete')
> >          if 'dump0' in remoteobs:
> >              tr = pullop.gettransaction()
> > +            markers = []
> >              for key in sorted(remoteobs, reverse=True):
> >                  if key.startswith('dump'):
> >                      data = base85.b85decode(remoteobs[key])
> > -                    pullop.repo.obsstore.mergemarkers(tr, data)
> > +                    version, newmarks = obsolete._readmarkers(data)
> > +                    markers += newmarks
> > +            if markers:
> > +                pullop.repo.obsstore.add(tr, markers)
>
> Looks good to me. Pierre-Yves, can you take a look at?

I agree, this looks right, and it's enough of a perf win I'm going to
just push it now. Queued.

>
> FWIW, we could avoid calling private function if mergemarkers() accepts a list
> of data.
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> https://selenic.com/mailman/listinfo/mercurial-devel

Patch

diff -r 5155fa08609e -r d386b2d49c58 mercurial/exchange.py
--- a/mercurial/exchange.py	Mon Dec 14 20:57:21 2015 -0500
+++ b/mercurial/exchange.py	Fri Dec 18 13:53:50 2015 -0600
@@ -1380,10 +1380,14 @@ 
         remoteobs = pullop.remote.listkeys('obsolete')
         if 'dump0' in remoteobs:
             tr = pullop.gettransaction()
+            markers = []
             for key in sorted(remoteobs, reverse=True):
                 if key.startswith('dump'):
                     data = base85.b85decode(remoteobs[key])
-                    pullop.repo.obsstore.mergemarkers(tr, data)
+                    version, newmarks = obsolete._readmarkers(data)
+                    markers += newmarks
+            if markers:
+                pullop.repo.obsstore.add(tr, markers)
             pullop.repo.invalidatevolatilesets()
     return tr