Patchwork [5,of,7] revlog: add a fast path for non-raw revision

login
register
mail settings
Submitter Jun Wu
Date March 28, 2017, 7:49 a.m.
Message ID <aecce2adbd64b25325e5.1490687346@localhost.localdomain>
Download mbox | patch
Permalink /patch/19775/
State Changes Requested
Headers show

Comments

Jun Wu - March 28, 2017, 7:49 a.m.
# HG changeset patch
# User Jun Wu <quark@fb.com>
# Date 1490683355 25200
#      Mon Mar 27 23:42:35 2017 -0700
# Node ID aecce2adbd64b25325e559798baa617e7311e85f
# Parent  a9d87712bec99abe109c155948ee4b7f1f5ec208
# Available At https://bitbucket.org/quark-zju/hg-draft
#              hg pull https://bitbucket.org/quark-zju/hg-draft -r aecce2adbd64
revlog: add a fast path for non-raw revision

If flags are empty, it's equal to "raw=True". Since "raw=True" is less
common but "flags are empty" is very common, add the flags check. So on
cache hit, revlog.revision could go through the fast path in the most common
cases.

This also makes some hash check unnecessary, so the test is changed a bit.
Ryan McElroy - March 29, 2017, 12:41 p.m.
On 3/28/17 8:49 AM, Jun Wu wrote:
> # HG changeset patch
> # User Jun Wu <quark@fb.com>
> # Date 1490683355 25200
> #      Mon Mar 27 23:42:35 2017 -0700
> # Node ID aecce2adbd64b25325e559798baa617e7311e85f
> # Parent  a9d87712bec99abe109c155948ee4b7f1f5ec208
> # Available At https://urldefense.proofpoint.com/v2/url?u=https-3A__bitbucket.org_quark-2Dzju_hg-2Ddraft&d=DwIGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Jw8rundaE7TbmqBYd1txIQ&m=HZJ95FJtSQKfeTym596Dsrw1nj2_uFqbuHYgwPBJDS0&s=0utxjEpjZ6EWkHbcRaaNJmSr6QF_yI6rObSckqL3BII&e=
> #              hg pull https://urldefense.proofpoint.com/v2/url?u=https-3A__bitbucket.org_quark-2Dzju_hg-2Ddraft&d=DwIGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Jw8rundaE7TbmqBYd1txIQ&m=HZJ95FJtSQKfeTym596Dsrw1nj2_uFqbuHYgwPBJDS0&s=0utxjEpjZ6EWkHbcRaaNJmSr6QF_yI6rObSckqL3BII&e=  -r aecce2adbd64
> revlog: add a fast path for non-raw revision
>
> If flags are empty, it's equal to "raw=True". Since "raw=True" is less
> common but "flags are empty" is very common, add the flags check. So on
> cache hit, revlog.revision could go through the fast path in the most common
> cases.
>
> This also makes some hash check unnecessary, so the test is changed a bit.

Thanks for explaining this. It's a non-obvious change :-)

>
> diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> --- a/mercurial/revlog.py
> +++ b/mercurial/revlog.py
> @@ -1258,4 +1258,5 @@ class revlog(object):
>   
>           cachedrev = None
> +        flags = None

Why is this initialized to None here? Why not initialize to 
self.flags(rev) and avoid the two None checks later? Add a comment if 
you keep it this way. The self.flags method looks quite cheap but maybe 
I'm missing something.

>           if node == nullid:
>               return ""
> @@ -1264,4 +1265,9 @@ class revlog(object):
>                   if raw:
>                       return self._cache[2]
> +                if rev is None:
> +                    rev = self.rev(node)
> +                flags = self.flags(rev)
> +                if not flags:
> +                    return self._cache[2]

I think this needs some comments and perhaps some refactoring.

I'd suggest something like this:

# useful comment
if raw or not self.flags:

This would be made possible if you eagerly initialize flags above.

Upon closer inspection, maybe its self.rev() that you're trying to avoid 
here, but is the expense there the radix tree? In this case, it should 
always be a full hash so we shouldn't need a radixtree right? Can we 
fastpath this somehow instead?

That may be out of the scope of this series but a comment to explain 
your choice is important.

>               cachedrev = self._cache[1]
>   
> @@ -1285,6 +1291,7 @@ class revlog(object):
>           text = mdiff.patches(text, bins)
>   
> -        newtext, validatehash = self._processflags(text, self.flags(rev),
> -                                                   'read', raw=raw)
> +        if flags is None:
> +            flags = self.flags(rev)
> +        newtext, validatehash = self._processflags(text, flags, 'read', raw=raw)
>           if validatehash:
>               self.checkhash(newtext, node, rev=rev)
> diff --git a/tests/test-revlog-raw.py.out b/tests/test-revlog-raw.py.out
> --- a/tests/test-revlog-raw.py.out
> +++ b/tests/test-revlog-raw.py.out
> @@ -23,5 +23,4 @@ REV RAW DATA    CACHE BEFORE -> AFTER
>     # 3338e2e7da35 = hash(7bc003d2bcd3, 02T35)
>     2 y   02T35   1 +012T34       2 02T35
> -  # 3338e2e7da35 = hash(7bc003d2bcd3, 02T35)
>     2     02T35   2 02T35         2 02T35
>   
> @@ -44,5 +43,4 @@ REV RAW DATA    CACHE BEFORE -> AFTER
>     # 3338e2e7da35 = hash(7bc003d2bcd3, 02T35)
>     2 y   02T35   None            2 02T35
> -  # 3338e2e7da35 = hash(7bc003d2bcd3, 02T35)
>     2     02T35   2 02T35         2 02T35
>     1 y   +012T34 2 02T35         1 +012T34
>
Jun Wu - March 29, 2017, 3:43 p.m.
Excerpts from Ryan McElroy's message of 2017-03-29 13:41:21 +0100:
> On 3/28/17 8:49 AM, Jun Wu wrote:
> > # HG changeset patch
> > # User Jun Wu <quark@fb.com>
> > # Date 1490683355 25200
> > #      Mon Mar 27 23:42:35 2017 -0700
> > # Node ID aecce2adbd64b25325e559798baa617e7311e85f
> > # Parent  a9d87712bec99abe109c155948ee4b7f1f5ec208
> > # Available At https://bitbucket.org/quark-zju/hg-draft
> > #              hg pull https://bitbucket.org/quark-zju/hg-draft  -r aecce2adbd64
> > revlog: add a fast path for non-raw revision
> >
> > If flags are empty, it's equal to "raw=True". Since "raw=True" is less
> > common but "flags are empty" is very common, add the flags check. So on
> > cache hit, revlog.revision could go through the fast path in the most common
> > cases.
> >
> > This also makes some hash check unnecessary, so the test is changed a bit.
> 
> Thanks for explaining this. It's a non-obvious change :-)
> 
> >
> > diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> > --- a/mercurial/revlog.py
> > +++ b/mercurial/revlog.py
> > @@ -1258,4 +1258,5 @@ class revlog(object):
> >   
> >           cachedrev = None
> > +        flags = None
> 
> Why is this initialized to None here? Why not initialize to 
> self.flags(rev) and avoid the two None checks later? Add a comment if 

Because in the cache hit + raw=True case, flags is not needed.

> you keep it this way. The self.flags method looks quite cheap but maybe 
> I'm missing something.
> 
> >           if node == nullid:
> >               return ""
> > @@ -1264,4 +1265,9 @@ class revlog(object):
> >                   if raw:
> >                       return self._cache[2]
> > +                if rev is None:
> > +                    rev = self.rev(node)
> > +                flags = self.flags(rev)
> > +                if not flags:
> > +                    return self._cache[2]
> 
> I think this needs some comments and perhaps some refactoring.
> 
> I'd suggest something like this:
> 
> # useful comment
> if raw or not self.flags:
> 
> This would be made possible if you eagerly initialize flags above.
> 
> Upon closer inspection, maybe its self.rev() that you're trying to avoid 
> here, but is the expense there the radix tree? In this case, it should 
> always be a full hash so we shouldn't need a radixtree right? Can we 
> fastpath this somehow instead?

Radixtree is needed for that. I think the None checks are the only way to
avoid unnecessary side effects.

> 
> That may be out of the scope of this series but a comment to explain 
> your choice is important.

It's the style in this area. I adjust my choice according to surrounding
code. If you look at "cachedrev", or "rev", they remain "None" unless
absolutely necessary. I did the same thing for all the code here.

I think in revlog.revision, code length is a second citizen, performance is
the most important.

> 
> >               cachedrev = self._cache[1]
> >   
> > @@ -1285,6 +1291,7 @@ class revlog(object):
> >           text = mdiff.patches(text, bins)
> >   
> > -        newtext, validatehash = self._processflags(text, self.flags(rev),
> > -                                                   'read', raw=raw)
> > +        if flags is None:
> > +            flags = self.flags(rev)
> > +        newtext, validatehash = self._processflags(text, flags, 'read', raw=raw)
> >           if validatehash:
> >               self.checkhash(newtext, node, rev=rev)
> > diff --git a/tests/test-revlog-raw.py.out b/tests/test-revlog-raw.py.out
> > --- a/tests/test-revlog-raw.py.out
> > +++ b/tests/test-revlog-raw.py.out
> > @@ -23,5 +23,4 @@ REV RAW DATA    CACHE BEFORE -> AFTER
> >     # 3338e2e7da35 = hash(7bc003d2bcd3, 02T35)
> >     2 y   02T35   1 +012T34       2 02T35
> > -  # 3338e2e7da35 = hash(7bc003d2bcd3, 02T35)
> >     2     02T35   2 02T35         2 02T35
> >   
> > @@ -44,5 +43,4 @@ REV RAW DATA    CACHE BEFORE -> AFTER
> >     # 3338e2e7da35 = hash(7bc003d2bcd3, 02T35)
> >     2 y   02T35   None            2 02T35
> > -  # 3338e2e7da35 = hash(7bc003d2bcd3, 02T35)
> >     2     02T35   2 02T35         2 02T35
> >     1 y   +012T34 2 02T35         1 +012T34
> >

Patch

diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -1258,4 +1258,5 @@  class revlog(object):
 
         cachedrev = None
+        flags = None
         if node == nullid:
             return ""
@@ -1264,4 +1265,9 @@  class revlog(object):
                 if raw:
                     return self._cache[2]
+                if rev is None:
+                    rev = self.rev(node)
+                flags = self.flags(rev)
+                if not flags:
+                    return self._cache[2]
             cachedrev = self._cache[1]
 
@@ -1285,6 +1291,7 @@  class revlog(object):
         text = mdiff.patches(text, bins)
 
-        newtext, validatehash = self._processflags(text, self.flags(rev),
-                                                   'read', raw=raw)
+        if flags is None:
+            flags = self.flags(rev)
+        newtext, validatehash = self._processflags(text, flags, 'read', raw=raw)
         if validatehash:
             self.checkhash(newtext, node, rev=rev)
diff --git a/tests/test-revlog-raw.py.out b/tests/test-revlog-raw.py.out
--- a/tests/test-revlog-raw.py.out
+++ b/tests/test-revlog-raw.py.out
@@ -23,5 +23,4 @@  REV RAW DATA    CACHE BEFORE -> AFTER
   # 3338e2e7da35 = hash(7bc003d2bcd3, 02T35)
   2 y   02T35   1 +012T34       2 02T35
-  # 3338e2e7da35 = hash(7bc003d2bcd3, 02T35)
   2     02T35   2 02T35         2 02T35
 
@@ -44,5 +43,4 @@  REV RAW DATA    CACHE BEFORE -> AFTER
   # 3338e2e7da35 = hash(7bc003d2bcd3, 02T35)
   2 y   02T35   None            2 02T35
-  # 3338e2e7da35 = hash(7bc003d2bcd3, 02T35)
   2     02T35   2 02T35         2 02T35
   1 y   +012T34 2 02T35         1 +012T34