Submitter | Jun Wu |
---|---|
Date | March 28, 2017, 7:49 a.m. |
Message ID | <aecce2adbd64b25325e5.1490687346@localhost.localdomain> |
Download | mbox | patch |
Permalink | /patch/19775/ |
State | Changes Requested |
Headers | show |
Comments
On 3/28/17 8:49 AM, Jun Wu wrote: > # HG changeset patch > # User Jun Wu <quark@fb.com> > # Date 1490683355 25200 > # Mon Mar 27 23:42:35 2017 -0700 > # Node ID aecce2adbd64b25325e559798baa617e7311e85f > # Parent a9d87712bec99abe109c155948ee4b7f1f5ec208 > # Available At https://urldefense.proofpoint.com/v2/url?u=https-3A__bitbucket.org_quark-2Dzju_hg-2Ddraft&d=DwIGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Jw8rundaE7TbmqBYd1txIQ&m=HZJ95FJtSQKfeTym596Dsrw1nj2_uFqbuHYgwPBJDS0&s=0utxjEpjZ6EWkHbcRaaNJmSr6QF_yI6rObSckqL3BII&e= > # hg pull https://urldefense.proofpoint.com/v2/url?u=https-3A__bitbucket.org_quark-2Dzju_hg-2Ddraft&d=DwIGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Jw8rundaE7TbmqBYd1txIQ&m=HZJ95FJtSQKfeTym596Dsrw1nj2_uFqbuHYgwPBJDS0&s=0utxjEpjZ6EWkHbcRaaNJmSr6QF_yI6rObSckqL3BII&e= -r aecce2adbd64 > revlog: add a fast path for non-raw revision > > If flags are empty, it's equal to "raw=True". Since "raw=True" is less > common but "flags are empty" is very common, add the flags check. So on > cache hit, revlog.revision could go through the fast path in the most common > cases. > > This also makes some hash check unnecessary, so the test is changed a bit. Thanks for explaining this. It's a non-obvious change :-) > > diff --git a/mercurial/revlog.py b/mercurial/revlog.py > --- a/mercurial/revlog.py > +++ b/mercurial/revlog.py > @@ -1258,4 +1258,5 @@ class revlog(object): > > cachedrev = None > + flags = None Why is this initialized to None here? Why not initialize to self.flags(rev) and avoid the two None checks later? Add a comment if you keep it this way. The self.flags method looks quite cheap but maybe I'm missing something. > if node == nullid: > return "" > @@ -1264,4 +1265,9 @@ class revlog(object): > if raw: > return self._cache[2] > + if rev is None: > + rev = self.rev(node) > + flags = self.flags(rev) > + if not flags: > + return self._cache[2] I think this needs some comments and perhaps some refactoring. I'd suggest something like this: # useful comment if raw or not self.flags: This would be made possible if you eagerly initialize flags above. Upon closer inspection, maybe its self.rev() that you're trying to avoid here, but is the expense there the radix tree? In this case, it should always be a full hash so we shouldn't need a radixtree right? Can we fastpath this somehow instead? That may be out of the scope of this series but a comment to explain your choice is important. > cachedrev = self._cache[1] > > @@ -1285,6 +1291,7 @@ class revlog(object): > text = mdiff.patches(text, bins) > > - newtext, validatehash = self._processflags(text, self.flags(rev), > - 'read', raw=raw) > + if flags is None: > + flags = self.flags(rev) > + newtext, validatehash = self._processflags(text, flags, 'read', raw=raw) > if validatehash: > self.checkhash(newtext, node, rev=rev) > diff --git a/tests/test-revlog-raw.py.out b/tests/test-revlog-raw.py.out > --- a/tests/test-revlog-raw.py.out > +++ b/tests/test-revlog-raw.py.out > @@ -23,5 +23,4 @@ REV RAW DATA CACHE BEFORE -> AFTER > # 3338e2e7da35 = hash(7bc003d2bcd3, 02T35) > 2 y 02T35 1 +012T34 2 02T35 > - # 3338e2e7da35 = hash(7bc003d2bcd3, 02T35) > 2 02T35 2 02T35 2 02T35 > > @@ -44,5 +43,4 @@ REV RAW DATA CACHE BEFORE -> AFTER > # 3338e2e7da35 = hash(7bc003d2bcd3, 02T35) > 2 y 02T35 None 2 02T35 > - # 3338e2e7da35 = hash(7bc003d2bcd3, 02T35) > 2 02T35 2 02T35 2 02T35 > 1 y +012T34 2 02T35 1 +012T34 >
Excerpts from Ryan McElroy's message of 2017-03-29 13:41:21 +0100: > On 3/28/17 8:49 AM, Jun Wu wrote: > > # HG changeset patch > > # User Jun Wu <quark@fb.com> > > # Date 1490683355 25200 > > # Mon Mar 27 23:42:35 2017 -0700 > > # Node ID aecce2adbd64b25325e559798baa617e7311e85f > > # Parent a9d87712bec99abe109c155948ee4b7f1f5ec208 > > # Available At https://bitbucket.org/quark-zju/hg-draft > > # hg pull https://bitbucket.org/quark-zju/hg-draft -r aecce2adbd64 > > revlog: add a fast path for non-raw revision > > > > If flags are empty, it's equal to "raw=True". Since "raw=True" is less > > common but "flags are empty" is very common, add the flags check. So on > > cache hit, revlog.revision could go through the fast path in the most common > > cases. > > > > This also makes some hash check unnecessary, so the test is changed a bit. > > Thanks for explaining this. It's a non-obvious change :-) > > > > > diff --git a/mercurial/revlog.py b/mercurial/revlog.py > > --- a/mercurial/revlog.py > > +++ b/mercurial/revlog.py > > @@ -1258,4 +1258,5 @@ class revlog(object): > > > > cachedrev = None > > + flags = None > > Why is this initialized to None here? Why not initialize to > self.flags(rev) and avoid the two None checks later? Add a comment if Because in the cache hit + raw=True case, flags is not needed. > you keep it this way. The self.flags method looks quite cheap but maybe > I'm missing something. > > > if node == nullid: > > return "" > > @@ -1264,4 +1265,9 @@ class revlog(object): > > if raw: > > return self._cache[2] > > + if rev is None: > > + rev = self.rev(node) > > + flags = self.flags(rev) > > + if not flags: > > + return self._cache[2] > > I think this needs some comments and perhaps some refactoring. > > I'd suggest something like this: > > # useful comment > if raw or not self.flags: > > This would be made possible if you eagerly initialize flags above. > > Upon closer inspection, maybe its self.rev() that you're trying to avoid > here, but is the expense there the radix tree? In this case, it should > always be a full hash so we shouldn't need a radixtree right? Can we > fastpath this somehow instead? Radixtree is needed for that. I think the None checks are the only way to avoid unnecessary side effects. > > That may be out of the scope of this series but a comment to explain > your choice is important. It's the style in this area. I adjust my choice according to surrounding code. If you look at "cachedrev", or "rev", they remain "None" unless absolutely necessary. I did the same thing for all the code here. I think in revlog.revision, code length is a second citizen, performance is the most important. > > > cachedrev = self._cache[1] > > > > @@ -1285,6 +1291,7 @@ class revlog(object): > > text = mdiff.patches(text, bins) > > > > - newtext, validatehash = self._processflags(text, self.flags(rev), > > - 'read', raw=raw) > > + if flags is None: > > + flags = self.flags(rev) > > + newtext, validatehash = self._processflags(text, flags, 'read', raw=raw) > > if validatehash: > > self.checkhash(newtext, node, rev=rev) > > diff --git a/tests/test-revlog-raw.py.out b/tests/test-revlog-raw.py.out > > --- a/tests/test-revlog-raw.py.out > > +++ b/tests/test-revlog-raw.py.out > > @@ -23,5 +23,4 @@ REV RAW DATA CACHE BEFORE -> AFTER > > # 3338e2e7da35 = hash(7bc003d2bcd3, 02T35) > > 2 y 02T35 1 +012T34 2 02T35 > > - # 3338e2e7da35 = hash(7bc003d2bcd3, 02T35) > > 2 02T35 2 02T35 2 02T35 > > > > @@ -44,5 +43,4 @@ REV RAW DATA CACHE BEFORE -> AFTER > > # 3338e2e7da35 = hash(7bc003d2bcd3, 02T35) > > 2 y 02T35 None 2 02T35 > > - # 3338e2e7da35 = hash(7bc003d2bcd3, 02T35) > > 2 02T35 2 02T35 2 02T35 > > 1 y +012T34 2 02T35 1 +012T34 > >
Patch
diff --git a/mercurial/revlog.py b/mercurial/revlog.py --- a/mercurial/revlog.py +++ b/mercurial/revlog.py @@ -1258,4 +1258,5 @@ class revlog(object): cachedrev = None + flags = None if node == nullid: return "" @@ -1264,4 +1265,9 @@ class revlog(object): if raw: return self._cache[2] + if rev is None: + rev = self.rev(node) + flags = self.flags(rev) + if not flags: + return self._cache[2] cachedrev = self._cache[1] @@ -1285,6 +1291,7 @@ class revlog(object): text = mdiff.patches(text, bins) - newtext, validatehash = self._processflags(text, self.flags(rev), - 'read', raw=raw) + if flags is None: + flags = self.flags(rev) + newtext, validatehash = self._processflags(text, flags, 'read', raw=raw) if validatehash: self.checkhash(newtext, node, rev=rev) diff --git a/tests/test-revlog-raw.py.out b/tests/test-revlog-raw.py.out --- a/tests/test-revlog-raw.py.out +++ b/tests/test-revlog-raw.py.out @@ -23,5 +23,4 @@ REV RAW DATA CACHE BEFORE -> AFTER # 3338e2e7da35 = hash(7bc003d2bcd3, 02T35) 2 y 02T35 1 +012T34 2 02T35 - # 3338e2e7da35 = hash(7bc003d2bcd3, 02T35) 2 02T35 2 02T35 2 02T35 @@ -44,5 +43,4 @@ REV RAW DATA CACHE BEFORE -> AFTER # 3338e2e7da35 = hash(7bc003d2bcd3, 02T35) 2 y 02T35 None 2 02T35 - # 3338e2e7da35 = hash(7bc003d2bcd3, 02T35) 2 02T35 2 02T35 2 02T35 1 y +012T34 2 02T35 1 +012T34