Patchwork [evolve-ext] directaccess: make sure gethashsymbols does not return rev numbers

login
register
mail settings
Submitter Jun Wu
Date June 22, 2016, 6:01 p.m.
Message ID <a697745ebcc8afe66361.1466618513@x1c>
Download mbox | patch
Permalink /patch/15572/
State Superseded
Headers show

Comments

Jun Wu - June 22, 2016, 6:01 p.m.
# HG changeset patch
# User Jun Wu <quark@fb.com>
# Date 1466616137 -3600
#      Wed Jun 22 18:22:17 2016 +0100
# Node ID a697745ebcc8afe66361511ffd12a1414e3d3d3d
# Parent  00612a019547c0de2b6a88de74c4e097b99d36d8
directaccess: make sure gethashsymbols does not return rev numbers

With "hg log -r 1 -r 2 -r 3", gethashsymbols will get the following tree:

  ('func', ('symbol', '_list'), ('string', '1\x002\x003'))

Before this patch, it will return ['1', '2', '3'], which are revision
numbers and causes unnecessary (expensive) partialmatch lookups.

This patch refacts gethashsymbols a bit, making sure the returning values
are always filtered.
Laurent Charignon - June 22, 2016, 8:55 p.m.
Thanks for doing that Jun, besides making sure that gethashsymbols does
not return rev number, it also simplifies the code and make it easier to
read.
I have one comment inline but it should be good to go regardless of a
possible fix.

On 6/22/16, 11:01 AM, "Jun Wu" <quark@fb.com> wrote:

># HG changeset patch
># User Jun Wu <quark@fb.com>
># Date 1466616137 -3600
>#      Wed Jun 22 18:22:17 2016 +0100
># Node ID a697745ebcc8afe66361511ffd12a1414e3d3d3d
># Parent  00612a019547c0de2b6a88de74c4e097b99d36d8
>directaccess: make sure gethashsymbols does not return rev numbers
>
>With "hg log -r 1 -r 2 -r 3", gethashsymbols will get the following tree:
>
>  ('func', ('symbol', '_list'), ('string', '1\x002\x003'))
>
>Before this patch, it will return ['1', '2', '3'], which are revision
>numbers and causes unnecessary (expensive) partialmatch lookups.
>
>This patch refacts gethashsymbols a bit, making sure the returning values
>are always filtered.
>
>diff --git a/hgext/directaccess.py b/hgext/directaccess.py
>--- a/hgext/directaccess.py
>+++ b/hgext/directaccess.py
>@@ -132,38 +132,33 @@ hashre = util.re.compile('[0-9a-fA-F]{1,
> 
> _listtuple = ('symbol', '_list')
> 
>+def _ishashsymbol(symbol, maxrev):
>+    # Returns true if symbol looks like a hash
>+    try:
>+        n = int(symbol)
>+        if n <= maxrev:
>+            # It's a rev number
>+            return False
>+    except ValueError:
>+        pass
>+    return hashre.match(symbol)
>+
> def gethashsymbols(tree, maxrev):
>     # Returns the list of symbols of the tree that look like hashes
>     # for example for the revset 3::abe3ff it will return ('abe3ff')
>     if not tree:
>         return []
> 
>+    results = []
>     if len(tree) == 2 and tree[0] == "symbol":
>-        try:
>-            n = int(tree[1])
>-            # This isn't necessarily a rev number, could be a hash prefix
>-            if n > maxrev:
>-                return [tree[1]]
>-            else:
>-                return []
>-        except ValueError as e:
>-            if hashre.match(tree[1]):
>-                return [tree[1]]
>-            return []
>+        results.append(tree[1])
>     elif tree[0] == "func" and tree[1] == _listtuple:
>         # the optimiser will group sequence of hash request
>-        result = []
>-        for entry in tree[2][1].split('\0'):
>-            if hashre.match(entry):
>-                result.append(entry)
>-        return result
>+        results += tree[2][1].split('\0')
>     elif len(tree) >= 3:
>-        results = []
>         for subtree in tree[1:]:
>             results += gethashsymbols(subtree, maxrev)
>-        return results
>-    else:
>-        return []
>+    return [s for s in results if _ishashsymbol(s, maxrev)]

We are doing the _ishashsymbol check multiple time per symbol (if
len(tree>=3) because we know that gethashsymbols returned hashes symbols
already.
It is just duplicated work but it probably won't have any performance
implication. What do you think?

> 
> def _posttreebuilthook(orig, tree, repo):
>     # This is use to enabled direct hash access
>diff --git a/tests/test-inhibit.t b/tests/test-inhibit.t
>--- a/tests/test-inhibit.t
>+++ b/tests/test-inhibit.t
>@@ -375,6 +375,21 @@ Test directaccess in a larger revset
>   cf5c4f4554ce
>   2db36d8066ff
> 
>+Test directaccess only takes hashes
>+
>+  $ HOOKPATH=$TESTTMP/printexplicitaccess.py
>+  $ cat >> $HOOKPATH <<EOF
>+  > def hook(ui, repo, **kwds):
>+  >     for i in sorted(repo._explicitaccess):
>+  >         ui.write('directaccess: %s\n' % i)
>+  > EOF
>+
>+  $ hg log -r 1 -r 2 -r 2db36d8066f -T '{rev}\n' --config
>hooks.post-log=python:$HOOKPATH:hook
>+  1
>+  2
>+  3
>+  directaccess: 3
>+
> With severals hidden sha, rebase of one hidden stack onto another one:
>   $ hg update -C 0
>   0 files updated, 0 files merged, 4 files removed, 0 files unresolved
Jun Wu - June 22, 2016, 9:08 p.m.
Excerpts from Laurent Charignon's message of 2016-06-22 21:55:13 +0100:
> We are doing the _ishashsymbol check multiple time per symbol (if
> len(tree>=3) because we know that gethashsymbols returned hashes symbols
> already.
> It is just duplicated work but it probably won't have any performance
> implication. What do you think?

Nice catch! V2 is coming.

Patch

diff --git a/hgext/directaccess.py b/hgext/directaccess.py
--- a/hgext/directaccess.py
+++ b/hgext/directaccess.py
@@ -132,38 +132,33 @@  hashre = util.re.compile('[0-9a-fA-F]{1,
 
 _listtuple = ('symbol', '_list')
 
+def _ishashsymbol(symbol, maxrev):
+    # Returns true if symbol looks like a hash
+    try:
+        n = int(symbol)
+        if n <= maxrev:
+            # It's a rev number
+            return False
+    except ValueError:
+        pass
+    return hashre.match(symbol)
+
 def gethashsymbols(tree, maxrev):
     # Returns the list of symbols of the tree that look like hashes
     # for example for the revset 3::abe3ff it will return ('abe3ff')
     if not tree:
         return []
 
+    results = []
     if len(tree) == 2 and tree[0] == "symbol":
-        try:
-            n = int(tree[1])
-            # This isn't necessarily a rev number, could be a hash prefix
-            if n > maxrev:
-                return [tree[1]]
-            else:
-                return []
-        except ValueError as e:
-            if hashre.match(tree[1]):
-                return [tree[1]]
-            return []
+        results.append(tree[1])
     elif tree[0] == "func" and tree[1] == _listtuple:
         # the optimiser will group sequence of hash request
-        result = []
-        for entry in tree[2][1].split('\0'):
-            if hashre.match(entry):
-                result.append(entry)
-        return result
+        results += tree[2][1].split('\0')
     elif len(tree) >= 3:
-        results = []
         for subtree in tree[1:]:
             results += gethashsymbols(subtree, maxrev)
-        return results
-    else:
-        return []
+    return [s for s in results if _ishashsymbol(s, maxrev)]
 
 def _posttreebuilthook(orig, tree, repo):
     # This is use to enabled direct hash access
diff --git a/tests/test-inhibit.t b/tests/test-inhibit.t
--- a/tests/test-inhibit.t
+++ b/tests/test-inhibit.t
@@ -375,6 +375,21 @@  Test directaccess in a larger revset
   cf5c4f4554ce
   2db36d8066ff
 
+Test directaccess only takes hashes
+
+  $ HOOKPATH=$TESTTMP/printexplicitaccess.py
+  $ cat >> $HOOKPATH <<EOF
+  > def hook(ui, repo, **kwds):
+  >     for i in sorted(repo._explicitaccess):
+  >         ui.write('directaccess: %s\n' % i)
+  > EOF
+
+  $ hg log -r 1 -r 2 -r 2db36d8066f -T '{rev}\n' --config hooks.post-log=python:$HOOKPATH:hook
+  1
+  2
+  3
+  directaccess: 3
+
 With severals hidden sha, rebase of one hidden stack onto another one:
   $ hg update -C 0
   0 files updated, 0 files merged, 4 files removed, 0 files unresolved