Patchwork D5295: branchmap: define a hasbranch() to find whether a branch exists or not

login
register
mail settings
Submitter phabricator
Date Nov. 22, 2018, 11:50 a.m.
Message ID <differential-rev-PHID-DREV-bjynwklkumktjroca5sg-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/36707/
State New
Headers show

Comments

phabricator - Nov. 22, 2018, 11:50 a.m.
pulkit created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  This patch introduces a branchmap.hasbranch() function to find whether a branch
  exists or not. I was going through profile of one of most used command
  internally on server side and found that we are spending ~50% of time finding
  whether a branch exists or not. We are reading the branchmap cache, building the
  related object and then finding whether a branch exists or not which is quite
  slow on repositories which have more than 10k named branches.
  
  In the new function, we just iterate over the whole branchmap cache and find
  that whether a branch name is present or not. This is essentially
  branchmap.read() but without any hash validation, creation of the branchcache()
  object.
  
  Following were the improvements observed:
  
  Before:
  
  `hg log -b . -l 1`: 0.53 sec
  `hg log -b <last_branch_in_the_cache> -l 1`: 1.12 sec
  
  After:
  
  `hg log -b . -l 1`: 0.32 sec
  `hg log -b <last_branch_in_the_cache> -l 1`: 0.95 sec
  
  There we no improvements observed in `hg log -r <branch_name>`, if the log limit
  is greater than 1, and while using revsets.
  
  I have added a TODO here that we can use binary search instead of plain
  iteration for finding whether a branch name exists or not. I got to know that
  Martijn Pieters is working on some related stuff to load branchmap in memory as
  raw data and decided to implement binary search on top of that.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5295

AFFECTED FILES
  mercurial/branchmap.py
  mercurial/localrepo.py
  mercurial/revset.py

CHANGE DETAILS




To: pulkit, #hg-reviewers
Cc: mercurial-devel
Yuya Nishihara - Nov. 25, 2018, 4:03 a.m.
I don't think it's good idea to re-scan the cache file per hasbranch() call.
Instead, we'll probably need a lazy parser backed by a in-memory cache data.
The current cache file format is text-based, which wouldn't be easily bisected
without loading (or memmap) the whole content.

> +def hasbranch(repo, branchname):
> +    """check whether a branchname exists in the repo or not by reading the
> +    branchmap cache"""
> +
> +    if not repo.cachevfs.exists(_filename(repo)):
> +        # branchmap file is not present, let's go repo.branchmap() route which
> +        # will create that file
> +        return branchname in repo.branchmap()
> +
> +    # TODO: implement binary-search here for faster search
> +    with repo.cachevfs(_filename(repo)) as f:
> +        f = repo.cachevfs(_filename(repo))
> +        lineiter = iter(f)
> +        next(lineiter).rstrip('\n').split(" ", 2)

Need to check if the cache file is valid.

> +        for l in lineiter:
> +            l = l.rstrip('\n')
> +            if not l:
> +                continue
> +            label = l.split(" ", 2)[2]
> +            label = encoding.tolocal(label.strip())
> +            if label == branchname:

And maybe need to check if the node exists.

> +                return True
> +    return False
phabricator - Nov. 25, 2018, 4:04 a.m.
yuja added a comment.


  I don't think it's good idea to re-scan the cache file per hasbranch() call.
  Instead, we'll probably need a lazy parser backed by a in-memory cache data.
  The current cache file format is text-based, which wouldn't be easily bisected
  without loading (or memmap) the whole content.
  
  > +def hasbranch(repo, branchname):
  >  +    """check whether a branchname exists in the repo or not by reading the
  >  +    branchmap cache"""
  >  +
  >  +    if not repo.cachevfs.exists(_filename(repo)):
  >  +        # branchmap file is not present, let's go repo.branchmap() route which
  >  +        # will create that file
  >  +        return branchname in repo.branchmap()
  >  +
  >  +    # TODO: implement binary-search here for faster search
  >  +    with repo.cachevfs(_filename(repo)) as f:
  >  +        f = repo.cachevfs(_filename(repo))
  >  +        lineiter = iter(f)
  >  +        next(lineiter).rstrip('\n').split(" ", 2)
  
  Need to check if the cache file is valid.
  
  > +        for l in lineiter:
  >  +            l = l.rstrip('\n')
  >  +            if not l:
  >  +                continue
  >  +            label = l.split(" ", 2)[2]
  >  +            label = encoding.tolocal(label.strip())
  >  +            if label == branchname:
  
  And maybe need to check if the node exists.
  
  > +                return True
  >  +    return False

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5295

To: pulkit, #hg-reviewers
Cc: yuja, mercurial-devel

Patch

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -11,6 +11,7 @@ 
 
 from .i18n import _
 from . import (
+    branchmap,
     dagop,
     destutil,
     diffutil,
@@ -499,7 +500,7 @@ 
         if kind == 'literal':
             # note: falls through to the revspec case if no branch with
             # this name exists and pattern kind is not specified explicitly
-            if pattern in repo.branchmap():
+            if branchmap.hasbranch(repo, pattern):
                 return subset.filter(lambda r: matcher(getbranch(r)),
                                      condrepr=('<branch %r>', b))
             if b.startswith('literal:'):
diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -1528,7 +1528,7 @@ 
         return scmutil.revsymbol(self, key).node()
 
     def lookupbranch(self, key):
-        if key in self.branchmap():
+        if branchmap.hasbranch(self, key):
             return key
 
         return scmutil.revsymbol(self, key).branch()
diff --git a/mercurial/branchmap.py b/mercurial/branchmap.py
--- a/mercurial/branchmap.py
+++ b/mercurial/branchmap.py
@@ -37,6 +37,30 @@ 
         filename = '%s-%s' % (filename, repo.filtername)
     return filename
 
+def hasbranch(repo, branchname):
+    """check whether a branchname exists in the repo or not by reading the
+    branchmap cache"""
+
+    if not repo.cachevfs.exists(_filename(repo)):
+        # branchmap file is not present, let's go repo.branchmap() route which
+        # will create that file
+        return branchname in repo.branchmap()
+
+    # TODO: implement binary-search here for faster search
+    with repo.cachevfs(_filename(repo)) as f:
+        f = repo.cachevfs(_filename(repo))
+        lineiter = iter(f)
+        next(lineiter).rstrip('\n').split(" ", 2)
+        for l in lineiter:
+            l = l.rstrip('\n')
+            if not l:
+                continue
+            label = l.split(" ", 2)[2]
+            label = encoding.tolocal(label.strip())
+            if label == branchname:
+                return True
+    return False
+
 def read(repo):
     f = None
     try: