Patchwork [1,of,3] dirstate: add a method to filter by not included

login
register
mail settings
Submitter Siddharth Agarwal
Date Feb. 1, 2013, 10:41 p.m.
Message ID <9139388e66b4c6295c85.1359758484@sid0x220>
Download mbox | patch
Permalink /patch/792/
State Changes Requested
Headers show

Comments

Siddharth Agarwal - Feb. 1, 2013, 10:41 p.m.
# HG changeset patch
# User Siddharth Agarwal <sid0@fb.com>
# Date 1359672835 28800
# Branch stable
# Node ID 9139388e66b4c6295c85989f27357c280a40025d
# Parent  d4f93b62e7af1b8fe71de99e0483d0a75b2b6746
dirstate: add a method to filter by not included

This is the bulk what merge._checkunknown does, except much faster. That
function will be made to use this in an upcoming patch.
Matt Mackall - Feb. 1, 2013, 11:26 p.m.
On Fri, 2013-02-01 at 14:41 -0800, Siddharth Agarwal wrote:
> # HG changeset patch
> # User Siddharth Agarwal <sid0@fb.com>
> # Date 1359672835 28800
> # Branch stable
> # Node ID 9139388e66b4c6295c85989f27357c280a40025d
> # Parent  d4f93b62e7af1b8fe71de99e0483d0a75b2b6746
> dirstate: add a method to filter by not included
> 
> This is the bulk what merge._checkunknown does, except much faster. That
> function will be made to use this in an upcoming patch.
> 
> diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
> --- a/mercurial/dirstate.py
> +++ b/mercurial/dirstate.py
> @@ -551,6 +551,17 @@ class dirstate(object):
>                  return True
>          return False
>  
> +    def notincluded(self, files):
> +        """Return all the entries in files not included in the dirstate.

I guess you're putting this in dirstate because it wants direct access
to _map for speed. This is a bit unfortunate: it's unlikely to ever have
other users and has a not-terribly-precise name.

> +        "Included" here means n, m or a. This is here for perf reasons."""
> +        dmap = self._map
> +        l = []
> +        for f in files:
> +            if f not in dmap or dmap[f] == 'r':
> +                l.append(f)
> +        return l

I think you'll find this is noticeably faster as a list comprehension.

$ python -m timeit -s 'l = []' 'for f in xrange(10000): l.append(f)'
1000 loops, best of 3: 624 usec per loop
$ python -m timeit -s 'l = []; la = l.append' 'for f in xrange(10000):
la(f)'
1000 loops, best of 3: 374 usec per loop
$ python -m timeit -s 'l = []' '[x for x in xrange(10000)]'
1000 loops, best of 3: 250 usec per loop
Siddharth Agarwal - Feb. 1, 2013, 11:49 p.m.
On 02/01/2013 03:26 PM, Matt Mackall wrote:
> On Fri, 2013-02-01 at 14:41 -0800, Siddharth Agarwal wrote:
>> # HG changeset patch
>> # User Siddharth Agarwal <sid0@fb.com>
>> # Date 1359672835 28800
>> # Branch stable
>> # Node ID 9139388e66b4c6295c85989f27357c280a40025d
>> # Parent  d4f93b62e7af1b8fe71de99e0483d0a75b2b6746
>> dirstate: add a method to filter by not included
>>
>> This is the bulk what merge._checkunknown does, except much faster. That
>> function will be made to use this in an upcoming patch.
>>
>> diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
>> --- a/mercurial/dirstate.py
>> +++ b/mercurial/dirstate.py
>> @@ -551,6 +551,17 @@ class dirstate(object):
>>                   return True
>>           return False
>>   
>> +    def notincluded(self, files):
>> +        """Return all the entries in files not included in the dirstate.
> I guess you're putting this in dirstate because it wants direct access
> to _map for speed. This is a bit unfortunate: it's unlikely to ever have
> other users and has a not-terribly-precise name.

Yep. Banging on the map directly is significantly (around 0.2 seconds) 
faster. Such is the cost of abstraction. :(

I'm open to other suggestions for the name though.

Patch

diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
--- a/mercurial/dirstate.py
+++ b/mercurial/dirstate.py
@@ -551,6 +551,17 @@  class dirstate(object):
                 return True
         return False
 
+    def notincluded(self, files):
+        """Return all the entries in files not included in the dirstate.
+
+        "Included" here means n, m or a. This is here for perf reasons."""
+        dmap = self._map
+        l = []
+        for f in files:
+            if f not in dmap or dmap[f] == 'r':
+                l.append(f)
+        return l
+
     def walk(self, match, subrepos, unknown, ignored):
         '''
         Walk recursively through the directory tree, finding all files