Patchwork [1,of,5] util: add an LRU cache dict

login
register
mail settings
Submitter Siddharth Agarwal
Date Feb. 8, 2013, 9:13 p.m.
Message ID <642bf4f92100a41182de.1360358003@sid0x220>
Download mbox | patch
Permalink /patch/834/
State Superseded, archived
Headers show

Comments

Siddharth Agarwal - Feb. 8, 2013, 9:13 p.m.
# HG changeset patch
# User Siddharth Agarwal <sid0@fb.com>
# Date 1360351263 0
# Node ID 642bf4f92100a41182de15646e37db70b391bfeb
# Parent  08e00496e7b3bda8db3fbe7084a013d77e4932d2
util: add an LRU cache dict

This is the bare minimum dictionary implementation needed for an upcoming
patch.
Idan Kamara - Feb. 8, 2013, 9:38 p.m.
On Fri, Feb 8, 2013 at 11:13 PM, Siddharth Agarwal <sid0@fb.com> wrote:
>
> # HG changeset patch
> # User Siddharth Agarwal <sid0@fb.com>
> # Date 1360351263 0
> # Node ID 642bf4f92100a41182de15646e37db70b391bfeb
> # Parent  08e00496e7b3bda8db3fbe7084a013d77e4932d2
> util: add an LRU cache dict
>
> This is the bare minimum dictionary implementation needed for an upcoming
> patch.
>
> diff --git a/mercurial/util.py b/mercurial/util.py
> --- a/mercurial/util.py
> +++ b/mercurial/util.py
> @@ -211,6 +211,30 @@ except AttributeError:
>                      del self[i]
>                      break
>
> +class lrucachedict(object):
> +    '''cache most recent get or sets to this dictionary'''
> +    def __init__(self, size):
> +        self._cache = {}
> +        self._size = {}
> +        self._order = deque()

Looks like you forgot to assign to _size here, so your
LRU isn't really bounded :)

>>> 0 > {}
False
Siddharth Agarwal - Feb. 8, 2013, 9:40 p.m.
On 02/08/2013 09:38 PM, Idan Kamara wrote:
> Looks like you forgot to assign to _size here, so your
> LRU isn't really bounded :)

Ouch!
Greg Ward - Feb. 8, 2013, 10:38 p.m.
On 08 February 2013, Siddharth Agarwal said:
> # HG changeset patch
> # User Siddharth Agarwal <sid0@fb.com>
> # Date 1360351263 0
> # Node ID 642bf4f92100a41182de15646e37db70b391bfeb
> # Parent  08e00496e7b3bda8db3fbe7084a013d77e4932d2
> util: add an LRU cache dict
> 
> This is the bare minimum dictionary implementation needed for an upcoming
> patch.

Tests? This should be doc-test-able.

> diff --git a/mercurial/util.py b/mercurial/util.py
> --- a/mercurial/util.py
> +++ b/mercurial/util.py
> @@ -211,6 +211,30 @@ except AttributeError:
>                      del self[i]
>                      break
>  
> +class lrucachedict(object):
> +    '''cache most recent get or sets to this dictionary'''
> +    def __init__(self, size):
> +        self._cache = {}
> +        self._size = {}

It's a dict...

> +        self._order = deque()
> +
> +    def __getitem__(self, key):
> +        self._order.remove(key)
> +        self._order.append(key)
> +        return self._cache[key]
> +
> +    def __setitem__(self, key, value):
> +        if key not in self._cache:
> +            if len(self._cache) > self._size:

...but you're comparing it to an int???

Also, wouldn't _maxsize be a better name?

Is dequeue the right data structure here? I recently implemented an
LRU cache using heapq to record the "fetch time" of each entry.
__getitem__() incremented the age of the whole cache, counted in
fetches, and set the fetch time of the fetched key to 0. So with every
fetch, every other key becomes a bit older. I *think* that adds O(1)
overhead to __getitem__().  

Then __setitem__() can lookup the oldest key in O(lg N) time when the
cache is full.

How does that compare to dequeue? What about memory use?

       Greg
Siddharth Agarwal - Feb. 8, 2013, 10:49 p.m.
On 02/08/2013 10:38 PM, Greg Ward wrote:
> It's a dict...

Yeah -- I fixed that in V2.

> Also, wouldn't _maxsize be a better name?

Maybe.

> Is dequeue the right data structure here? I recently implemented an
> LRU cache using heapq to record the "fetch time" of each entry.
> __getitem__() incremented the age of the whole cache, counted in
> fetches, and set the fetch time of the fetched key to 0. So with every
> fetch, every other key becomes a bit older. I *think* that adds O(1)
> overhead to __getitem__().
>
> Then __setitem__() can lookup the oldest key in O(lg N) time when the
> cache is full.
>
> How does that compare to dequeue? What about memory use?

I don't think it matters -- it's meant for small caches just like the 
lrucachefunc right below, and is in fact a straight no-brainer copy of 
that algorithm.

I guess I could add tests to it, but lrucachefunc isn't unit tested 
either...

Patch

diff --git a/mercurial/util.py b/mercurial/util.py
--- a/mercurial/util.py
+++ b/mercurial/util.py
@@ -211,6 +211,30 @@  except AttributeError:
                     del self[i]
                     break
 
+class lrucachedict(object):
+    '''cache most recent get or sets to this dictionary'''
+    def __init__(self, size):
+        self._cache = {}
+        self._size = {}
+        self._order = deque()
+
+    def __getitem__(self, key):
+        self._order.remove(key)
+        self._order.append(key)
+        return self._cache[key]
+
+    def __setitem__(self, key, value):
+        if key not in self._cache:
+            if len(self._cache) > self._size:
+                del self._cache[self._order.popleft()]
+            self._cache[key] = value
+        else:
+            self._order.remove(key)
+        self._order.append(key)
+
+    def __contains__(self, key):
+        return key in self._cache
+
 def lrucachefunc(func):
     '''cache most recent results of function calls'''
     cache = {}