Patchwork [2,of,5,filtering,part,2,V2] clfilter: add actual repo filtering mechanism

login
register
mail settings
Submitter Pierre-Yves David
Date Dec. 10, 2012, 5:30 p.m.
Message ID <5a3d1c83343b403cf1e1.1355160622@crater1.logilab.fr>
Download mbox | patch
Permalink /patch/42/
State Superseded, archived
Commit 3a6ddacb7198c99be9f2c62d2bea09a8eda36758
Headers show

Comments

Pierre-Yves David - Dec. 10, 2012, 5:30 p.m.
# HG changeset patch
# User Pierre-Yves David <pierre-yves.david at ens-lyon.org>
# Date 1355157839 -3600
# Node ID 5a3d1c83343b403cf1e1393cdf2dde4f8512309b
# Parent  9280d7beadd6f38c8623ea3137f2028c57cc349c
clfilter: add actual repo filtering mechanism

We add a `filtered` method on repo. This method return instance of `repoproxy`
that behave exactly as the original repository but with a filtered changelog
attribute. Filter are identified by a "name". Planed filter are `unserved`,
`hidden` and `mutable`.  See the `repoproxier` docstring for details.

Mechanism to compute filtered revision are also installed. Some cache will be
installed in later commit.
Kevin Bullock - Dec. 10, 2012, 10:45 p.m.
On 10 Dec 2012, at 11:30 AM, pierre-yves.david at logilab.fr wrote:

> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david at ens-lyon.org>
> # Date 1355157839 -3600
> # Node ID 5a3d1c83343b403cf1e1393cdf2dde4f8512309b
> # Parent  9280d7beadd6f38c8623ea3137f2028c57cc349c
> clfilter: add actual repo filtering mechanism
> 
> We add a `filtered` method on repo. This method return instance of `repoproxy`
                                                       an instance
> that behave exactly as the original repository but with a filtered changelog
       behaves
> attribute. Filter are identified by a "name". Planed filter are `unserved`,
             Filters                            Planned filters
                                       [^^ " instead of ` suddenly?]
> `hidden` and `mutable`.  See the `repoproxier` docstring for details.
                                   `repoproxy`
> Mechanism to compute filtered revision are also installed. Some cache will be
  A mechanism                   revisions is
> installed in later commit.
> 
> diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
> --- a/mercurial/localrepo.py
> +++ b/mercurial/localrepo.py
> @@ -13,10 +13,11 @@ import scmutil, util, extensions, hook, 
> import match as matchmod
> import merge as mergemod
> import tags as tagsmod
> from lock import release
> import weakref, errno, os, time, inspect
> +import copy
> propertycache = util.propertycache
> filecache = scmutil.filecache
> 
> class repofilecache(filecache):
>     """All filecache usage on repo are done for logic that should be unfiltered
> @@ -55,10 +56,73 @@ def unfilteredmethod(orig):
>     """decorate method that always need to be run on unfiltered version"""
>     def wrapper(repo, *args, **kwargs):
>         return orig(repo.unfiltered(), *args, **kwargs)
>     return wrapper
> 
> +# function to compute filtered set
> +computefiltered = {}
> +
> +def _filteredrevs(repo, filtername):
> +    """returns set of filtered revision for this filter name"""
> +    return computefiltered[filtername](repo.unfiltered())
> +
> +class repoproxy(object):
> +    """Changelog filterered localrepo proxy object
> +
> +    This object act is a proxy for attribute operation. setattr is done on the
> +    initial repo, getattr is get from the parent (unless defined on proxy) and
> +    delattr operate on proxied repo.
> +
> +    The changelog attribute is overridden to return a copy of the original
> +    changelog but with some revision filtered.
> +
> +    You have to mix this class with the actual localrepo subclass to be able to
> +    use the very same logic than the proxied repo. See `filtered` method of
> +    local repo for details."""

This method of decorating the repo seems pretty awkward, particularly using a global dict to hold functions. If there are really going to only be a handful of filters, can't we have a handful of actual named mixin classes (derived from repoproxy)?

pacem in terris / ??? / ?????? / ????????? / ??
Kevin R. Bullock
Pierre-Yves David - Dec. 11, 2012, 11:48 p.m.
On 10 d?c. 2012, at 23:45, Kevin Bullock wrote:

> On 10 Dec 2012, at 11:30 AM, pierre-yves.david at logilab.fr wrote:
> 
>> # HG changeset patch
>> # User Pierre-Yves David <pierre-yves.david at ens-lyon.org>
>> # Date 1355157839 -3600
>> # Node ID 5a3d1c83343b403cf1e1393cdf2dde4f8512309b
>> # Parent  9280d7beadd6f38c8623ea3137f2028c57cc349c
>> clfilter: add actual repo filtering mechanism
>> 
>> We add a `filtered` method on repo. This method return instance of `repoproxy`
>                                                       an instance
>> that behave exactly as the original repository but with a filtered changelog
>       behaves
>> attribute. Filter are identified by a "name". Planed filter are `unserved`,
>             Filters                            Planned filters
>                                       [^^ " instead of ` suddenly?]
>> `hidden` and `mutable`.  See the `repoproxier` docstring for details.
>                                   `repoproxy`
>> Mechanism to compute filtered revision are also installed. Some cache will be
>  A mechanism                   revisions is
>> installed in later commit.
>> 
>> diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
>> --- a/mercurial/localrepo.py
>> +++ b/mercurial/localrepo.py
>> @@ -13,10 +13,11 @@ import scmutil, util, extensions, hook, 
>> import match as matchmod
>> import merge as mergemod
>> import tags as tagsmod
>> from lock import release
>> import weakref, errno, os, time, inspect
>> +import copy
>> propertycache = util.propertycache
>> filecache = scmutil.filecache
>> 
>> class repofilecache(filecache):
>>    """All filecache usage on repo are done for logic that should be unfiltered
>> @@ -55,10 +56,73 @@ def unfilteredmethod(orig):
>>    """decorate method that always need to be run on unfiltered version"""
>>    def wrapper(repo, *args, **kwargs):
>>        return orig(repo.unfiltered(), *args, **kwargs)
>>    return wrapper
>> 
>> +# function to compute filtered set
>> +computefiltered = {}
>> +
>> +def _filteredrevs(repo, filtername):
>> +    """returns set of filtered revision for this filter name"""
>> +    return computefiltered[filtername](repo.unfiltered())
>> +
>> +class repoproxy(object):
>> +    """Changelog filterered localrepo proxy object
>> +
>> +    This object act is a proxy for attribute operation. setattr is done on the
>> +    initial repo, getattr is get from the parent (unless defined on proxy) and
>> +    delattr operate on proxied repo.
>> +
>> +    The changelog attribute is overridden to return a copy of the original
>> +    changelog but with some revision filtered.
>> +
>> +    You have to mix this class with the actual localrepo subclass to be able to
>> +    use the very same logic than the proxied repo. See `filtered` method of
>> +    local repo for details."""
> 
> This method of decorating the repo seems pretty awkward, particularly using a global dict to hold functions. If there are really going to only be a handful of filters, can't we have a handful of actual named mixin classes (derived from repoproxy)?

After discussion on IRC with Kevin the current proposition:

- The `computefiltered` dictionary will be moved in the repoproxy class
- The filtering method will be added to the dictionnary by decoration as in mercurial/obsolete.py

I'll think about it and send a followup soon.
Pierre-Yves David - Dec. 14, 2012, 2:22 a.m.
On 12 d?c. 2012, at 00:48, Pierre-Yves David wrote:

> 
> On 10 d?c. 2012, at 23:45, Kevin Bullock wrote:
> 
>> On 10 Dec 2012, at 11:30 AM, pierre-yves.david at logilab.fr wrote:
>> 
>>> # HG changeset patch
>>> # User Pierre-Yves David <pierre-yves.david at ens-lyon.org>
>>> # Date 1355157839 -3600
>>> # Node ID 5a3d1c83343b403cf1e1393cdf2dde4f8512309b
>>> # Parent  9280d7beadd6f38c8623ea3137f2028c57cc349c
>>> clfilter: add actual repo filtering mechanism
>>> 
>>> We add a `filtered` method on repo. This method return instance of `repoproxy`
>>                                                      an instance
>>> that behave exactly as the original repository but with a filtered changelog
>>      behaves
>>> attribute. Filter are identified by a "name". Planed filter are `unserved`,
>>            Filters                            Planned filters
>>                                      [^^ " instead of ` suddenly?]
>>> `hidden` and `mutable`.  See the `repoproxier` docstring for details.
>>                                  `repoproxy`
>>> Mechanism to compute filtered revision are also installed. Some cache will be
>> A mechanism                   revisions is
>>> installed in later commit.
>>> 
>>> diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
>>> --- a/mercurial/localrepo.py
>>> +++ b/mercurial/localrepo.py
>>> @@ -13,10 +13,11 @@ import scmutil, util, extensions, hook, 
>>> import match as matchmod
>>> import merge as mergemod
>>> import tags as tagsmod
>>> from lock import release
>>> import weakref, errno, os, time, inspect
>>> +import copy
>>> propertycache = util.propertycache
>>> filecache = scmutil.filecache
>>> 
>>> class repofilecache(filecache):
>>>   """All filecache usage on repo are done for logic that should be unfiltered
>>> @@ -55,10 +56,73 @@ def unfilteredmethod(orig):
>>>   """decorate method that always need to be run on unfiltered version"""
>>>   def wrapper(repo, *args, **kwargs):
>>>       return orig(repo.unfiltered(), *args, **kwargs)
>>>   return wrapper
>>> 
>>> +# function to compute filtered set
>>> +computefiltered = {}
>>> +
>>> +def _filteredrevs(repo, filtername):
>>> +    """returns set of filtered revision for this filter name"""
>>> +    return computefiltered[filtername](repo.unfiltered())
>>> +
>>> +class repoproxy(object):
>>> +    """Changelog filterered localrepo proxy object
>>> +
>>> +    This object act is a proxy for attribute operation. setattr is done on the
>>> +    initial repo, getattr is get from the parent (unless defined on proxy) and
>>> +    delattr operate on proxied repo.
>>> +
>>> +    The changelog attribute is overridden to return a copy of the original
>>> +    changelog but with some revision filtered.
>>> +
>>> +    You have to mix this class with the actual localrepo subclass to be able to
>>> +    use the very same logic than the proxied repo. See `filtered` method of
>>> +    local repo for details."""
>> 
>> This method of decorating the repo seems pretty awkward, particularly using a global dict to hold functions. If there are really going to only be a handful of filters, can't we have a handful of actual named mixin classes (derived from repoproxy)?
> 
> After discussion on IRC with Kevin the current proposition:
> 
> - The `computefiltered` dictionary will be moved in the repoproxy class
> - The filtering method will be added to the dictionnary by decoration as in mercurial/obsolete.py
> 
> I'll think about it and send a followup soon.

After long discussion with Kevin Bullock:

- The code is structure is unchanged,
- The class is now called `repoview` to better highlight the fact it manipulate the underlying repo almost directly.
- The new documentation is:

class repoview(object):
    """Provide a read/write view of a repo through a filtered changelog

    This object is used to access a filtered version of a repository without
    altering the original repository object itself. We can not alter the
    original object for two main reasons:
    - It prevents the use of a repo with multiple filters at the same time. In
      particular when multiple threads are involved.
    - It makes scope of the filtering harder to control.

    This object behaves very closely to the original repository. All attribute
    operations are done on the original repository:
    - An access to `repoview.someattr` actually returns `repo.someattr`,
    - A write to `repoview.someattr` actually sets value of `repo.someattr`,
    - A deletion of `repoview.someattr` actually drops `someattr`
      from `repo.__dict__`.

    The only exception is the `changelog` property. It is overridden to return
    a (surface) copy of `repo.changelog` with some revisions filtered. The
    `filtername` attribute of the view control the revisions that need to be
    filtered.  (the fact the changelog is copied is an implementation detail).

    Unlike attributes, this object intercepts all method calls. This means that
    all methods are run on the `repoview` object with the filtered `changelog`
    property. For this purpose the simple `repoview` class must be mixed with
    the actual class of the repository. This ensures that the resulting
    `repoview` object have the very same methods than the repo object. This
    leads to the property below.

        repoview.method() --> repo.__class__.method(repoview)

    The inheritance has to be done dynamically because `repo` can be of any
    subclasses of `localrepo`. Eg: `bundlerepo` or `httprepo`.
    """
Kevin Bullock - Dec. 14, 2012, 5:02 a.m.
On 13 Dec 2012, at 8:22 PM, Pierre-Yves David wrote:

> After long discussion with Kevin Bullock:
> 
> - The code is structure is unchanged,
> - The class is now called `repoview` to better highlight the fact it manipulate the underlying repo almost directly.
> - The new documentation is:
> 
> class repoview(object):
>    """Provide a read/write view of a repo through a filtered changelog
> 
>    This object is used to access a filtered version of a repository without
>    altering the original repository object itself. We can not alter the
>    original object for two main reasons:
>    - It prevents the use of a repo with multiple filters at the same time. In
>      particular when multiple threads are involved.
>    - It makes scope of the filtering harder to control.
> 
>    This object behaves very closely to the original repository. All attribute
>    operations are done on the original repository:
>    - An access to `repoview.someattr` actually returns `repo.someattr`,
>    - A write to `repoview.someattr` actually sets value of `repo.someattr`,
>    - A deletion of `repoview.someattr` actually drops `someattr`
>      from `repo.__dict__`.
> 
>    The only exception is the `changelog` property. It is overridden to return
>    a (surface) copy of `repo.changelog` with some revisions filtered. The
>    `filtername` attribute of the view control the revisions that need to be
>    filtered.  (the fact the changelog is copied is an implementation detail).
> 
>    Unlike attributes, this object intercepts all method calls. This means that
>    all methods are run on the `repoview` object with the filtered `changelog`
>    property. For this purpose the simple `repoview` class must be mixed with
>    the actual class of the repository. This ensures that the resulting
>    `repoview` object have the very same methods than the repo object. This
>    leads to the property below.
> 
>        repoview.method() --> repo.__class__.method(repoview)
> 
>    The inheritance has to be done dynamically because `repo` can be of any
>    subclasses of `localrepo`. Eg: `bundlerepo` or `httprepo`.
>    """

The fact that such machinations are necessary, though, leads me to believe we need to take a hard look at localrepo and its interactions with the changelog.

I also recommended that the repoview class and all the new filtering functions to be added be put into a new module, particularly since we're also introducing a new module-level dispatch table.

pacem in terris / ??? / ?????? / ????????? / ??
Kevin R. Bullock
Idan Kamara - Dec. 14, 2012, 6:43 p.m.
On Mon, Dec 10, 2012 at 7:30 PM,  <pierre-yves.david at logilab.fr> wrote:
> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david at ens-lyon.org>
> # Date 1355157839 -3600
> # Node ID 5a3d1c83343b403cf1e1393cdf2dde4f8512309b
> # Parent  9280d7beadd6f38c8623ea3137f2028c57cc349c
> clfilter: add actual repo filtering mechanism

How do you see other parts of the code hooking into this? For
example how would another feature such as stash add its commits
to the 'unserved' filter?

What about extensions?
Pierre-Yves David - Dec. 16, 2012, 12:19 a.m.
On 14 d?c. 2012, at 19:43, Idan Kamara wrote:

> On Mon, Dec 10, 2012 at 7:30 PM,  <pierre-yves.david at logilab.fr> wrote:
>> # HG changeset patch
>> # User Pierre-Yves David <pierre-yves.david at ens-lyon.org>
>> # Date 1355157839 -3600
>> # Node ID 5a3d1c83343b403cf1e1393cdf2dde4f8512309b
>> # Parent  9280d7beadd6f38c8623ea3137f2028c57cc349c
>> clfilter: add actual repo filtering mechanism
> 
> How do you see other parts of the code hooking into this? For
> example how would another feature such as stash add its commits
> to the 'unserved' filter?

Stash needs to extends the definition of "hidden" changeset. This means making a distinction between "hide-able" and "obsolete".

Then stashed changed will be properly hidden to the user eyes. Stash code will internally use unfiltered repo to operates its logic.

A distinction between "hidden" and "disposable" will also be necessary.

Regarding filtering mode themself, extension should be able to add they own filtering mode in by adding function to the `computefilter` mapping.
Idan Kamara - Dec. 16, 2012, 8:45 a.m.
On Sun, Dec 16, 2012 at 2:19 AM, Pierre-Yves David
<pierre-yves.david at ens-lyon.org> wrote:
>
>
> On 14 d?c. 2012, at 19:43, Idan Kamara wrote:
>
> > On Mon, Dec 10, 2012 at 7:30 PM,  <pierre-yves.david at logilab.fr> wrote:
> >> # HG changeset patch
> >> # User Pierre-Yves David <pierre-yves.david at ens-lyon.org>
> >> # Date 1355157839 -3600
> >> # Node ID 5a3d1c83343b403cf1e1393cdf2dde4f8512309b
> >> # Parent  9280d7beadd6f38c8623ea3137f2028c57cc349c
> >> clfilter: add actual repo filtering mechanism
> >
> > How do you see other parts of the code hooking into this? For
> > example how would another feature such as stash add its commits
> > to the 'unserved' filter?
>
> Stash needs to extends the definition of "hidden" changeset. This means
> making a distinction between "hide-able" and "obsolete".

Ok, so basically every feature in core that wants to hide something
from the changelog (in a non-obsolete way), needs to add some logic to
localrepo.hiddenrevs?

>
> Then stashed changed will be properly hidden to the user eyes. Stash code
> will internally use unfiltered repo to operates its logic.
>
> A distinction between "hidden" and "disposable" will also be necessary.
>
> Regarding filtering mode themself, extension should be able to add they
> own filtering mode in by adding function to the `computefilter` mapping.

So they have to override the 'unserved' filter with their own
(probably calling the base one before adding).

Since other parts of the code refer to filters by name, adding completely
new ones seems to be useless.

Patch

diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -13,10 +13,11 @@  import scmutil, util, extensions, hook, 
 import match as matchmod
 import merge as mergemod
 import tags as tagsmod
 from lock import release
 import weakref, errno, os, time, inspect
+import copy
 propertycache = util.propertycache
 filecache = scmutil.filecache
 
 class repofilecache(filecache):
     """All filecache usage on repo are done for logic that should be unfiltered
@@ -55,10 +56,73 @@  def unfilteredmethod(orig):
     """decorate method that always need to be run on unfiltered version"""
     def wrapper(repo, *args, **kwargs):
         return orig(repo.unfiltered(), *args, **kwargs)
     return wrapper
 
+# function to compute filtered set
+computefiltered = {}
+
+def _filteredrevs(repo, filtername):
+    """returns set of filtered revision for this filter name"""
+    return computefiltered[filtername](repo.unfiltered())
+
+class repoproxy(object):
+    """Changelog filterered localrepo proxy object
+
+    This object act is a proxy for attribute operation. setattr is done on the
+    initial repo, getattr is get from the parent (unless defined on proxy) and
+    delattr operate on proxied repo.
+
+    The changelog attribute is overridden to return a copy of the original
+    changelog but with some revision filtered.
+
+    You have to mix this class with the actual localrepo subclass to be able to
+    use the very same logic than the proxied repo. See `filtered` method of
+    local repo for details."""
+
+    def __init__(self, repo, filtername):
+        object.__setattr__(self, '_unfilteredrepo', repo)
+        object.__setattr__(self, 'filtername', filtername)
+
+    # not a cacheproperty on purpose we shall implement a proper cache later
+    @property
+    def changelog(self):
+        """return a filtered version of the changeset
+
+        this changelog must not be used for writing"""
+        # some cache may be implemented later
+        cl = copy.copy(self._unfilteredrepo.changelog)
+        cl.filteredrevs = _filteredrevs(self._unfilteredrepo, self.filtername)
+        return cl
+
+    def unfiltered(self):
+        """Return an unfiltered version of a repo"""
+        return self._unfilteredrepo
+
+    def filtered(self, name):
+        """Return a filtered version of a repository"""
+        if name == self.filtername:
+            return self
+        return self.unfiltered().filtered(name)
+
+    # everything access are forwarded to the proxied repo
+    def __getattr__(self, attr):
+        return getattr(self._unfilteredrepo, attr)
+
+    def __setattr__(self, attr, value):
+        return setattr(self._unfilteredrepo, attr, value)
+
+    def __delattr__(self, attr):
+        return delattr(self._unfilteredrepo, attr)
+
+    # The `requirement` attribut is initialiazed during __init__. But
+    # __getattr__ won't be called as it also exists on the class. We need
+    # explicit forwarding to main repo here
+    @property
+    def requirements(self):
+        return self._unfilteredrepo.requirements
+
 MODERNCAPS = set(('lookup', 'branchmap', 'pushkey', 'known', 'getbundle'))
 LEGACYCAPS = MODERNCAPS.union(set(['changegroupsubset']))
 
 class localpeer(peer.peerrepository):
     '''peer for a local repo; reflects only the most recent API'''
@@ -301,10 +365,18 @@  class localrepository(object):
         """Return unfiltered version of the repository
 
         Intended to be ovewritten by filtered repo."""
         return self
 
+    def filtered(self, name):
+        """Return a filtered version of a repository"""
+        # build a new class with the mixin and the current class
+        # (possibily subclass of the repo)
+        class proxycls(repoproxy, self.unfiltered().__class__):
+            pass
+        return proxycls(self, name)
+
     @repofilecache('bookmarks')
     def _bookmarks(self):
         return bookmarks.bmstore(self)
 
     @repofilecache('bookmarks.current')