Submitter | Gregory Szorc |
---|---|
Date | Nov. 22, 2015, 1:14 a.m. |
Message ID | <888c2171adffa8340406.1448154841@ubuntu-main> |
Download | mbox | patch |
Permalink | /patch/11568/ |
State | Deferred |
Delegated to: | Pierre-Yves David |
Headers | show |
Comments
On 11/21/2015 05:14 PM, Gregory Szorc wrote: > # HG changeset patch > # User Gregory Szorc <gregory.szorc@gmail.com> > # Date 1448154001 28800 > # Sat Nov 21 17:00:01 2015 -0800 > # Node ID 888c2171adffa8340406b50aae02375f7bef50f4 > # Parent f559e16232491e44fc4dac6c12872cb2e9164d27 > localrepo: ability to obtain a read-only snapshot of a repo > > As part of performing some performance profiling, I noticed that a lot > of time can be spent resolving repoview.changelog (the changelog must > be resolved when performing repo[x] for example). The reason is that > the changelog lookup function needs to validate that the cached set > of revisions exposed to the view is still accurate. While there are > comments in repoview.py hinting at future optimizations to this > code, I think the ideal solution is to avoid having to validate a > cache altogether. While I like the idea of having frozen repository, it introduce an additional level of complexity (somewhere already not simple) and will probably take some time to right and properly performing. Can we look at way to speed up the cache validation logic (or get a better approach for invalidating it, eg: nothing really change until we have a transaction). Speeding this would have a massive impact all across the board without changing much of the current architecture. I'll look at the implementation of the frozen repo more in details later.
On Sun, Nov 22, 2015 at 9:23 PM, Pierre-Yves David < pierre-yves.david@ens-lyon.org> wrote: > > > On 11/21/2015 05:14 PM, Gregory Szorc wrote: > >> # HG changeset patch >> # User Gregory Szorc <gregory.szorc@gmail.com> >> # Date 1448154001 28800 >> # Sat Nov 21 17:00:01 2015 -0800 >> # Node ID 888c2171adffa8340406b50aae02375f7bef50f4 >> # Parent f559e16232491e44fc4dac6c12872cb2e9164d27 >> localrepo: ability to obtain a read-only snapshot of a repo >> >> As part of performing some performance profiling, I noticed that a lot >> of time can be spent resolving repoview.changelog (the changelog must >> be resolved when performing repo[x] for example). The reason is that >> the changelog lookup function needs to validate that the cached set >> of revisions exposed to the view is still accurate. While there are >> comments in repoview.py hinting at future optimizations to this >> code, I think the ideal solution is to avoid having to validate a >> cache altogether. >> > > While I like the idea of having frozen repository, it introduce an > additional level of complexity (somewhere already not simple) and will > probably take some time to right and properly performing. > > Can we look at way to speed up the cache validation logic (or get a better > approach for invalidating it, eg: nothing really change until we have a > transaction). > > Speeding this would have a massive impact all across the board without > changing much of the current architecture. > > I'll look at the implementation of the frozen repo more in details later. Pierre-Yves just sent a refactor to repoview to remove a lot of the overhead for .changelog lookup. His series seems to have the same performance impact as my series. And his series doesn't introduce a complicated new concept of "frozen repos" which would need to be utilized all over the code base to become effective. While I like the idea of a separate class for a read-only snapshot of the state of a repository, I think that is an idea left for another day since the main benefit (performance) is achieved through simpler means.
Patch
diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py --- a/mercurial/localrepo.py +++ b/mercurial/localrepo.py @@ -413,16 +413,24 @@ class localrepository(object): def filtered(self, name): """Return a filtered version of a repository""" # build a new class with the mixin and the current class # (possibly subclass of the repo) class proxycls(repoview.repoview, self.unfiltered().__class__): pass return proxycls(self, name) + def frozen(self): + """Obtain a read-only snapshot of the repository.""" + # Don't create a new object if we're already frozen. + if getattr(self, '_frozencl', None): + return self + + return freezerepo(self) + @repofilecache('bookmarks') def _bookmarks(self): return bookmarks.bmstore(self) @repofilecache('bookmarks.current') def _activebookmark(self): return bookmarks.readactive(self) @@ -1931,8 +1939,81 @@ def undoname(fn): assert name.startswith('journal') return os.path.join(base, name.replace('journal', 'undo', 1)) def instance(ui, path, create): return localrepository(ui, util.urllocalpath(path), create) def islocal(path): return True + +def freezerepo(repo): + """Obtain a read-only snapshot of a repository. + + Returns a new repository instance based on the existing one that + is read-only and frozen in time. Lookups that would normally need + o be validated against change are not performed. This property makes + frozen repository instances suitable for read-only queries where + performance is important. + + Any active repo filter, if any, will be preserved in the returned + instance. + """ + msg = _('%s should not be called on frozen repos') + + class frozenrepo(repo.__class__): + def __init__(self, orig): + # We need to call the constructor for repoview if we inherit + # from one. + if isinstance(orig, repoview.repoview): + repoview.repoview.__init__(self, orig, orig.filtername) + + # Stash away a reference to the original repo so we can proxy. + object.__setattr__(self, '_unfrozenrepo', orig) + + # changelog object lookup is typically dynamic. This is not + # needed on frozen repos. So resolve the changelog once and + # cache it. + object.__setattr__(self, '_frozencl', orig.changelog) + + @property + def changelog(self): + return self._frozencl + + # We need to override functions that return repos to returned frozen + # repos as well. + def unfiltered(self): + # TODO this should return a frozen repo. However, a few tests are + # failing due to unknown parent of working directory state. + return self._unfrozenrepo.unfiltered() + + def filtered(self, name): + current = getattr(self, 'filtername', None) + if current == name: + return self + + return freezerepo(self._unfrozenrepo.unfiltered().filtered(name)) + + # Proxy all operations to original repo. + def __getattr__(self, attr): + return getattr(self._unfrozenrepo, attr) + + def __setattr__(self, attr, value): + return setattr(self._unfrozenrepo, attr, value) + + def __delattr__(self, attr): + return delattr(self._unfrozenrepo, attr) + + # Catch calls to functions that should never be called on frozen + # repo instances. + def lock(self, *args, **kwargs): + raise error.Abort(msg % 'lock()') + + def transaction(self, *args, **kwargs): + raise error.Abort(msg % 'transaction()') + + def commit(self, *args, **kwargs): + raise error.Abort(msg % 'commit()') + + def destroying(self): + raise error.Abort(msg % 'destroying()') + + return frozenrepo(repo)