Patchwork [1,of,5,V2] treemanifest: create treemanifest class

login
register
mail settings
Submitter Martin von Zweigbergk
Date March 16, 2015, 11:27 p.m.
Message ID <0d969d6efeef08935c93.1426548427@martinvonz.mtv.corp.google.com>
Download mbox | patch
Permalink /patch/8103/
State Superseded
Headers show

Comments

Martin von Zweigbergk - March 16, 2015, 11:27 p.m.
# HG changeset patch
# User Martin von Zweigbergk <martinvonz@google.com>
# Date 1426435855 25200
#      Sun Mar 15 09:10:55 2015 -0700
# Node ID 0d969d6efeef08935c93afc416bf406def6b8e59
# Parent  567ae53657544744155897ada91f16f8af61ad8a
treemanifest: create treemanifest class

There are a number of problems with large and flat manifests. Copying
from http://mercurial.selenic.com/wiki/ManifestShardingPlan:

 * manifest too large for RAM

 * manifest resolution too much CPU (long delta chains)

 * committing is slow because entire manifest has to be hashed

 * impossible for narrow clone to leave out part of manifest as all is
   needed to calculate new hash

 * diffing two revisions involves traversing entire subdirectories
   even if identical

This is a first step in a series introducing a manifest revlog per
directory.

This change adds boolean configuration option
experimental.treemanifest. When the option is enabled, manifests are
parsed into a new tree data structure with one tree node per
directory. At this point, it is just a different data structure in
memory; there is still just a single manifest revlog on disk.

The code is not yet optimized. Enabling the option makes most or all
operations slower. Once we start storing manifest revlogs for every
directory, it should be possible to make many of these operations much
faster. The fastdelta() optimization has been explicitly disabled (and
not implemented) for the treemanifests.

Tests can be run using the new data structure by switching the config
option default in localrepo._applyrequirements(). All tests pass.
Matt Mackall - March 18, 2015, 10:28 p.m.
On Mon, 2015-03-16 at 16:27 -0700, Martin von Zweigbergk wrote:
> # HG changeset patch
> # User Martin von Zweigbergk <martinvonz@google.com>
> # Date 1426435855 25200
> #      Sun Mar 15 09:10:55 2015 -0700
> # Node ID 0d969d6efeef08935c93afc416bf406def6b8e59
> # Parent  567ae53657544744155897ada91f16f8af61ad8a
> treemanifest: create treemanifest class
> 
> There are a number of problems with large and flat manifests. Copying
> from http://mercurial.selenic.com/wiki/ManifestShardingPlan:
> 
>  * manifest too large for RAM
> 
>  * manifest resolution too much CPU (long delta chains)
> 
>  * committing is slow because entire manifest has to be hashed
> 
>  * impossible for narrow clone to leave out part of manifest as all is
>    needed to calculate new hash
> 
>  * diffing two revisions involves traversing entire subdirectories
>    even if identical
> 
> This is a first step in a series introducing a manifest revlog per
> directory.
> 
> This change adds boolean configuration option
> experimental.treemanifest. When the option is enabled, manifests are
> parsed into a new tree data structure with one tree node per
> directory. At this point, it is just a different data structure in
> memory; there is still just a single manifest revlog on disk.

I think the right way to do this is:

- at repo create time, check the flag
   - add treemanifest to requires file
- at startup, check requires file
   - unconditionally use treemanifest if set

..and that wants to be staged as a couple patches later in the series,
not here.
Martin von Zweigbergk - March 18, 2015, 10:47 p.m.
On Wed, Mar 18, 2015 at 3:28 PM Matt Mackall <mpm@selenic.com> wrote:

> On Mon, 2015-03-16 at 16:27 -0700, Martin von Zweigbergk wrote:
> > # HG changeset patch
> > # User Martin von Zweigbergk <martinvonz@google.com>
> > # Date 1426435855 25200
> > #      Sun Mar 15 09:10:55 2015 -0700
> > # Node ID 0d969d6efeef08935c93afc416bf406def6b8e59
> > # Parent  567ae53657544744155897ada91f16f8af61ad8a
> > treemanifest: create treemanifest class
> >
> > There are a number of problems with large and flat manifests. Copying
> > from http://mercurial.selenic.com/wiki/ManifestShardingPlan:
> >
> >  * manifest too large for RAM
> >
> >  * manifest resolution too much CPU (long delta chains)
> >
> >  * committing is slow because entire manifest has to be hashed
> >
> >  * impossible for narrow clone to leave out part of manifest as all is
> >    needed to calculate new hash
> >
> >  * diffing two revisions involves traversing entire subdirectories
> >    even if identical
> >
> > This is a first step in a series introducing a manifest revlog per
> > directory.
> >
> > This change adds boolean configuration option
> > experimental.treemanifest. When the option is enabled, manifests are
> > parsed into a new tree data structure with one tree node per
> > directory. At this point, it is just a different data structure in
> > memory; there is still just a single manifest revlog on disk.
>
> I think the right way to do this is:
>
> - at repo create time, check the flag
>    - add treemanifest to requires file
> - at startup, check requires file
>    - unconditionally use treemanifest if set
>

I agree with the second step, but I'm not sure about the first. After the 5
patches in this series, using treemanifest is still not a final decision;
the manifest is still stored in the regular flat format. In other words,
--config experimental.treemanifest=True can be passed to any command
without affecting anything besides performance. It seems like the above
would unnecessarily make it a final decision. I was rather planning on
writing the requires flag when a treemanifest is first written to disk
using the future submanifest revlog structure. I would even be possible to
unconditionally read into the treemanifest structure (if we imagine it got
fast enough). Then the config would clearly only be about writing. What do
you think?
Augie Fackler - March 19, 2015, 1:27 p.m.
On Wed, Mar 18, 2015 at 10:47:16PM +0000, Martin von Zweigbergk wrote:
> On Wed, Mar 18, 2015 at 3:28 PM Matt Mackall <mpm@selenic.com> wrote:
>
> > On Mon, 2015-03-16 at 16:27 -0700, Martin von Zweigbergk wrote:
> > > # HG changeset patch
> > > # User Martin von Zweigbergk <martinvonz@google.com>
> > > # Date 1426435855 25200
> > > #      Sun Mar 15 09:10:55 2015 -0700
> > > # Node ID 0d969d6efeef08935c93afc416bf406def6b8e59
> > > # Parent  567ae53657544744155897ada91f16f8af61ad8a
> > > treemanifest: create treemanifest class
> > >
> > > There are a number of problems with large and flat manifests. Copying
> > > from http://mercurial.selenic.com/wiki/ManifestShardingPlan:
> > >
> > >  * manifest too large for RAM
> > >
> > >  * manifest resolution too much CPU (long delta chains)
> > >
> > >  * committing is slow because entire manifest has to be hashed
> > >
> > >  * impossible for narrow clone to leave out part of manifest as all is
> > >    needed to calculate new hash
> > >
> > >  * diffing two revisions involves traversing entire subdirectories
> > >    even if identical
> > >
> > > This is a first step in a series introducing a manifest revlog per
> > > directory.
> > >
> > > This change adds boolean configuration option
> > > experimental.treemanifest. When the option is enabled, manifests are
> > > parsed into a new tree data structure with one tree node per
> > > directory. At this point, it is just a different data structure in
> > > memory; there is still just a single manifest revlog on disk.
> >
> > I think the right way to do this is:
> >
> > - at repo create time, check the flag
> >    - add treemanifest to requires file
> > - at startup, check requires file
> >    - unconditionally use treemanifest if set
> >
>
> I agree with the second step, but I'm not sure about the first.

He's 100% right, I should have thought of this before you mailed
it. Basically, if you don't put *something* in the requires, older hg
versions will try and operate on the repository and (at best) claim
it's corrupt. You could probably do something like
"experimental-treemanifest" or something, knowing that we'd either
want to disable the feature or lock down the format before it went
into a real release.

(I'm not sure if mpm will endorse an experimental- flag in requires,
but it seems reasonable to me.)

> After the 5
> patches in this series, using treemanifest is still not a final decision;
> the manifest is still stored in the regular flat format. In other words,
> --config experimental.treemanifest=True can be passed to any command
> without affecting anything besides performance. It seems like the above
> would unnecessarily make it a final decision. I was rather planning on
> writing the requires flag when a treemanifest is first written to disk
> using the future submanifest revlog structure. I would even be possible to
> unconditionally read into the treemanifest structure (if we imagine it got
> fast enough). Then the config would clearly only be about writing. What do
> you think?

> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
Martin von Zweigbergk - March 19, 2015, 1:34 p.m.
On Thu, Mar 19, 2015 at 6:27 AM Augie Fackler <raf@durin42.com> wrote:

> On Wed, Mar 18, 2015 at 10:47:16PM +0000, Martin von Zweigbergk wrote:
> > On Wed, Mar 18, 2015 at 3:28 PM Matt Mackall <mpm@selenic.com> wrote:
> >
> > > On Mon, 2015-03-16 at 16:27 -0700, Martin von Zweigbergk wrote:
> > > > # HG changeset patch
> > > > # User Martin von Zweigbergk <martinvonz@google.com>
> > > > # Date 1426435855 25200
> > > > #      Sun Mar 15 09:10:55 2015 -0700
> > > > # Node ID 0d969d6efeef08935c93afc416bf406def6b8e59
> > > > # Parent  567ae53657544744155897ada91f16f8af61ad8a
> > > > treemanifest: create treemanifest class
> > > >
> > > > There are a number of problems with large and flat manifests. Copying
> > > > from http://mercurial.selenic.com/wiki/ManifestShardingPlan:
> > > >
> > > >  * manifest too large for RAM
> > > >
> > > >  * manifest resolution too much CPU (long delta chains)
> > > >
> > > >  * committing is slow because entire manifest has to be hashed
> > > >
> > > >  * impossible for narrow clone to leave out part of manifest as all
> is
> > > >    needed to calculate new hash
> > > >
> > > >  * diffing two revisions involves traversing entire subdirectories
> > > >    even if identical
> > > >
> > > > This is a first step in a series introducing a manifest revlog per
> > > > directory.
> > > >
> > > > This change adds boolean configuration option
> > > > experimental.treemanifest. When the option is enabled, manifests are
> > > > parsed into a new tree data structure with one tree node per
> > > > directory. At this point, it is just a different data structure in
> > > > memory; there is still just a single manifest revlog on disk.
> > >
> > > I think the right way to do this is:
> > >
> > > - at repo create time, check the flag
> > >    - add treemanifest to requires file
> > > - at startup, check requires file
> > >    - unconditionally use treemanifest if set
> > >
> >
> > I agree with the second step, but I'm not sure about the first.
>
> He's 100% right, I should have thought of this before you mailed
> it. Basically, if you don't put *something* in the requires, older hg
> versions will try and operate on the repository and (at best) claim
> it's corrupt.
>

No worries, I did know about the requires thing. I was instead arguing that
the repository is *not* corrupt (to an old client either) after this
series, so we can add the requires later.
Martin von Zweigbergk - March 19, 2015, 3:18 p.m.
On Thu, Mar 19, 2015 at 6:34 AM Martin von Zweigbergk <martinvonz@google.com>
wrote:

> On Thu, Mar 19, 2015 at 6:27 AM Augie Fackler <raf@durin42.com> wrote:
>
>> On Wed, Mar 18, 2015 at 10:47:16PM +0000, Martin von Zweigbergk wrote:
>> > On Wed, Mar 18, 2015 at 3:28 PM Matt Mackall <mpm@selenic.com> wrote:
>> >
>> > > On Mon, 2015-03-16 at 16:27 -0700, Martin von Zweigbergk wrote:
>> > > > # HG changeset patch
>> > > > # User Martin von Zweigbergk <martinvonz@google.com>
>> > > > # Date 1426435855 25200
>> > > > #      Sun Mar 15 09:10:55 2015 -0700
>> > > > # Node ID 0d969d6efeef08935c93afc416bf406def6b8e59
>> > > > # Parent  567ae53657544744155897ada91f16f8af61ad8a
>> > > > treemanifest: create treemanifest class
>> > > >
>> > > > There are a number of problems with large and flat manifests.
>> Copying
>> > > > from http://mercurial.selenic.com/wiki/ManifestShardingPlan:
>> > > >
>> > > >  * manifest too large for RAM
>> > > >
>> > > >  * manifest resolution too much CPU (long delta chains)
>> > > >
>> > > >  * committing is slow because entire manifest has to be hashed
>> > > >
>> > > >  * impossible for narrow clone to leave out part of manifest as all
>> is
>> > > >    needed to calculate new hash
>> > > >
>> > > >  * diffing two revisions involves traversing entire subdirectories
>> > > >    even if identical
>> > > >
>> > > > This is a first step in a series introducing a manifest revlog per
>> > > > directory.
>> > > >
>> > > > This change adds boolean configuration option
>> > > > experimental.treemanifest. When the option is enabled, manifests are
>> > > > parsed into a new tree data structure with one tree node per
>> > > > directory. At this point, it is just a different data structure in
>> > > > memory; there is still just a single manifest revlog on disk.
>> > >
>> > > I think the right way to do this is:
>> > >
>> > > - at repo create time, check the flag
>> > >    - add treemanifest to requires file
>> > > - at startup, check requires file
>> > >    - unconditionally use treemanifest if set
>> > >
>> >
>> > I agree with the second step, but I'm not sure about the first.
>>
>> He's 100% right, I should have thought of this before you mailed
>> it. Basically, if you don't put *something* in the requires, older hg
>> versions will try and operate on the repository and (at best) claim
>> it's corrupt.
>>
>
> No worries, I did know about the requires thing. I was instead arguing
> that the repository is *not* corrupt (to an old client either) after this
> series, so we can add the requires later.
>

I hadn't thought of this before, but to show that it's likely to be safe, I
just ran the test suite while randomly (50/50) using the two types to make
sure one version can be written with treemanifest and the next be read with
manifestdict, and vice versa. I checked that it did in fact choose randomly
(well, not the same every time anyway).
Matt Mackall - March 19, 2015, 5:45 p.m.
On Thu, 2015-03-19 at 15:18 +0000, Martin von Zweigbergk wrote:
> On Thu, Mar 19, 2015 at 6:34 AM Martin von Zweigbergk <martinvonz@google.com>
> wrote:
> 
> > On Thu, Mar 19, 2015 at 6:27 AM Augie Fackler <raf@durin42.com> wrote:
> >
> >> On Wed, Mar 18, 2015 at 10:47:16PM +0000, Martin von Zweigbergk wrote:
> >> > On Wed, Mar 18, 2015 at 3:28 PM Matt Mackall <mpm@selenic.com> wrote:
> >> >
> >> > > On Mon, 2015-03-16 at 16:27 -0700, Martin von Zweigbergk wrote:
> >> > > > # HG changeset patch
> >> > > > # User Martin von Zweigbergk <martinvonz@google.com>
> >> > > > # Date 1426435855 25200
> >> > > > #      Sun Mar 15 09:10:55 2015 -0700
> >> > > > # Node ID 0d969d6efeef08935c93afc416bf406def6b8e59
> >> > > > # Parent  567ae53657544744155897ada91f16f8af61ad8a
> >> > > > treemanifest: create treemanifest class
> >> > > >
> >> > > > There are a number of problems with large and flat manifests.
> >> Copying
> >> > > > from http://mercurial.selenic.com/wiki/ManifestShardingPlan:
> >> > > >
> >> > > >  * manifest too large for RAM
> >> > > >
> >> > > >  * manifest resolution too much CPU (long delta chains)
> >> > > >
> >> > > >  * committing is slow because entire manifest has to be hashed
> >> > > >
> >> > > >  * impossible for narrow clone to leave out part of manifest as all
> >> is
> >> > > >    needed to calculate new hash
> >> > > >
> >> > > >  * diffing two revisions involves traversing entire subdirectories
> >> > > >    even if identical
> >> > > >
> >> > > > This is a first step in a series introducing a manifest revlog per
> >> > > > directory.
> >> > > >
> >> > > > This change adds boolean configuration option
> >> > > > experimental.treemanifest. When the option is enabled, manifests are
> >> > > > parsed into a new tree data structure with one tree node per
> >> > > > directory. At this point, it is just a different data structure in
> >> > > > memory; there is still just a single manifest revlog on disk.
> >> > >
> >> > > I think the right way to do this is:
> >> > >
> >> > > - at repo create time, check the flag
> >> > >    - add treemanifest to requires file
> >> > > - at startup, check requires file
> >> > >    - unconditionally use treemanifest if set
> >> > >
> >> >
> >> > I agree with the second step, but I'm not sure about the first.
> >>
> >> He's 100% right, I should have thought of this before you mailed
> >> it. Basically, if you don't put *something* in the requires, older hg
> >> versions will try and operate on the repository and (at best) claim
> >> it's corrupt.
> >>
> >
> > No worries, I did know about the requires thing. I was instead arguing
> > that the repository is *not* corrupt (to an old client either) after this
> > series, so we can add the requires later.
> >
> 
> I hadn't thought of this before, but to show that it's likely to be safe, I
> just ran the test suite while randomly (50/50) using the two types to make
> sure one version can be written with treemanifest and the next be read with
> manifestdict, and vice versa. I checked that it did in fact choose randomly
> (well, not the same every time anyway).

Yeah, Martin's right: there's no on-disk change here (yet) so it's
premature to touch requires. I'd still like to see the experimental flag
bits in a separate patch.
Martin von Zweigbergk - March 19, 2015, 5:54 p.m.
On Thu, Mar 19, 2015 at 10:45 AM Matt Mackall <mpm@selenic.com> wrote:

> On Thu, 2015-03-19 at 15:18 +0000, Martin von Zweigbergk wrote:
> > On Thu, Mar 19, 2015 at 6:34 AM Martin von Zweigbergk <
> martinvonz@google.com>
> > wrote:
> >
> > > On Thu, Mar 19, 2015 at 6:27 AM Augie Fackler <raf@durin42.com> wrote:
> > >
> > >> On Wed, Mar 18, 2015 at 10:47:16PM +0000, Martin von Zweigbergk wrote:
> > >> > On Wed, Mar 18, 2015 at 3:28 PM Matt Mackall <mpm@selenic.com>
> wrote:
> > >> >
> > >> > > On Mon, 2015-03-16 at 16:27 -0700, Martin von Zweigbergk wrote:
> > >> > > > # HG changeset patch
> > >> > > > # User Martin von Zweigbergk <martinvonz@google.com>
> > >> > > > # Date 1426435855 25200
> > >> > > > #      Sun Mar 15 09:10:55 2015 -0700
> > >> > > > # Node ID 0d969d6efeef08935c93afc416bf406def6b8e59
> > >> > > > # Parent  567ae53657544744155897ada91f16f8af61ad8a
> > >> > > > treemanifest: create treemanifest class
> > >> > > >
> > >> > > > There are a number of problems with large and flat manifests.
> > >> Copying
> > >> > > > from http://mercurial.selenic.com/wiki/ManifestShardingPlan:
> > >> > > >
> > >> > > >  * manifest too large for RAM
> > >> > > >
> > >> > > >  * manifest resolution too much CPU (long delta chains)
> > >> > > >
> > >> > > >  * committing is slow because entire manifest has to be hashed
> > >> > > >
> > >> > > >  * impossible for narrow clone to leave out part of manifest as
> all
> > >> is
> > >> > > >    needed to calculate new hash
> > >> > > >
> > >> > > >  * diffing two revisions involves traversing entire
> subdirectories
> > >> > > >    even if identical
> > >> > > >
> > >> > > > This is a first step in a series introducing a manifest revlog
> per
> > >> > > > directory.
> > >> > > >
> > >> > > > This change adds boolean configuration option
> > >> > > > experimental.treemanifest. When the option is enabled,
> manifests are
> > >> > > > parsed into a new tree data structure with one tree node per
> > >> > > > directory. At this point, it is just a different data structure
> in
> > >> > > > memory; there is still just a single manifest revlog on disk.
> > >> > >
> > >> > > I think the right way to do this is:
> > >> > >
> > >> > > - at repo create time, check the flag
> > >> > >    - add treemanifest to requires file
> > >> > > - at startup, check requires file
> > >> > >    - unconditionally use treemanifest if set
> > >> > >
> > >> >
> > >> > I agree with the second step, but I'm not sure about the first.
> > >>
> > >> He's 100% right, I should have thought of this before you mailed
> > >> it. Basically, if you don't put *something* in the requires, older hg
> > >> versions will try and operate on the repository and (at best) claim
> > >> it's corrupt.
> > >>
> > >
> > > No worries, I did know about the requires thing. I was instead arguing
> > > that the repository is *not* corrupt (to an old client either) after
> this
> > > series, so we can add the requires later.
> > >
> >
> > I hadn't thought of this before, but to show that it's likely to be
> safe, I
> > just ran the test suite while randomly (50/50) using the two types to
> make
> > sure one version can be written with treemanifest and the next be read
> with
> > manifestdict, and vice versa. I checked that it did in fact choose
> randomly
> > (well, not the same every time anyway).
>
> Yeah, Martin's right: there's no on-disk change here (yet) so it's
> premature to touch requires. I'd still like to see the experimental flag
> bits in a separate patch.
>

Just to confirm, the current patch 1 will be split into:
patch 1:  adds the treemanifest class
patch 2:  adds the config option (to make it like the current patch 1)

Correct? If so, that should be trivial, of course, and I'll send a V3
shortly.
Matt Mackall - March 19, 2015, 6:03 p.m.
On Thu, 2015-03-19 at 17:54 +0000, Martin von Zweigbergk wrote:
> Just to confirm, the current patch 1 will be split into:
> patch 1:  adds the treemanifest class
> patch 2:  adds the config option (to make it like the current patch 1)
> 
> Correct? If so, that should be trivial, of course, and I'll send a V3
> shortly.

Yep.

Patch

diff -r 567ae5365754 -r 0d969d6efeef mercurial/localrepo.py
--- a/mercurial/localrepo.py	Thu Mar 12 23:15:06 2015 -0400
+++ b/mercurial/localrepo.py	Sun Mar 15 09:10:55 2015 -0700
@@ -326,6 +326,9 @@ 
         manifestcachesize = self.ui.configint('format', 'manifestcachesize')
         if manifestcachesize is not None:
             self.svfs.options['manifestcachesize'] = manifestcachesize
+        usetreemanifest = self.ui.configbool('experimental', 'treemanifest')
+        if usetreemanifest is not None:
+            self.svfs.options['usetreemanifest'] = usetreemanifest
 
     def _writerequirements(self):
         reqfile = self.vfs("requires", "w")
diff -r 567ae5365754 -r 0d969d6efeef mercurial/manifest.py
--- a/mercurial/manifest.py	Thu Mar 12 23:15:06 2015 -0400
+++ b/mercurial/manifest.py	Sun Mar 15 09:10:55 2015 -0700
@@ -328,22 +328,252 @@ 
                    + content for start, end, content in x)
     return deltatext, newaddlist
 
+def _splittopdir(f):
+    if '/' in f:
+        dir, subpath = f.split('/', 1)
+        return dir + '/', subpath
+    else:
+        return '', f
+
+class treemanifest(object):
+    def __init__(self, text=''):
+        self._dirs = {}
+        # Using _lazymanifest here is a little slower than plain old dicts
+        self._files = {}
+        self._flags = {}
+        lm = _lazymanifest(text)
+        for f, n, fl in lm.iterentries():
+            self[f] = n
+            if fl:
+                self.setflag(f, fl)
+
+    def __len__(self):
+        size = len(self._files)
+        for m in self._dirs.values():
+            size += m.__len__()
+        return size
+
+    def iteritems(self):
+        for p, n in sorted(self._dirs.items() + self._files.items()):
+            if p in self._files:
+                yield p, n
+            else:
+                for sf, sn in n.iteritems():
+                    yield p + sf, sn
+
+    def iterkeys(self):
+        for p in sorted(self._dirs.keys() + self._files.keys()):
+            if p in self._files:
+                yield p
+            else:
+                for f in self._dirs[p].iterkeys():
+                    yield p + f
+
+    def keys(self):
+        return list(self.iterkeys())
+
+    def __iter__(self):
+        return self.iterkeys()
+
+    def __contains__(self, f):
+        if f is None:
+            return False
+        dir, subpath = _splittopdir(f)
+        if dir:
+            if dir not in self._dirs:
+                return False
+            return self._dirs[dir].__contains__(subpath)
+        else:
+            return f in self._files
+
+    def get(self, f, default=None):
+        dir, subpath = _splittopdir(f)
+        if dir:
+            if dir not in self._dirs:
+                return default
+            return self._dirs[dir].get(subpath, default)
+        else:
+            return self._files.get(f, default)
+
+    def __getitem__(self, f):
+        dir, subpath = _splittopdir(f)
+        if dir:
+            return self._dirs[dir].__getitem__(subpath)
+        else:
+            return self._files[f]
+
+    def flags(self, f):
+        dir, subpath = _splittopdir(f)
+        if dir:
+            if dir not in self._dirs:
+                return ''
+            return self._dirs[dir].flags(subpath)
+        else:
+            if f in self._dirs:
+                return ''
+            return self._flags.get(f, '')
+
+    def find(self, f):
+        dir, subpath = _splittopdir(f)
+        if dir:
+            return self._dirs[dir].find(subpath)
+        else:
+            return self._files[f], self._flags.get(f, '')
+
+    def __delitem__(self, f):
+        dir, subpath = _splittopdir(f)
+        if dir:
+            self._dirs[dir].__delitem__(subpath)
+            # If the directory is now empty, remove it
+            if not self._dirs[dir]._dirs and not self._dirs[dir]._files:
+                del self._dirs[dir]
+        else:
+            del self._files[f]
+            if f in self._flags:
+                del self._flags[f]
+
+    def __setitem__(self, f, n):
+        assert n is not None
+        dir, subpath = _splittopdir(f)
+        if dir:
+            if dir not in self._dirs:
+                self._dirs[dir] = treemanifest()
+            self._dirs[dir].__setitem__(subpath, n)
+        else:
+            self._files[f] = n
+
+    def setflag(self, f, flags):
+        """Set the flags (symlink, executable) for path f."""
+        dir, subpath = _splittopdir(f)
+        if dir:
+            if dir not in self._dirs:
+                self._dirs[dir] = treemanifest()
+            self._dirs[dir].setflag(subpath, flags)
+        else:
+            self._flags[f] = flags
+
+    def copy(self):
+        copy = treemanifest()
+        for d in self._dirs:
+            copy._dirs[d] = self._dirs[d].copy()
+        copy._files = dict.copy(self._files)
+        copy._flags = dict.copy(self._flags)
+        return copy
+
+    def intersectfiles(self, files):
+        '''make a new treemanifest with the intersection of self with files
+
+        The algorithm assumes that files is much smaller than self.'''
+        ret = treemanifest()
+        for fn in files:
+            if fn in self:
+                ret[fn] = self[fn]
+                flags = self.flags(fn)
+                if flags:
+                    ret.setflag(fn, flags)
+        return ret
+
+    def filesnotin(self, m2):
+        '''Set of files in this manifest that are not in the other'''
+        files = set(self.iterkeys())
+        files.difference_update(m2.iterkeys())
+        return files
+
+    @propertycache
+    def _alldirs(self):
+        return scmutil.dirs(self)
+
+    def dirs(self):
+        return self._alldirs
+
+    def hasdir(self, dir):
+        return dir in self._alldirs
+
+    def matches(self, match):
+        '''generate a new manifest filtered by the match argument'''
+        if match.always():
+            return self.copy()
+
+        files = match.files()
+        if (match.matchfn == match.exact or
+            (not match.anypats() and util.all(fn in self for fn in files))):
+            return self.intersectfiles(files)
+
+        m = self.copy()
+        for fn in m.keys():
+            if not match(fn):
+                del m[fn]
+        return m
+
+    def diff(self, m2, clean=False):
+        '''Finds changes between the current manifest and m2.
+
+        Args:
+          m2: the manifest to which this manifest should be compared.
+          clean: if true, include files unchanged between these manifests
+                 with a None value in the returned dictionary.
+
+        The result is returned as a dict with filename as key and
+        values of the form ((n1,fl1),(n2,fl2)), where n1/n2 is the
+        nodeid in the current/other manifest and fl1/fl2 is the flag
+        in the current/other manifest. Where the file does not exist,
+        the nodeid will be None and the flags will be the empty
+        string.
+        '''
+        diff = {}
+
+        for fn, n1 in self.iteritems():
+            fl1 = self.flags(fn)
+            n2 = m2.get(fn, None)
+            fl2 = m2.flags(fn)
+            if n2 is None:
+                fl2 = ''
+            if n1 != n2 or fl1 != fl2:
+                diff[fn] = ((n1, fl1), (n2, fl2))
+            elif clean:
+                diff[fn] = None
+
+        for fn, n2 in m2.iteritems():
+            if fn not in self:
+                fl2 = m2.flags(fn)
+                diff[fn] = ((None, ''), (n2, fl2))
+
+        return diff
+
+    def text(self):
+        """Get the full data of this manifest as a bytestring."""
+        fl = self.keys()
+        _checkforbidden(fl)
+
+        hex, flags = revlog.hex, self.flags
+        # if this is changed to support newlines in filenames,
+        # be sure to check the templates/ dir again (especially *-raw.tmpl)
+        return ''.join("%s\0%s%s\n" % (f, hex(self[f]), flags(f)) for f in fl)
+
 class manifest(revlog.revlog):
     def __init__(self, opener):
         # During normal operations, we expect to deal with not more than four
         # revs at a time (such as during commit --amend). When rebasing large
         # stacks of commits, the number can go up, hence the config knob below.
         cachesize = 4
+        usetreemanifest = False
         opts = getattr(opener, 'options', None)
         if opts is not None:
             cachesize = opts.get('manifestcachesize', cachesize)
+            usetreemanifest = opts.get('usetreemanifest', usetreemanifest)
         self._mancache = util.lrucachedict(cachesize)
         revlog.revlog.__init__(self, opener, "00manifest.i")
+        self._usetreemanifest = usetreemanifest
+
+    def _newmanifest(self, data=''):
+        if self._usetreemanifest:
+            return treemanifest(data)
+        return manifestdict(data)
 
     def readdelta(self, node):
         r = self.rev(node)
         d = mdiff.patchtext(self.revdiff(self.deltaparent(r), r))
-        return manifestdict(d)
+        return self._newmanifest(d)
 
     def readfast(self, node):
         '''use the faster of readdelta or read'''
@@ -355,12 +585,12 @@ 
 
     def read(self, node):
         if node == revlog.nullid:
-            return manifestdict() # don't upset local cache
+            return self._newmanifest() # don't upset local cache
         if node in self._mancache:
             return self._mancache[node][0]
         text = self.revision(node)
         arraytext = array.array('c', text)
-        m = manifestdict(text)
+        m = self._newmanifest(text)
         self._mancache[node] = (m, arraytext)
         return m
 
@@ -374,7 +604,7 @@ 
             return None, None
 
     def add(self, m, transaction, link, p1, p2, added, removed):
-        if p1 in self._mancache:
+        if p1 in self._mancache and not self._usetreemanifest:
             # If our first parent is in the manifest cache, we can
             # compute a delta here using properties we know about the
             # manifest up-front, which may save time later for the