Submitter | Martin von Zweigbergk |
---|---|
Date | March 16, 2015, 11:27 p.m. |
Message ID | <0d969d6efeef08935c93.1426548427@martinvonz.mtv.corp.google.com> |
Download | mbox | patch |
Permalink | /patch/8103/ |
State | Superseded |
Headers | show |
Comments
On Mon, 2015-03-16 at 16:27 -0700, Martin von Zweigbergk wrote: > # HG changeset patch > # User Martin von Zweigbergk <martinvonz@google.com> > # Date 1426435855 25200 > # Sun Mar 15 09:10:55 2015 -0700 > # Node ID 0d969d6efeef08935c93afc416bf406def6b8e59 > # Parent 567ae53657544744155897ada91f16f8af61ad8a > treemanifest: create treemanifest class > > There are a number of problems with large and flat manifests. Copying > from http://mercurial.selenic.com/wiki/ManifestShardingPlan: > > * manifest too large for RAM > > * manifest resolution too much CPU (long delta chains) > > * committing is slow because entire manifest has to be hashed > > * impossible for narrow clone to leave out part of manifest as all is > needed to calculate new hash > > * diffing two revisions involves traversing entire subdirectories > even if identical > > This is a first step in a series introducing a manifest revlog per > directory. > > This change adds boolean configuration option > experimental.treemanifest. When the option is enabled, manifests are > parsed into a new tree data structure with one tree node per > directory. At this point, it is just a different data structure in > memory; there is still just a single manifest revlog on disk. I think the right way to do this is: - at repo create time, check the flag - add treemanifest to requires file - at startup, check requires file - unconditionally use treemanifest if set ..and that wants to be staged as a couple patches later in the series, not here.
On Wed, Mar 18, 2015 at 3:28 PM Matt Mackall <mpm@selenic.com> wrote: > On Mon, 2015-03-16 at 16:27 -0700, Martin von Zweigbergk wrote: > > # HG changeset patch > > # User Martin von Zweigbergk <martinvonz@google.com> > > # Date 1426435855 25200 > > # Sun Mar 15 09:10:55 2015 -0700 > > # Node ID 0d969d6efeef08935c93afc416bf406def6b8e59 > > # Parent 567ae53657544744155897ada91f16f8af61ad8a > > treemanifest: create treemanifest class > > > > There are a number of problems with large and flat manifests. Copying > > from http://mercurial.selenic.com/wiki/ManifestShardingPlan: > > > > * manifest too large for RAM > > > > * manifest resolution too much CPU (long delta chains) > > > > * committing is slow because entire manifest has to be hashed > > > > * impossible for narrow clone to leave out part of manifest as all is > > needed to calculate new hash > > > > * diffing two revisions involves traversing entire subdirectories > > even if identical > > > > This is a first step in a series introducing a manifest revlog per > > directory. > > > > This change adds boolean configuration option > > experimental.treemanifest. When the option is enabled, manifests are > > parsed into a new tree data structure with one tree node per > > directory. At this point, it is just a different data structure in > > memory; there is still just a single manifest revlog on disk. > > I think the right way to do this is: > > - at repo create time, check the flag > - add treemanifest to requires file > - at startup, check requires file > - unconditionally use treemanifest if set > I agree with the second step, but I'm not sure about the first. After the 5 patches in this series, using treemanifest is still not a final decision; the manifest is still stored in the regular flat format. In other words, --config experimental.treemanifest=True can be passed to any command without affecting anything besides performance. It seems like the above would unnecessarily make it a final decision. I was rather planning on writing the requires flag when a treemanifest is first written to disk using the future submanifest revlog structure. I would even be possible to unconditionally read into the treemanifest structure (if we imagine it got fast enough). Then the config would clearly only be about writing. What do you think?
On Wed, Mar 18, 2015 at 10:47:16PM +0000, Martin von Zweigbergk wrote: > On Wed, Mar 18, 2015 at 3:28 PM Matt Mackall <mpm@selenic.com> wrote: > > > On Mon, 2015-03-16 at 16:27 -0700, Martin von Zweigbergk wrote: > > > # HG changeset patch > > > # User Martin von Zweigbergk <martinvonz@google.com> > > > # Date 1426435855 25200 > > > # Sun Mar 15 09:10:55 2015 -0700 > > > # Node ID 0d969d6efeef08935c93afc416bf406def6b8e59 > > > # Parent 567ae53657544744155897ada91f16f8af61ad8a > > > treemanifest: create treemanifest class > > > > > > There are a number of problems with large and flat manifests. Copying > > > from http://mercurial.selenic.com/wiki/ManifestShardingPlan: > > > > > > * manifest too large for RAM > > > > > > * manifest resolution too much CPU (long delta chains) > > > > > > * committing is slow because entire manifest has to be hashed > > > > > > * impossible for narrow clone to leave out part of manifest as all is > > > needed to calculate new hash > > > > > > * diffing two revisions involves traversing entire subdirectories > > > even if identical > > > > > > This is a first step in a series introducing a manifest revlog per > > > directory. > > > > > > This change adds boolean configuration option > > > experimental.treemanifest. When the option is enabled, manifests are > > > parsed into a new tree data structure with one tree node per > > > directory. At this point, it is just a different data structure in > > > memory; there is still just a single manifest revlog on disk. > > > > I think the right way to do this is: > > > > - at repo create time, check the flag > > - add treemanifest to requires file > > - at startup, check requires file > > - unconditionally use treemanifest if set > > > > I agree with the second step, but I'm not sure about the first. He's 100% right, I should have thought of this before you mailed it. Basically, if you don't put *something* in the requires, older hg versions will try and operate on the repository and (at best) claim it's corrupt. You could probably do something like "experimental-treemanifest" or something, knowing that we'd either want to disable the feature or lock down the format before it went into a real release. (I'm not sure if mpm will endorse an experimental- flag in requires, but it seems reasonable to me.) > After the 5 > patches in this series, using treemanifest is still not a final decision; > the manifest is still stored in the regular flat format. In other words, > --config experimental.treemanifest=True can be passed to any command > without affecting anything besides performance. It seems like the above > would unnecessarily make it a final decision. I was rather planning on > writing the requires flag when a treemanifest is first written to disk > using the future submanifest revlog structure. I would even be possible to > unconditionally read into the treemanifest structure (if we imagine it got > fast enough). Then the config would clearly only be about writing. What do > you think? > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@selenic.com > http://selenic.com/mailman/listinfo/mercurial-devel
On Thu, Mar 19, 2015 at 6:27 AM Augie Fackler <raf@durin42.com> wrote: > On Wed, Mar 18, 2015 at 10:47:16PM +0000, Martin von Zweigbergk wrote: > > On Wed, Mar 18, 2015 at 3:28 PM Matt Mackall <mpm@selenic.com> wrote: > > > > > On Mon, 2015-03-16 at 16:27 -0700, Martin von Zweigbergk wrote: > > > > # HG changeset patch > > > > # User Martin von Zweigbergk <martinvonz@google.com> > > > > # Date 1426435855 25200 > > > > # Sun Mar 15 09:10:55 2015 -0700 > > > > # Node ID 0d969d6efeef08935c93afc416bf406def6b8e59 > > > > # Parent 567ae53657544744155897ada91f16f8af61ad8a > > > > treemanifest: create treemanifest class > > > > > > > > There are a number of problems with large and flat manifests. Copying > > > > from http://mercurial.selenic.com/wiki/ManifestShardingPlan: > > > > > > > > * manifest too large for RAM > > > > > > > > * manifest resolution too much CPU (long delta chains) > > > > > > > > * committing is slow because entire manifest has to be hashed > > > > > > > > * impossible for narrow clone to leave out part of manifest as all > is > > > > needed to calculate new hash > > > > > > > > * diffing two revisions involves traversing entire subdirectories > > > > even if identical > > > > > > > > This is a first step in a series introducing a manifest revlog per > > > > directory. > > > > > > > > This change adds boolean configuration option > > > > experimental.treemanifest. When the option is enabled, manifests are > > > > parsed into a new tree data structure with one tree node per > > > > directory. At this point, it is just a different data structure in > > > > memory; there is still just a single manifest revlog on disk. > > > > > > I think the right way to do this is: > > > > > > - at repo create time, check the flag > > > - add treemanifest to requires file > > > - at startup, check requires file > > > - unconditionally use treemanifest if set > > > > > > > I agree with the second step, but I'm not sure about the first. > > He's 100% right, I should have thought of this before you mailed > it. Basically, if you don't put *something* in the requires, older hg > versions will try and operate on the repository and (at best) claim > it's corrupt. > No worries, I did know about the requires thing. I was instead arguing that the repository is *not* corrupt (to an old client either) after this series, so we can add the requires later.
On Thu, Mar 19, 2015 at 6:34 AM Martin von Zweigbergk <martinvonz@google.com> wrote: > On Thu, Mar 19, 2015 at 6:27 AM Augie Fackler <raf@durin42.com> wrote: > >> On Wed, Mar 18, 2015 at 10:47:16PM +0000, Martin von Zweigbergk wrote: >> > On Wed, Mar 18, 2015 at 3:28 PM Matt Mackall <mpm@selenic.com> wrote: >> > >> > > On Mon, 2015-03-16 at 16:27 -0700, Martin von Zweigbergk wrote: >> > > > # HG changeset patch >> > > > # User Martin von Zweigbergk <martinvonz@google.com> >> > > > # Date 1426435855 25200 >> > > > # Sun Mar 15 09:10:55 2015 -0700 >> > > > # Node ID 0d969d6efeef08935c93afc416bf406def6b8e59 >> > > > # Parent 567ae53657544744155897ada91f16f8af61ad8a >> > > > treemanifest: create treemanifest class >> > > > >> > > > There are a number of problems with large and flat manifests. >> Copying >> > > > from http://mercurial.selenic.com/wiki/ManifestShardingPlan: >> > > > >> > > > * manifest too large for RAM >> > > > >> > > > * manifest resolution too much CPU (long delta chains) >> > > > >> > > > * committing is slow because entire manifest has to be hashed >> > > > >> > > > * impossible for narrow clone to leave out part of manifest as all >> is >> > > > needed to calculate new hash >> > > > >> > > > * diffing two revisions involves traversing entire subdirectories >> > > > even if identical >> > > > >> > > > This is a first step in a series introducing a manifest revlog per >> > > > directory. >> > > > >> > > > This change adds boolean configuration option >> > > > experimental.treemanifest. When the option is enabled, manifests are >> > > > parsed into a new tree data structure with one tree node per >> > > > directory. At this point, it is just a different data structure in >> > > > memory; there is still just a single manifest revlog on disk. >> > > >> > > I think the right way to do this is: >> > > >> > > - at repo create time, check the flag >> > > - add treemanifest to requires file >> > > - at startup, check requires file >> > > - unconditionally use treemanifest if set >> > > >> > >> > I agree with the second step, but I'm not sure about the first. >> >> He's 100% right, I should have thought of this before you mailed >> it. Basically, if you don't put *something* in the requires, older hg >> versions will try and operate on the repository and (at best) claim >> it's corrupt. >> > > No worries, I did know about the requires thing. I was instead arguing > that the repository is *not* corrupt (to an old client either) after this > series, so we can add the requires later. > I hadn't thought of this before, but to show that it's likely to be safe, I just ran the test suite while randomly (50/50) using the two types to make sure one version can be written with treemanifest and the next be read with manifestdict, and vice versa. I checked that it did in fact choose randomly (well, not the same every time anyway).
On Thu, 2015-03-19 at 15:18 +0000, Martin von Zweigbergk wrote: > On Thu, Mar 19, 2015 at 6:34 AM Martin von Zweigbergk <martinvonz@google.com> > wrote: > > > On Thu, Mar 19, 2015 at 6:27 AM Augie Fackler <raf@durin42.com> wrote: > > > >> On Wed, Mar 18, 2015 at 10:47:16PM +0000, Martin von Zweigbergk wrote: > >> > On Wed, Mar 18, 2015 at 3:28 PM Matt Mackall <mpm@selenic.com> wrote: > >> > > >> > > On Mon, 2015-03-16 at 16:27 -0700, Martin von Zweigbergk wrote: > >> > > > # HG changeset patch > >> > > > # User Martin von Zweigbergk <martinvonz@google.com> > >> > > > # Date 1426435855 25200 > >> > > > # Sun Mar 15 09:10:55 2015 -0700 > >> > > > # Node ID 0d969d6efeef08935c93afc416bf406def6b8e59 > >> > > > # Parent 567ae53657544744155897ada91f16f8af61ad8a > >> > > > treemanifest: create treemanifest class > >> > > > > >> > > > There are a number of problems with large and flat manifests. > >> Copying > >> > > > from http://mercurial.selenic.com/wiki/ManifestShardingPlan: > >> > > > > >> > > > * manifest too large for RAM > >> > > > > >> > > > * manifest resolution too much CPU (long delta chains) > >> > > > > >> > > > * committing is slow because entire manifest has to be hashed > >> > > > > >> > > > * impossible for narrow clone to leave out part of manifest as all > >> is > >> > > > needed to calculate new hash > >> > > > > >> > > > * diffing two revisions involves traversing entire subdirectories > >> > > > even if identical > >> > > > > >> > > > This is a first step in a series introducing a manifest revlog per > >> > > > directory. > >> > > > > >> > > > This change adds boolean configuration option > >> > > > experimental.treemanifest. When the option is enabled, manifests are > >> > > > parsed into a new tree data structure with one tree node per > >> > > > directory. At this point, it is just a different data structure in > >> > > > memory; there is still just a single manifest revlog on disk. > >> > > > >> > > I think the right way to do this is: > >> > > > >> > > - at repo create time, check the flag > >> > > - add treemanifest to requires file > >> > > - at startup, check requires file > >> > > - unconditionally use treemanifest if set > >> > > > >> > > >> > I agree with the second step, but I'm not sure about the first. > >> > >> He's 100% right, I should have thought of this before you mailed > >> it. Basically, if you don't put *something* in the requires, older hg > >> versions will try and operate on the repository and (at best) claim > >> it's corrupt. > >> > > > > No worries, I did know about the requires thing. I was instead arguing > > that the repository is *not* corrupt (to an old client either) after this > > series, so we can add the requires later. > > > > I hadn't thought of this before, but to show that it's likely to be safe, I > just ran the test suite while randomly (50/50) using the two types to make > sure one version can be written with treemanifest and the next be read with > manifestdict, and vice versa. I checked that it did in fact choose randomly > (well, not the same every time anyway). Yeah, Martin's right: there's no on-disk change here (yet) so it's premature to touch requires. I'd still like to see the experimental flag bits in a separate patch.
On Thu, Mar 19, 2015 at 10:45 AM Matt Mackall <mpm@selenic.com> wrote: > On Thu, 2015-03-19 at 15:18 +0000, Martin von Zweigbergk wrote: > > On Thu, Mar 19, 2015 at 6:34 AM Martin von Zweigbergk < > martinvonz@google.com> > > wrote: > > > > > On Thu, Mar 19, 2015 at 6:27 AM Augie Fackler <raf@durin42.com> wrote: > > > > > >> On Wed, Mar 18, 2015 at 10:47:16PM +0000, Martin von Zweigbergk wrote: > > >> > On Wed, Mar 18, 2015 at 3:28 PM Matt Mackall <mpm@selenic.com> > wrote: > > >> > > > >> > > On Mon, 2015-03-16 at 16:27 -0700, Martin von Zweigbergk wrote: > > >> > > > # HG changeset patch > > >> > > > # User Martin von Zweigbergk <martinvonz@google.com> > > >> > > > # Date 1426435855 25200 > > >> > > > # Sun Mar 15 09:10:55 2015 -0700 > > >> > > > # Node ID 0d969d6efeef08935c93afc416bf406def6b8e59 > > >> > > > # Parent 567ae53657544744155897ada91f16f8af61ad8a > > >> > > > treemanifest: create treemanifest class > > >> > > > > > >> > > > There are a number of problems with large and flat manifests. > > >> Copying > > >> > > > from http://mercurial.selenic.com/wiki/ManifestShardingPlan: > > >> > > > > > >> > > > * manifest too large for RAM > > >> > > > > > >> > > > * manifest resolution too much CPU (long delta chains) > > >> > > > > > >> > > > * committing is slow because entire manifest has to be hashed > > >> > > > > > >> > > > * impossible for narrow clone to leave out part of manifest as > all > > >> is > > >> > > > needed to calculate new hash > > >> > > > > > >> > > > * diffing two revisions involves traversing entire > subdirectories > > >> > > > even if identical > > >> > > > > > >> > > > This is a first step in a series introducing a manifest revlog > per > > >> > > > directory. > > >> > > > > > >> > > > This change adds boolean configuration option > > >> > > > experimental.treemanifest. When the option is enabled, > manifests are > > >> > > > parsed into a new tree data structure with one tree node per > > >> > > > directory. At this point, it is just a different data structure > in > > >> > > > memory; there is still just a single manifest revlog on disk. > > >> > > > > >> > > I think the right way to do this is: > > >> > > > > >> > > - at repo create time, check the flag > > >> > > - add treemanifest to requires file > > >> > > - at startup, check requires file > > >> > > - unconditionally use treemanifest if set > > >> > > > > >> > > > >> > I agree with the second step, but I'm not sure about the first. > > >> > > >> He's 100% right, I should have thought of this before you mailed > > >> it. Basically, if you don't put *something* in the requires, older hg > > >> versions will try and operate on the repository and (at best) claim > > >> it's corrupt. > > >> > > > > > > No worries, I did know about the requires thing. I was instead arguing > > > that the repository is *not* corrupt (to an old client either) after > this > > > series, so we can add the requires later. > > > > > > > I hadn't thought of this before, but to show that it's likely to be > safe, I > > just ran the test suite while randomly (50/50) using the two types to > make > > sure one version can be written with treemanifest and the next be read > with > > manifestdict, and vice versa. I checked that it did in fact choose > randomly > > (well, not the same every time anyway). > > Yeah, Martin's right: there's no on-disk change here (yet) so it's > premature to touch requires. I'd still like to see the experimental flag > bits in a separate patch. > Just to confirm, the current patch 1 will be split into: patch 1: adds the treemanifest class patch 2: adds the config option (to make it like the current patch 1) Correct? If so, that should be trivial, of course, and I'll send a V3 shortly.
On Thu, 2015-03-19 at 17:54 +0000, Martin von Zweigbergk wrote: > Just to confirm, the current patch 1 will be split into: > patch 1: adds the treemanifest class > patch 2: adds the config option (to make it like the current patch 1) > > Correct? If so, that should be trivial, of course, and I'll send a V3 > shortly. Yep.
Patch
diff -r 567ae5365754 -r 0d969d6efeef mercurial/localrepo.py --- a/mercurial/localrepo.py Thu Mar 12 23:15:06 2015 -0400 +++ b/mercurial/localrepo.py Sun Mar 15 09:10:55 2015 -0700 @@ -326,6 +326,9 @@ manifestcachesize = self.ui.configint('format', 'manifestcachesize') if manifestcachesize is not None: self.svfs.options['manifestcachesize'] = manifestcachesize + usetreemanifest = self.ui.configbool('experimental', 'treemanifest') + if usetreemanifest is not None: + self.svfs.options['usetreemanifest'] = usetreemanifest def _writerequirements(self): reqfile = self.vfs("requires", "w") diff -r 567ae5365754 -r 0d969d6efeef mercurial/manifest.py --- a/mercurial/manifest.py Thu Mar 12 23:15:06 2015 -0400 +++ b/mercurial/manifest.py Sun Mar 15 09:10:55 2015 -0700 @@ -328,22 +328,252 @@ + content for start, end, content in x) return deltatext, newaddlist +def _splittopdir(f): + if '/' in f: + dir, subpath = f.split('/', 1) + return dir + '/', subpath + else: + return '', f + +class treemanifest(object): + def __init__(self, text=''): + self._dirs = {} + # Using _lazymanifest here is a little slower than plain old dicts + self._files = {} + self._flags = {} + lm = _lazymanifest(text) + for f, n, fl in lm.iterentries(): + self[f] = n + if fl: + self.setflag(f, fl) + + def __len__(self): + size = len(self._files) + for m in self._dirs.values(): + size += m.__len__() + return size + + def iteritems(self): + for p, n in sorted(self._dirs.items() + self._files.items()): + if p in self._files: + yield p, n + else: + for sf, sn in n.iteritems(): + yield p + sf, sn + + def iterkeys(self): + for p in sorted(self._dirs.keys() + self._files.keys()): + if p in self._files: + yield p + else: + for f in self._dirs[p].iterkeys(): + yield p + f + + def keys(self): + return list(self.iterkeys()) + + def __iter__(self): + return self.iterkeys() + + def __contains__(self, f): + if f is None: + return False + dir, subpath = _splittopdir(f) + if dir: + if dir not in self._dirs: + return False + return self._dirs[dir].__contains__(subpath) + else: + return f in self._files + + def get(self, f, default=None): + dir, subpath = _splittopdir(f) + if dir: + if dir not in self._dirs: + return default + return self._dirs[dir].get(subpath, default) + else: + return self._files.get(f, default) + + def __getitem__(self, f): + dir, subpath = _splittopdir(f) + if dir: + return self._dirs[dir].__getitem__(subpath) + else: + return self._files[f] + + def flags(self, f): + dir, subpath = _splittopdir(f) + if dir: + if dir not in self._dirs: + return '' + return self._dirs[dir].flags(subpath) + else: + if f in self._dirs: + return '' + return self._flags.get(f, '') + + def find(self, f): + dir, subpath = _splittopdir(f) + if dir: + return self._dirs[dir].find(subpath) + else: + return self._files[f], self._flags.get(f, '') + + def __delitem__(self, f): + dir, subpath = _splittopdir(f) + if dir: + self._dirs[dir].__delitem__(subpath) + # If the directory is now empty, remove it + if not self._dirs[dir]._dirs and not self._dirs[dir]._files: + del self._dirs[dir] + else: + del self._files[f] + if f in self._flags: + del self._flags[f] + + def __setitem__(self, f, n): + assert n is not None + dir, subpath = _splittopdir(f) + if dir: + if dir not in self._dirs: + self._dirs[dir] = treemanifest() + self._dirs[dir].__setitem__(subpath, n) + else: + self._files[f] = n + + def setflag(self, f, flags): + """Set the flags (symlink, executable) for path f.""" + dir, subpath = _splittopdir(f) + if dir: + if dir not in self._dirs: + self._dirs[dir] = treemanifest() + self._dirs[dir].setflag(subpath, flags) + else: + self._flags[f] = flags + + def copy(self): + copy = treemanifest() + for d in self._dirs: + copy._dirs[d] = self._dirs[d].copy() + copy._files = dict.copy(self._files) + copy._flags = dict.copy(self._flags) + return copy + + def intersectfiles(self, files): + '''make a new treemanifest with the intersection of self with files + + The algorithm assumes that files is much smaller than self.''' + ret = treemanifest() + for fn in files: + if fn in self: + ret[fn] = self[fn] + flags = self.flags(fn) + if flags: + ret.setflag(fn, flags) + return ret + + def filesnotin(self, m2): + '''Set of files in this manifest that are not in the other''' + files = set(self.iterkeys()) + files.difference_update(m2.iterkeys()) + return files + + @propertycache + def _alldirs(self): + return scmutil.dirs(self) + + def dirs(self): + return self._alldirs + + def hasdir(self, dir): + return dir in self._alldirs + + def matches(self, match): + '''generate a new manifest filtered by the match argument''' + if match.always(): + return self.copy() + + files = match.files() + if (match.matchfn == match.exact or + (not match.anypats() and util.all(fn in self for fn in files))): + return self.intersectfiles(files) + + m = self.copy() + for fn in m.keys(): + if not match(fn): + del m[fn] + return m + + def diff(self, m2, clean=False): + '''Finds changes between the current manifest and m2. + + Args: + m2: the manifest to which this manifest should be compared. + clean: if true, include files unchanged between these manifests + with a None value in the returned dictionary. + + The result is returned as a dict with filename as key and + values of the form ((n1,fl1),(n2,fl2)), where n1/n2 is the + nodeid in the current/other manifest and fl1/fl2 is the flag + in the current/other manifest. Where the file does not exist, + the nodeid will be None and the flags will be the empty + string. + ''' + diff = {} + + for fn, n1 in self.iteritems(): + fl1 = self.flags(fn) + n2 = m2.get(fn, None) + fl2 = m2.flags(fn) + if n2 is None: + fl2 = '' + if n1 != n2 or fl1 != fl2: + diff[fn] = ((n1, fl1), (n2, fl2)) + elif clean: + diff[fn] = None + + for fn, n2 in m2.iteritems(): + if fn not in self: + fl2 = m2.flags(fn) + diff[fn] = ((None, ''), (n2, fl2)) + + return diff + + def text(self): + """Get the full data of this manifest as a bytestring.""" + fl = self.keys() + _checkforbidden(fl) + + hex, flags = revlog.hex, self.flags + # if this is changed to support newlines in filenames, + # be sure to check the templates/ dir again (especially *-raw.tmpl) + return ''.join("%s\0%s%s\n" % (f, hex(self[f]), flags(f)) for f in fl) + class manifest(revlog.revlog): def __init__(self, opener): # During normal operations, we expect to deal with not more than four # revs at a time (such as during commit --amend). When rebasing large # stacks of commits, the number can go up, hence the config knob below. cachesize = 4 + usetreemanifest = False opts = getattr(opener, 'options', None) if opts is not None: cachesize = opts.get('manifestcachesize', cachesize) + usetreemanifest = opts.get('usetreemanifest', usetreemanifest) self._mancache = util.lrucachedict(cachesize) revlog.revlog.__init__(self, opener, "00manifest.i") + self._usetreemanifest = usetreemanifest + + def _newmanifest(self, data=''): + if self._usetreemanifest: + return treemanifest(data) + return manifestdict(data) def readdelta(self, node): r = self.rev(node) d = mdiff.patchtext(self.revdiff(self.deltaparent(r), r)) - return manifestdict(d) + return self._newmanifest(d) def readfast(self, node): '''use the faster of readdelta or read''' @@ -355,12 +585,12 @@ def read(self, node): if node == revlog.nullid: - return manifestdict() # don't upset local cache + return self._newmanifest() # don't upset local cache if node in self._mancache: return self._mancache[node][0] text = self.revision(node) arraytext = array.array('c', text) - m = manifestdict(text) + m = self._newmanifest(text) self._mancache[node] = (m, arraytext) return m @@ -374,7 +604,7 @@ return None, None def add(self, m, transaction, link, p1, p2, added, removed): - if p1 in self._mancache: + if p1 in self._mancache and not self._usetreemanifest: # If our first parent is in the manifest cache, we can # compute a delta here using properties we know about the # manifest up-front, which may save time later for the