Submitter | phabricator |
---|---|
Date | May 22, 2019, 12:32 a.m. |
Message ID | <differential-rev-PHID-DREV-nx3rwq36dvtavtlkyust-req@phab.mercurial-scm.org> |
Download | mbox | patch |
Permalink | /patch/40174/ |
State | Superseded |
Headers | show |
Comments
martinvonz added a subscriber: marmoute. martinvonz added a comment. In https://phab.mercurial-scm.org/D6417#93707, @yuja wrote: > I can't really comment on the storage format. I'm not keen on using extras > for this kind of stuff (including copies), but that seems be okay for > experiment. Do we have a better place for it? What's your concern with using extras? Is it that we're storing information that could instead be calculated? I agree, but the same is true about the list of files, of course (and linkrevs, although they're not stored in the changeset). Or that a user could set the values? I agree about that too, but I don't know what to do about that. We could create a cache for this information, but we can't really create a cache for the copy information for Google's use case (serving copy information together with changesets). At least it wouldn't be a cache in the usual sense. It could be a separate storage still, of course. @marmoute has been working on that a bit. We'd need that storage to be exchanged before we could use it. We would also need it to be considered the source of truth for copy information (which probably means that it should live in `.hg/store/` rather than `.hg/cache/`). I don't know exactly how that aligns with @marmoute's plans. I also haven't thought about how a migration would work for us if we eventually decided to switch over from storage in extras to a separate storage, but that will probably not be a huge problem. > @indygreg Any comments? > >> +def decodefileindices(files, data): >> + try: >> + subset = [] >> + for str in data.split('\0'): >> + i = int(str) > > Better to not shadow `str()` function. Good point. Done. > > >> + if i < 0 or i > len(files): > > Off by one? Oops, that's embarrassing. Done. > > >> + return None >> + subset.append(files[i]) >> + return subset >> + except (ValueError, IndexError): >> + # Perhaps someone had chosen the same key name (e.g. "added") and >> + # used different syntax for the value. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D6417 To: martinvonz, #hg-reviewers Cc: marmoute, yuja, indygreg, mercurial-devel
> > I can't really comment on the storage format. I'm not keen on using extras > > for this kind of stuff (including copies), but that seems be okay for > > experiment. > > Do we have a better place for it? I don't think so. > What's your concern with using extras? > Is it that we're storing information that could instead be calculated? No, I don't care much about that. > Or that a user could set the values? Somewhat yes. I just have a feeling that these copies/added/removed data are first class, the repo can be somewhat corrupted if these data are wrong, which I don't think are data meant to be stored in the extras. Ideally, we can add some repo requirement and bump the revlog format to store these data properly, but that's a big change. So I said storing in extras seems okay for the time being.
yuja added a comment. > > I can't really comment on the storage format. I'm not keen on using extras > > for this kind of stuff (including copies), but that seems be okay for > > experiment. > > Do we have a better place for it? I don't think so. > What's your concern with using extras? > Is it that we're storing information that could instead be calculated? No, I don't care much about that. > Or that a user could set the values? Somewhat yes. I just have a feeling that these copies/added/removed data are first class, the repo can be somewhat corrupted if these data are wrong, which I don't think are data meant to be stored in the extras. Ideally, we can add some repo requirement and bump the revlog format to store these data properly, but that's a big change. So I said storing in extras seems okay for the time being. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D6417 To: martinvonz, #hg-reviewers Cc: marmoute, yuja, indygreg, mercurial-devel
indygreg added a comment. I'm also not super crazy about abusing extras for this. But it is the best compromise considering the "better" solutions require a lot more effort and thought. At some point, I would like Mercurial's storage and wire protocol to grow official APIs for storing and exchanging arbitrary data outside the current storage primitives. Maybe we can shoehorn storage into revlogs in `.hg/store/meta`. I dunno. Feels like sprint material to me. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D6417 To: martinvonz, #hg-reviewers Cc: marmoute, yuja, indygreg, mercurial-devel
Patch
diff --git a/mercurial/context.py b/mercurial/context.py --- a/mercurial/context.py +++ b/mercurial/context.py @@ -469,12 +469,24 @@ modified.difference_update(self.filesremoved()) return sorted(modified) def filesadded(self): + source = self._repo.ui.config('experimental', 'copies.read-from') + if (source == 'changeset-only' or + (source == 'compatibility' and + self._changeset.filesadded is not None)): + return self._changeset.filesadded or [] + added = [] for f in self.files(): if not any(f in p for p in self.parents()): added.append(f) return added def filesremoved(self): + source = self._repo.ui.config('experimental', 'copies.read-from') + if (source == 'changeset-only' or + (source == 'compatibility' and + self._changeset.filesremoved is not None)): + return self._changeset.filesremoved or [] + removed = [] for f in self.files(): if f not in self: diff --git a/mercurial/changelog.py b/mercurial/changelog.py --- a/mercurial/changelog.py +++ b/mercurial/changelog.py @@ -107,6 +107,20 @@ indices.append(b'%d' % i) return '\0'.join(indices) +def decodefileindices(files, data): + try: + subset = [] + for str in data.split('\0'): + i = int(str) + if i < 0 or i > len(files): + return None + subset.append(files[i]) + return subset + except ValueError: + # Perhaps someone had chosen the same key name (e.g. "added") and + # used different syntax for the value. + return None + def stripdesc(desc): """strip trailing whitespace and leading and trailing empty lines""" return '\n'.join([l.rstrip() for l in desc.splitlines()]).strip('\n') @@ -202,6 +216,8 @@ user = attr.ib(default='') date = attr.ib(default=(0, 0)) files = attr.ib(default=attr.Factory(list)) + filesadded = attr.ib(default=None) + filesremoved = attr.ib(default=None) p1copies = attr.ib(default=None) p2copies = attr.ib(default=None) description = attr.ib(default='') @@ -308,6 +324,16 @@ return self._text[off[2] + 1:off[3]].split('\n') @property + def filesadded(self): + rawindices = self.extra.get('filesadded') + return rawindices and decodefileindices(self.files, rawindices) + + @property + def filesremoved(self): + rawindices = self.extra.get('filesremoved') + return rawindices and decodefileindices(self.files, rawindices) + + @property def p1copies(self): rawcopies = self.extra.get('p1copies') return rawcopies and decodecopies(rawcopies)