Patchwork D6417: context: get filesadded() and filesremoved() from changeset if configured

login
register
mail settings
Submitter phabricator
Date May 22, 2019, 12:32 a.m.
Message ID <differential-rev-PHID-DREV-nx3rwq36dvtavtlkyust-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/40174/
State Superseded
Headers show

Comments

phabricator - May 22, 2019, 12:32 a.m.
martinvonz created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  This adds the read side for getting the sets of added and removed
  files from the changeset extras. I timed this command on the hg repo:
  
    hg log -T '{rev}\n {files}\n %:{file_mods}\n +{file_adds}\n -{file_dels}\n'
  
  It took 1m21s before and 6.4s after. I also used that command to check
  that the result didn't change compared to calculating the values from
  the manifests on the fly (it didn't change).
  
  In the mozilla-unified repo, the same command run on
  FIREFOX_BETA_58_END::FIREFOX_BETA_59_END went from 29s to 0.67s.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6417

AFFECTED FILES
  mercurial/changelog.py
  mercurial/context.py

CHANGE DETAILS




To: martinvonz, #hg-reviewers
Cc: mercurial-devel
phabricator - May 28, 2019, 5:03 p.m.
martinvonz added a subscriber: marmoute.
martinvonz added a comment.


  In https://phab.mercurial-scm.org/D6417#93707, @yuja wrote:
  
  > I can't really comment on the storage format. I'm not keen on using extras
  >  for this kind of stuff (including copies), but that seems be okay for
  >  experiment.
  
  
  Do we have a better place for it? What's your concern with using extras? Is it that we're storing information that could instead be calculated? I agree, but the same is true about the list of files, of course (and linkrevs, although they're not stored in the changeset). Or that a user could set the values? I agree about that too, but I don't know what to do about that. We could create a cache for this information, but we can't really create a cache for the copy information for Google's use case (serving copy information together with changesets). At least it wouldn't be a cache in the usual sense. It could be a separate storage still, of course. @marmoute has been working on that a bit. We'd need that storage to be exchanged before we could use it. We would also need it to be considered the source of truth for copy information (which probably means that it should live in `.hg/store/` rather than `.hg/cache/`). I don't know exactly how that aligns with @marmoute's plans. I also haven't thought about how a migration would work for us if we eventually decided to switch over from storage in extras to a separate storage, but that will probably not be a huge problem.
  
  > @indygreg Any comments?
  > 
  >> +def decodefileindices(files, data):
  >>  +    try:
  >>  +        subset = []
  >>  +        for str in data.split('\0'):
  >>  +            i = int(str)
  > 
  > Better to not shadow `str()` function.
  
  Good point. Done.
  
  > 
  > 
  >> +            if i < 0 or i > len(files):
  > 
  > Off by one?
  
  Oops, that's embarrassing. Done.
  
  > 
  > 
  >> +                return None
  >>  +            subset.append(files[i])
  >>  +        return subset
  >>  +    except (ValueError, IndexError):
  >>  +        # Perhaps someone had chosen the same key name (e.g. "added") and
  >>  +        # used different syntax for the value.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6417

To: martinvonz, #hg-reviewers
Cc: marmoute, yuja, indygreg, mercurial-devel
Yuya Nishihara - May 28, 2019, 11:43 p.m.
>   > I can't really comment on the storage format. I'm not keen on using extras
>   >  for this kind of stuff (including copies), but that seems be okay for
>   >  experiment.
>   
>   Do we have a better place for it?

I don't think so.

> What's your concern with using extras?
> Is it that we're storing information that could instead be calculated?

No, I don't care much about that.

> Or that a user could set the values?

Somewhat yes.

I just have a feeling that these copies/added/removed data are first class,
the repo can be somewhat corrupted if these data are wrong, which I don't think
are data meant to be stored in the extras.

Ideally, we can add some repo requirement and bump the revlog format to
store these data properly, but that's a big change. So I said storing in
extras seems okay for the time being.
phabricator - May 28, 2019, 11:44 p.m.
yuja added a comment.


  >   > I can't really comment on the storage format. I'm not keen on using extras
  >   >  for this kind of stuff (including copies), but that seems be okay for
  >   >  experiment.
  >   
  >   Do we have a better place for it?
  
  I don't think so.
  
  > What's your concern with using extras?
  >  Is it that we're storing information that could instead be calculated?
  
  No, I don't care much about that.
  
  > Or that a user could set the values?
  
  Somewhat yes.
  
  I just have a feeling that these copies/added/removed data are first class,
  the repo can be somewhat corrupted if these data are wrong, which I don't think
  are data meant to be stored in the extras.
  
  Ideally, we can add some repo requirement and bump the revlog format to
  store these data properly, but that's a big change. So I said storing in
  extras seems okay for the time being.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6417

To: martinvonz, #hg-reviewers
Cc: marmoute, yuja, indygreg, mercurial-devel
phabricator - May 29, 2019, 1:59 a.m.
indygreg added a comment.


  I'm also not super crazy about abusing extras for this. But it is the best compromise considering the "better" solutions require a lot more effort and thought. At some point, I would like Mercurial's storage and wire protocol to grow official APIs for storing and exchanging arbitrary data outside the current storage primitives. Maybe we can shoehorn storage into revlogs in `.hg/store/meta`. I dunno. Feels like sprint material to me.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6417

To: martinvonz, #hg-reviewers
Cc: marmoute, yuja, indygreg, mercurial-devel

Patch

diff --git a/mercurial/context.py b/mercurial/context.py
--- a/mercurial/context.py
+++ b/mercurial/context.py
@@ -469,12 +469,24 @@ 
         modified.difference_update(self.filesremoved())
         return sorted(modified)
     def filesadded(self):
+        source = self._repo.ui.config('experimental', 'copies.read-from')
+        if (source == 'changeset-only' or
+            (source == 'compatibility' and
+             self._changeset.filesadded is not None)):
+            return self._changeset.filesadded or []
+
         added = []
         for f in self.files():
             if not any(f in p for p in self.parents()):
                 added.append(f)
         return added
     def filesremoved(self):
+        source = self._repo.ui.config('experimental', 'copies.read-from')
+        if (source == 'changeset-only' or
+            (source == 'compatibility' and
+             self._changeset.filesremoved is not None)):
+            return self._changeset.filesremoved or []
+
         removed = []
         for f in self.files():
             if f not in self:
diff --git a/mercurial/changelog.py b/mercurial/changelog.py
--- a/mercurial/changelog.py
+++ b/mercurial/changelog.py
@@ -107,6 +107,20 @@ 
             indices.append(b'%d' % i)
     return '\0'.join(indices)
 
+def decodefileindices(files, data):
+    try:
+        subset = []
+        for str in data.split('\0'):
+            i = int(str)
+            if i < 0 or i > len(files):
+                return None
+            subset.append(files[i])
+        return subset
+    except ValueError:
+        # Perhaps someone had chosen the same key name (e.g. "added") and
+        # used different syntax for the value.
+        return None
+
 def stripdesc(desc):
     """strip trailing whitespace and leading and trailing empty lines"""
     return '\n'.join([l.rstrip() for l in desc.splitlines()]).strip('\n')
@@ -202,6 +216,8 @@ 
     user = attr.ib(default='')
     date = attr.ib(default=(0, 0))
     files = attr.ib(default=attr.Factory(list))
+    filesadded = attr.ib(default=None)
+    filesremoved = attr.ib(default=None)
     p1copies = attr.ib(default=None)
     p2copies = attr.ib(default=None)
     description = attr.ib(default='')
@@ -308,6 +324,16 @@ 
         return self._text[off[2] + 1:off[3]].split('\n')
 
     @property
+    def filesadded(self):
+        rawindices = self.extra.get('filesadded')
+        return rawindices and decodefileindices(self.files, rawindices)
+
+    @property
+    def filesremoved(self):
+        rawindices = self.extra.get('filesremoved')
+        return rawindices and decodefileindices(self.files, rawindices)
+
+    @property
     def p1copies(self):
         rawcopies = self.extra.get('p1copies')
         return rawcopies and decodecopies(rawcopies)