Patchwork D6808: revlog: introduce a `sidedata` method

login
register
mail settings
Submitter phabricator
Date Sept. 7, 2019, 9:28 a.m.
Message ID <differential-rev-PHID-DREV-krfhc7mgt54ivxpmqec5-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/41513/
State New
Headers show

Comments

phabricator - Sept. 7, 2019, 9:28 a.m.
marmoute created this revision.
marmoute added reviewers: yuja, durin42.
Herald added a reviewer: indygreg.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  The method give access to extra information related to the revision. Such data
  will not be part of the hash be strongly related to the revision. Having them
  stored at the revlog level helps the storage consistency story and simplify
  various things.
  
  Example of data we could store there:
  
  - copy tracing related informations
  - graph structure related information (useful for discovery)
  - unresolved conflict data
  
  The full implementation will be introduced gradually in the coming changesets.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6808

AFFECTED FILES
  mercurial/revlog.py

CHANGE DETAILS




To: marmoute, yuja, durin42, indygreg, #hg-reviewers
Cc: mercurial-devel
phabricator - Sept. 7, 2019, 5:20 p.m.
indygreg added a comment.


  I think I see where you are going with this and it seems potentially very useful.
  
  I do wish the flag processors code could have been refactored without involving `sidedata`, as I would have taken the patches later in this series to remove the temporary `mixin` in a heartbeat. But since `sidedata` is involved, I want to have others weigh in.
  
  It would be extremely useful to see an actual use case for `sidedata`. As it stands, I have questions like whether we'll need to further evolve the storage APIs to accommodate things like cherry-picking which pieces of `sidedata` are desired. Also, incorporating `sidedata` in the revlog write APIs (in later patches) seems a bit odd, as the revlog format is rather stable and I'm not sure how additional data will be incorporated without introducing a new flag processor to accommodate extra data next to the revision. Again, I'd really like to see a real world use case for `sidedata` to help give me confidence this is the correct storage abstraction.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D6808/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D6808

To: marmoute, yuja, durin42, indygreg, #hg-reviewers
Cc: mercurial-devel
phabricator - Sept. 8, 2019, 8:46 a.m.
marmoute added a comment.


  In D6808#100077 <https://phab.mercurial-scm.org/D6808#100077>, @indygreg wrote:
  
  > I think I see where you are going with this and it seems potentially very useful.
  > I do wish the flag processors code could have been refactored without involving `sidedata`, as I would have taken the patches later in this series to remove the temporary `mixin` in a heartbeat. But since `sidedata` is involved, I want to have others weigh in.
  
  The point of the refactoring is to introduce `sidedata` ;-). I could rework the series to get rid of the mixin sooner, but after discussing with Augie on IRC it seemed fine to just do a final cleanup (the mixin discussion arrived after I wrote the other patches).
  
  > It would be extremely useful to see an actual use case for `sidedata`.
  
  My first target is to use it for copy tracing. After the current series I have:
  
  1. a series to add actual storage for sidedata (using a flag processors)
  2. a series to refactor access to copy tracing information
  3. a series to use sidedata to store copy tracing information
  
  (1) is almost in a visible state (2) and (3) need a bit more work. Which one are you the most earger to see ?
  
  > As it stands, I have questions like whether we'll need to further evolve the storage APIs to accommodate things like cherry-picking which pieces of `sidedata` are desired.
  
  I feel like the API in this patch offer enough room for that already. By all mean, this is the early stage of implementation, it will be easy to adjust the API once we are further down the road.
  
  > Also, incorporating `sidedata` in the revlog write APIs (in later patches) seems a bit odd, as the revlog format is rather stable and I'm not sure how additional data will be incorporated without introducing a new flag processor to accommodate extra data next to the revision.
  
  It will be done by introduce a flag processors ;-)
  Having the sidedata specified at `addrevision` time seems okay to me. It highlight the fact these data needs to be immutable property of the revision.
  
  > Again, I'd really like to see a real world use case for `sidedata` to help give me confidence this is the correct storage abstraction.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D6808/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D6808

To: marmoute, yuja, durin42, indygreg, #hg-reviewers
Cc: mercurial-devel
phabricator - Sept. 9, 2019, 6:01 p.m.
durin42 added a comment.


  FWIW I'm +0.5 on this because it seems like a reasonable abstraction and could serve a variety of purposes.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D6808/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D6808

To: marmoute, yuja, durin42, indygreg, #hg-reviewers
Cc: mercurial-devel
phabricator - Sept. 9, 2019, 11:46 p.m.
marmoute added a comment.


  The next series (adding actual storage and reading is available here: https://bitbucket.org/octobus/mercurial-devel/commits/?search=777a564525e8%3A%3Ae9075930b1b7)

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D6808/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D6808

To: marmoute, yuja, durin42, indygreg, #hg-reviewers
Cc: mercurial-devel

Patch

diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -1612,6 +1612,16 @@ 
         """
         return self._revisiondata(nodeorrev, _df, raw=raw)
 
+    def sidedata(self, nodeorrev, _df=None):
+        """a map of extra data related to the changeset but not part of the hash
+
+        This function currently return a dictionary. However, more advanced
+        mapping object will likely be used in the future for a more
+        efficient/lazy code.
+        """
+        # XXX will actualy return data once storage is implemented.
+        return {}
+
     def _revisiondata(self, nodeorrev, _df=None, raw=False):
         # deal with <nodeorrev> argument type
         if isinstance(nodeorrev, int):