Patchwork D2593: state: add logic to parse the state file in old way if cbor fails

login
register
mail settings
Submitter phabricator
Date March 3, 2018, 7:19 p.m.
Message ID <differential-rev-PHID-DREV-4tzoyivghkyqo4n3wd4e-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/28801/
State New
Headers show

Comments

phabricator - March 3, 2018, 7:19 p.m.
pulkit created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  With updating the format of statefiles to use cbor, there can be cases when a
  user updates hg in between a unresolved graft, rebase, histedit, cbor load
  function in new hg won't be able to parse that properly. So let's fallback to
  reading the file old way if cbor fails or does not returns a dictionary as cbor
  may read that data without any error.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2593

AFFECTED FILES
  mercurial/state.py

CHANGE DETAILS




To: pulkit, #hg-reviewers
Cc: mercurial-devel
phabricator - March 3, 2018, 7:21 p.m.
pulkit added inline comments.

INLINE COMMENTS

> state.py:84
> +                    if not isinstance(ret, dict):
> +                        raise Exception("cbor parsed it wrong")
> +                    return ret

I am not confident about this part. Like to have some suggestions here.

> state.py:86
> +                    return ret
> +        except:     # cbor raises an Exception
> +            with self._repo.vfs(self.fname, 'rb') as fp:

Should I change thirdparty/cbor/ to raise specific errors?

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2593

To: pulkit, #hg-reviewers
Cc: mercurial-devel
phabricator - March 4, 2018, 4:12 p.m.
durin42 added inline comments.

INLINE COMMENTS

> pulkit wrote in state.py:84
> I am not confident about this part. Like to have some suggestions here.

Probably treat not-a-dict as corrupt and fall back to the other format?

> pulkit wrote in state.py:86
> Should I change thirdparty/cbor/ to raise specific errors?

Ugh. Yeah, maybe see if they'd take a patch upstream to raise a more explicit exception type.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2593

To: pulkit, #hg-reviewers
Cc: durin42, mercurial-devel
phabricator - March 4, 2018, 9:31 p.m.
pulkit added inline comments.

INLINE COMMENTS

> durin42 wrote in state.py:86
> Ugh. Yeah, maybe see if they'd take a patch upstream to raise a more explicit exception type.

@durin42 I feel sorry about adding a TODO but I did that because I won't be able to work for a week or so dur to laptop issue. If you are not okay with that, I am fine with these lying here.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2593

To: pulkit, #hg-reviewers
Cc: durin42, mercurial-devel
phabricator - March 9, 2018, 4:22 p.m.
indygreg added a comment.


  Not sure where to record this comment in this series. So I'll pick this commit.
  
  I think we want an explicit version header in the state files so clients know when they may be reading a file in an old format. For example, if I start a merge in one terminal window, I sometimes move to another terminal window to resolve parts of it. The multiple windows may be running different Mercurial versions. For example, sometimes one of the shells has a virtualenv activated and that virtualenv is running an older Mercurial. We don't want the older Mercurial trampling on state needed by the new Mercurial.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2593

To: pulkit, #hg-reviewers
Cc: indygreg, durin42, mercurial-devel
phabricator - March 25, 2018, 6:31 a.m.
martinvonz added a comment.




INLINE COMMENTS

> durin42 wrote in state.py:84
> Probably treat not-a-dict as corrupt and fall back to the other format?

How about not requiring it to be a dict? I imagine practically all callers will want to pass a dict, but why does this class have to enforce it? I think the  API would be simpler if it was an opaque object. In the simple case of graft, the state is simply a list of nodes. However, for more complex states, we could have nested structures. I don't see why cmdstate should be involved in lookups into the top-level structure. The difference is subtle, but here's an example:

  # With dict-aware cmdstate
  cmdstate = ...
  cmdstate.load()
  version = cmdstate['version']
  for car in cmdstate['cars']:
     for wheel in car['wheels']:
         # whatever
  
  # With agnostic cmdstate
  cmdstate = ...
  parking = cmdstate.load()
  version = parking['version']
  for car in parking['cars']:
     for wheel in car['wheels']:
         # whatever

> state.py:82
> +            with self._repo.vfs(self.fname, 'rb') as fp:
> +                    ret = cbor.load(fp)
> +                    if not isinstance(ret, dict):

bad ident

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2593

To: pulkit, #hg-reviewers
Cc: martinvonz, indygreg, durin42, mercurial-devel
phabricator - March 25, 2018, 6:49 a.m.
martinvonz added a comment.


  In https://phab.mercurial-scm.org/D2593#44291, @indygreg wrote:
  
  > Not sure where to record this comment in this series. So I'll pick this commit.
  >
  > I think we want an explicit version header in the state files so clients know when they may be reading a file in an old format. For example, if I start a merge in one terminal window, I sometimes move to another terminal window to resolve parts of it. The multiple windows may be running different Mercurial versions. For example, sometimes one of the shells has a virtualenv activated and that virtualenv is running an older Mercurial. We don't want the older Mercurial trampling on state needed by the new Mercurial.
  
  
  Also, it doesn't seem like CBOR defines any magic bytes to start the top-level object with, so it's not obvious to me that an old state file (e.g. on containing just a nodeid) could not be parsed as a valid CBOR file. Perhaps cmdstate should help with that? We could make it always add a first item that's just "CBOR" or something (it seem very unlikely that we'd have that in an old state file), and we could fail when reading a state file that doesn't have that. Or would could require any new state files to have a new name than the old ones?

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2593

To: pulkit, #hg-reviewers
Cc: martinvonz, indygreg, durin42, mercurial-devel
phabricator - March 26, 2018, 3:08 p.m.
pulkit added inline comments.

INLINE COMMENTS

> martinvonz wrote in state.py:84
> How about not requiring it to be a dict? I imagine practically all callers will want to pass a dict, but why does this class have to enforce it? I think the  API would be simpler if it was an opaque object. In the simple case of graft, the state is simply a list of nodes. However, for more complex states, we could have nested structures. I don't see why cmdstate should be involved in lookups into the top-level structure. The difference is subtle, but here's an example:
> 
>   # With dict-aware cmdstate
>   cmdstate = ...
>   cmdstate.load()
>   version = cmdstate['version']
>   for car in cmdstate['cars']:
>      for wheel in car['wheels']:
>          # whatever
>   
>   # With agnostic cmdstate
>   cmdstate = ...
>   parking = cmdstate.load()
>   version = parking['version']
>   for car in parking['cars']:
>      for wheel in car['wheels']:
>          # whatever

I don't have a strong reason. I just found this an easy way. I was going to add __setitem__() too so that we can store more new values or change existing values.

For graft case, it won't be just a list of nodes, we are going to store info about `--no-commit` flag in graftstate. Moreover we need `graft --abort` too which needs to store more things.

The one benefit is that we can pass the state object everywhere, lookup for values and update values on fly.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2593

To: pulkit, #hg-reviewers
Cc: martinvonz, indygreg, durin42, mercurial-devel
phabricator - March 26, 2018, 3:16 p.m.
pulkit added a comment.


  In https://phab.mercurial-scm.org/D2593#47511, @martinvonz wrote:
  
  > In https://phab.mercurial-scm.org/D2593#44291, @indygreg wrote:
  >
  > > Not sure where to record this comment in this series. So I'll pick this commit.
  > >
  > > I think we want an explicit version header in the state files so clients know when they may be reading a file in an old format. For example, if I start a merge in one terminal window, I sometimes move to another terminal window to resolve parts of it. The multiple windows may be running different Mercurial versions. For example, sometimes one of the shells has a virtualenv activated and that virtualenv is running an older Mercurial. We don't want the older Mercurial trampling on state needed by the new Mercurial.
  >
  >
  > Also, it doesn't seem like CBOR defines any magic bytes to start the top-level object with, so it's not obvious to me that an old state file (e.g. on containing just a nodeid) could not be parsed as a valid CBOR file. Perhaps cmdstate should help with that? We could make it always add a first item that's just "CBOR" or something (it seem very unlikely that we'd have that in an old state file), and we could fail when reading a state file that doesn't have that. Or would could require any new state files to have a new name than the old ones?
  
  
  I can add another key value pair 'magicstring: CBOR' which should be present in the cbor format state files, or we can start storing statefiles in .hg/state/ folder.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2593

To: pulkit, #hg-reviewers
Cc: martinvonz, indygreg, durin42, mercurial-devel
phabricator - March 26, 2018, 5:29 p.m.
martinvonz added inline comments.

INLINE COMMENTS

> pulkit wrote in state.py:84
> I don't have a strong reason. I just found this an easy way. I was going to add __setitem__() too so that we can store more new values or change existing values.
> 
> For graft case, it won't be just a list of nodes, we are going to store info about `--no-commit` flag in graftstate. Moreover we need `graft --abort` too which needs to store more things.
> 
> The one benefit is that we can pass the state object everywhere, lookup for values and update values on fly.

> I don't have a strong reason. I just found this an easy way.

It seems the other way is still slightly easier :) You'd replace this:

  def __getitem__(self, key):
      return self.opts[key]
  
  def __setitem__(self, key, value):
      updates = {key: value}
      self.opts.update(updates)

by this:

  def get(self):
      return self.state
  
  def set(self, state):
      self.state = state

I'm not even sure we'd need those two methods. It might be enough to make load() return the state and make save() accept (and require) a new state.

The constructor would change from:

  if not opts:
      self.opts = {}
  else:
      self.opts = opts

to:

  self.state = state

The graft code for reading would change from:

  cmdstate.load()
  if cmdstate['version'] < 2:
      nodes = cmdstate['nodes']
      revs = [repo[node].rev() for node in nodes]
  else:
      # state-file is written by a newer mercurial than what we are
      # using
      raise error.Abort(_("unable to read to read the state file"))

to:

  state = cmdstate.load()
  if state['version'] < 2:
      nodes = state['nodes']
      revs = [repo[node].rev() for node in nodes]
  else:
      # state-file is written by a newer mercurial than what we are
      # using
      raise error.Abort(_("unable to read to read the state file"))

The graft code for writing would change from:

  cmdstate.addopts({'version': 1, 'nodes': nodelines})
  cmdstate.save()

to:

  cmdstate.set({'version': 1, 'nodes': nodelines})
  cmdstate.save()



> The one benefit is that we can pass the state object everywhere, lookup for values and update values on fly.

We can still pass the "cmdstate" instance around if we wanted to do that. I think it's a small plus that we can pass *just* the state around, though. That means we can pass the state into a function and we can look at the call site and know that it won't be calling .save() on it.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2593

To: pulkit, #hg-reviewers
Cc: martinvonz, indygreg, durin42, mercurial-devel
phabricator - March 27, 2018, 8:11 a.m.
pulkit added inline comments.

INLINE COMMENTS

> martinvonz wrote in state.py:84
> > I don't have a strong reason. I just found this an easy way.
> 
> It seems the other way is still slightly easier :) You'd replace this:
> 
>   def __getitem__(self, key):
>       return self.opts[key]
>   
>   def __setitem__(self, key, value):
>       updates = {key: value}
>       self.opts.update(updates)
> 
> by this:
> 
>   def get(self):
>       return self.state
>   
>   def set(self, state):
>       self.state = state
> 
> I'm not even sure we'd need those two methods. It might be enough to make load() return the state and make save() accept (and require) a new state.
> 
> The constructor would change from:
> 
>   if not opts:
>       self.opts = {}
>   else:
>       self.opts = opts
> 
> to:
> 
>   self.state = state
> 
> The graft code for reading would change from:
> 
>   cmdstate.load()
>   if cmdstate['version'] < 2:
>       nodes = cmdstate['nodes']
>       revs = [repo[node].rev() for node in nodes]
>   else:
>       # state-file is written by a newer mercurial than what we are
>       # using
>       raise error.Abort(_("unable to read to read the state file"))
> 
> to:
> 
>   state = cmdstate.load()
>   if state['version'] < 2:
>       nodes = state['nodes']
>       revs = [repo[node].rev() for node in nodes]
>   else:
>       # state-file is written by a newer mercurial than what we are
>       # using
>       raise error.Abort(_("unable to read to read the state file"))
> 
> The graft code for writing would change from:
> 
>   cmdstate.addopts({'version': 1, 'nodes': nodelines})
>   cmdstate.save()
> 
> to:
> 
>   cmdstate.set({'version': 1, 'nodes': nodelines})
>   cmdstate.save()
> 
> 
> 
> > The one benefit is that we can pass the state object everywhere, lookup for values and update values on fly.
> 
> We can still pass the "cmdstate" instance around if we wanted to do that. I think it's a small plus that we can pass *just* the state around, though. That means we can pass the state into a function and we can look at the call site and know that it won't be calling .save() on it.

I don't feel convinced on what can be the benefit of it and my opinion can be influenced by how I used it in evolve extension. But I can change it the other way around too, let me know if you want me to change.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2593

To: pulkit, #hg-reviewers
Cc: martinvonz, indygreg, durin42, mercurial-devel
phabricator - March 27, 2018, 2:58 p.m.
martinvonz added inline comments.

INLINE COMMENTS

> pulkit wrote in state.py:84
> I don't feel convinced on what can be the benefit of it and my opinion can be influenced by how I used it in evolve extension. But I can change it the other way around too, let me know if you want me to change.

If it's up to me, I'd definitely say that you should change it. I don't see any benefits of the current proposal. I'd be happy to hear what other reviewers think.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2593

To: pulkit, #hg-reviewers
Cc: martinvonz, indygreg, durin42, mercurial-devel
phabricator - March 27, 2018, 3:03 p.m.
martinvonz added inline comments.

INLINE COMMENTS

> martinvonz wrote in state.py:84
> If it's up to me, I'd definitely say that you should change it. I don't see any benefits of the current proposal. I'd be happy to hear what other reviewers think.

Oh, and I should clarify that I think the root of the problem is that you're mixing the class for reading with state itself (but only the top-level items of it). Let's say someone realizes they want to iterate over the items in the state, would we also add a .items() then? (I don't think that's a likely scenario, but it shows what I mean.)

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2593

To: pulkit, #hg-reviewers
Cc: martinvonz, indygreg, durin42, mercurial-devel
phabricator - March 27, 2018, 3:43 p.m.
yuja added inline comments.

INLINE COMMENTS

> martinvonz wrote in state.py:84
> Oh, and I should clarify that I think the root of the problem is that you're mixing the class for reading with state itself (but only the top-level items of it). Let's say someone realizes they want to iterate over the items in the state, would we also add a .items() then? (I don't think that's a likely scenario, but it shows what I mean.)

Sorry if it's already discussed, but what's the benefit of `cmdstate`
not being a plain dict?

I think plain load/save functions are much simpler than a dict-like object
backed by file, i.e.

  state = statemod.load(repo, fname)
  state = {'version': 1, 'nodes': nodelines}
  statemod.save(repo, fname, state)

(statemod.load/save could be a class for serialization/deserialization)

The state dict would be wrapped anyway depending on application logic.
If we can find out a common behavior between "state"s, we can extract it to a state class,
but that will still be separated from the storage format.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2593

To: pulkit, #hg-reviewers
Cc: yuja, martinvonz, indygreg, durin42, mercurial-devel

Patch

diff --git a/mercurial/state.py b/mercurial/state.py
--- a/mercurial/state.py
+++ b/mercurial/state.py
@@ -77,8 +77,16 @@ 
     def _read(self):
         """reads the evolvestate file and returns a dictionary which contain
         data in the same format as it was before storing"""
-        with self._repo.vfs(self.fname, 'rb') as fp:
-            return cbor.load(fp)
+        try:
+            with self._repo.vfs(self.fname, 'rb') as fp:
+                    ret = cbor.load(fp)
+                    if not isinstance(ret, dict):
+                        raise Exception("cbor parsed it wrong")
+                    return ret
+        except:     # cbor raises an Exception
+            with self._repo.vfs(self.fname, 'rb') as fp:
+                oldfn = oldstatefilefns[self.fname]
+                return oldfn(fp)
 
     def delete(self):
         """drop the evolvestate file if exists"""