Patchwork D2945: state: add a magicheader to each state file

login
register
mail settings
Submitter phabricator
Date March 26, 2018, 5:09 p.m.
Message ID <differential-rev-PHID-DREV-rpwwhkz4oke5q7u4kdvs-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/29872/
State New
Headers show

Comments

phabricator - March 26, 2018, 5:09 p.m.
pulkit created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  This patch adds a magic header to each state file written using CBOR format so
  that we can distinguish between old state files and new one even if old state
  files get successfully parsed by cbor.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2945

AFFECTED FILES
  mercurial/state.py

CHANGE DETAILS




To: pulkit, #hg-reviewers
Cc: mercurial-devel
phabricator - March 27, 2018, 12:58 p.m.
yuja added a comment.


  I think the magic has to vary depending on the current state file format.
  The state file must be backward/forward compatible with the older/newer formats.
  
  If the current magic is `2\n` (and is parsed as `int(f.readline())`) for example, the
  CBOR preamble would have to be `3\n`. Otherwise, old hg client cant determine
  whether the state file is corrupted or written by newer client.
  
  FWIW, last time I reviewed `simplekeyvaluefile`, I insisted that a parser should
  take a file stream instead of a filename, so the caller can handle compatibility thingy.
  
    with open("statefile", "rb") as fp:
        try:
            version = int(fp.readline())
        except ValueError:
            raise some nicer exception
        if version == 2:
            return readold(fp)
        elif version == 3:
            return readcbor(fp)

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2945

To: pulkit, #hg-reviewers
Cc: yuja, mercurial-devel
phabricator - March 28, 2018, 11:05 a.m.
pulkit added a comment.


  In https://phab.mercurial-scm.org/D2945#47831, @yuja wrote:
  
  > I think the magic has to vary depending on the current state file format.
  >  The state file must be backward/forward compatible with the older/newer formats.
  >
  > If the current magic is `2\n` (and is parsed as `int(f.readline())`) for example, the
  >  CBOR preamble would have to be `3\n`. Otherwise, old hg client cant determine
  >  whether the state file is corrupted or written by newer client.
  
  
  There are some old state files, which don't have a version header on top of them like graftstate. I need suggestion on how to handle them.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2945

To: pulkit, #hg-reviewers
Cc: yuja, mercurial-devel
phabricator - March 28, 2018, 1:39 p.m.
yuja added a comment.


  > There are some old state files, which don't have a version header on top
  >  of them like graftstate. I need suggestion on how to handle them.
  
  In that case, we would have to either do some heuristic (like histedit), or maintain
  old/new state files (like mergestate.) Perhaps graft can do the latter?

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2945

To: pulkit, #hg-reviewers
Cc: yuja, mercurial-devel
phabricator - June 14, 2018, 8:10 p.m.
pulkit added a subscriber: jeffpc.
pulkit added a comment.


  There is a suggestion from @jeffpc to have a plain text header in statefiles like `HGStateFile` so that user can grep for them or look into files and know that they are state files. How does that sound?
  (@jeffpc I am not sure I explained this correctly, it will be great if you can chime in.)

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2945

To: pulkit, #hg-reviewers
Cc: jeffpc, yuja, mercurial-devel
phabricator - June 15, 2018, 12:59 a.m.
jeffpc added a comment.


  In https://phab.mercurial-scm.org/D2945#58586, @pulkit wrote:
  
  > There is a suggestion from @jeffpc to have a plain text header in statefiles like `HGStateFile` so that user can grep for them or look into files and know that they are state files. How does that sound?
  >  (@jeffpc I am not sure I explained this correctly, it will be great if you can chime in.)
  
  
  I don't really care about the actual string value, I just think it is a good storage design practice to have every on-disk structure contain a magic value of some sort.  (Beware, I have a storage background and therefore opinions about these things... ;) )  This makes it easier to do all sort of data recovery should the need arise.  E.g.,
  
  - if you know that a random file (e.g., from lost+found) is a hg state file, you can ignore/delete it and just reclone your repos
  - hg can detect some corruption better, and depending on the state file regenerate it or error out
  - hg developers can more easily sort out what's what when looking at buffers/files/etc.
  - as an added bonus, the magic can be used for format versioning
  
  FWIW, the three major things that I consider essential to good on-disk structures:
  
  1. magic numbers - see above
  2. checksums - possibly two checksums: one for the header and one for the payload
  3. back-pointers - see below
  
  Back-pointers allow you to tie the file/buffer into the bigger context - be it which state file it is, or which repository it belongs to.  In this case a state file's back-pointer could be something like the repo-id + state-file-name stored in the state file's header.  (I realize that there is no repo-id, but in general some approximation of it tends to be better than nothing.)  This should make it painfully obvious what data one is dealing with and how to parse/interpret it just by looking at the byte-stream without any other context.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2945

To: pulkit, #hg-reviewers
Cc: jeffpc, yuja, mercurial-devel

Patch

diff --git a/mercurial/state.py b/mercurial/state.py
--- a/mercurial/state.py
+++ b/mercurial/state.py
@@ -51,6 +51,7 @@ 
             self.opts = {}
         else:
             self.opts = opts
+        self.opts['magicheader'] = 'HGStateFile'
 
     def __nonzero__(self):
         return self.exists()