Patchwork [1,of,2,V2] verify: document corner cases

login
register
mail settings
Submitter Jun Wu
Date March 30, 2017, 9:51 p.m.
Message ID <17b41390f4912a4c1853.1490910674@x1c>
Download mbox | patch
Permalink /patch/19851/
State Accepted
Delegated to: Yuya Nishihara
Headers show

Comments

Jun Wu - March 30, 2017, 9:51 p.m.
# HG changeset patch
# User Jun Wu <quark@fb.com>
# Date 1490823901 25200
#      Wed Mar 29 14:45:01 2017 -0700
# Node ID 17b41390f4912a4c18538d778837bc2cf4a1be92
# Parent  dea2a17cbfd00bf08ee87b3e44b1c71499189f89
# Available At https://bitbucket.org/quark-zju/hg-draft
#              hg pull https://bitbucket.org/quark-zju/hg-draft -r 17b41390f491
verify: document corner cases

It seems a good idea to list all kinds of "surprises" and expected behavior
to make the upcoming changes easier to understand.

Note: the comment added does not reflect the actual behavior of the current
code.
Ryan McElroy - March 31, 2017, 10:14 a.m.
On 3/30/17 10:51 PM, Jun Wu wrote:
> # HG changeset patch
> # User Jun Wu <quark@fb.com>
> # Date 1490823901 25200
> #      Wed Mar 29 14:45:01 2017 -0700
> # Node ID 17b41390f4912a4c18538d778837bc2cf4a1be92
> # Parent  dea2a17cbfd00bf08ee87b3e44b1c71499189f89
> # Available At https://urldefense.proofpoint.com/v2/url?u=https-3A__bitbucket.org_quark-2Dzju_hg-2Ddraft&d=DwIGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Jw8rundaE7TbmqBYd1txIQ&m=kuj9GJXY5gbugZhlyrbxjthYrYkZ6M3HZHYFrfQg_ZY&s=yj22qlE05uO47PYA0xtQ2kgE_-D5kRIk7mybLrSUZdg&e=
> #              hg pull https://urldefense.proofpoint.com/v2/url?u=https-3A__bitbucket.org_quark-2Dzju_hg-2Ddraft&d=DwIGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Jw8rundaE7TbmqBYd1txIQ&m=kuj9GJXY5gbugZhlyrbxjthYrYkZ6M3HZHYFrfQg_ZY&s=yj22qlE05uO47PYA0xtQ2kgE_-D5kRIk7mybLrSUZdg&e=  -r 17b41390f491
> verify: document corner cases
>
> It seems a good idea to list all kinds of "surprises" and expected behavior
> to make the upcoming changes easier to understand.
>
> Note: the comment added does not reflect the actual behavior of the current
> code.
>
> diff --git a/mercurial/verify.py b/mercurial/verify.py
> --- a/mercurial/verify.py
> +++ b/mercurial/verify.py
> @@ -380,5 +380,51 @@ class verifier(object):
>                           del filenodes[f][n]
>   
> -                # verify contents
> +                # Verify contents. 4 cases to care about:
> +                #
> +                #   common: the most common case
> +                #   rename: with a rename
> +                #   meta: file content starts with b'\1\n', the metadata
> +                #         header defined in filelog.py, but without a rename
> +                #   ext: largefiles. content stored elsewhere
> +                #
> +                # More formally, their differences are shown below:
> +                #
> +                #                       | common | rename | meta  | ext
> +                #  -------------------------------------------------------
> +                #   flags()             | 0      | 0      | 0     | not 0
> +                #   renamed()           | False  | True   | False | ?
> +                #   rawtext[0:2]=='\1\n'| False  | True   | True  | ?
> +                #
> +                # "rawtext" means the raw text stored in revlog data, which
> +                # could be retrieved by "revision(rev, raw=True)". "text"
> +                # mentioned below is "revision(rev, raw=False)".
> +                #
> +                # There are 3 different lengths stored physically:
> +                #  1. L1: rawsize, stored in revlog index
> +                #  2. L2: len(rawtext), stored in revlog data
> +                #  3. L3: len(text), stored in revlog data if flags==0, or
> +                #     possibly somewhere else if flags!=0
> +                #
> +                # L1 should be equal to L2. L3 could be different from them.
> +                # "text" may or may not affect commit hash depending on flag
> +                # processors (see revlog.addflagprocessor).
> +                #
> +                #              | common  | rename | meta  | ext
> +                # -------------------------------------------------
> +                #    rawsize() | L1      | L1     | L1    | L1
> +                #       size() | L1      | L2-LM  | L1(*) | L1 (?)
> +                # len(rawtext) | L2      | L2     | L2    | L2
> +                #    len(text) | L2      | L2     | L2    | L3
> +                #  len(read()) | L2      | L2-LM  | L2-LM | L3 (?)
> +                #
> +                # LM:  length of metadata, depending on rawtext
> +                # (*): not ideal, see comment in filelog.size
> +                # (?): could be "- len(meta)" if the resolved content has
> +                #      rename metadata
> +                #
> +                # Checks needed to be done:
> +                #  1. length check: L1 == L2, in all cases.
> +                #  2. hash check: depending on flag processor, we may need to
> +                #     use either "text" (external), or "rawtext" (in revlog).
>                   try:
>                       l = len(fl.read(n))
>
This comment is very clear to me now. Well done and thanks!
Yuya Nishihara - April 2, 2017, 7:21 a.m.
On Thu, 30 Mar 2017 14:51:14 -0700, Jun Wu wrote:
> # HG changeset patch
> # User Jun Wu <quark@fb.com>
> # Date 1490823901 25200
> #      Wed Mar 29 14:45:01 2017 -0700
> # Node ID 17b41390f4912a4c18538d778837bc2cf4a1be92
> # Parent  dea2a17cbfd00bf08ee87b3e44b1c71499189f89
> # Available At https://bitbucket.org/quark-zju/hg-draft
> #              hg pull https://bitbucket.org/quark-zju/hg-draft -r 17b41390f491
> verify: document corner cases
> 
> It seems a good idea to list all kinds of "surprises" and expected behavior
> to make the upcoming changes easier to understand.
> 
> Note: the comment added does not reflect the actual behavior of the current
> code.
> 
> diff --git a/mercurial/verify.py b/mercurial/verify.py
> --- a/mercurial/verify.py
> +++ b/mercurial/verify.py
> @@ -380,5 +380,51 @@ class verifier(object):
>                          del filenodes[f][n]
>  
> -                # verify contents
> +                # Verify contents. 4 cases to care about:
> +                #
> +                #   common: the most common case
> +                #   rename: with a rename
> +                #   meta: file content starts with b'\1\n', the metadata
> +                #         header defined in filelog.py, but without a rename
> +                #   ext: largefiles. content stored elsewhere

Maybe s/largefiles/lfs/ ? Could be fixed in flight if that's correct.
Jun Wu - April 3, 2017, 12:15 a.m.
Excerpts from Yuya Nishihara's message of 2017-04-02 16:21:24 +0900:
> On Thu, 30 Mar 2017 14:51:14 -0700, Jun Wu wrote:
> > # HG changeset patch
> > # User Jun Wu <quark@fb.com>
> > # Date 1490823901 25200
> > #      Wed Mar 29 14:45:01 2017 -0700
> > # Node ID 17b41390f4912a4c18538d778837bc2cf4a1be92
> > # Parent  dea2a17cbfd00bf08ee87b3e44b1c71499189f89
> > # Available At https://bitbucket.org/quark-zju/hg-draft 
> > #              hg pull https://bitbucket.org/quark-zju/hg-draft  -r 17b41390f491
> > verify: document corner cases
> > 
> > It seems a good idea to list all kinds of "surprises" and expected behavior
> > to make the upcoming changes easier to understand.
> > 
> > Note: the comment added does not reflect the actual behavior of the current
> > code.
> > 
> > diff --git a/mercurial/verify.py b/mercurial/verify.py
> > --- a/mercurial/verify.py
> > +++ b/mercurial/verify.py
> > @@ -380,5 +380,51 @@ class verifier(object):
> >                          del filenodes[f][n]
> >  
> > -                # verify contents
> > +                # Verify contents. 4 cases to care about:
> > +                #
> > +                #   common: the most common case
> > +                #   rename: with a rename
> > +                #   meta: file content starts with b'\1\n', the metadata
> > +                #         header defined in filelog.py, but without a rename
> > +                #   ext: largefiles. content stored elsewhere
> 
> Maybe s/largefiles/lfs/ ? Could be fixed in flight if that's correct.

Yes. But "lfs" is outside core. Maybe just say "content stored externally".
Yuya Nishihara - April 3, 2017, 2:22 p.m.
On Sun, 2 Apr 2017 17:15:48 -0700, Jun Wu wrote:
> Excerpts from Yuya Nishihara's message of 2017-04-02 16:21:24 +0900:
> > On Thu, 30 Mar 2017 14:51:14 -0700, Jun Wu wrote:
> > > # HG changeset patch
> > > # User Jun Wu <quark@fb.com>
> > > # Date 1490823901 25200
> > > #      Wed Mar 29 14:45:01 2017 -0700
> > > # Node ID 17b41390f4912a4c18538d778837bc2cf4a1be92
> > > # Parent  dea2a17cbfd00bf08ee87b3e44b1c71499189f89
> > > # Available At https://bitbucket.org/quark-zju/hg-draft 
> > > #              hg pull https://bitbucket.org/quark-zju/hg-draft  -r 17b41390f491
> > > verify: document corner cases
> > > 
> > > It seems a good idea to list all kinds of "surprises" and expected behavior
> > > to make the upcoming changes easier to understand.
> > > 
> > > Note: the comment added does not reflect the actual behavior of the current
> > > code.
> > > 
> > > diff --git a/mercurial/verify.py b/mercurial/verify.py
> > > --- a/mercurial/verify.py
> > > +++ b/mercurial/verify.py
> > > @@ -380,5 +380,51 @@ class verifier(object):
> > >                          del filenodes[f][n]
> > >  
> > > -                # verify contents
> > > +                # Verify contents. 4 cases to care about:
> > > +                #
> > > +                #   common: the most common case
> > > +                #   rename: with a rename
> > > +                #   meta: file content starts with b'\1\n', the metadata
> > > +                #         header defined in filelog.py, but without a rename
> > > +                #   ext: largefiles. content stored elsewhere
> > 
> > Maybe s/largefiles/lfs/ ? Could be fixed in flight if that's correct.
> 
> Yes. But "lfs" is outside core. Maybe just say "content stored externally".

Okay, updated as such and queued, thanks.

Patch

diff --git a/mercurial/verify.py b/mercurial/verify.py
--- a/mercurial/verify.py
+++ b/mercurial/verify.py
@@ -380,5 +380,51 @@  class verifier(object):
                         del filenodes[f][n]
 
-                # verify contents
+                # Verify contents. 4 cases to care about:
+                #
+                #   common: the most common case
+                #   rename: with a rename
+                #   meta: file content starts with b'\1\n', the metadata
+                #         header defined in filelog.py, but without a rename
+                #   ext: largefiles. content stored elsewhere
+                #
+                # More formally, their differences are shown below:
+                #
+                #                       | common | rename | meta  | ext
+                #  -------------------------------------------------------
+                #   flags()             | 0      | 0      | 0     | not 0
+                #   renamed()           | False  | True   | False | ?
+                #   rawtext[0:2]=='\1\n'| False  | True   | True  | ?
+                #
+                # "rawtext" means the raw text stored in revlog data, which
+                # could be retrieved by "revision(rev, raw=True)". "text"
+                # mentioned below is "revision(rev, raw=False)".
+                #
+                # There are 3 different lengths stored physically:
+                #  1. L1: rawsize, stored in revlog index
+                #  2. L2: len(rawtext), stored in revlog data
+                #  3. L3: len(text), stored in revlog data if flags==0, or
+                #     possibly somewhere else if flags!=0
+                #
+                # L1 should be equal to L2. L3 could be different from them.
+                # "text" may or may not affect commit hash depending on flag
+                # processors (see revlog.addflagprocessor).
+                #
+                #              | common  | rename | meta  | ext
+                # -------------------------------------------------
+                #    rawsize() | L1      | L1     | L1    | L1
+                #       size() | L1      | L2-LM  | L1(*) | L1 (?)
+                # len(rawtext) | L2      | L2     | L2    | L2
+                #    len(text) | L2      | L2     | L2    | L3
+                #  len(read()) | L2      | L2-LM  | L2-LM | L3 (?)
+                #
+                # LM:  length of metadata, depending on rawtext
+                # (*): not ideal, see comment in filelog.size
+                # (?): could be "- len(meta)" if the resolved content has
+                #      rename metadata
+                #
+                # Checks needed to be done:
+                #  1. length check: L1 == L2, in all cases.
+                #  2. hash check: depending on flag processor, we may need to
+                #     use either "text" (external), or "rawtext" (in revlog).
                 try:
                     l = len(fl.read(n))