Patchwork [1,of,2] verify: document corner cases

login
register
mail settings
Submitter Jun Wu
Date March 29, 2017, 9:51 p.m.
Message ID <1f7890370b5437466534.1490824319@x1c>
Download mbox | patch
Permalink /patch/19831/
State Changes Requested
Headers show

Comments

Jun Wu - March 29, 2017, 9:51 p.m.
# HG changeset patch
# User Jun Wu <quark@fb.com>
# Date 1490823901 25200
#      Wed Mar 29 14:45:01 2017 -0700
# Node ID 1f7890370b5437466534cbd0a313d21671dade03
# Parent  e9fda3b8614a8b701bd48041afa8b709e1227f27
# Available At https://bitbucket.org/quark-zju/hg-draft
#              hg pull https://bitbucket.org/quark-zju/hg-draft -r 1f7890370b54
verify: document corner cases

It seems a good idea to list all kinds of "surprises" and expected behavior
to make the upcoming changes easier to understand.

Note: the comment added do not reflect the actual behavior of the current
code.
Jun Wu - March 29, 2017, 9:56 p.m.
Excerpts from Jun Wu's message of 2017-03-29 14:51:59 -0700:
> # HG changeset patch
> # User Jun Wu <quark@fb.com>
> # Date 1490823901 25200
> #      Wed Mar 29 14:45:01 2017 -0700
> # Node ID 1f7890370b5437466534cbd0a313d21671dade03
> # Parent  e9fda3b8614a8b701bd48041afa8b709e1227f27
> # Available At https://bitbucket.org/quark-zju/hg-draft 
> #              hg pull https://bitbucket.org/quark-zju/hg-draft  -r 1f7890370b54
> verify: document corner cases
> 
> It seems a good idea to list all kinds of "surprises" and expected behavior
> to make the upcoming changes easier to understand.
> 
> Note: the comment added do not reflect the actual behavior of the current
> code.
> 
> diff --git a/mercurial/verify.py b/mercurial/verify.py
> --- a/mercurial/verify.py
> +++ b/mercurial/verify.py
> @@ -381,4 +381,44 @@ class verifier(object):
>  
>                  # verify contents
> +                #
> +                # Define 4 cases to help understand corner cases:
> +                #
> +                #                       | common | rename | meta  | ext
> +                #  -------------------------------------------------------
> +                #   flags()             | 0      | 0      | 0     | not 0
> +                #   renamed()           | False  | True   | False | ?
> +                #   rawtext[0:2]=='\1\n'| False  | True   | True  | ?
> +                #
> +                # "rawtext" means the raw text stored in revlog data, which
> +                # could be retrieved by "revision(rev, raw=True)". "text"
> +                # mentioned below is "revision(rev, raw=False)".
> +                #
> +                # There are 3 different lengths stored physically:
> +                #  1. L1: rawsize, stored in revlog index
> +                #  2. L2: len(rawtext), stored in revlog data
> +                #  3. L3: len(text), stored in revlog data if flags=0, or
> +                #     possibly somewhere else if flags!=0
> +                #
> +                # L1 should be equal to L2. L3 is unrelated to L1 or L2. But
> +                # text may or may not affect commit hash depending on flag
> +                # processors (see revlog.addflagprocessor).
> +                #
> +                #              | common  | rename | meta  | ext
> +                # -------------------------------------------------
> +                #    rawsize() | L1      | L1     | L1    | L1
> +                #       size() | L1      | L2-LM  | L1(*) | L1 (?)
> +                # len(rawtext) | L2      | L2     | L2    | L2
> +                #    len(text) | L2      | L2     | L2    | L3
> +                #  len(read()) | L2      | L2-LM  | L2-LM | L3
                                                               ^^^^
                                                 This should be "L3 (?)"
                If that does not get fixed in flight. I'll send a follow up.

> +                #
> +                # LM:  length of metadata, depending on rawtext
> +                # (*): not ideal, see comment in filelog.size
> +                # (?): could be "- len(meta)" if the resolved content has
> +                #      rename metadata
> +                #
> +                # Checks needed to be done:
> +                #  1. length check: L1 == L2, in all cases.
> +                #  2. hash check: depending on flag processor, we may need to
> +                #     use either "text" (external), or "rawtext" (in revlog).
>                  try:
>                      l = len(fl.read(n))
Ryan McElroy - March 30, 2017, 8:55 a.m.
On 3/29/17 10:51 PM, Jun Wu wrote:
> # HG changeset patch
> # User Jun Wu <quark@fb.com>
> # Date 1490823901 25200
> #      Wed Mar 29 14:45:01 2017 -0700
> # Node ID 1f7890370b5437466534cbd0a313d21671dade03
> # Parent  e9fda3b8614a8b701bd48041afa8b709e1227f27
> verify: document corner cases
>
> It seems a good idea to list all kinds of "surprises" and expected behavior
> to make the upcoming changes easier to understand.
>
> Note: the comment added do not reflect the actual behavior of the current
grammar-nit: s/do/does

> code.
>
> diff --git a/mercurial/verify.py b/mercurial/verify.py
> --- a/mercurial/verify.py
> +++ b/mercurial/verify.py
> @@ -381,4 +381,44 @@ class verifier(object):
>   
>                   # verify contents
> +                #
> +                # Define 4 cases to help understand corner cases:
> +                #
> +                #                       | common | rename | meta  | ext

Can you define these four words?

common = common case?
rename = file was just copied from somewhere else?
meta = ???
ext = extension that adds a flags processor?
> +                #  -------------------------------------------------------
> +                #   flags()             | 0      | 0      | 0     | not 0
> +                #   renamed()           | False  | True   | False | ?
> +                #   rawtext[0:2]=='\1\n'| False  | True   | True  | ?
> +                #
> +                # "rawtext" means the raw text stored in revlog data, which
> +                # could be retrieved by "revision(rev, raw=True)". "text"
> +                # mentioned below is "revision(rev, raw=False)".
> +                #
> +                # There are 3 different lengths stored physically:
> +                #  1. L1: rawsize, stored in revlog index
> +                #  2. L2: len(rawtext), stored in revlog data
> +                #  3. L3: len(text), stored in revlog data if flags=0, or
> +                #     possibly somewhere else if flags!=0
> +                #
> +                # L1 should be equal to L2. L3 is unrelated to L1 or L2. But

How about "L3 can be different than L1 and L2" -- it would be the same 
if there are no flags, right?

> +                # text may or may not affect commit hash depending on flag
> +                # processors (see revlog.addflagprocessor).
> +                #
> +                #              | common  | rename | meta  | ext
> +                # -------------------------------------------------
> +                #    rawsize() | L1      | L1     | L1    | L1
> +                #       size() | L1      | L2-LM  | L1(*) | L1 (?)
> +                # len(rawtext) | L2      | L2     | L2    | L2
> +                #    len(text) | L2      | L2     | L2    | L3
> +                #  len(read()) | L2      | L2-LM  | L2-LM | L3
> +                #
> +                # LM:  length of metadata, depending on rawtext
> +                # (*): not ideal, see comment in filelog.size
> +                # (?): could be "- len(meta)" if the resolved content has
> +                #      rename metadata
> +                #
> +                # Checks needed to be done:
> +                #  1. length check: L1 == L2, in all cases.
> +                #  2. hash check: depending on flag processor, we may need to
> +                #     use either "text" (external), or "rawtext" (in revlog).
>                   try:
>                       l = len(fl.read(n))

Overall this is pretty easy to follow and adds a lot of clarity. Just a 
couple of questions inline.
Jun Wu - March 30, 2017, 4:01 p.m.
Excerpts from Ryan McElroy's message of 2017-03-30 09:55:50 +0100:
> On 3/29/17 10:51 PM, Jun Wu wrote:
> > # HG changeset patch
> > # User Jun Wu <quark@fb.com>
> > # Date 1490823901 25200
> > #      Wed Mar 29 14:45:01 2017 -0700
> > # Node ID 1f7890370b5437466534cbd0a313d21671dade03
> > # Parent  e9fda3b8614a8b701bd48041afa8b709e1227f27
> > verify: document corner cases
> >
> > It seems a good idea to list all kinds of "surprises" and expected behavior
> > to make the upcoming changes easier to understand.
> >
> > Note: the comment added do not reflect the actual behavior of the current
> grammar-nit: s/do/does
> 
> > code.
> >
> > diff --git a/mercurial/verify.py b/mercurial/verify.py
> > --- a/mercurial/verify.py
> > +++ b/mercurial/verify.py
> > @@ -381,4 +381,44 @@ class verifier(object):
> >   
> >                   # verify contents
> > +                #
> > +                # Define 4 cases to help understand corner cases:
> > +                #
> > +                #                       | common | rename | meta  | ext
> 
> Can you define these four words?

The first table is supposed to define them precisely.

> common = common case?
> rename = file was just copied from somewhere else?
> meta = ???

"rename" is called "metadata" in filelog.py, so it means metadata without
rename.

> ext = extension that adds a flags processor?
> > +                #  -------------------------------------------------------
> > +                #   flags()             | 0      | 0      | 0     | not 0
> > +                #   renamed()           | False  | True   | False | ?
> > +                #   rawtext[0:2]=='\1\n'| False  | True   | True  | ?
> > +                #
> > +                # "rawtext" means the raw text stored in revlog data, which
> > +                # could be retrieved by "revision(rev, raw=True)". "text"
> > +                # mentioned below is "revision(rev, raw=False)".
> > +                #
> > +                # There are 3 different lengths stored physically:
> > +                #  1. L1: rawsize, stored in revlog index
> > +                #  2. L2: len(rawtext), stored in revlog data
> > +                #  3. L3: len(text), stored in revlog data if flags=0, or
> > +                #     possibly somewhere else if flags!=0
> > +                #
> > +                # L1 should be equal to L2. L3 is unrelated to L1 or L2. But
> 
> How about "L3 can be different than L1 and L2" -- it would be the same 
> if there are no flags, right?

Yes. I notice the minor difference between the language. Thanks for the
suggestion!

> > +                # text may or may not affect commit hash depending on flag
> > +                # processors (see revlog.addflagprocessor).
> > +                #
> > +                #              | common  | rename | meta  | ext
> > +                # -------------------------------------------------
> > +                #    rawsize() | L1      | L1     | L1    | L1
> > +                #       size() | L1      | L2-LM  | L1(*) | L1 (?)
> > +                # len(rawtext) | L2      | L2     | L2    | L2
> > +                #    len(text) | L2      | L2     | L2    | L3
> > +                #  len(read()) | L2      | L2-LM  | L2-LM | L3
> > +                #
> > +                # LM:  length of metadata, depending on rawtext
> > +                # (*): not ideal, see comment in filelog.size
> > +                # (?): could be "- len(meta)" if the resolved content has
> > +                #      rename metadata
> > +                #
> > +                # Checks needed to be done:
> > +                #  1. length check: L1 == L2, in all cases.
> > +                #  2. hash check: depending on flag processor, we may need to
> > +                #     use either "text" (external), or "rawtext" (in revlog).
> >                   try:
> >                       l = len(fl.read(n))
> 
> Overall this is pretty easy to follow and adds a lot of clarity. Just a 
> couple of questions inline.
Ryan McElroy - March 30, 2017, 5:15 p.m.
On 3/30/17 5:01 PM, Jun Wu wrote:
> Excerpts from Ryan McElroy's message of 2017-03-30 09:55:50 +0100:
>> On 3/29/17 10:51 PM, Jun Wu wrote:
>>> # HG changeset patch
>>> # User Jun Wu <quark@fb.com>
>>> # Date 1490823901 25200
>>> #      Wed Mar 29 14:45:01 2017 -0700
>>> # Node ID 1f7890370b5437466534cbd0a313d21671dade03
>>> # Parent  e9fda3b8614a8b701bd48041afa8b709e1227f27
>>> verify: document corner cases
>>>
>>> It seems a good idea to list all kinds of "surprises" and expected behavior
>>> to make the upcoming changes easier to understand.
>>>
>>> Note: the comment added do not reflect the actual behavior of the current
>> grammar-nit: s/do/does
>>
>>> code.
>>>
>>> diff --git a/mercurial/verify.py b/mercurial/verify.py
>>> --- a/mercurial/verify.py
>>> +++ b/mercurial/verify.py
>>> @@ -381,4 +381,44 @@ class verifier(object):
>>>    
>>>                    # verify contents
>>> +                #
>>> +                # Define 4 cases to help understand corner cases:
>>> +                #
>>> +                #                       | common | rename | meta  | ext
>> Can you define these four words?
> The first table is supposed to define them precisely.

I guess it didn't for me, which is why I asked and thought "english 
descriptions" would be useful.

>
>> common = common case?
>> rename = file was just copied from somewhere else?
>> meta = ???
> "rename" is called "metadata" in filelog.py, so it means metadata without
> rename.

When would this happen? It's still not clear to me.

>
>> ext = extension that adds a flags processor?
>>> +                #  -------------------------------------------------------
>>> +                #   flags()             | 0      | 0      | 0     | not 0
>>> +                #   renamed()           | False  | True   | False | ?
>>> +                #   rawtext[0:2]=='\1\n'| False  | True   | True  | ?
>>> +                #
>>> +                # "rawtext" means the raw text stored in revlog data, which
>>> +                # could be retrieved by "revision(rev, raw=True)". "text"
>>> +                # mentioned below is "revision(rev, raw=False)".
>>> +                #
>>> +                # There are 3 different lengths stored physically:
>>> +                #  1. L1: rawsize, stored in revlog index
>>> +                #  2. L2: len(rawtext), stored in revlog data
>>> +                #  3. L3: len(text), stored in revlog data if flags=0, or
>>> +                #     possibly somewhere else if flags!=0
>>> +                #
>>> +                # L1 should be equal to L2. L3 is unrelated to L1 or L2. But
>> How about "L3 can be different than L1 and L2" -- it would be the same
>> if there are no flags, right?
> Yes. I notice the minor difference between the language. Thanks for the
> suggestion!
>
>>> +                # text may or may not affect commit hash depending on flag
>>> +                # processors (see revlog.addflagprocessor).
>>> +                #
>>> +                #              | common  | rename | meta  | ext
>>> +                # -------------------------------------------------
>>> +                #    rawsize() | L1      | L1     | L1    | L1
>>> +                #       size() | L1      | L2-LM  | L1(*) | L1 (?)
>>> +                # len(rawtext) | L2      | L2     | L2    | L2
>>> +                #    len(text) | L2      | L2     | L2    | L3
>>> +                #  len(read()) | L2      | L2-LM  | L2-LM | L3
>>> +                #
>>> +                # LM:  length of metadata, depending on rawtext
>>> +                # (*): not ideal, see comment in filelog.size
>>> +                # (?): could be "- len(meta)" if the resolved content has
>>> +                #      rename metadata
>>> +                #
>>> +                # Checks needed to be done:
>>> +                #  1. length check: L1 == L2, in all cases.
>>> +                #  2. hash check: depending on flag processor, we may need to
>>> +                #     use either "text" (external), or "rawtext" (in revlog).
>>>                    try:
>>>                        l = len(fl.read(n))
>> Overall this is pretty easy to follow and adds a lot of clarity. Just a
>> couple of questions inline.

Patch

diff --git a/mercurial/verify.py b/mercurial/verify.py
--- a/mercurial/verify.py
+++ b/mercurial/verify.py
@@ -381,4 +381,44 @@  class verifier(object):
 
                 # verify contents
+                #
+                # Define 4 cases to help understand corner cases:
+                #
+                #                       | common | rename | meta  | ext
+                #  -------------------------------------------------------
+                #   flags()             | 0      | 0      | 0     | not 0
+                #   renamed()           | False  | True   | False | ?
+                #   rawtext[0:2]=='\1\n'| False  | True   | True  | ?
+                #
+                # "rawtext" means the raw text stored in revlog data, which
+                # could be retrieved by "revision(rev, raw=True)". "text"
+                # mentioned below is "revision(rev, raw=False)".
+                #
+                # There are 3 different lengths stored physically:
+                #  1. L1: rawsize, stored in revlog index
+                #  2. L2: len(rawtext), stored in revlog data
+                #  3. L3: len(text), stored in revlog data if flags=0, or
+                #     possibly somewhere else if flags!=0
+                #
+                # L1 should be equal to L2. L3 is unrelated to L1 or L2. But
+                # text may or may not affect commit hash depending on flag
+                # processors (see revlog.addflagprocessor).
+                #
+                #              | common  | rename | meta  | ext
+                # -------------------------------------------------
+                #    rawsize() | L1      | L1     | L1    | L1
+                #       size() | L1      | L2-LM  | L1(*) | L1 (?)
+                # len(rawtext) | L2      | L2     | L2    | L2
+                #    len(text) | L2      | L2     | L2    | L3
+                #  len(read()) | L2      | L2-LM  | L2-LM | L3
+                #
+                # LM:  length of metadata, depending on rawtext
+                # (*): not ideal, see comment in filelog.size
+                # (?): could be "- len(meta)" if the resolved content has
+                #      rename metadata
+                #
+                # Checks needed to be done:
+                #  1. length check: L1 == L2, in all cases.
+                #  2. hash check: depending on flag processor, we may need to
+                #     use either "text" (external), or "rawtext" (in revlog).
                 try:
                     l = len(fl.read(n))