Submitter | Jun Wu |
---|---|
Date | March 30, 2017, 9:51 p.m. |
Message ID | <17b41390f4912a4c1853.1490910674@x1c> |
Download | mbox | patch |
Permalink | /patch/19851/ |
State | Accepted |
Delegated to: | Yuya Nishihara |
Headers | show |
Comments
On 3/30/17 10:51 PM, Jun Wu wrote: > # HG changeset patch > # User Jun Wu <quark@fb.com> > # Date 1490823901 25200 > # Wed Mar 29 14:45:01 2017 -0700 > # Node ID 17b41390f4912a4c18538d778837bc2cf4a1be92 > # Parent dea2a17cbfd00bf08ee87b3e44b1c71499189f89 > # Available At https://urldefense.proofpoint.com/v2/url?u=https-3A__bitbucket.org_quark-2Dzju_hg-2Ddraft&d=DwIGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Jw8rundaE7TbmqBYd1txIQ&m=kuj9GJXY5gbugZhlyrbxjthYrYkZ6M3HZHYFrfQg_ZY&s=yj22qlE05uO47PYA0xtQ2kgE_-D5kRIk7mybLrSUZdg&e= > # hg pull https://urldefense.proofpoint.com/v2/url?u=https-3A__bitbucket.org_quark-2Dzju_hg-2Ddraft&d=DwIGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Jw8rundaE7TbmqBYd1txIQ&m=kuj9GJXY5gbugZhlyrbxjthYrYkZ6M3HZHYFrfQg_ZY&s=yj22qlE05uO47PYA0xtQ2kgE_-D5kRIk7mybLrSUZdg&e= -r 17b41390f491 > verify: document corner cases > > It seems a good idea to list all kinds of "surprises" and expected behavior > to make the upcoming changes easier to understand. > > Note: the comment added does not reflect the actual behavior of the current > code. > > diff --git a/mercurial/verify.py b/mercurial/verify.py > --- a/mercurial/verify.py > +++ b/mercurial/verify.py > @@ -380,5 +380,51 @@ class verifier(object): > del filenodes[f][n] > > - # verify contents > + # Verify contents. 4 cases to care about: > + # > + # common: the most common case > + # rename: with a rename > + # meta: file content starts with b'\1\n', the metadata > + # header defined in filelog.py, but without a rename > + # ext: largefiles. content stored elsewhere > + # > + # More formally, their differences are shown below: > + # > + # | common | rename | meta | ext > + # ------------------------------------------------------- > + # flags() | 0 | 0 | 0 | not 0 > + # renamed() | False | True | False | ? > + # rawtext[0:2]=='\1\n'| False | True | True | ? > + # > + # "rawtext" means the raw text stored in revlog data, which > + # could be retrieved by "revision(rev, raw=True)". "text" > + # mentioned below is "revision(rev, raw=False)". > + # > + # There are 3 different lengths stored physically: > + # 1. L1: rawsize, stored in revlog index > + # 2. L2: len(rawtext), stored in revlog data > + # 3. L3: len(text), stored in revlog data if flags==0, or > + # possibly somewhere else if flags!=0 > + # > + # L1 should be equal to L2. L3 could be different from them. > + # "text" may or may not affect commit hash depending on flag > + # processors (see revlog.addflagprocessor). > + # > + # | common | rename | meta | ext > + # ------------------------------------------------- > + # rawsize() | L1 | L1 | L1 | L1 > + # size() | L1 | L2-LM | L1(*) | L1 (?) > + # len(rawtext) | L2 | L2 | L2 | L2 > + # len(text) | L2 | L2 | L2 | L3 > + # len(read()) | L2 | L2-LM | L2-LM | L3 (?) > + # > + # LM: length of metadata, depending on rawtext > + # (*): not ideal, see comment in filelog.size > + # (?): could be "- len(meta)" if the resolved content has > + # rename metadata > + # > + # Checks needed to be done: > + # 1. length check: L1 == L2, in all cases. > + # 2. hash check: depending on flag processor, we may need to > + # use either "text" (external), or "rawtext" (in revlog). > try: > l = len(fl.read(n)) > This comment is very clear to me now. Well done and thanks!
On Thu, 30 Mar 2017 14:51:14 -0700, Jun Wu wrote: > # HG changeset patch > # User Jun Wu <quark@fb.com> > # Date 1490823901 25200 > # Wed Mar 29 14:45:01 2017 -0700 > # Node ID 17b41390f4912a4c18538d778837bc2cf4a1be92 > # Parent dea2a17cbfd00bf08ee87b3e44b1c71499189f89 > # Available At https://bitbucket.org/quark-zju/hg-draft > # hg pull https://bitbucket.org/quark-zju/hg-draft -r 17b41390f491 > verify: document corner cases > > It seems a good idea to list all kinds of "surprises" and expected behavior > to make the upcoming changes easier to understand. > > Note: the comment added does not reflect the actual behavior of the current > code. > > diff --git a/mercurial/verify.py b/mercurial/verify.py > --- a/mercurial/verify.py > +++ b/mercurial/verify.py > @@ -380,5 +380,51 @@ class verifier(object): > del filenodes[f][n] > > - # verify contents > + # Verify contents. 4 cases to care about: > + # > + # common: the most common case > + # rename: with a rename > + # meta: file content starts with b'\1\n', the metadata > + # header defined in filelog.py, but without a rename > + # ext: largefiles. content stored elsewhere Maybe s/largefiles/lfs/ ? Could be fixed in flight if that's correct.
Excerpts from Yuya Nishihara's message of 2017-04-02 16:21:24 +0900: > On Thu, 30 Mar 2017 14:51:14 -0700, Jun Wu wrote: > > # HG changeset patch > > # User Jun Wu <quark@fb.com> > > # Date 1490823901 25200 > > # Wed Mar 29 14:45:01 2017 -0700 > > # Node ID 17b41390f4912a4c18538d778837bc2cf4a1be92 > > # Parent dea2a17cbfd00bf08ee87b3e44b1c71499189f89 > > # Available At https://bitbucket.org/quark-zju/hg-draft > > # hg pull https://bitbucket.org/quark-zju/hg-draft -r 17b41390f491 > > verify: document corner cases > > > > It seems a good idea to list all kinds of "surprises" and expected behavior > > to make the upcoming changes easier to understand. > > > > Note: the comment added does not reflect the actual behavior of the current > > code. > > > > diff --git a/mercurial/verify.py b/mercurial/verify.py > > --- a/mercurial/verify.py > > +++ b/mercurial/verify.py > > @@ -380,5 +380,51 @@ class verifier(object): > > del filenodes[f][n] > > > > - # verify contents > > + # Verify contents. 4 cases to care about: > > + # > > + # common: the most common case > > + # rename: with a rename > > + # meta: file content starts with b'\1\n', the metadata > > + # header defined in filelog.py, but without a rename > > + # ext: largefiles. content stored elsewhere > > Maybe s/largefiles/lfs/ ? Could be fixed in flight if that's correct. Yes. But "lfs" is outside core. Maybe just say "content stored externally".
On Sun, 2 Apr 2017 17:15:48 -0700, Jun Wu wrote: > Excerpts from Yuya Nishihara's message of 2017-04-02 16:21:24 +0900: > > On Thu, 30 Mar 2017 14:51:14 -0700, Jun Wu wrote: > > > # HG changeset patch > > > # User Jun Wu <quark@fb.com> > > > # Date 1490823901 25200 > > > # Wed Mar 29 14:45:01 2017 -0700 > > > # Node ID 17b41390f4912a4c18538d778837bc2cf4a1be92 > > > # Parent dea2a17cbfd00bf08ee87b3e44b1c71499189f89 > > > # Available At https://bitbucket.org/quark-zju/hg-draft > > > # hg pull https://bitbucket.org/quark-zju/hg-draft -r 17b41390f491 > > > verify: document corner cases > > > > > > It seems a good idea to list all kinds of "surprises" and expected behavior > > > to make the upcoming changes easier to understand. > > > > > > Note: the comment added does not reflect the actual behavior of the current > > > code. > > > > > > diff --git a/mercurial/verify.py b/mercurial/verify.py > > > --- a/mercurial/verify.py > > > +++ b/mercurial/verify.py > > > @@ -380,5 +380,51 @@ class verifier(object): > > > del filenodes[f][n] > > > > > > - # verify contents > > > + # Verify contents. 4 cases to care about: > > > + # > > > + # common: the most common case > > > + # rename: with a rename > > > + # meta: file content starts with b'\1\n', the metadata > > > + # header defined in filelog.py, but without a rename > > > + # ext: largefiles. content stored elsewhere > > > > Maybe s/largefiles/lfs/ ? Could be fixed in flight if that's correct. > > Yes. But "lfs" is outside core. Maybe just say "content stored externally". Okay, updated as such and queued, thanks.
Patch
diff --git a/mercurial/verify.py b/mercurial/verify.py --- a/mercurial/verify.py +++ b/mercurial/verify.py @@ -380,5 +380,51 @@ class verifier(object): del filenodes[f][n] - # verify contents + # Verify contents. 4 cases to care about: + # + # common: the most common case + # rename: with a rename + # meta: file content starts with b'\1\n', the metadata + # header defined in filelog.py, but without a rename + # ext: largefiles. content stored elsewhere + # + # More formally, their differences are shown below: + # + # | common | rename | meta | ext + # ------------------------------------------------------- + # flags() | 0 | 0 | 0 | not 0 + # renamed() | False | True | False | ? + # rawtext[0:2]=='\1\n'| False | True | True | ? + # + # "rawtext" means the raw text stored in revlog data, which + # could be retrieved by "revision(rev, raw=True)". "text" + # mentioned below is "revision(rev, raw=False)". + # + # There are 3 different lengths stored physically: + # 1. L1: rawsize, stored in revlog index + # 2. L2: len(rawtext), stored in revlog data + # 3. L3: len(text), stored in revlog data if flags==0, or + # possibly somewhere else if flags!=0 + # + # L1 should be equal to L2. L3 could be different from them. + # "text" may or may not affect commit hash depending on flag + # processors (see revlog.addflagprocessor). + # + # | common | rename | meta | ext + # ------------------------------------------------- + # rawsize() | L1 | L1 | L1 | L1 + # size() | L1 | L2-LM | L1(*) | L1 (?) + # len(rawtext) | L2 | L2 | L2 | L2 + # len(text) | L2 | L2 | L2 | L3 + # len(read()) | L2 | L2-LM | L2-LM | L3 (?) + # + # LM: length of metadata, depending on rawtext + # (*): not ideal, see comment in filelog.size + # (?): could be "- len(meta)" if the resolved content has + # rename metadata + # + # Checks needed to be done: + # 1. length check: L1 == L2, in all cases. + # 2. hash check: depending on flag processor, we may need to + # use either "text" (external), or "rawtext" (in revlog). try: l = len(fl.read(n))