Patchwork sparse-revlog: protect C code against delta chain including nullrev

login
register
mail settings
Submitter Boris Feld
Date Nov. 29, 2018, 5:55 p.m.
Message ID <5a6a715f4ed0fa4d47bd.1543514101@pc62.home>
Download mbox | patch
Permalink /patch/36869/
State New
Headers show

Comments

Boris Feld - Nov. 29, 2018, 5:55 p.m.
# HG changeset patch
# User Boris Feld <boris.feld@octobus.net>
# Date 1543508919 -3600
#      Thu Nov 29 17:28:39 2018 +0100
# Node ID 5a6a715f4ed0fa4d47bdd6b6413797fe6cb17f88
# Parent  2f14d1bbc9a7a142b421285c0708320b46be7a56
# EXP-Topic sparse-followup
# Available At https://bitbucket.org/octobus/mercurial-devel/
#              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 5a6a715f4ed0
sparse-revlog: protect C code against delta chain including nullrev

For unclear reasons, some repositories include nullrev (-1). Re-computing
delta for such repo remove nullrev from all chain, so some older versions have
been creating them.

This currently raise an IndexError with the new C code doing chain slicing as
it expect all item to be positive.

Both python and C code for reading delta chain preserve nullrev, and the
Python code for chain slicing does not raise an exception in this case. So we
take the safe route and make the new C code works fine in that case.
Yuya Nishihara - Nov. 30, 2018, 11:45 a.m.
On Thu, 29 Nov 2018 18:55:01 +0100, Boris Feld wrote:
> # HG changeset patch
> # User Boris Feld <boris.feld@octobus.net>
> # Date 1543508919 -3600
> #      Thu Nov 29 17:28:39 2018 +0100
> # Node ID 5a6a715f4ed0fa4d47bdd6b6413797fe6cb17f88
> # Parent  2f14d1bbc9a7a142b421285c0708320b46be7a56
> # EXP-Topic sparse-followup
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 5a6a715f4ed0
> sparse-revlog: protect C code against delta chain including nullrev
> 
> For unclear reasons, some repositories include nullrev (-1). Re-computing
> delta for such repo remove nullrev from all chain, so some older versions have
> been creating them.
> 
> This currently raise an IndexError with the new C code doing chain slicing as
> it expect all item to be positive.
> 
> Both python and C code for reading delta chain preserve nullrev, and the
> Python code for chain slicing does not raise an exception in this case. So we
> take the safe route and make the new C code works fine in that case.

Can't we instead get around the nullrev in Python code?
I'm not a fan of including a hack in C for unclear reasons.

> --- a/mercurial/cext/revlog.c
> +++ b/mercurial/cext/revlog.c
> @@ -1064,6 +1064,13 @@ index_segment_span(indexObject *self, Py
>  	int64_t start_offset;
>  	int64_t end_offset;
>  	int end_size;
> +	if (start_rev == -1) {
> +        /* revlog have "harmless" corruption, keep going */
> +        if (start_rev == end_rev) {
> +            return 0;
> +        }
> +        start_rev += 1;
> +    }
>  	start_offset = index_get_start(self, start_rev);
>  	if (start_offset < 0) {
>  		return -1;
> @@ -1188,7 +1195,7 @@ static PyObject *index_slicechunktodensi
>  		if (revnum == -1 && PyErr_Occurred()) {
>  			goto bail;
>  		}
> -		if (revnum < 0 || revnum >= idxlen) {
> +		if (revnum < -1 || revnum >= idxlen) {

index_get_length() doesn't support nullrev, for example.
Boris Feld - Dec. 2, 2018, 3:42 p.m.
On 30/11/2018 12:45, Yuya Nishihara wrote:
> On Thu, 29 Nov 2018 18:55:01 +0100, Boris Feld wrote:
>> # HG changeset patch
>> # User Boris Feld <boris.feld@octobus.net>
>> # Date 1543508919 -3600
>> #      Thu Nov 29 17:28:39 2018 +0100
>> # Node ID 5a6a715f4ed0fa4d47bdd6b6413797fe6cb17f88
>> # Parent  2f14d1bbc9a7a142b421285c0708320b46be7a56
>> # EXP-Topic sparse-followup
>> # Available At https://bitbucket.org/octobus/mercurial-devel/
>> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 5a6a715f4ed0
>> sparse-revlog: protect C code against delta chain including nullrev
>>
>> For unclear reasons, some repositories include nullrev (-1). Re-computing
>> delta for such repo remove nullrev from all chain, so some older versions have
>> been creating them.
>>
>> This currently raise an IndexError with the new C code doing chain slicing as
>> it expect all item to be positive.
>>
>> Both python and C code for reading delta chain preserve nullrev, and the
>> Python code for chain slicing does not raise an exception in this case. So we
>> take the safe route and make the new C code works fine in that case.
> Can't we instead get around the nullrev in Python code?
> I'm not a fan of including a hack in C for unclear reasons.

My preferred way to do this would be to filter out the -1 in deltachain.
(both C and Python).

Would that be okay with you?
>
>> --- a/mercurial/cext/revlog.c
>> +++ b/mercurial/cext/revlog.c
>> @@ -1064,6 +1064,13 @@ index_segment_span(indexObject *self, Py
>>  	int64_t start_offset;
>>  	int64_t end_offset;
>>  	int end_size;
>> +	if (start_rev == -1) {
>> +        /* revlog have "harmless" corruption, keep going */
>> +        if (start_rev == end_rev) {
>> +            return 0;
>> +        }
>> +        start_rev += 1;
>> +    }
>>  	start_offset = index_get_start(self, start_rev);
>>  	if (start_offset < 0) {
>>  		return -1;
>> @@ -1188,7 +1195,7 @@ static PyObject *index_slicechunktodensi
>>  		if (revnum == -1 && PyErr_Occurred()) {
>>  			goto bail;
>>  		}
>> -		if (revnum < 0 || revnum >= idxlen) {
>> +		if (revnum < -1 || revnum >= idxlen) {
> index_get_length() doesn't support nullrev, for example.
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Yuya Nishihara - Dec. 3, 2018, 11:10 a.m.
On Sun, 2 Dec 2018 16:42:18 +0100, Boris FELD wrote:
> 
> On 30/11/2018 12:45, Yuya Nishihara wrote:
> > On Thu, 29 Nov 2018 18:55:01 +0100, Boris Feld wrote:
> >> # HG changeset patch
> >> # User Boris Feld <boris.feld@octobus.net>
> >> # Date 1543508919 -3600
> >> #      Thu Nov 29 17:28:39 2018 +0100
> >> # Node ID 5a6a715f4ed0fa4d47bdd6b6413797fe6cb17f88
> >> # Parent  2f14d1bbc9a7a142b421285c0708320b46be7a56
> >> # EXP-Topic sparse-followup
> >> # Available At https://bitbucket.org/octobus/mercurial-devel/
> >> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 5a6a715f4ed0
> >> sparse-revlog: protect C code against delta chain including nullrev
> >>
> >> For unclear reasons, some repositories include nullrev (-1). Re-computing
> >> delta for such repo remove nullrev from all chain, so some older versions have
> >> been creating them.
> >>
> >> This currently raise an IndexError with the new C code doing chain slicing as
> >> it expect all item to be positive.
> >>
> >> Both python and C code for reading delta chain preserve nullrev, and the
> >> Python code for chain slicing does not raise an exception in this case. So we
> >> take the safe route and make the new C code works fine in that case.
> > Can't we instead get around the nullrev in Python code?
> > I'm not a fan of including a hack in C for unclear reasons.
> 
> My preferred way to do this would be to filter out the -1 in deltachain.
> (both C and Python).

I don't know details, but yeah, it sounds like the best workaround if we
can just pre-filter null revision.
Boris Feld - Dec. 14, 2018, 9:05 p.m.
On 03/12/2018 11:10, Yuya Nishihara wrote:
> On Sun, 2 Dec 2018 16:42:18 +0100, Boris FELD wrote:
>> On 30/11/2018 12:45, Yuya Nishihara wrote:
>>> On Thu, 29 Nov 2018 18:55:01 +0100, Boris Feld wrote:
>>>> # HG changeset patch
>>>> # User Boris Feld <boris.feld@octobus.net>
>>>> # Date 1543508919 -3600
>>>> #      Thu Nov 29 17:28:39 2018 +0100
>>>> # Node ID 5a6a715f4ed0fa4d47bdd6b6413797fe6cb17f88
>>>> # Parent  2f14d1bbc9a7a142b421285c0708320b46be7a56
>>>> # EXP-Topic sparse-followup
>>>> # Available At https://bitbucket.org/octobus/mercurial-devel/
>>>> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 5a6a715f4ed0
>>>> sparse-revlog: protect C code against delta chain including nullrev
>>>>
>>>> For unclear reasons, some repositories include nullrev (-1). Re-computing
>>>> delta for such repo remove nullrev from all chain, so some older versions have
>>>> been creating them.
>>>>
>>>> This currently raise an IndexError with the new C code doing chain slicing as
>>>> it expect all item to be positive.
>>>>
>>>> Both python and C code for reading delta chain preserve nullrev, and the
>>>> Python code for chain slicing does not raise an exception in this case. So we
>>>> take the safe route and make the new C code works fine in that case.
>>> Can't we instead get around the nullrev in Python code?
>>> I'm not a fan of including a hack in C for unclear reasons.
>> My preferred way to do this would be to filter out the -1 in deltachain.
>> (both C and Python).
> I don't know details, but yeah, it sounds like the best workaround if we
> can just pre-filter null revision.

So looking deeper into that, we cannot just filter null revision, the
delta chain expect to be able to apply delta on a null base :-/

Instead we are looking into fixing internal's (index_get_start & co) to
provide a return consistent with `index_get` (0 start, 0 lenght). V2, coming

> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Patch

diff --git a/mercurial/cext/revlog.c b/mercurial/cext/revlog.c
--- a/mercurial/cext/revlog.c
+++ b/mercurial/cext/revlog.c
@@ -1064,6 +1064,13 @@  index_segment_span(indexObject *self, Py
 	int64_t start_offset;
 	int64_t end_offset;
 	int end_size;
+	if (start_rev == -1) {
+        /* revlog have "harmless" corruption, keep going */
+        if (start_rev == end_rev) {
+            return 0;
+        }
+        start_rev += 1;
+    }
 	start_offset = index_get_start(self, start_rev);
 	if (start_offset < 0) {
 		return -1;
@@ -1188,7 +1195,7 @@  static PyObject *index_slicechunktodensi
 		if (revnum == -1 && PyErr_Occurred()) {
 			goto bail;
 		}
-		if (revnum < 0 || revnum >= idxlen) {
+		if (revnum < -1 || revnum >= idxlen) {
 			PyErr_Format(PyExc_IndexError,
 			             "index out of range: %zd", revnum);
 			goto bail;