Patchwork [2,of,2] bdiff: early pruning of common suffix before doing expensive computations

login
register
mail settings
Submitter Pierre-Yves David
Date Dec. 15, 2016, 8:01 a.m.
Message ID <f606b7c0-34d2-c1ed-5156-8aadf63e60d5@ens-lyon.org>
Download mbox | patch
Permalink /patch/17920/
State Not Applicable
Headers show

Comments

Pierre-Yves David - Dec. 15, 2016, 8:01 a.m.
On 11/17/2016 06:16 PM, Mads Kiilerich wrote:
> # HG changeset patch
> # User Mads Kiilerich <madski@unity3d.com>
> # Date 1479322051 -3600
> #      Wed Nov 16 19:47:31 2016 +0100
> # Node ID 851a14d5b80639efe61bd01ad215479c99106b40
> # Parent  7f39bccc1c96bffc83f3c6e774da57ecd38982a7
> bdiff: early pruning of common suffix before doing expensive computations
>
> Similar to the previous change, but trimming trailing lines.

The change itself seems fine and the perf win is very tasty. However, 
this changesets breaks tests with --pure.

I'm tempted to drop it unless we can provide a test update to fold into 
this.

Test failure with --pure:


> On mozilla-unified:
> perfbdiff -m 3041e4d59df2
> ! wall 0.024618 comb 0.020000 user 0.020000 sys 0.000000 (best of 116) to
> ! wall 0.008259 comb 0.010000 user 0.010000 sys 0.000000 (best of 356) to
> perfbdiff 0e9928989e9c --alldata --count 10
> ! wall 0.579235 comb 0.580000 user 0.580000 sys 0.000000 (best of 18)
> ! wall 0.469869 comb 0.460000 user 0.460000 sys 0.000000 (best of 22)
>
> diff --git a/mercurial/bdiff_module.c b/mercurial/bdiff_module.c
> --- a/mercurial/bdiff_module.c
> +++ b/mercurial/bdiff_module.c
> @@ -66,7 +66,7 @@ static PyObject *bdiff(PyObject *self, P
>  	struct bdiff_line *al, *bl;
>  	struct bdiff_hunk l, *h;
>  	int an, bn, count;
> -	Py_ssize_t len = 0, la, lb, li = 0, lcommon = 0, lmax;
> +	Py_ssize_t len = 0, la, lb, li = 0, lcommon = 0, lcommonend = 0, lmax;
>  	PyThreadState *_save;
>
>  	l.next = NULL;
> @@ -88,9 +88,16 @@ static PyObject *bdiff(PyObject *self, P
>  		if (*ia == '\n')
>  			lcommon = li + 1;
>  	/* we can almost add: if (li == lmax) lcommon = li; */
> +	lmax -= lcommon;
> +	for (li = 0, ia = sa + la - 1, ib = sb + lb - 1;
> +	     li <= lmax && *ia == *ib;
> +	     ++li, --ia, --ib)
> +		if (*ia == '\n')
> +			lcommonend = li;
> +	/* printf("common %i %i\n", lcommon, lcommonend); */
>
> -	an = bdiff_splitlines(sa + lcommon, la - lcommon, &al);
> -	bn = bdiff_splitlines(sb + lcommon, lb - lcommon, &bl);
> +	an = bdiff_splitlines(sa + lcommon, la - lcommon - lcommonend, &al);
> +	bn = bdiff_splitlines(sb + lcommon, lb - lcommon - lcommonend, &bl);
>  	if (!al || !bl)
>  		goto nomem;
>
> diff --git a/tests/test-bdiff.py.out b/tests/test-bdiff.py.out
> --- a/tests/test-bdiff.py.out
> +++ b/tests/test-bdiff.py.out
> @@ -28,9 +28,9 @@ showdiff(
>    'x\n\nx\n\ny\n\nx\n\ny\n\nx\n\nz\n'):
>   'x\n\nx\n\n'
>   6 6 '' -> 'y\n\n'
> - 'x\n\n'
> - 9 9 '' -> 'y\n\n'
> - 'x\n\nz\n'
> + 'x\n'
> + 8 8 '' -> '\ny\n'
> + '\nx\n\nz\n'
>  showdiff(
>    'a\nb\nb\nb\nc\n.\nd\ne\n.\nf\n',
>    'a\nb\nb\na\nb\nb\nb\nc\n.\nb\nc\n.\nd\ne\nf\n'):
> diff --git a/tests/test-check-code.t b/tests/test-check-code.t
> --- a/tests/test-check-code.t
> +++ b/tests/test-check-code.t
> @@ -15,10 +15,6 @@ New errors are not allowed. Warnings are
>    Skipping hgext/fsmonitor/pywatchman/msc_stdint.h it has no-che?k-code (glob)
>    Skipping hgext/fsmonitor/pywatchman/pybser.py it has no-che?k-code (glob)
>    Skipping i18n/polib.py it has no-che?k-code (glob)
> -  mercurial/bdiff_module.c:89:
> -   > 	//printf("common prefix: %i next line: %i\n", li, llf);
> -   don't use //-style comments
>    Skipping mercurial/httpclient/__init__.py it has no-che?k-code (glob)
>    Skipping mercurial/httpclient/_readers.py it has no-che?k-code (glob)
>    Skipping mercurial/statprof.py it has no-che?k-code (glob)
> -  [1]
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
Yuya Nishihara - Dec. 15, 2016, 1:02 p.m.
On Thu, 15 Dec 2016 09:01:11 +0100, Pierre-Yves David wrote:
> On 11/17/2016 06:16 PM, Mads Kiilerich wrote:
> > # HG changeset patch
> > # User Mads Kiilerich <madski@unity3d.com>
> > # Date 1479322051 -3600
> > #      Wed Nov 16 19:47:31 2016 +0100
> > # Node ID 851a14d5b80639efe61bd01ad215479c99106b40
> > # Parent  7f39bccc1c96bffc83f3c6e774da57ecd38982a7
> > bdiff: early pruning of common suffix before doing expensive computations
> >
> > Similar to the previous change, but trimming trailing lines.
> 
> The change itself seems fine and the perf win is very tasty. However, 
> this changesets breaks tests with --pure.
> 
> I'm tempted to drop it unless we can provide a test update to fold into 
> this.

[...]

> > --- a/mercurial/bdiff_module.c
> > +++ b/mercurial/bdiff_module.c
> > @@ -66,7 +66,7 @@ static PyObject *bdiff(PyObject *self, P
> >  	struct bdiff_line *al, *bl;
> >  	struct bdiff_hunk l, *h;
> >  	int an, bn, count;
> > -	Py_ssize_t len = 0, la, lb, li = 0, lcommon = 0, lmax;
> > +	Py_ssize_t len = 0, la, lb, li = 0, lcommon = 0, lcommonend = 0, lmax;
> >  	PyThreadState *_save;
> >
> >  	l.next = NULL;
> > @@ -88,9 +88,16 @@ static PyObject *bdiff(PyObject *self, P
> >  		if (*ia == '\n')
> >  			lcommon = li + 1;
> >  	/* we can almost add: if (li == lmax) lcommon = li; */
> > +	lmax -= lcommon;
> > +	for (li = 0, ia = sa + la - 1, ib = sb + lb - 1;
> > +	     li <= lmax && *ia == *ib;
> > +	     ++li, --ia, --ib)
> > +		if (*ia == '\n')
> > +			lcommonend = li;

And there appears to be an off-by-one error. 'li' is a zero-based index and
'lmax' is a length.

+1 for dropping this for now.
Augie Fackler - Dec. 15, 2016, 1:09 p.m.
> On Dec 15, 2016, at 8:02 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> 
> On Thu, 15 Dec 2016 09:01:11 +0100, Pierre-Yves David wrote:
>> On 11/17/2016 06:16 PM, Mads Kiilerich wrote:
>>> # HG changeset patch
>>> # User Mads Kiilerich <madski@unity3d.com>
>>> # Date 1479322051 -3600
>>> #      Wed Nov 16 19:47:31 2016 +0100
>>> # Node ID 851a14d5b80639efe61bd01ad215479c99106b40
>>> # Parent  7f39bccc1c96bffc83f3c6e774da57ecd38982a7
>>> bdiff: early pruning of common suffix before doing expensive computations
>>> 
>>> Similar to the previous change, but trimming trailing lines.
>> 
>> The change itself seems fine and the perf win is very tasty. However, 
>> this changesets breaks tests with --pure.
>> 
>> I'm tempted to drop it unless we can provide a test update to fold into 
>> this.
> 
> […]

I’d be fine to just #if pure the variations between pure and C, given the immense win this provides.

> 
>>> --- a/mercurial/bdiff_module.c
>>> +++ b/mercurial/bdiff_module.c
>>> @@ -66,7 +66,7 @@ static PyObject *bdiff(PyObject *self, P
>>> 	struct bdiff_line *al, *bl;
>>> 	struct bdiff_hunk l, *h;
>>> 	int an, bn, count;
>>> -	Py_ssize_t len = 0, la, lb, li = 0, lcommon = 0, lmax;
>>> +	Py_ssize_t len = 0, la, lb, li = 0, lcommon = 0, lcommonend = 0, lmax;
>>> 	PyThreadState *_save;
>>> 
>>> 	l.next = NULL;
>>> @@ -88,9 +88,16 @@ static PyObject *bdiff(PyObject *self, P
>>> 		if (*ia == '\n')
>>> 			lcommon = li + 1;
>>> 	/* we can almost add: if (li == lmax) lcommon = li; */
>>> +	lmax -= lcommon;
>>> +	for (li = 0, ia = sa + la - 1, ib = sb + lb - 1;
>>> +	     li <= lmax && *ia == *ib;
>>> +	     ++li, --ia, --ib)
>>> +		if (*ia == '\n')
>>> +			lcommonend = li;
> 
> And there appears to be an off-by-one error. 'li' is a zero-based index and
> 'lmax' is a length.
> 
> +1 for dropping this for now.

Given the off-by-one, yeah, drop it. I’m a little surprised the bdiff automated stress tester didn’t trip over that.

Sigh.
Pierre-Yves David - Dec. 15, 2016, 1:55 p.m.
On 12/15/2016 02:09 PM, Augie Fackler wrote:
>
>> On Dec 15, 2016, at 8:02 AM, Yuya Nishihara <yuya@tcha.org> wrote:
>>
>> On Thu, 15 Dec 2016 09:01:11 +0100, Pierre-Yves David wrote:
>>> On 11/17/2016 06:16 PM, Mads Kiilerich wrote:
>>>> # HG changeset patch
>>>> # User Mads Kiilerich <madski@unity3d.com>
>>>> # Date 1479322051 -3600
>>>> #      Wed Nov 16 19:47:31 2016 +0100
>>>> # Node ID 851a14d5b80639efe61bd01ad215479c99106b40
>>>> # Parent  7f39bccc1c96bffc83f3c6e774da57ecd38982a7
>>>> bdiff: early pruning of common suffix before doing expensive computations
>>>>
>>>> Similar to the previous change, but trimming trailing lines.
>>>
>>> The change itself seems fine and the perf win is very tasty. However,
>>> this changesets breaks tests with --pure.
>>>
>>> I'm tempted to drop it unless we can provide a test update to fold into
>>> this.
>>
>> […]
>
> I’d be fine to just #if pure the variations between pure and C, given the immense win this provides.

The previous changesets (already public) is also breaking pure. Can one 
of you submit the necessary test update?

>>>> --- a/mercurial/bdiff_module.c
>>>> +++ b/mercurial/bdiff_module.c
>>>> @@ -66,7 +66,7 @@ static PyObject *bdiff(PyObject *self, P
>>>> 	struct bdiff_line *al, *bl;
>>>> 	struct bdiff_hunk l, *h;
>>>> 	int an, bn, count;
>>>> -	Py_ssize_t len = 0, la, lb, li = 0, lcommon = 0, lmax;
>>>> +	Py_ssize_t len = 0, la, lb, li = 0, lcommon = 0, lcommonend = 0, lmax;
>>>> 	PyThreadState *_save;
>>>>
>>>> 	l.next = NULL;
>>>> @@ -88,9 +88,16 @@ static PyObject *bdiff(PyObject *self, P
>>>> 		if (*ia == '\n')
>>>> 			lcommon = li + 1;
>>>> 	/* we can almost add: if (li == lmax) lcommon = li; */
>>>> +	lmax -= lcommon;
>>>> +	for (li = 0, ia = sa + la - 1, ib = sb + lb - 1;
>>>> +	     li <= lmax && *ia == *ib;
>>>> +	     ++li, --ia, --ib)
>>>> +		if (*ia == '\n')
>>>> +			lcommonend = li;
>>
>> And there appears to be an off-by-one error. 'li' is a zero-based index and
>> 'lmax' is a length.
>>
>> +1 for dropping this for now.
>
> Given the off-by-one, yeah, drop it. I’m a little surprised the bdiff automated stress tester didn’t trip over that.
Augie Fackler - Dec. 15, 2016, 1:59 p.m.
On Thu, Dec 15, 2016 at 8:55 AM, Pierre-Yves David
<pierre-yves.david@ens-lyon.org> wrote:
> The previous changesets (already public) is also breaking pure. Can one of
> you submit the necessary test update?

Yeah, I'll draft it and push it within the next hour.
Pierre-Yves David - Dec. 15, 2016, 2 p.m.
On 12/15/2016 02:59 PM, Augie Fackler wrote:
> On Thu, Dec 15, 2016 at 8:55 AM, Pierre-Yves David
> <pierre-yves.david@ens-lyon.org> wrote:
>> The previous changesets (already public) is also breaking pure. Can one of
>> you submit the necessary test update?
>
> Yeah, I'll draft it and push it within the next hour.

Thanks!

Patch

--- /home/marmoute/src/mercurial-dev/tests/test-bdiff.py.out
+++ /home/marmoute/src/mercurial-dev/tests/test-bdiff.py.err
@@ -28,15 +28,16 @@ 
    'x\n\nx\n\ny\n\nx\n\ny\n\nx\n\nz\n'):
   'x\n\nx\n\n'
   6 6 '' -> 'y\n\n'
- 'x\n'
- 8 8 '' -> '\ny\n'
- '\nx\n\nz\n'
+ 'x\n\n'
+ 9 9 '' -> 'y\n\n'
+ 'x\n\nz\n'
  showdiff(
    'a\nb\nb\nb\nc\n.\nd\ne\n.\nf\n',
    'a\nb\nb\na\nb\nb\nb\nc\n.\nb\nc\n.\nd\ne\nf\n'):
- 'a\nb\nb\n'
- 6 6 '' -> 'a\nb\nb\nb\nc\n.\n'
- 'b\nc\n.\nd\ne\n'
+ 0 0 '' -> 'a\nb\nb\n'
+ 'a\nb\nb\nb\nc\n.\n'
+ 12 12 '' -> 'b\nc\n.\n'
+ 'd\ne\n'
   16 18 '.\n' -> ''
   'f\n'
  done