Patchwork [STABLE] parsers: fix width of datalen variable in fm1readmarkers

login
register
mail settings
Submitter Yuya Nishihara
Date Nov. 7, 2015, 10:59 a.m.
Message ID <70776f5ecbcd42f94cbe.1446893994@mimosa>
Download mbox | patch
Permalink /patch/11318/
State Accepted
Commit ce03e72837c6a69ae7379f97e33c584f9beb123d
Delegated to: Pierre-Yves David
Headers show

Comments

Yuya Nishihara - Nov. 7, 2015, 10:59 a.m.
# HG changeset patch
# User Yuya Nishihara <yuya@tcha.org>
# Date 1446885800 -32400
#      Sat Nov 07 17:43:20 2015 +0900
# Branch stable
# Node ID 70776f5ecbcd42f94cbe2d3c81f858b5bff8e466
# Parent  10a1a4b3e775d6024a82937df6bcc4188a315124
parsers: fix width of datalen variable in fm1readmarkers

Because parsers.c does not define PY_SSIZE_T_CLEAN, "s#" format requires
(const char*, int), not (const char*, Py_ssize_t).

https://docs.python.org/2/c-api/arg.html

This error had no problem before 042344313939, where datalen wasn't used.
But now fm1readmarkers() fails with "overflow in obsstore" on Python 2.6.9
(amd64) because upper bits of datalen seem to be filled with 1, making it
a negative integer.

This problem seems not visible on our Python 2.7 environment because upper
bits happen to be filled with 0.
Pierre-Yves David - Nov. 7, 2015, 9:27 p.m.
On 11/07/2015 05:59 AM, Yuya Nishihara wrote:
> # HG changeset patch
> # User Yuya Nishihara <yuya@tcha.org>
> # Date 1446885800 -32400
> #      Sat Nov 07 17:43:20 2015 +0900
> # Branch stable
> # Node ID 70776f5ecbcd42f94cbe2d3c81f858b5bff8e466
> # Parent  10a1a4b3e775d6024a82937df6bcc4188a315124
> parsers: fix width of datalen variable in fm1readmarkers


>
> Because parsers.c does not define PY_SSIZE_T_CLEAN, "s#" format requires
> (const char*, int), not (const char*, Py_ssize_t).

very nice catch

My only fear here is that "int" is not actually defined as 32bits in the 
standard. Did the C compiler world came to its senses about that or do 
we have to go for some stdint stuff?

Given that mercurial/util.h is using uint32_t, I assume we should 
probably use it here too.


>
> https://docs.python.org/2/c-api/arg.html
>
> This error had no problem before 042344313939, where datalen wasn't used.
> But now fm1readmarkers() fails with "overflow in obsstore" on Python 2.6.9
> (amd64) because upper bits of datalen seem to be filled with 1, making it
> a negative integer.
>
> This problem seems not visible on our Python 2.7 environment because upper
> bits happen to be filled with 0.
>
> diff --git a/mercurial/parsers.c b/mercurial/parsers.c
> --- a/mercurial/parsers.c
> +++ b/mercurial/parsers.c
> @@ -2692,7 +2692,7 @@ bail:
>
>   static PyObject *fm1readmarkers(PyObject *self, PyObject *args) {
>   	const char *data, *dataend;
> -	Py_ssize_t datalen;
> +	int datalen;
>   	Py_ssize_t offset, stop;
>   	PyObject *markers = NULL;
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> https://selenic.com/mailman/listinfo/mercurial-devel
>
Yuya Nishihara - Nov. 8, 2015, 5:48 a.m.
On Sat, 07 Nov 2015 16:27:43 -0500, Pierre-Yves David wrote:
> On 11/07/2015 05:59 AM, Yuya Nishihara wrote:
> > # HG changeset patch
> > # User Yuya Nishihara <yuya@tcha.org>
> > # Date 1446885800 -32400
> > #      Sat Nov 07 17:43:20 2015 +0900
> > # Branch stable
> > # Node ID 70776f5ecbcd42f94cbe2d3c81f858b5bff8e466
> > # Parent  10a1a4b3e775d6024a82937df6bcc4188a315124
> > parsers: fix width of datalen variable in fm1readmarkers
> 
> 
> >
> > Because parsers.c does not define PY_SSIZE_T_CLEAN, "s#" format requires
> > (const char*, int), not (const char*, Py_ssize_t).
> 
> very nice catch
> 
> My only fear here is that "int" is not actually defined as 32bits in the 
> standard. Did the C compiler world came to its senses about that or do 
> we have to go for some stdint stuff?
> 
> Given that mercurial/util.h is using uint32_t, I assume we should 
> probably use it here too.

No, "int" is correct here because that is the requirement of PyArg_ParseTuple().
If PY_SSIZE_T_CLEAN is not defined, the argument is processed as an int.
Therefore, upper bits of "Py_ssize_t datalen" was uninitialized on little-endian
machine.

https://hg.python.org/cpython/file/v2.7.10/Python/getargs.c#l585

Perhaps we should use PY_SSIZE_T_CLEAN consistently to avoid this kind of bugs.
Pierre-Yves David - Nov. 8, 2015, 4:54 p.m.
On 11/08/2015 12:48 AM, Yuya Nishihara wrote:
> On Sat, 07 Nov 2015 16:27:43 -0500, Pierre-Yves David wrote:
>> On 11/07/2015 05:59 AM, Yuya Nishihara wrote:
>>> # HG changeset patch
>>> # User Yuya Nishihara <yuya@tcha.org>
>>> # Date 1446885800 -32400
>>> #      Sat Nov 07 17:43:20 2015 +0900
>>> # Branch stable
>>> # Node ID 70776f5ecbcd42f94cbe2d3c81f858b5bff8e466
>>> # Parent  10a1a4b3e775d6024a82937df6bcc4188a315124
>>> parsers: fix width of datalen variable in fm1readmarkers
>>
>>
>>>
>>> Because parsers.c does not define PY_SSIZE_T_CLEAN, "s#" format requires
>>> (const char*, int), not (const char*, Py_ssize_t).
>>
>> very nice catch
>>
>> My only fear here is that "int" is not actually defined as 32bits in the
>> standard. Did the C compiler world came to its senses about that or do
>> we have to go for some stdint stuff?
>>
>> Given that mercurial/util.h is using uint32_t, I assume we should
>> probably use it here too.
>
> No, "int" is correct here because that is the requirement of PyArg_ParseTuple().
> If PY_SSIZE_T_CLEAN is not defined, the argument is processed as an int.
> Therefore, upper bits of "Py_ssize_t datalen" was uninitialized on little-endian
> machine.
>
> https://hg.python.org/cpython/file/v2.7.10/Python/getargs.c#l585
>
> Perhaps we should use PY_SSIZE_T_CLEAN consistently to avoid this kind of bugs.

Okay, I've it pushed to the clowncopter. Thanks!

Patch

diff --git a/mercurial/parsers.c b/mercurial/parsers.c
--- a/mercurial/parsers.c
+++ b/mercurial/parsers.c
@@ -2692,7 +2692,7 @@  bail:
 
 static PyObject *fm1readmarkers(PyObject *self, PyObject *args) {
 	const char *data, *dataend;
-	Py_ssize_t datalen;
+	int datalen;
 	Py_ssize_t offset, stop;
 	PyObject *markers = NULL;