Patchwork [6,of,7] pathencode: use Py_SIZE directly

login
register
mail settings
Submitter Gregory Szorc
Date Oct. 8, 2016, 8:48 p.m.
Message ID <40775aad0c78f6c1fd07.1475959692@gps-mbp.local>
Download mbox | patch
Permalink /patch/16965/
State Accepted
Headers show

Comments

Gregory Szorc - Oct. 8, 2016, 8:48 p.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1475958082 -7200
#      Sat Oct 08 22:21:22 2016 +0200
# Node ID 40775aad0c78f6c1fd07e7160d50426efbe032ed
# Parent  a33e93c20bc1290af106a0f077fece735ea05245
pathencode: use Py_SIZE directly

On Python 2, PyBytes_GET_SIZE is the same as PyString_GET_SIZE which
is the same as Py_SIZE which resolves to a struct member.

On Python 3, PyBytes_GET_SIZE is
"(assert(PyBytes_Check(op)),Py_SIZE(op))". The compiler barfs when
assigning to this version.

This patch simply changes PyBytes_GET_SIZE to Py_SIZE. On Python 2,
there is no effective change in behavior. On Python 3, we drop the
PyBytes_Check(). However, in all cases we have explicitly created
a PyBytesObject in the same function, so the PyBytes_Check() is
guaranteed to be true. Despite this, code changes over time, so
I've added added assert() in all callers so we can catch this in
debug builds.

With this patch, all mercurial.* C extensions now compile on Python 3
on my OS X machine. There are several compiler warnings and I'm sure
there are incompatibilities with Python 3, including possibly
segfaults. But it is a milestone.
Yuya Nishihara - Oct. 9, 2016, 12:38 p.m.
On Sat, 08 Oct 2016 22:48:12 +0200, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1475958082 -7200
> #      Sat Oct 08 22:21:22 2016 +0200
> # Node ID 40775aad0c78f6c1fd07e7160d50426efbe032ed
> # Parent  a33e93c20bc1290af106a0f077fece735ea05245
> pathencode: use Py_SIZE directly

> @@ -637,9 +638,10 @@ static PyObject *hashmangle(const char *
>  	if (lastdot >= 0)
>  		memcopy(dest, &destlen, destsize, &src[lastdot],
>  			len - lastdot - 1);
>  
> -	PyBytes_GET_SIZE(ret) = destlen;
> +	PyBytes_Check(ret);
> +	Py_SIZE(ret) = destlen;
>  
>  	return ret;
>  }
>  
> @@ -749,9 +751,10 @@ PyObject *pathencode(PyObject *self, PyO
>  
>  		newobj = PyBytes_FromStringAndSize(NULL, newlen);
>  
>  		if (newobj) {
> -			PyBytes_GET_SIZE(newobj)--;
> +			PyBytes_Check(newobj);
> +			Py_SIZE(newobj)--;

Maybe we need to wrap these PyBytes_Check()s by assert() ?
Gregory Szorc - Oct. 9, 2016, 1:15 p.m.
On Sun, Oct 9, 2016 at 2:38 PM, Yuya Nishihara <yuya@tcha.org> wrote:

> On Sat, 08 Oct 2016 22:48:12 +0200, Gregory Szorc wrote:
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc@gmail.com>
> > # Date 1475958082 -7200
> > #      Sat Oct 08 22:21:22 2016 +0200
> > # Node ID 40775aad0c78f6c1fd07e7160d50426efbe032ed
> > # Parent  a33e93c20bc1290af106a0f077fece735ea05245
> > pathencode: use Py_SIZE directly
>
> > @@ -637,9 +638,10 @@ static PyObject *hashmangle(const char *
> >       if (lastdot >= 0)
> >               memcopy(dest, &destlen, destsize, &src[lastdot],
> >                       len - lastdot - 1);
> >
> > -     PyBytes_GET_SIZE(ret) = destlen;
> > +     PyBytes_Check(ret);
> > +     Py_SIZE(ret) = destlen;
> >
> >       return ret;
> >  }
> >
> > @@ -749,9 +751,10 @@ PyObject *pathencode(PyObject *self, PyO
> >
> >               newobj = PyBytes_FromStringAndSize(NULL, newlen);
> >
> >               if (newobj) {
> > -                     PyBytes_GET_SIZE(newobj)--;
> > +                     PyBytes_Check(newobj);
> > +                     Py_SIZE(newobj)--;
>
> Maybe we need to wrap these PyBytes_Check()s by assert() ?
>

I was going for a literal 1:1 port with this patch. I can submit a
follow-up to convert to assert(). Is that acceptable?

FWIW, as part of the Python 3 port, I'll likely have to do a line-by-line
audit of the C extensions. I imagine I'll be submitting a lot of
miscellaneous fixups like converting certain things like this to assert().
Yuya Nishihara - Oct. 9, 2016, 1:28 p.m.
On Sun, 9 Oct 2016 15:15:34 +0200, Gregory Szorc wrote:
> On Sun, Oct 9, 2016 at 2:38 PM, Yuya Nishihara <yuya@tcha.org> wrote:
> > On Sat, 08 Oct 2016 22:48:12 +0200, Gregory Szorc wrote:
> > > # HG changeset patch
> > > # User Gregory Szorc <gregory.szorc@gmail.com>
> > > # Date 1475958082 -7200
> > > #      Sat Oct 08 22:21:22 2016 +0200
> > > # Node ID 40775aad0c78f6c1fd07e7160d50426efbe032ed
> > > # Parent  a33e93c20bc1290af106a0f077fece735ea05245
> > > pathencode: use Py_SIZE directly
> >
> > > @@ -637,9 +638,10 @@ static PyObject *hashmangle(const char *
> > >       if (lastdot >= 0)
> > >               memcopy(dest, &destlen, destsize, &src[lastdot],
> > >                       len - lastdot - 1);
> > >
> > > -     PyBytes_GET_SIZE(ret) = destlen;
> > > +     PyBytes_Check(ret);
> > > +     Py_SIZE(ret) = destlen;
> > >
> > >       return ret;
> > >  }
> > >
> > > @@ -749,9 +751,10 @@ PyObject *pathencode(PyObject *self, PyO
> > >
> > >               newobj = PyBytes_FromStringAndSize(NULL, newlen);
> > >
> > >               if (newobj) {
> > > -                     PyBytes_GET_SIZE(newobj)--;
> > > +                     PyBytes_Check(newobj);
> > > +                     Py_SIZE(newobj)--;
> >
> > Maybe we need to wrap these PyBytes_Check()s by assert() ?
> >
> 
> I was going for a literal 1:1 port with this patch. I can submit a
> follow-up to convert to assert(). Is that acceptable?

Seems fine. I just wondered why only the first PyBytes_Check() had assert().
Gregory Szorc - Oct. 9, 2016, 1:32 p.m.
On Sun, Oct 9, 2016 at 3:28 PM, Yuya Nishihara <yuya@tcha.org> wrote:

> On Sun, 9 Oct 2016 15:15:34 +0200, Gregory Szorc wrote:
> > On Sun, Oct 9, 2016 at 2:38 PM, Yuya Nishihara <yuya@tcha.org> wrote:
> > > On Sat, 08 Oct 2016 22:48:12 +0200, Gregory Szorc wrote:
> > > > # HG changeset patch
> > > > # User Gregory Szorc <gregory.szorc@gmail.com>
> > > > # Date 1475958082 -7200
> > > > #      Sat Oct 08 22:21:22 2016 +0200
> > > > # Node ID 40775aad0c78f6c1fd07e7160d50426efbe032ed
> > > > # Parent  a33e93c20bc1290af106a0f077fece735ea05245
> > > > pathencode: use Py_SIZE directly
> > >
> > > > @@ -637,9 +638,10 @@ static PyObject *hashmangle(const char *
> > > >       if (lastdot >= 0)
> > > >               memcopy(dest, &destlen, destsize, &src[lastdot],
> > > >                       len - lastdot - 1);
> > > >
> > > > -     PyBytes_GET_SIZE(ret) = destlen;
> > > > +     PyBytes_Check(ret);
> > > > +     Py_SIZE(ret) = destlen;
> > > >
> > > >       return ret;
> > > >  }
> > > >
> > > > @@ -749,9 +751,10 @@ PyObject *pathencode(PyObject *self, PyO
> > > >
> > > >               newobj = PyBytes_FromStringAndSize(NULL, newlen);
> > > >
> > > >               if (newobj) {
> > > > -                     PyBytes_GET_SIZE(newobj)--;
> > > > +                     PyBytes_Check(newobj);
> > > > +                     Py_SIZE(newobj)--;
> > >
> > > Maybe we need to wrap these PyBytes_Check()s by assert() ?
> > >
> >
> > I was going for a literal 1:1 port with this patch. I can submit a
> > follow-up to convert to assert(). Is that acceptable?
>
> Seems fine. I just wondered why only the first PyBytes_Check() had
> assert().
>

Oh, I may not have amended correctly. Be on the lookout for errant
assert()s in my other patches!

Patch

diff --git a/mercurial/pathencode.c b/mercurial/pathencode.c
--- a/mercurial/pathencode.c
+++ b/mercurial/pathencode.c
@@ -170,9 +170,10 @@  PyObject *encodedir(PyObject *self, PyOb
 
 	newobj = PyBytes_FromStringAndSize(NULL, newlen);
 
 	if (newobj) {
-		PyBytes_GET_SIZE(newobj)--;
+		assert(PyBytes_Check(newobj));
+		Py_SIZE(newobj)--;
 		_encodedir(PyBytes_AS_STRING(newobj), newlen, path,
 			   len + 1);
 	}
 
@@ -637,9 +638,10 @@  static PyObject *hashmangle(const char *
 	if (lastdot >= 0)
 		memcopy(dest, &destlen, destsize, &src[lastdot],
 			len - lastdot - 1);
 
-	PyBytes_GET_SIZE(ret) = destlen;
+	PyBytes_Check(ret);
+	Py_SIZE(ret) = destlen;
 
 	return ret;
 }
 
@@ -749,9 +751,10 @@  PyObject *pathencode(PyObject *self, PyO
 
 		newobj = PyBytes_FromStringAndSize(NULL, newlen);
 
 		if (newobj) {
-			PyBytes_GET_SIZE(newobj)--;
+			PyBytes_Check(newobj);
+			Py_SIZE(newobj)--;
 			basicencode(PyBytes_AS_STRING(newobj), newlen, path,
 				    len + 1);
 		}
 	}