Patchwork [4,of,6] revlog: add public CPython function to get parent revisions

login
register
mail settings
Submitter Yuya Nishihara
Date Dec. 5, 2018, 1:43 p.m.
Message ID <716a73bab79be30c20c7.1544017415@mimosa>
Download mbox | patch
Permalink /patch/36977/
State Accepted
Headers show

Comments

Yuya Nishihara - Dec. 5, 2018, 1:43 p.m.
# HG changeset patch
# User Yuya Nishihara <yuya@tcha.org>
# Date 1543756237 -32400
#      Sun Dec 02 22:10:37 2018 +0900
# Node ID 716a73bab79be30c20c75e13324c44205d5e2120
# Parent  3842abba948cd7f4bb3fad6805265a35fb94a083
revlog: add public CPython function to get parent revisions

Since this is a public function, it validates the input revision, and supports
nullrev. index_get_parents_checked() will be replaced by this function.
Yuya Nishihara - Dec. 9, 2018, 5:02 a.m.
On Wed, 05 Dec 2018 22:43:35 +0900, Yuya Nishihara wrote:
> # HG changeset patch
> # User Yuya Nishihara <yuya@tcha.org>
> # Date 1543756237 -32400
> #      Sun Dec 02 22:10:37 2018 +0900
> # Node ID 716a73bab79be30c20c75e13324c44205d5e2120
> # Parent  3842abba948cd7f4bb3fad6805265a35fb94a083
> revlog: add public CPython function to get parent revisions

> +/*
> + * Get parents of the given rev.
> + *
> + * If the specified rev is out of range, IndexError will be raised. If the
> + * revlog entry is corrupted, ValueError may be raised.
> + *
> + * Returns 0 on success or -1 on failure.
> + */
> +int HgRevlogIndex_GetParents(PyObject *op, int rev, int *ps)

This is based on the idea that the Rust module will be statically linked with
the cext objects. I thought that would be easier for our use case, package-local
visibility, but I'm not certain. If that makes things complicated, maybe we
should choose dynamic linking and wrap the function with PyCapsule, as you did
in the PoC code.
Georges Racinet - Dec. 10, 2018, 5:54 p.m.
On 12/9/18 6:02 AM, Yuya Nishihara wrote:
> On Wed, 05 Dec 2018 22:43:35 +0900, Yuya Nishihara wrote:
>> # HG changeset patch
>> # User Yuya Nishihara <yuya@tcha.org>
>> # Date 1543756237 -32400
>> #      Sun Dec 02 22:10:37 2018 +0900
>> # Node ID 716a73bab79be30c20c75e13324c44205d5e2120
>> # Parent  3842abba948cd7f4bb3fad6805265a35fb94a083
>> revlog: add public CPython function to get parent revisions
>> +/*
>> + * Get parents of the given rev.
>> + *
>> + * If the specified rev is out of range, IndexError will be raised. If the
>> + * revlog entry is corrupted, ValueError may be raised.
>> + *
>> + * Returns 0 on success or -1 on failure.
>> + */
>> +int HgRevlogIndex_GetParents(PyObject *op, int rev, int *ps)
> This is based on the idea that the Rust module will be statically linked with
> the cext objects. I thought that would be easier for our use case, package-local
> visibility, but I'm not certain. If that makes things complicated, maybe we
> should choose dynamic linking and wrap the function with PyCapsule, as you did
> in the PoC code.

Yes, it's true in the direct-ffi code, I passed the function pointer
around because I was wary of a loop in dependencies (that's a bit silly
since we can't avoid linking the Rust extension within the parsers
module anyway), but also I didn't want to touch the existing C code too
much for my first patch set. This version you're proposing feels simpler
to me.

Using a capsule in that context wouldn't be much complicated either, all
we'd need in hg-direct-ffi would be to declare and call PyCapsule_Import
(it's obviously designed to use very few CPython API concepts, only
needs a char * for the name). The advantage it'd have then would be that
the same capsule could be used for all types of FFI, and we could do the
same for other functions we could need later on (maybe in other
packages). Adding a new capsule seems less risky for people like me, who
aren't as familiar with mercurial's C code as you are or, if you prefer
to see it that way, will require less review.

To be more explicit for other mailing-list subscribers, here's the
change in revlog.c I'm doing in my PoC CPython code (this is inside the
module init function) :

@@ -2846,6 +2846,12 @@
        if (nullentry)
                PyObject_GC_UnTrack(nullentry);
 
+       void *caps = PyCapsule_New(
+           index_get_parents,
"mercurial.cext.parsers.index_get_parents_CAPI",
+           NULL);
+       if (caps != NULL)
+               PyModule_AddObject(mod, "index_get_parents_CAPI", caps);
+
 #ifdef WITH_RUST
        rustlazyancestorsType.tp_new = PyType_GenericNew;
        if (PyType_Ready(&rustlazyancestorsType) < 0)

So, to summarize, I think we should maybe promote capsules as the
preferred way to interface Rust code with inner C code. It wouldn't be
hard to document either. What do you think ?

And yes, I should submit formally those rust-cpython bindings sooner
than later.

Regards,
Yuya Nishihara - Dec. 11, 2018, 1:10 p.m.
On Mon, 10 Dec 2018 18:54:56 +0100, Georges Racinet wrote:
> On 12/9/18 6:02 AM, Yuya Nishihara wrote:
> > On Wed, 05 Dec 2018 22:43:35 +0900, Yuya Nishihara wrote:
> >> # HG changeset patch
> >> # User Yuya Nishihara <yuya@tcha.org>
> >> # Date 1543756237 -32400
> >> #      Sun Dec 02 22:10:37 2018 +0900
> >> # Node ID 716a73bab79be30c20c75e13324c44205d5e2120
> >> # Parent  3842abba948cd7f4bb3fad6805265a35fb94a083
> >> revlog: add public CPython function to get parent revisions
> >> +/*
> >> + * Get parents of the given rev.
> >> + *
> >> + * If the specified rev is out of range, IndexError will be raised. If the
> >> + * revlog entry is corrupted, ValueError may be raised.
> >> + *
> >> + * Returns 0 on success or -1 on failure.
> >> + */
> >> +int HgRevlogIndex_GetParents(PyObject *op, int rev, int *ps)
> > This is based on the idea that the Rust module will be statically linked with
> > the cext objects. I thought that would be easier for our use case, package-local
> > visibility, but I'm not certain. If that makes things complicated, maybe we
> > should choose dynamic linking and wrap the function with PyCapsule, as you did
> > in the PoC code.
> 
> Yes, it's true in the direct-ffi code, I passed the function pointer
> around because I was wary of a loop in dependencies (that's a bit silly
> since we can't avoid linking the Rust extension within the parsers
> module anyway), but also I didn't want to touch the existing C code too
> much for my first patch set. This version you're proposing feels simpler
> to me.
> 
> Using a capsule in that context wouldn't be much complicated either, all
> we'd need in hg-direct-ffi would be to declare and call PyCapsule_Import
> (it's obviously designed to use very few CPython API concepts, only
> needs a char * for the name). The advantage it'd have then would be that
> the same capsule could be used for all types of FFI, and we could do the
> same for other functions we could need later on (maybe in other
> packages). Adding a new capsule seems less risky for people like me, who
> aren't as familiar with mercurial's C code as you are or, if you prefer
> to see it that way, will require less review.
> 
> To be more explicit for other mailing-list subscribers, here's the
> change in revlog.c I'm doing in my PoC CPython code (this is inside the
> module init function) :
> 
> @@ -2846,6 +2846,12 @@
>         if (nullentry)
>                 PyObject_GC_UnTrack(nullentry);
>  
> +       void *caps = PyCapsule_New(
> +           index_get_parents,
> "mercurial.cext.parsers.index_get_parents_CAPI",
> +           NULL);
> +       if (caps != NULL)
> +               PyModule_AddObject(mod, "index_get_parents_CAPI", caps);
> +
>  #ifdef WITH_RUST
>         rustlazyancestorsType.tp_new = PyType_GenericNew;
>         if (PyType_Ready(&rustlazyancestorsType) < 0)
> 
> So, to summarize, I think we should maybe promote capsules as the
> preferred way to interface Rust code with inner C code. It wouldn't be
> hard to document either. What do you think ?

My only concern about using PyCapsule is, we would have to get around some
"static mut" variables in Rust if the cost of the name resolution matters.
That's what I noticed while reading your rust/hg-cpython code.

To be clear, I'm not against using PyCapsule. It's documented as the "right"
way to interface C-level APIs across modules. I wrote this patch before
reading your upcoming series. That's the only reason I made the GetParents()
function public.
Georges Racinet - Dec. 12, 2018, 11:17 a.m.
On 12/11/18 2:10 PM, Yuya Nishihara wrote:
> On Mon, 10 Dec 2018 18:54:56 +0100, Georges Racinet wrote:
>> On 12/9/18 6:02 AM, Yuya Nishihara wrote:
>>> On Wed, 05 Dec 2018 22:43:35 +0900, Yuya Nishihara wrote:
>>>> # HG changeset patch
>>>> # User Yuya Nishihara <yuya@tcha.org>
>>>> # Date 1543756237 -32400
>>>> #      Sun Dec 02 22:10:37 2018 +0900
>>>> # Node ID 716a73bab79be30c20c75e13324c44205d5e2120
>>>> # Parent  3842abba948cd7f4bb3fad6805265a35fb94a083
>>>> revlog: add public CPython function to get parent revisions
>>>> +/*
>>>> + * Get parents of the given rev.
>>>> + *
>>>> + * If the specified rev is out of range, IndexError will be raised. If the
>>>> + * revlog entry is corrupted, ValueError may be raised.
>>>> + *
>>>> + * Returns 0 on success or -1 on failure.
>>>> + */
>>>> +int HgRevlogIndex_GetParents(PyObject *op, int rev, int *ps)
>>> This is based on the idea that the Rust module will be statically linked with
>>> the cext objects. I thought that would be easier for our use case, package-local
>>> visibility, but I'm not certain. If that makes things complicated, maybe we
>>> should choose dynamic linking and wrap the function with PyCapsule, as you did
>>> in the PoC code.
>> Yes, it's true in the direct-ffi code, I passed the function pointer
>> around because I was wary of a loop in dependencies (that's a bit silly
>> since we can't avoid linking the Rust extension within the parsers
>> module anyway), but also I didn't want to touch the existing C code too
>> much for my first patch set. This version you're proposing feels simpler
>> to me.
>>
>> Using a capsule in that context wouldn't be much complicated either, all
>> we'd need in hg-direct-ffi would be to declare and call PyCapsule_Import
>> (it's obviously designed to use very few CPython API concepts, only
>> needs a char * for the name). The advantage it'd have then would be that
>> the same capsule could be used for all types of FFI, and we could do the
>> same for other functions we could need later on (maybe in other
>> packages). Adding a new capsule seems less risky for people like me, who
>> aren't as familiar with mercurial's C code as you are or, if you prefer
>> to see it that way, will require less review.
>>
>> To be more explicit for other mailing-list subscribers, here's the
>> change in revlog.c I'm doing in my PoC CPython code (this is inside the
>> module init function) :
>>
>> @@ -2846,6 +2846,12 @@
>>         if (nullentry)
>>                 PyObject_GC_UnTrack(nullentry);
>>  
>> +       void *caps = PyCapsule_New(
>> +           index_get_parents,
>> "mercurial.cext.parsers.index_get_parents_CAPI",
>> +           NULL);
>> +       if (caps != NULL)
>> +               PyModule_AddObject(mod, "index_get_parents_CAPI", caps);
>> +
>>  #ifdef WITH_RUST
>>         rustlazyancestorsType.tp_new = PyType_GenericNew;
>>         if (PyType_Ready(&rustlazyancestorsType) < 0)
>>
>> So, to summarize, I think we should maybe promote capsules as the
>> preferred way to interface Rust code with inner C code. It wouldn't be
>> hard to document either. What do you think ?
> My only concern about using PyCapsule is, we would have to get around some
> "static mut" variables in Rust if the cost of the name resolution matters.
> That's what I noticed while reading your rust/hg-cpython code.
I understand that concern. Would a protection with std::sync::Once be
enough to address it? The official Rust doc
(https://doc.rust-lang.org/std/sync/struct.Once.html) claims that the
static mut is then actually safe.
>
> To be clear, I'm not against using PyCapsule. It's documented as the "right"
> way to interface C-level APIs across modules. I wrote this patch before
> reading your upcoming series. That's the only reason I made the GetParents()
> function public.

About that upcoming series, I should be able to submit it real soon now:
I don't consider https://github.com/dgrunwald/rust-cpython/issues/164 to
be a blocker anymore.

Regards,
Yuya Nishihara - Dec. 12, 2018, 12:15 p.m.
On Wed, 12 Dec 2018 12:17:15 +0100, Georges Racinet wrote:
> On 12/11/18 2:10 PM, Yuya Nishihara wrote:
> > On Mon, 10 Dec 2018 18:54:56 +0100, Georges Racinet wrote:
> >> On 12/9/18 6:02 AM, Yuya Nishihara wrote:
> >>> On Wed, 05 Dec 2018 22:43:35 +0900, Yuya Nishihara wrote:
> >>>> # HG changeset patch
> >>>> # User Yuya Nishihara <yuya@tcha.org>
> >>>> # Date 1543756237 -32400
> >>>> #      Sun Dec 02 22:10:37 2018 +0900
> >>>> # Node ID 716a73bab79be30c20c75e13324c44205d5e2120
> >>>> # Parent  3842abba948cd7f4bb3fad6805265a35fb94a083
> >>>> revlog: add public CPython function to get parent revisions
> >>>> +/*
> >>>> + * Get parents of the given rev.
> >>>> + *
> >>>> + * If the specified rev is out of range, IndexError will be raised. If the
> >>>> + * revlog entry is corrupted, ValueError may be raised.
> >>>> + *
> >>>> + * Returns 0 on success or -1 on failure.
> >>>> + */
> >>>> +int HgRevlogIndex_GetParents(PyObject *op, int rev, int *ps)
> >>> This is based on the idea that the Rust module will be statically linked with
> >>> the cext objects. I thought that would be easier for our use case, package-local
> >>> visibility, but I'm not certain. If that makes things complicated, maybe we
> >>> should choose dynamic linking and wrap the function with PyCapsule, as you did
> >>> in the PoC code.
> >> Yes, it's true in the direct-ffi code, I passed the function pointer
> >> around because I was wary of a loop in dependencies (that's a bit silly
> >> since we can't avoid linking the Rust extension within the parsers
> >> module anyway), but also I didn't want to touch the existing C code too
> >> much for my first patch set. This version you're proposing feels simpler
> >> to me.
> >>
> >> Using a capsule in that context wouldn't be much complicated either, all
> >> we'd need in hg-direct-ffi would be to declare and call PyCapsule_Import
> >> (it's obviously designed to use very few CPython API concepts, only
> >> needs a char * for the name). The advantage it'd have then would be that
> >> the same capsule could be used for all types of FFI, and we could do the
> >> same for other functions we could need later on (maybe in other
> >> packages). Adding a new capsule seems less risky for people like me, who
> >> aren't as familiar with mercurial's C code as you are or, if you prefer
> >> to see it that way, will require less review.
> >>
> >> To be more explicit for other mailing-list subscribers, here's the
> >> change in revlog.c I'm doing in my PoC CPython code (this is inside the
> >> module init function) :
> >>
> >> @@ -2846,6 +2846,12 @@
> >>         if (nullentry)
> >>                 PyObject_GC_UnTrack(nullentry);
> >>  
> >> +       void *caps = PyCapsule_New(
> >> +           index_get_parents,
> >> "mercurial.cext.parsers.index_get_parents_CAPI",
> >> +           NULL);
> >> +       if (caps != NULL)
> >> +               PyModule_AddObject(mod, "index_get_parents_CAPI", caps);
> >> +
> >>  #ifdef WITH_RUST
> >>         rustlazyancestorsType.tp_new = PyType_GenericNew;
> >>         if (PyType_Ready(&rustlazyancestorsType) < 0)
> >>
> >> So, to summarize, I think we should maybe promote capsules as the
> >> preferred way to interface Rust code with inner C code. It wouldn't be
> >> hard to document either. What do you think ?
> > My only concern about using PyCapsule is, we would have to get around some
> > "static mut" variables in Rust if the cost of the name resolution matters.
> > That's what I noticed while reading your rust/hg-cpython code.
> I understand that concern. Would a protection with std::sync::Once be
> enough to address it? The official Rust doc
> (https://doc.rust-lang.org/std/sync/struct.Once.html) claims that the
> static mut is then actually safe.

Or simply remove the static mut? Since the function pointer is held by the
Index struct, we only need to resolve it twice per rustlazyancestors creation.
I guess the cost of PyCapsule_Import() would be negligible.

FWIW, I think CString allocation in decapsule_parents_fn() can be replaced
with CStr::from_bytes_with_nul_unchecked().

Patch

diff --git a/mercurial/cext/revlog.c b/mercurial/cext/revlog.c
--- a/mercurial/cext/revlog.c
+++ b/mercurial/cext/revlog.c
@@ -194,6 +194,33 @@  static inline int index_get_parents(inde
 	return 0;
 }
 
+/*
+ * Get parents of the given rev.
+ *
+ * If the specified rev is out of range, IndexError will be raised. If the
+ * revlog entry is corrupted, ValueError may be raised.
+ *
+ * Returns 0 on success or -1 on failure.
+ */
+int HgRevlogIndex_GetParents(PyObject *op, int rev, int *ps)
+{
+	int tiprev;
+	if (!op || !HgRevlogIndex_Check(op) || !ps) {
+		PyErr_BadInternalCall();
+		return -1;
+	}
+	tiprev = (int)index_length((indexObject *)op) - 1;
+	if (rev < -1 || rev > tiprev) {
+		PyErr_Format(PyExc_IndexError, "rev out of range: %d", rev);
+		return -1;
+	} else if (rev == -1) {
+		ps[0] = ps[1] = -1;
+		return 0;
+	} else {
+		return index_get_parents((indexObject *)op, rev, ps, tiprev);
+	}
+}
+
 static inline int64_t index_get_start(indexObject *self, Py_ssize_t rev)
 {
 	uint64_t offset;
diff --git a/mercurial/cext/revlog.h b/mercurial/cext/revlog.h
--- a/mercurial/cext/revlog.h
+++ b/mercurial/cext/revlog.h
@@ -12,4 +12,8 @@ 
 
 extern PyTypeObject HgRevlogIndex_Type;
 
+#define HgRevlogIndex_Check(op) PyObject_TypeCheck(op, &HgRevlogIndex_Type)
+
+int HgRevlogIndex_GetParents(PyObject *op, int rev, int *ps);
+
 #endif /* _HG_REVLOG_H_ */