Patchwork [10,of,10] rust-cpython: add safe way to map PyLeakedRef<&T> to PyLeakedRef<U>

login
register
mail settings
Submitter Yuya Nishihara
Date Sept. 22, 2019, 6:41 a.m.
Message ID <0d0d926ccd7f34a21250.1569134507@mimosa>
Download mbox | patch
Permalink /patch/41729/
State Accepted
Headers show

Comments

Yuya Nishihara - Sept. 22, 2019, 6:41 a.m.
# HG changeset patch
# User Yuya Nishihara <yuya@tcha.org>
# Date 1568553550 -32400
#      Sun Sep 15 22:19:10 2019 +0900
# Node ID 0d0d926ccd7f34a2125071d6e7e0f86e04026ba6
# Parent  77d76b966a64f83fb7be893f7fc4090f24d7e83c
rust-cpython: add safe way to map PyLeakedRef<&T> to PyLeakedRef<U>

And leverage it. Now leak_immutable() can be safely implemented as the
PyLeakedRef owns its inner data.
Yuya Nishihara - Oct. 8, 2019, 5:16 p.m.
On Sun, 22 Sep 2019 15:41:47 +0900, Yuya Nishihara wrote:
> # HG changeset patch
> # User Yuya Nishihara <yuya@tcha.org>
> # Date 1568553550 -32400
> #      Sun Sep 15 22:19:10 2019 +0900
> # Node ID 0d0d926ccd7f34a2125071d6e7e0f86e04026ba6
> # Parent  77d76b966a64f83fb7be893f7fc4090f24d7e83c
> rust-cpython: add safe way to map PyLeakedRef<&T> to PyLeakedRef<U>

> +    /// Converts the inner value by the given function.
> +    ///
> +    /// Typically `T` is a static reference to a container, and `U` is an
> +    /// iterator of that container.
> +    pub fn map<U>(
> +        mut self,
> +        py: Python,
> +        f: impl FnOnce(T) -> U,
> +    ) -> PyLeakedRef<U> {
> +        PyLeakedRef {
> +            inner: self.inner.clone_ref(py),
> +            data: Some(f(self.data.take().unwrap())),
> +            py_shared_state: self.py_shared_state,
> +        }
> +    }

Actually this can be unsafe since the lambda function 'f' could move the
argument "&'static X" out of the PyLeakedRef scope.

  leaked.map(py, |o| {
     out_of_scope_variable = o;
  });

Maybe I need to find a safe way to apply arbitrary function on static ref
without loosing the static-ness.

I think the patches 01-08 aren't suffered from this issue.
Raphaël Gomès - Oct. 10, 2019, 11:53 a.m.
Is it not possible to just take a `fn` type? It forces a non-capturing fn.

On 10/8/19 7:16 PM, Yuya Nishihara wrote:
> +    pub fn map<U>(
> +        mut self,
> +        py: Python,
> +        f: impl FnOnce(T) -> U,
> +    ) -> PyLeakedRef<U> {
> +        PyLeakedRef {
> +            inner: self.inner.clone_ref(py),
> +            data: Some(f(self.data.take().unwrap())),
> +            py_shared_state: self.py_shared_state,
> +        }
> +    }
Yuya Nishihara - Oct. 10, 2019, 2:34 p.m.
On Thu, 10 Oct 2019 13:53:54 +0200, Raphaël Gomès wrote:
> Is it not possible to just take a `fn` type? It forces a non-capturing fn.

Interesting idea, but still fn() would allow mutating a static variable.
IIRC, mutation of static variable can be safely written by using lazy_static
and Mutex.

> On 10/8/19 7:16 PM, Yuya Nishihara wrote:
> > +    pub fn map<U>(
> > +        mut self,
> > +        py: Python,
> > +        f: impl FnOnce(T) -> U,
> > +    ) -> PyLeakedRef<U> {
> > +        PyLeakedRef {
> > +            inner: self.inner.clone_ref(py),
> > +            data: Some(f(self.data.take().unwrap())),
> > +            py_shared_state: self.py_shared_state,
> > +        }
> > +    }

I'm feeling that it's good to leave leaked_ref.map() unsafe. The new API
should be less error-prone, and its safety requirement, "don't leak the
static reference", can be easily guaranteed.
Raphaël Gomès - Oct. 10, 2019, 7:21 p.m.
Indeed, I don't think there is a way to prevent this. We might get 
additional feedback when upstreaming to rust-cpython though.

Rest of the series looks good to me. This is all a needed improvement.

On 10/10/19 4:34 PM, Yuya Nishihara wrote:
> On Thu, 10 Oct 2019 13:53:54 +0200, Raphaël Gomès wrote:
>> Is it not possible to just take a `fn` type? It forces a non-capturing fn.
> Interesting idea, but still fn() would allow mutating a static variable.
> IIRC, mutation of static variable can be safely written by using lazy_static
> and Mutex.
>
>> On 10/8/19 7:16 PM, Yuya Nishihara wrote:
>>> +    pub fn map<U>(
>>> +        mut self,
>>> +        py: Python,
>>> +        f: impl FnOnce(T) -> U,
>>> +    ) -> PyLeakedRef<U> {
>>> +        PyLeakedRef {
>>> +            inner: self.inner.clone_ref(py),
>>> +            data: Some(f(self.data.take().unwrap())),
>>> +            py_shared_state: self.py_shared_state,
>>> +        }
>>> +    }
> I'm feeling that it's good to leave leaked_ref.map() unsafe. The new API
> should be less error-prone, and its safety requirement, "don't leak the
> static reference", can be easily guaranteed.
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Raphaël Gomès - Oct. 11, 2019, 8:27 a.m.
Also, rust-cpython will need tests to upstream. I also think that it's a 
good idea to have tests when dealing with unsafe boundaries.

On 10/10/19 9:21 PM, Raphaël Gomès wrote:
> Indeed, I don't think there is a way to prevent this. We might get 
> additional feedback when upstreaming to rust-cpython though.
>
> Rest of the series looks good to me. This is all a needed improvement.
>
> On 10/10/19 4:34 PM, Yuya Nishihara wrote:
>> On Thu, 10 Oct 2019 13:53:54 +0200, Raphaël Gomès wrote:
>>> Is it not possible to just take a `fn` type? It forces a 
>>> non-capturing fn.
>> Interesting idea, but still fn() would allow mutating a static variable.
>> IIRC, mutation of static variable can be safely written by using 
>> lazy_static
>> and Mutex.
>>
>>> On 10/8/19 7:16 PM, Yuya Nishihara wrote:
>>>> +    pub fn map<U>(
>>>> +        mut self,
>>>> +        py: Python,
>>>> +        f: impl FnOnce(T) -> U,
>>>> +    ) -> PyLeakedRef<U> {
>>>> +        PyLeakedRef {
>>>> +            inner: self.inner.clone_ref(py),
>>>> +            data: Some(f(self.data.take().unwrap())),
>>>> +            py_shared_state: self.py_shared_state,
>>>> +        }
>>>> +    }
>> I'm feeling that it's good to leave leaked_ref.map() unsafe. The new API
>> should be less error-prone, and its safety requirement, "don't leak the
>> static reference", can be easily guaranteed.
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel@mercurial-scm.org
>> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Patch

diff --git a/rust/hg-cpython/src/dirstate/copymap.rs b/rust/hg-cpython/src/dirstate/copymap.rs
--- a/rust/hg-cpython/src/dirstate/copymap.rs
+++ b/rust/hg-cpython/src/dirstate/copymap.rs
@@ -13,7 +13,6 @@  use std::cell::RefCell;
 
 use crate::dirstate::dirstate_map::DirstateMap;
 use crate::ref_sharing::PyLeakedRef;
-use hg::DirstateMap as RustDirstateMap;
 use hg::{utils::hg_path::HgPathBuf, CopyMapIter};
 
 py_class!(pub class CopyMap |py| {
@@ -105,16 +104,14 @@  impl CopyMap {
 
 py_shared_iterator!(
     CopyMapKeysIterator,
-    PyLeakedRef<&'static RustDirstateMap>,
-    CopyMapIter<'static>,
+    PyLeakedRef<CopyMapIter<'static>>,
     CopyMap::translate_key,
     Option<PyBytes>
 );
 
 py_shared_iterator!(
     CopyMapItemsIterator,
-    PyLeakedRef<&'static RustDirstateMap>,
-    CopyMapIter<'static>,
+    PyLeakedRef<CopyMapIter<'static>>,
     CopyMap::translate_key_value,
     Option<(PyBytes, PyBytes)>
 );
diff --git a/rust/hg-cpython/src/dirstate/dirs_multiset.rs b/rust/hg-cpython/src/dirstate/dirs_multiset.rs
--- a/rust/hg-cpython/src/dirstate/dirs_multiset.rs
+++ b/rust/hg-cpython/src/dirstate/dirs_multiset.rs
@@ -92,13 +92,10 @@  py_class!(pub class Dirs |py| {
             })
     }
     def __iter__(&self) -> PyResult<DirsMultisetKeysIterator> {
-        let mut leak_handle =
-            unsafe { self.inner_shared(py).leak_immutable()? };
-        let leaked_ref = leak_handle.data.take().unwrap();
+        let leaked_ref = self.inner_shared(py).leak_immutable()?;
         DirsMultisetKeysIterator::from_inner(
             py,
-            leak_handle,
-            leaked_ref.iter(),
+            leaked_ref.map(py, |o| o.iter()),
         )
     }
 
@@ -126,8 +123,7 @@  impl Dirs {
 
 py_shared_iterator!(
     DirsMultisetKeysIterator,
-    PyLeakedRef<&'static DirsMultiset>,
-    DirsMultisetIter<'static>,
+    PyLeakedRef<DirsMultisetIter<'static>>,
     Dirs::translate_key,
     Option<PyBytes>
 );
diff --git a/rust/hg-cpython/src/dirstate/dirstate_map.rs b/rust/hg-cpython/src/dirstate/dirstate_map.rs
--- a/rust/hg-cpython/src/dirstate/dirstate_map.rs
+++ b/rust/hg-cpython/src/dirstate/dirstate_map.rs
@@ -321,35 +321,26 @@  py_class!(pub class DirstateMap |py| {
     }
 
     def keys(&self) -> PyResult<DirstateMapKeysIterator> {
-        let mut leak_handle =
-            unsafe { self.inner_shared(py).leak_immutable()? };
-        let leaked_ref = leak_handle.data.take().unwrap();
+        let leaked_ref = self.inner_shared(py).leak_immutable()?;
         DirstateMapKeysIterator::from_inner(
             py,
-            leak_handle,
-            leaked_ref.iter(),
+            leaked_ref.map(py, |o| o.iter()),
         )
     }
 
     def items(&self) -> PyResult<DirstateMapItemsIterator> {
-        let mut leak_handle =
-            unsafe { self.inner_shared(py).leak_immutable()? };
-        let leaked_ref = leak_handle.data.take().unwrap();
+        let leaked_ref = self.inner_shared(py).leak_immutable()?;
         DirstateMapItemsIterator::from_inner(
             py,
-            leak_handle,
-            leaked_ref.iter(),
+            leaked_ref.map(py, |o| o.iter()),
         )
     }
 
     def __iter__(&self) -> PyResult<DirstateMapKeysIterator> {
-        let mut leak_handle =
-            unsafe { self.inner_shared(py).leak_immutable()? };
-        let leaked_ref = leak_handle.data.take().unwrap();
+        let leaked_ref = self.inner_shared(py).leak_immutable()?;
         DirstateMapKeysIterator::from_inner(
             py,
-            leak_handle,
-            leaked_ref.iter(),
+            leaked_ref.map(py, |o| o.iter()),
         )
     }
 
@@ -463,24 +454,18 @@  py_class!(pub class DirstateMap |py| {
     }
 
     def copymapiter(&self) -> PyResult<CopyMapKeysIterator> {
-        let mut leak_handle =
-            unsafe { self.inner_shared(py).leak_immutable()? };
-        let leaked_ref = leak_handle.data.take().unwrap();
+        let leaked_ref = self.inner_shared(py).leak_immutable()?;
         CopyMapKeysIterator::from_inner(
             py,
-            leak_handle,
-            leaked_ref.copy_map.iter(),
+            leaked_ref.map(py, |o| o.copy_map.iter()),
         )
     }
 
     def copymapitemsiter(&self) -> PyResult<CopyMapItemsIterator> {
-        let mut leak_handle =
-            unsafe { self.inner_shared(py).leak_immutable()? };
-        let leaked_ref = leak_handle.data.take().unwrap();
+        let leaked_ref = self.inner_shared(py).leak_immutable()?;
         CopyMapItemsIterator::from_inner(
             py,
-            leak_handle,
-            leaked_ref.copy_map.iter(),
+            leaked_ref.map(py, |o| o.copy_map.iter()),
         )
     }
 
@@ -518,16 +503,14 @@  py_shared_ref!(DirstateMap, RustDirstate
 
 py_shared_iterator!(
     DirstateMapKeysIterator,
-    PyLeakedRef<&'static RustDirstateMap>,
-    StateMapIter<'static>,
+    PyLeakedRef<StateMapIter<'static>>,
     DirstateMap::translate_key,
     Option<PyBytes>
 );
 
 py_shared_iterator!(
     DirstateMapItemsIterator,
-    PyLeakedRef<&'static RustDirstateMap>,
-    StateMapIter<'static>,
+    PyLeakedRef<StateMapIter<'static>>,
     DirstateMap::translate_key_value,
     Option<(PyBytes, PyObject)>
 );
diff --git a/rust/hg-cpython/src/ref_sharing.rs b/rust/hg-cpython/src/ref_sharing.rs
--- a/rust/hg-cpython/src/ref_sharing.rs
+++ b/rust/hg-cpython/src/ref_sharing.rs
@@ -173,23 +173,18 @@  impl<'a, T> PySharedRef<'a, T> {
     }
 
     /// Returns a leaked reference.
-    ///
-    /// # Safety
-    ///
-    /// It's up to you to make sure that the management object lives
-    /// longer than the leaked reference. Otherwise, you'll get a
-    /// dangling reference.
-    pub unsafe fn leak_immutable(&self) -> PyResult<PyLeakedRef<&'static T>> {
-        let (static_ref, static_state_ref) = self
-            .data
-            .py_shared_state
-            .leak_immutable(self.py, self.data)?;
-        Ok(PyLeakedRef::new(
-            self.py,
-            self.owner,
-            static_ref,
-            static_state_ref,
-        ))
+    pub fn leak_immutable(&self) -> PyResult<PyLeakedRef<&'static T>> {
+        let state = &self.data.py_shared_state;
+        unsafe {
+            let (static_ref, static_state_ref) =
+                state.leak_immutable(self.py, self.data)?;
+            Ok(PyLeakedRef::new(
+                self.py,
+                self.owner,
+                static_ref,
+                static_state_ref,
+            ))
+        }
     }
 }
 
@@ -301,15 +296,15 @@  macro_rules! py_shared_ref {
 }
 
 /// Manage immutable references to `PyObject` leaked into Python iterators.
-///
-/// In truth, this does not represent leaked references themselves;
-/// it is instead useful alongside them to manage them.
 pub struct PyLeakedRef<T> {
-    _inner: PyObject,
-    pub data: Option<T>, // TODO: remove pub
+    inner: PyObject,
+    data: Option<T>,
     py_shared_state: &'static PySharedState,
 }
 
+// DO NOT implement Deref for PyLeakedRef<T>! Dereferencing PyLeakedRef
+// without taking Python GIL wouldn't be safe.
+
 impl<T> PyLeakedRef<T> {
     /// # Safety
     ///
@@ -323,11 +318,40 @@  impl<T> PyLeakedRef<T> {
         py_shared_state: &'static PySharedState,
     ) -> Self {
         Self {
-            _inner: inner.clone_ref(py),
+            inner: inner.clone_ref(py),
             data: Some(data),
             py_shared_state,
         }
     }
+
+    /// Returns an immutable reference to the inner value.
+    pub fn get_ref<'a>(&'a self, _py: Python<'a>) -> &'a T {
+        self.data.as_ref().unwrap()
+    }
+
+    /// Returns a mutable reference to the inner value.
+    ///
+    /// Typically `T` is an iterator. If `T` is an immutable reference,
+    /// `get_mut()` is useless since the inner value can't be mutated.
+    pub fn get_mut<'a>(&'a mut self, _py: Python<'a>) -> &'a mut T {
+        self.data.as_mut().unwrap()
+    }
+
+    /// Converts the inner value by the given function.
+    ///
+    /// Typically `T` is a static reference to a container, and `U` is an
+    /// iterator of that container.
+    pub fn map<U>(
+        mut self,
+        py: Python,
+        f: impl FnOnce(T) -> U,
+    ) -> PyLeakedRef<U> {
+        PyLeakedRef {
+            inner: self.inner.clone_ref(py),
+            data: Some(f(self.data.take().unwrap())),
+            py_shared_state: self.py_shared_state,
+        }
+    }
 }
 
 impl<T> Drop for PyLeakedRef<T> {
@@ -337,6 +361,9 @@  impl<T> Drop for PyLeakedRef<T> {
         // sure that the state is only accessed by this thread.
         let gil = Python::acquire_gil();
         let py = gil.python();
+        if self.data.is_none() {
+            return; // moved to another PyLeakedRef
+        }
         unsafe {
             self.py_shared_state.decrease_leak_count(py, false);
         }
@@ -368,13 +395,10 @@  impl<T> Drop for PyLeakedRef<T> {
 ///     data inner: PySharedRefCell<MyStruct>;
 ///
 ///     def __iter__(&self) -> PyResult<MyTypeItemsIterator> {
-///         let mut leak_handle =
-///             unsafe { self.inner_shared(py).leak_immutable()? };
-///         let leaked_ref = leak_handle.data.take().unwrap();
+///         let leaked_ref = self.inner_shared(py).leak_immutable()?;
 ///         MyTypeItemsIterator::from_inner(
 ///             py,
-///             leak_handle,
-///             leaked_ref.iter(),
+///             leaked_ref.map(py, |o| o.iter()),
 ///         )
 ///     }
 /// });
@@ -396,8 +420,7 @@  impl<T> Drop for PyLeakedRef<T> {
 ///
 /// py_shared_iterator!(
 ///     MyTypeItemsIterator,
-///     PyLeakedRef<&'static MyStruct>,
-///     HashMap<'static, Vec<u8>, Vec<u8>>,
+///     PyLeakedRef<HashMap<'static, Vec<u8>, Vec<u8>>>,
 ///     MyType::translate_key_value,
 ///     Option<(PyBytes, PyBytes)>
 /// );
@@ -406,18 +429,16 @@  macro_rules! py_shared_iterator {
     (
         $name: ident,
         $leaked: ty,
-        $iterator_type: ty,
         $success_func: expr,
         $success_type: ty
     ) => {
         py_class!(pub class $name |py| {
             data inner: RefCell<Option<$leaked>>;
-            data it: RefCell<$iterator_type>;
 
             def __next__(&self) -> PyResult<$success_type> {
                 let mut inner_opt = self.inner(py).borrow_mut();
-                if inner_opt.is_some() {
-                    match self.it(py).borrow_mut().next() {
+                if let Some(leaked) = inner_opt.as_mut() {
+                    match leaked.get_mut(py).next() {
                         None => {
                             // replace Some(inner) by None, drop $leaked
                             inner_opt.take();
@@ -441,12 +462,10 @@  macro_rules! py_shared_iterator {
             pub fn from_inner(
                 py: Python,
                 leaked: $leaked,
-                it: $iterator_type
             ) -> PyResult<Self> {
                 Self::create_instance(
                     py,
                     RefCell::new(Some(leaked)),
-                    RefCell::new(it)
                 )
             }
         }