Patchwork [1,of,8] bdiff: add _version to help detect breaking binary changes

login
register
mail settings
Submitter Jun Wu
Date May 3, 2017, 12:19 a.m.
Message ID <9073b1dfb58eb71c1005.1493770798@x1c>
Download mbox | patch
Permalink /patch/20374/
State Deferred
Headers show

Comments

Jun Wu - May 3, 2017, 12:19 a.m.
# HG changeset patch
# User Jun Wu <quark@fb.com>
# Date 1493166881 25200
#      Tue Apr 25 17:34:41 2017 -0700
# Node ID 9073b1dfb58eb71c1005dad5267b6ee4f09790da
# Parent  6cacc271ee0a9385be483314dc73be176a13c891
# Available At https://bitbucket.org/quark-zju/hg-draft
#              hg pull https://bitbucket.org/quark-zju/hg-draft -r 9073b1dfb58e
bdiff: add _version to help detect breaking binary changes

Previously, we have no way to detect if a compiled .so file could be used or
not, and blindly load it if it exists. Usually we carefully maintain
compatibility of .so and fallback to pure code gracefully. But if we stick
to the rules, certain nice changes will be impossible to make in a clean
way.

This patch adds a "_version" variable to the module so we can detect
inconsistency and take appropriate actions (warn, abort, fallback to pure,
run make automatically) in the module loader.
Gregory Szorc - May 3, 2017, 12:41 a.m.
On Tue, May 2, 2017 at 5:19 PM, Jun Wu <quark@fb.com> wrote:

> # HG changeset patch
> # User Jun Wu <quark@fb.com>
> # Date 1493166881 25200
> #      Tue Apr 25 17:34:41 2017 -0700
> # Node ID 9073b1dfb58eb71c1005dad5267b6ee4f09790da
> # Parent  6cacc271ee0a9385be483314dc73be176a13c891
> # Available At https://bitbucket.org/quark-zju/hg-draft
> #              hg pull https://bitbucket.org/quark-zju/hg-draft -r
> 9073b1dfb58e
> bdiff: add _version to help detect breaking binary changes
>
> Previously, we have no way to detect if a compiled .so file could be used
> or
> not, and blindly load it if it exists. Usually we carefully maintain
> compatibility of .so and fallback to pure code gracefully. But if we stick
> to the rules, certain nice changes will be impossible to make in a clean
> way.
>
> This patch adds a "_version" variable to the module so we can detect
> inconsistency and take appropriate actions (warn, abort, fallback to pure,
> run make automatically) in the module loader.
>

I like the spirit of this series.

Have you given any consideration to:

* Sharing a version number across all modules? That will make the code
easier at the expense of recompiling all C modules when not strictly
necessary. Since the base rate of changing C modules is low and everything
but zstd compiles almost instantaneously, one can make an argument that
simplicity is sufficient.

* A shared function or macro for calling PyModule_AddIntConstant(). It
feels like a bit of redundant code across all the modules.


>
> diff --git a/mercurial/bdiff_module.c b/mercurial/bdiff_module.c
> --- a/mercurial/bdiff_module.c
> +++ b/mercurial/bdiff_module.c
> @@ -193,4 +193,8 @@ static PyMethodDef methods[] = {
>  };
>
> +/* Binary version. When breaking changes are made, bump this number and
> change
> + * __init__.py, so the module loader can notice and fallback to pure. */
> +static const int _version = 1;
> +
>  #ifdef IS_PY3K
>  static struct PyModuleDef bdiff_module = {
> @@ -204,10 +208,15 @@ static struct PyModuleDef bdiff_module =
>  PyMODINIT_FUNC PyInit_bdiff(void)
>  {
> -       return PyModule_Create(&bdiff_module);
> +       PyObject *m;
> +       m = PyModule_Create(&bdiff_module);
> +       PyModule_AddIntConstant(m, "_version", _version);
> +       return m;
>  }
>  #else
>  PyMODINIT_FUNC initbdiff(void)
>  {
> -       Py_InitModule3("bdiff", methods, mdiff_doc);
> +       PyObject *m;
> +       m = Py_InitModule3("bdiff", methods, mdiff_doc);
> +       PyModule_AddIntConstant(m, "_version", _version);
>  }
>  #endif
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
Jun Wu - May 3, 2017, 12:56 a.m.
Excerpts from Gregory Szorc's message of 2017-05-02 17:41:21 -0700:
> I like the spirit of this series.
> 
> Have you given any consideration to:
> 
> * Sharing a version number across all modules? That will make the code
> easier at the expense of recompiling all C modules when not strictly
> necessary. Since the base rate of changing C modules is low and everything
> but zstd compiles almost instantaneously, one can make an argument that
> simplicity is sufficient.

While I think that also works, I think the benefit (less repetitive code) is
not that significant given the fact that I think CPython C code is already
cumbersome (look at those PyInits), so I won't optimize code length for them
as much as I do for non-Python C code.

I also slightly prefer the explicit way of saying which modules we are
checking and their expected versions in Patch 8.

> * A shared function or macro for calling PyModule_AddIntConstant(). It
> feels like a bit of redundant code across all the modules.

Suppose we have a .h file for that feature, then .c file still need roughly
same number of lines of change:

  +#include "...."
  +PyObject *m;
  +m = PyModule_Create(&bdiff_module);  // Python 2
  +define_version(m)                    // Python 2
  +define_version(m)                    // Python 3

Maybe we can unified the Py2 vs Py3 module definition. But that seems to be
another series which should not block this one.
Yuya Nishihara - May 3, 2017, 1:18 a.m.
On Tue, 2 May 2017 17:19:58 -0700, Jun Wu wrote:
> # HG changeset patch
> # User Jun Wu <quark@fb.com>
> # Date 1493166881 25200
> #      Tue Apr 25 17:34:41 2017 -0700
> # Node ID 9073b1dfb58eb71c1005dad5267b6ee4f09790da
> # Parent  6cacc271ee0a9385be483314dc73be176a13c891
> # Available At https://bitbucket.org/quark-zju/hg-draft
> #              hg pull https://bitbucket.org/quark-zju/hg-draft -r 9073b1dfb58e
> bdiff: add _version to help detect breaking binary changes

For reference, my series:
https://bitbucket.org/yuja/hg-work/commits/cbfd47adf9d6bbdbab7ed61338b2dcf5877b90f2
https://bitbucket.org/yuja/hg-work/commits/c84185741bf4063e296c82b9c3571a87c9f61eed
https://bitbucket.org/yuja/hg-work/commits/54531fe0b1f9950db5b496b21cf06badb25b56eb

I have no strong opinion that we should stick to the current per-function
versioning, but I believe some bits in this series can be reused so we won't
mess up our importer hack anymore.

> --- a/mercurial/bdiff_module.c
> +++ b/mercurial/bdiff_module.c
> @@ -193,4 +193,8 @@ static PyMethodDef methods[] = {
>  };
>  
> +/* Binary version. When breaking changes are made, bump this number and change
> + * __init__.py, so the module loader can notice and fallback to pure. */
> +static const int _version = 1;

Nit: '_' prefix is reserved for compiler authors.
Jun Wu - May 3, 2017, 1:33 a.m.
Excerpts from Yuya Nishihara's message of 2017-05-03 10:18:46 +0900:
> On Tue, 2 May 2017 17:19:58 -0700, Jun Wu wrote:
> > # HG changeset patch
> > # User Jun Wu <quark@fb.com>
> > # Date 1493166881 25200
> > #      Tue Apr 25 17:34:41 2017 -0700
> > # Node ID 9073b1dfb58eb71c1005dad5267b6ee4f09790da
> > # Parent  6cacc271ee0a9385be483314dc73be176a13c891
> > # Available At https://bitbucket.org/quark-zju/hg-draft 
> > #              hg pull https://bitbucket.org/quark-zju/hg-draft  -r 9073b1dfb58e
> > bdiff: add _version to help detect breaking binary changes
> 
> For reference, my series:
> https://bitbucket.org/yuja/hg-work/commits/cbfd47adf9d6bbdbab7ed61338b2dcf5877b90f2 
> https://bitbucket.org/yuja/hg-work/commits/c84185741bf4063e296c82b9c3571a87c9f61eed 
> https://bitbucket.org/yuja/hg-work/commits/54531fe0b1f9950db5b496b21cf06badb25b56eb 
> 
> I have no strong opinion that we should stick to the current per-function
> versioning, but I believe some bits in this series can be reused so we won't
> mess up our importer hack anymore.

I like the idea moving C extensions to "cext". We should probably do that.

The problem we faced for issue4042 fix is to maintain compatibility in both
directions without copy-paste.

The "_duralmodule" approach is while smart, does not really solve the above
problem. i.e. copy-paste is still needed.

I believe avoiding copy-paste (or dead-code) is much more important than
avoiding re-compiling for mercurial developers. That's the motivation of
this series.

> > --- a/mercurial/bdiff_module.c
> > +++ b/mercurial/bdiff_module.c
> > @@ -193,4 +193,8 @@ static PyMethodDef methods[] = {
> >  };
> >  
> > +/* Binary version. When breaking changes are made, bump this number and change
> > + * __init__.py, so the module loader can notice and fallback to pure. */
> > +static const int _version = 1;
> 
> Nit: '_' prefix is reserved for compiler authors.
Yuya Nishihara - May 3, 2017, 2:02 a.m.
On Tue, 2 May 2017 18:33:50 -0700, Jun Wu wrote:
> The problem we faced for issue4042 fix is to maintain compatibility in both
> directions without copy-paste.
> 
> The "_duralmodule" approach is while smart, does not really solve the above
> problem. i.e. copy-paste is still needed.

No. We can just rename fix_newline() to fixnewline2(). New code has
pure.diffhelpers.fixnewline2() as a fallback, and old one does
pure.diffhelpers.fix_newline(). That's how we solved API changes historically.
Jun Wu - May 3, 2017, 2:05 a.m.
Excerpts from Yuya Nishihara's message of 2017-05-03 11:02:18 +0900:
> On Tue, 2 May 2017 18:33:50 -0700, Jun Wu wrote:
> > The problem we faced for issue4042 fix is to maintain compatibility in both
> > directions without copy-paste.
> > 
> > The "_duralmodule" approach is while smart, does not really solve the above
> > problem. i.e. copy-paste is still needed.
> 
> No. We can just rename fix_newline() to fixnewline2(). New code has
> pure.diffhelpers.fixnewline2() as a fallback, and old one does
> pure.diffhelpers.fix_newline(). That's how we solved API changes
> historically.

The problem is that only works with non-default modulepolicy.
Yuya Nishihara - May 3, 2017, 2:22 a.m.
On Tue, 2 May 2017 19:05:21 -0700, Jun Wu wrote:
> Excerpts from Yuya Nishihara's message of 2017-05-03 11:02:18 +0900:
> > On Tue, 2 May 2017 18:33:50 -0700, Jun Wu wrote:
> > > The problem we faced for issue4042 fix is to maintain compatibility in both
> > > directions without copy-paste.
> > > 
> > > The "_duralmodule" approach is while smart, does not really solve the above
> > > problem. i.e. copy-paste is still needed.
> > 
> > No. We can just rename fix_newline() to fixnewline2(). New code has
> > pure.diffhelpers.fixnewline2() as a fallback, and old one does
> > pure.diffhelpers.fix_newline(). That's how we solved API changes
> > historically.
> 
> The problem is that only works with non-default modulepolicy.

So the default of in-place build will be changed to 'allow'. The current 'c'
policy is 'allow' in practice as we have many adhoc fallback to pure functions.
Jun Wu - May 3, 2017, 2:41 a.m.
Excerpts from Yuya Nishihara's message of 2017-05-03 11:22:45 +0900:
> > The problem is that only works with non-default modulepolicy.
> 
> So the default of in-place build will be changed to 'allow'. The current
> 'c' policy is 'allow' in practice as we have many adhoc fallback to pure
> functions.

Cleaning those inconsistence does not conflict with this series.

This series solves extra problems confidently and without copy-paste or
even rename issues:

  - BC to complex CPython object, like chnagelog.index
  - modulepolicy='c'

Mixing python and C modules could lead to surprises if the API changes state
in the moudle. Like "set(x, y)" -> affect C moudle state; "get2(x)" ->
"get2" is only in pure, and the state "x" is not in pure moudle state.
Yuya Nishihara - May 3, 2017, 4:18 a.m.
On Tue, 2 May 2017 19:41:32 -0700, Jun Wu wrote:
> Excerpts from Yuya Nishihara's message of 2017-05-03 11:22:45 +0900:
> > > The problem is that only works with non-default modulepolicy.
> > 
> > So the default of in-place build will be changed to 'allow'. The current
> > 'c' policy is 'allow' in practice as we have many adhoc fallback to pure
> > functions.
> 
> Cleaning those inconsistence does not conflict with this series.
> 
> This series solves extra problems confidently and without copy-paste or
> even rename issues:
> 
>   - BC to complex CPython object, like chnagelog.index

Maybe rename index to index2 ?

>   - modulepolicy='c'

Should be okay to fail with ImportError or AttributeError as the user
requested.

> Mixing python and C modules could lead to surprises if the API changes state
> in the moudle. Like "set(x, y)" -> affect C moudle state; "get2(x)" ->
> "get2" is only in pure, and the state "x" is not in pure moudle state. 

Yeah. So that's the only difference between my version and yours. I don't
say my version is strictly better. The goal of my series is to get rid of
our importer hacks, CFFI mess and import cycle around encoding.py. I believe
your module versioning patches will get slightly simpler if we had no importer
hack.
Jun Wu - May 3, 2017, 5:15 a.m.
Excerpts from Yuya Nishihara's message of 2017-05-03 13:18:16 +0900:
> Maybe rename index to index2 ?

That's a solution. But developers must rename things recursively - APIs
using index as input or output also needs to be renamed. That does not look
pretty, and is not friendly to new developers.

> Should be okay to fail with ImportError or AttributeError as the user
> requested.

If that's okay, we should have taken Phil's patches at the first place
because it works with newer .so + old .py already. I'm doing all of these
because smf suggested that's not okay.

> > Mixing python and C modules could lead to surprises if the API changes state
> > in the moudle. Like "set(x, y)" -> affect C moudle state; "get2(x)" ->
> > "get2" is only in pure, and the state "x" is not in pure moudle state. 
> 
> Yeah. So that's the only difference between my version and yours. I don't
> say my version is strictly better. The goal of my series is to get rid of
> our importer hacks, CFFI mess and import cycle around encoding.py. I believe
> your module versioning patches will get slightly simpler if we had no importer
> hack.

I think they do not conflict. We can take this series for policy=c, and also
move .c to "cext/" (which I think is a very good change). I can do my change
after the "cext/" movement to resolve merge conflicts on my side.

I think strong guarantee on .so compatibility is a good thing, and
dualmodule can give us more trouble than benefit (so I'm -1 on dualmodule):

  1. require recursive renaming (pitfall for new contributors)
  2. possible to have other weird behaviors

For 2, modules do not have to change its state, exposing state could also be
troublesome, like (not an issue today though, but not that unrealistic):

  try:
      ... # call C or pure API, and may raise
  except mpatch.mpatchError: # pure mpatchError will miss C exception
      ...

In general, I don't think avoiding recompile should take precedence over any
attempt that makes the code more explicit and consistent.
Yuya Nishihara - May 3, 2017, 8:37 a.m.
On Tue, 2 May 2017 22:15:44 -0700, Jun Wu wrote:
> Excerpts from Yuya Nishihara's message of 2017-05-03 13:18:16 +0900:
> > Maybe rename index to index2 ?
> 
> That's a solution. But developers must rename things recursively - APIs
> using index as input or output also needs to be renamed. That does not look
> pretty, and is not friendly to new developers.

I'm optimistic about that. The code depending directly on C API won't that
large.

> > Should be okay to fail with ImportError or AttributeError as the user
> > requested.
> 
> If that's okay, we should have taken Phil's patches at the first place
> because it works with newer .so + old .py already. I'm doing all of these
> because smf suggested that's not okay.

No. We shouldn't break the default build (unless we agree to.) I said it
should be okay if the user explicitly select the strict 'c' policy.

> > > Mixing python and C modules could lead to surprises if the API changes state
> > > in the moudle. Like "set(x, y)" -> affect C moudle state; "get2(x)" ->
> > > "get2" is only in pure, and the state "x" is not in pure moudle state. 
> > 
> > Yeah. So that's the only difference between my version and yours. I don't
> > say my version is strictly better. The goal of my series is to get rid of
> > our importer hacks, CFFI mess and import cycle around encoding.py. I believe
> > your module versioning patches will get slightly simpler if we had no importer
> > hack.
> 
> I think they do not conflict. We can take this series for policy=c, and also
> move .c to "cext/" (which I think is a very good change). I can do my change
> after the "cext/" movement to resolve merge conflicts on my side.
> 
> I think strong guarantee on .so compatibility is a good thing, and
> dualmodule can give us more trouble than benefit (so I'm -1 on dualmodule):

Well, dualmod is IMHO the easiest hack to move cext modules to new layout
without loosing backward/forward compatibility. We can remove it later if we
prefer using per-module version constant.

It seems getting hard to track what's your point and what's my point. I'll
send first bits of my series, so let's discuss on them later.

>   1. require recursive renaming (pitfall for new contributors)
>   2. possible to have other weird behaviors
> 
> For 2, modules do not have to change its state, exposing state could also be
> troublesome, like (not an issue today though, but not that unrealistic):
> 
>   try:
>       ... # call C or pure API, and may raise
>   except mpatch.mpatchError: # pure mpatchError will miss C exception
>       ...
> 
> In general, I don't think avoiding recompile should take precedence over any
> attempt that makes the code more explicit and consistent.

Honestly I don't need that feature as I always do "make local". ;)

Patch

diff --git a/mercurial/bdiff_module.c b/mercurial/bdiff_module.c
--- a/mercurial/bdiff_module.c
+++ b/mercurial/bdiff_module.c
@@ -193,4 +193,8 @@  static PyMethodDef methods[] = {
 };
 
+/* Binary version. When breaking changes are made, bump this number and change
+ * __init__.py, so the module loader can notice and fallback to pure. */
+static const int _version = 1;
+
 #ifdef IS_PY3K
 static struct PyModuleDef bdiff_module = {
@@ -204,10 +208,15 @@  static struct PyModuleDef bdiff_module =
 PyMODINIT_FUNC PyInit_bdiff(void)
 {
-	return PyModule_Create(&bdiff_module);
+	PyObject *m;
+	m = PyModule_Create(&bdiff_module);
+	PyModule_AddIntConstant(m, "_version", _version);
+	return m;
 }
 #else
 PyMODINIT_FUNC initbdiff(void)
 {
-	Py_InitModule3("bdiff", methods, mdiff_doc);
+	PyObject *m;
+	m = Py_InitModule3("bdiff", methods, mdiff_doc);
+	PyModule_AddIntConstant(m, "_version", _version);
 }
 #endif