Submitter | Jun Wu |
---|---|
Date | May 3, 2017, 12:19 a.m. |
Message ID | <9073b1dfb58eb71c1005.1493770798@x1c> |
Download | mbox | patch |
Permalink | /patch/20374/ |
State | Deferred |
Headers | show |
Comments
On Tue, May 2, 2017 at 5:19 PM, Jun Wu <quark@fb.com> wrote: > # HG changeset patch > # User Jun Wu <quark@fb.com> > # Date 1493166881 25200 > # Tue Apr 25 17:34:41 2017 -0700 > # Node ID 9073b1dfb58eb71c1005dad5267b6ee4f09790da > # Parent 6cacc271ee0a9385be483314dc73be176a13c891 > # Available At https://bitbucket.org/quark-zju/hg-draft > # hg pull https://bitbucket.org/quark-zju/hg-draft -r > 9073b1dfb58e > bdiff: add _version to help detect breaking binary changes > > Previously, we have no way to detect if a compiled .so file could be used > or > not, and blindly load it if it exists. Usually we carefully maintain > compatibility of .so and fallback to pure code gracefully. But if we stick > to the rules, certain nice changes will be impossible to make in a clean > way. > > This patch adds a "_version" variable to the module so we can detect > inconsistency and take appropriate actions (warn, abort, fallback to pure, > run make automatically) in the module loader. > I like the spirit of this series. Have you given any consideration to: * Sharing a version number across all modules? That will make the code easier at the expense of recompiling all C modules when not strictly necessary. Since the base rate of changing C modules is low and everything but zstd compiles almost instantaneously, one can make an argument that simplicity is sufficient. * A shared function or macro for calling PyModule_AddIntConstant(). It feels like a bit of redundant code across all the modules. > > diff --git a/mercurial/bdiff_module.c b/mercurial/bdiff_module.c > --- a/mercurial/bdiff_module.c > +++ b/mercurial/bdiff_module.c > @@ -193,4 +193,8 @@ static PyMethodDef methods[] = { > }; > > +/* Binary version. When breaking changes are made, bump this number and > change > + * __init__.py, so the module loader can notice and fallback to pure. */ > +static const int _version = 1; > + > #ifdef IS_PY3K > static struct PyModuleDef bdiff_module = { > @@ -204,10 +208,15 @@ static struct PyModuleDef bdiff_module = > PyMODINIT_FUNC PyInit_bdiff(void) > { > - return PyModule_Create(&bdiff_module); > + PyObject *m; > + m = PyModule_Create(&bdiff_module); > + PyModule_AddIntConstant(m, "_version", _version); > + return m; > } > #else > PyMODINIT_FUNC initbdiff(void) > { > - Py_InitModule3("bdiff", methods, mdiff_doc); > + PyObject *m; > + m = Py_InitModule3("bdiff", methods, mdiff_doc); > + PyModule_AddIntConstant(m, "_version", _version); > } > #endif > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel >
Excerpts from Gregory Szorc's message of 2017-05-02 17:41:21 -0700: > I like the spirit of this series. > > Have you given any consideration to: > > * Sharing a version number across all modules? That will make the code > easier at the expense of recompiling all C modules when not strictly > necessary. Since the base rate of changing C modules is low and everything > but zstd compiles almost instantaneously, one can make an argument that > simplicity is sufficient. While I think that also works, I think the benefit (less repetitive code) is not that significant given the fact that I think CPython C code is already cumbersome (look at those PyInits), so I won't optimize code length for them as much as I do for non-Python C code. I also slightly prefer the explicit way of saying which modules we are checking and their expected versions in Patch 8. > * A shared function or macro for calling PyModule_AddIntConstant(). It > feels like a bit of redundant code across all the modules. Suppose we have a .h file for that feature, then .c file still need roughly same number of lines of change: +#include "...." +PyObject *m; +m = PyModule_Create(&bdiff_module); // Python 2 +define_version(m) // Python 2 +define_version(m) // Python 3 Maybe we can unified the Py2 vs Py3 module definition. But that seems to be another series which should not block this one.
On Tue, 2 May 2017 17:19:58 -0700, Jun Wu wrote: > # HG changeset patch > # User Jun Wu <quark@fb.com> > # Date 1493166881 25200 > # Tue Apr 25 17:34:41 2017 -0700 > # Node ID 9073b1dfb58eb71c1005dad5267b6ee4f09790da > # Parent 6cacc271ee0a9385be483314dc73be176a13c891 > # Available At https://bitbucket.org/quark-zju/hg-draft > # hg pull https://bitbucket.org/quark-zju/hg-draft -r 9073b1dfb58e > bdiff: add _version to help detect breaking binary changes For reference, my series: https://bitbucket.org/yuja/hg-work/commits/cbfd47adf9d6bbdbab7ed61338b2dcf5877b90f2 https://bitbucket.org/yuja/hg-work/commits/c84185741bf4063e296c82b9c3571a87c9f61eed https://bitbucket.org/yuja/hg-work/commits/54531fe0b1f9950db5b496b21cf06badb25b56eb I have no strong opinion that we should stick to the current per-function versioning, but I believe some bits in this series can be reused so we won't mess up our importer hack anymore. > --- a/mercurial/bdiff_module.c > +++ b/mercurial/bdiff_module.c > @@ -193,4 +193,8 @@ static PyMethodDef methods[] = { > }; > > +/* Binary version. When breaking changes are made, bump this number and change > + * __init__.py, so the module loader can notice and fallback to pure. */ > +static const int _version = 1; Nit: '_' prefix is reserved for compiler authors.
Excerpts from Yuya Nishihara's message of 2017-05-03 10:18:46 +0900: > On Tue, 2 May 2017 17:19:58 -0700, Jun Wu wrote: > > # HG changeset patch > > # User Jun Wu <quark@fb.com> > > # Date 1493166881 25200 > > # Tue Apr 25 17:34:41 2017 -0700 > > # Node ID 9073b1dfb58eb71c1005dad5267b6ee4f09790da > > # Parent 6cacc271ee0a9385be483314dc73be176a13c891 > > # Available At https://bitbucket.org/quark-zju/hg-draft > > # hg pull https://bitbucket.org/quark-zju/hg-draft -r 9073b1dfb58e > > bdiff: add _version to help detect breaking binary changes > > For reference, my series: > https://bitbucket.org/yuja/hg-work/commits/cbfd47adf9d6bbdbab7ed61338b2dcf5877b90f2 > https://bitbucket.org/yuja/hg-work/commits/c84185741bf4063e296c82b9c3571a87c9f61eed > https://bitbucket.org/yuja/hg-work/commits/54531fe0b1f9950db5b496b21cf06badb25b56eb > > I have no strong opinion that we should stick to the current per-function > versioning, but I believe some bits in this series can be reused so we won't > mess up our importer hack anymore. I like the idea moving C extensions to "cext". We should probably do that. The problem we faced for issue4042 fix is to maintain compatibility in both directions without copy-paste. The "_duralmodule" approach is while smart, does not really solve the above problem. i.e. copy-paste is still needed. I believe avoiding copy-paste (or dead-code) is much more important than avoiding re-compiling for mercurial developers. That's the motivation of this series. > > --- a/mercurial/bdiff_module.c > > +++ b/mercurial/bdiff_module.c > > @@ -193,4 +193,8 @@ static PyMethodDef methods[] = { > > }; > > > > +/* Binary version. When breaking changes are made, bump this number and change > > + * __init__.py, so the module loader can notice and fallback to pure. */ > > +static const int _version = 1; > > Nit: '_' prefix is reserved for compiler authors.
On Tue, 2 May 2017 18:33:50 -0700, Jun Wu wrote: > The problem we faced for issue4042 fix is to maintain compatibility in both > directions without copy-paste. > > The "_duralmodule" approach is while smart, does not really solve the above > problem. i.e. copy-paste is still needed. No. We can just rename fix_newline() to fixnewline2(). New code has pure.diffhelpers.fixnewline2() as a fallback, and old one does pure.diffhelpers.fix_newline(). That's how we solved API changes historically.
Excerpts from Yuya Nishihara's message of 2017-05-03 11:02:18 +0900: > On Tue, 2 May 2017 18:33:50 -0700, Jun Wu wrote: > > The problem we faced for issue4042 fix is to maintain compatibility in both > > directions without copy-paste. > > > > The "_duralmodule" approach is while smart, does not really solve the above > > problem. i.e. copy-paste is still needed. > > No. We can just rename fix_newline() to fixnewline2(). New code has > pure.diffhelpers.fixnewline2() as a fallback, and old one does > pure.diffhelpers.fix_newline(). That's how we solved API changes > historically. The problem is that only works with non-default modulepolicy.
On Tue, 2 May 2017 19:05:21 -0700, Jun Wu wrote: > Excerpts from Yuya Nishihara's message of 2017-05-03 11:02:18 +0900: > > On Tue, 2 May 2017 18:33:50 -0700, Jun Wu wrote: > > > The problem we faced for issue4042 fix is to maintain compatibility in both > > > directions without copy-paste. > > > > > > The "_duralmodule" approach is while smart, does not really solve the above > > > problem. i.e. copy-paste is still needed. > > > > No. We can just rename fix_newline() to fixnewline2(). New code has > > pure.diffhelpers.fixnewline2() as a fallback, and old one does > > pure.diffhelpers.fix_newline(). That's how we solved API changes > > historically. > > The problem is that only works with non-default modulepolicy. So the default of in-place build will be changed to 'allow'. The current 'c' policy is 'allow' in practice as we have many adhoc fallback to pure functions.
Excerpts from Yuya Nishihara's message of 2017-05-03 11:22:45 +0900: > > The problem is that only works with non-default modulepolicy. > > So the default of in-place build will be changed to 'allow'. The current > 'c' policy is 'allow' in practice as we have many adhoc fallback to pure > functions. Cleaning those inconsistence does not conflict with this series. This series solves extra problems confidently and without copy-paste or even rename issues: - BC to complex CPython object, like chnagelog.index - modulepolicy='c' Mixing python and C modules could lead to surprises if the API changes state in the moudle. Like "set(x, y)" -> affect C moudle state; "get2(x)" -> "get2" is only in pure, and the state "x" is not in pure moudle state.
On Tue, 2 May 2017 19:41:32 -0700, Jun Wu wrote: > Excerpts from Yuya Nishihara's message of 2017-05-03 11:22:45 +0900: > > > The problem is that only works with non-default modulepolicy. > > > > So the default of in-place build will be changed to 'allow'. The current > > 'c' policy is 'allow' in practice as we have many adhoc fallback to pure > > functions. > > Cleaning those inconsistence does not conflict with this series. > > This series solves extra problems confidently and without copy-paste or > even rename issues: > > - BC to complex CPython object, like chnagelog.index Maybe rename index to index2 ? > - modulepolicy='c' Should be okay to fail with ImportError or AttributeError as the user requested. > Mixing python and C modules could lead to surprises if the API changes state > in the moudle. Like "set(x, y)" -> affect C moudle state; "get2(x)" -> > "get2" is only in pure, and the state "x" is not in pure moudle state. Yeah. So that's the only difference between my version and yours. I don't say my version is strictly better. The goal of my series is to get rid of our importer hacks, CFFI mess and import cycle around encoding.py. I believe your module versioning patches will get slightly simpler if we had no importer hack.
Excerpts from Yuya Nishihara's message of 2017-05-03 13:18:16 +0900: > Maybe rename index to index2 ? That's a solution. But developers must rename things recursively - APIs using index as input or output also needs to be renamed. That does not look pretty, and is not friendly to new developers. > Should be okay to fail with ImportError or AttributeError as the user > requested. If that's okay, we should have taken Phil's patches at the first place because it works with newer .so + old .py already. I'm doing all of these because smf suggested that's not okay. > > Mixing python and C modules could lead to surprises if the API changes state > > in the moudle. Like "set(x, y)" -> affect C moudle state; "get2(x)" -> > > "get2" is only in pure, and the state "x" is not in pure moudle state. > > Yeah. So that's the only difference between my version and yours. I don't > say my version is strictly better. The goal of my series is to get rid of > our importer hacks, CFFI mess and import cycle around encoding.py. I believe > your module versioning patches will get slightly simpler if we had no importer > hack. I think they do not conflict. We can take this series for policy=c, and also move .c to "cext/" (which I think is a very good change). I can do my change after the "cext/" movement to resolve merge conflicts on my side. I think strong guarantee on .so compatibility is a good thing, and dualmodule can give us more trouble than benefit (so I'm -1 on dualmodule): 1. require recursive renaming (pitfall for new contributors) 2. possible to have other weird behaviors For 2, modules do not have to change its state, exposing state could also be troublesome, like (not an issue today though, but not that unrealistic): try: ... # call C or pure API, and may raise except mpatch.mpatchError: # pure mpatchError will miss C exception ... In general, I don't think avoiding recompile should take precedence over any attempt that makes the code more explicit and consistent.
On Tue, 2 May 2017 22:15:44 -0700, Jun Wu wrote: > Excerpts from Yuya Nishihara's message of 2017-05-03 13:18:16 +0900: > > Maybe rename index to index2 ? > > That's a solution. But developers must rename things recursively - APIs > using index as input or output also needs to be renamed. That does not look > pretty, and is not friendly to new developers. I'm optimistic about that. The code depending directly on C API won't that large. > > Should be okay to fail with ImportError or AttributeError as the user > > requested. > > If that's okay, we should have taken Phil's patches at the first place > because it works with newer .so + old .py already. I'm doing all of these > because smf suggested that's not okay. No. We shouldn't break the default build (unless we agree to.) I said it should be okay if the user explicitly select the strict 'c' policy. > > > Mixing python and C modules could lead to surprises if the API changes state > > > in the moudle. Like "set(x, y)" -> affect C moudle state; "get2(x)" -> > > > "get2" is only in pure, and the state "x" is not in pure moudle state. > > > > Yeah. So that's the only difference between my version and yours. I don't > > say my version is strictly better. The goal of my series is to get rid of > > our importer hacks, CFFI mess and import cycle around encoding.py. I believe > > your module versioning patches will get slightly simpler if we had no importer > > hack. > > I think they do not conflict. We can take this series for policy=c, and also > move .c to "cext/" (which I think is a very good change). I can do my change > after the "cext/" movement to resolve merge conflicts on my side. > > I think strong guarantee on .so compatibility is a good thing, and > dualmodule can give us more trouble than benefit (so I'm -1 on dualmodule): Well, dualmod is IMHO the easiest hack to move cext modules to new layout without loosing backward/forward compatibility. We can remove it later if we prefer using per-module version constant. It seems getting hard to track what's your point and what's my point. I'll send first bits of my series, so let's discuss on them later. > 1. require recursive renaming (pitfall for new contributors) > 2. possible to have other weird behaviors > > For 2, modules do not have to change its state, exposing state could also be > troublesome, like (not an issue today though, but not that unrealistic): > > try: > ... # call C or pure API, and may raise > except mpatch.mpatchError: # pure mpatchError will miss C exception > ... > > In general, I don't think avoiding recompile should take precedence over any > attempt that makes the code more explicit and consistent. Honestly I don't need that feature as I always do "make local". ;)
Patch
diff --git a/mercurial/bdiff_module.c b/mercurial/bdiff_module.c --- a/mercurial/bdiff_module.c +++ b/mercurial/bdiff_module.c @@ -193,4 +193,8 @@ static PyMethodDef methods[] = { }; +/* Binary version. When breaking changes are made, bump this number and change + * __init__.py, so the module loader can notice and fallback to pure. */ +static const int _version = 1; + #ifdef IS_PY3K static struct PyModuleDef bdiff_module = { @@ -204,10 +208,15 @@ static struct PyModuleDef bdiff_module = PyMODINIT_FUNC PyInit_bdiff(void) { - return PyModule_Create(&bdiff_module); + PyObject *m; + m = PyModule_Create(&bdiff_module); + PyModule_AddIntConstant(m, "_version", _version); + return m; } #else PyMODINIT_FUNC initbdiff(void) { - Py_InitModule3("bdiff", methods, mdiff_doc); + PyObject *m; + m = Py_InitModule3("bdiff", methods, mdiff_doc); + PyModule_AddIntConstant(m, "_version", _version); } #endif