Patchwork [RFC] RFC: allow optional C++ 11 extensions with pybind11 for performance code

login
register
mail settings
Submitter Laurent Charignon
Date Feb. 8, 2016, 9:13 p.m.
Message ID <3c1ae9c7e93e8556fa74.1454966003@lcharignon-mbp.local>
Download mbox | patch
Permalink /patch/13055/
State Not Applicable
Headers show

Comments

Laurent Charignon - Feb. 8, 2016, 9:13 p.m.
# HG changeset patch
# User Laurent Charignon <lcharignon@fb.com>
# Date 1454965979 28800
#      Mon Feb 08 13:12:59 2016 -0800
# Branch stable
# Node ID 3c1ae9c7e93e8556fa7470919108a02bd989d040
# Parent  61f4d59e9a0be4e25c1aa016db1a80a540a9d337
RFC: allow optional C++ 11 extensions with pybind11 for performance code

This is a proposal to allow us to write C++ 11 extensions (in addition to C89),
for optional performance code.

According to Augie, lazymanifest was a large undertaking and it would have been
much easier to implement it in C++.
And, as I plan to write a native version of tree manifest, I would like to
introduce C++ extensions beforehand and write the native tree manifest in C++.

Like our current C modules, C++ modules would be optional and have Python
fallbacks for performance purposes.

I propose to use pybind11 to do that.
pybind11's website (https://pybind11.readthedocs.org/en/latest/index.html)
describes it as follows:
"pybind11 is a lightweight header-only library that exposes C++ types in
Python and vice versa, mainly to create Python bindings of existing C++ code"

This patch contains a change to setup.py to optional compile C++ modules (I just
put a test module there). The only change needed to our codebase is in util.h
as we cannot redefine the "bool" type if we are using C++.

As you can see in the example, not having to convert the arguments and return
values between python and native code would save us a lot of lines of code and
complexity, making native code easier to review.

The following example works with the patch that I sent:

    >>> from mercurial import pybindtest
    >>> print pybindtest.add(40, 2)
        42
Sean Farley - Feb. 8, 2016, 9:33 p.m.
Laurent Charignon <lcharignon@fb.com> writes:

> # HG changeset patch
> # User Laurent Charignon <lcharignon@fb.com>
> # Date 1454965979 28800
> #      Mon Feb 08 13:12:59 2016 -0800
> # Branch stable
> # Node ID 3c1ae9c7e93e8556fa7470919108a02bd989d040
> # Parent  61f4d59e9a0be4e25c1aa016db1a80a540a9d337
> RFC: allow optional C++ 11 extensions with pybind11 for performance code

I am very much against this as I am also against writing code in rust /
go / etc. Mercurial is a *system tool*. When cloning code on IBM
BlueGene it is annoying as hell to have random dependencies sneak in
like this. For reference, IBM BlueGene has had problem with codes that
use C++.

C is much, much, much more portable. In fact, I would like to go even
further and break out the C code we have into a shared library (for cffi
sweetness).
Laurent Charignon - Feb. 8, 2016, 9:45 p.m.
On 2/8/16, 1:33 PM, "Sean Farley" <sean@farley.io> wrote:

>
>Laurent Charignon <lcharignon@fb.com> writes:
>
>> # HG changeset patch
>> # User Laurent Charignon <lcharignon@fb.com>
>> # Date 1454965979 28800
>> #      Mon Feb 08 13:12:59 2016 -0800
>> # Branch stable
>> # Node ID 3c1ae9c7e93e8556fa7470919108a02bd989d040
>> # Parent  61f4d59e9a0be4e25c1aa016db1a80a540a9d337
>> RFC: allow optional C++ 11 extensions with pybind11 for performance code
>
>I am very much against this as I am also against writing code in rust /
>go / etc. Mercurial is a *system tool*. When cloning code on IBM
>BlueGene it is annoying as hell to have random dependencies sneak in
>like this. For reference, IBM BlueGene has had problem with codes that
>use C++.

I understand your concern but that is not the case, it is not a random
dependency.
Like our C extensions, the C++ extensions, would be *optional*.
If you are not able to compile them on some system, mercurial should still
work without them.

>
>C is much, much, much more portable. In fact, I would like to go even
>further and break out the C code we have into a shared library (for cffi
>sweetness).

It is true that C is more portable.
However, it is much more difficult (at least for me) to implement large
features like lazy manifest or tree manifest in C compared to rust / go /
C++.
How do you think we should proceed?
Do you suggest finding a C implementation of whatever
data-structure/feature that would work for us and include them in the
project?

Thanks,

Laurent
Sean Farley - Feb. 8, 2016, 9:53 p.m.
Laurent Charignon <lcharignon@fb.com> writes:

> On 2/8/16, 1:33 PM, "Sean Farley" <sean@farley.io> wrote:
>
>>
>>Laurent Charignon <lcharignon@fb.com> writes:
>>
>>> # HG changeset patch
>>> # User Laurent Charignon <lcharignon@fb.com>
>>> # Date 1454965979 28800
>>> #      Mon Feb 08 13:12:59 2016 -0800
>>> # Branch stable
>>> # Node ID 3c1ae9c7e93e8556fa7470919108a02bd989d040
>>> # Parent  61f4d59e9a0be4e25c1aa016db1a80a540a9d337
>>> RFC: allow optional C++ 11 extensions with pybind11 for performance code
>>
>>I am very much against this as I am also against writing code in rust /
>>go / etc. Mercurial is a *system tool*. When cloning code on IBM
>>BlueGene it is annoying as hell to have random dependencies sneak in
>>like this. For reference, IBM BlueGene has had problem with codes that
>>use C++.
>
> I understand your concern but that is not the case, it is not a random
> dependency.
> Like our C extensions, the C++ extensions, would be *optional*.
> If you are not able to compile them on some system, mercurial should still
> work without them.

It would still be crippled (performance-wise). In academia (and HPC)
this was very embarrassing. Git could be compiled fast on the same
machine and was not crippled in the same way.

>>C is much, much, much more portable. In fact, I would like to go even
>>further and break out the C code we have into a shared library (for cffi
>>sweetness).
>
> It is true that C is more portable.
> However, it is much more difficult (at least for me) to implement large
> features like lazy manifest or tree manifest in C compared to rust / go /
> C++.
> How do you think we should proceed?
> Do you suggest finding a C implementation of whatever
> data-structure/feature that would work for us and include them in the
> project?

Why not? I don't C is *that* hard. My usual advice is to prototype in
your favorite language of Python / rust / go / D / whatever and then
convert it to C for portability and performance.
Kyle Lippincott - Feb. 8, 2016, 10:01 p.m.
On Mon, Feb 8, 2016 at 1:33 PM, Sean Farley <sean@farley.io> wrote:

>
> Laurent Charignon <lcharignon@fb.com> writes:
>
> > # HG changeset patch
> > # User Laurent Charignon <lcharignon@fb.com>
> > # Date 1454965979 28800
> > #      Mon Feb 08 13:12:59 2016 -0800
> > # Branch stable
> > # Node ID 3c1ae9c7e93e8556fa7470919108a02bd989d040
> > # Parent  61f4d59e9a0be4e25c1aa016db1a80a540a9d337
> > RFC: allow optional C++ 11 extensions with pybind11 for performance code
>
> I am very much against this as I am also against writing code in rust /
> go / etc. Mercurial is a *system tool*. When cloning code on IBM
> BlueGene it is annoying as hell to have random dependencies sneak in
> like this. For reference, IBM BlueGene has had problem with codes that
> use C++.
>

Do you have links to some of the issues?  I feel like C++ is common enough
and mature enough that this is mostly cargo-culted fear, but would be fine
with being proven wrong.

I don't know about the portability of pybind11 or any of those, but I was
definitely under the impression that C++03 code was essentially as portable
as C, and C++11 was getting close.


> C is much, much, much more portable. In fact, I would like to go even
> further and break out the C code we have into a shared library (for cffi
> sweetness).
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
Sean Farley - Feb. 8, 2016, 10:08 p.m.
Kyle Lippincott <spectral@pewpew.net> writes:

> On Mon, Feb 8, 2016 at 1:33 PM, Sean Farley <sean@farley.io> wrote:
>
>>
>> Laurent Charignon <lcharignon@fb.com> writes:
>>
>> > # HG changeset patch
>> > # User Laurent Charignon <lcharignon@fb.com>
>> > # Date 1454965979 28800
>> > #      Mon Feb 08 13:12:59 2016 -0800
>> > # Branch stable
>> > # Node ID 3c1ae9c7e93e8556fa7470919108a02bd989d040
>> > # Parent  61f4d59e9a0be4e25c1aa016db1a80a540a9d337
>> > RFC: allow optional C++ 11 extensions with pybind11 for performance code
>>
>> I am very much against this as I am also against writing code in rust /
>> go / etc. Mercurial is a *system tool*. When cloning code on IBM
>> BlueGene it is annoying as hell to have random dependencies sneak in
>> like this. For reference, IBM BlueGene has had problem with codes that
>> use C++.
>>
>
> Do you have links to some of the issues?  I feel like C++ is common enough
> and mature enough that this is mostly cargo-culted fear, but would be fine
> with being proven wrong.
>
> I don't know about the portability of pybind11 or any of those, but I was
> definitely under the impression that C++03 code was essentially as portable
> as C, and C++11 was getting close.

Guys, you have no idea the state of using compilers on bullshit "super"
computing machines. The main frustration from working with companies
that ship a compiler with their esoteric hardware is that the compiler
will undoubtedly have a bug and the company will never release the
source of the compiler. It takes weeks of back-and-forth to get this
ironed out (sometimes if at all).

I do not look forward to these bug reports especially since I think they
are mostly avoidable.
Danek Duvall - Feb. 8, 2016, 10:17 p.m.
Kyle Lippincott wrote:

> On Mon, Feb 8, 2016 at 1:33 PM, Sean Farley <sean@farley.io> wrote:
> 
> >
> > Laurent Charignon <lcharignon@fb.com> writes:
> >
> > > # HG changeset patch
> > > # User Laurent Charignon <lcharignon@fb.com>
> > > # Date 1454965979 28800
> > > #      Mon Feb 08 13:12:59 2016 -0800
> > > # Branch stable
> > > # Node ID 3c1ae9c7e93e8556fa7470919108a02bd989d040
> > > # Parent  61f4d59e9a0be4e25c1aa016db1a80a540a9d337
> > > RFC: allow optional C++ 11 extensions with pybind11 for performance code
> >
> > I am very much against this as I am also against writing code in rust /
> > go / etc. Mercurial is a *system tool*. When cloning code on IBM
> > BlueGene it is annoying as hell to have random dependencies sneak in
> > like this. For reference, IBM BlueGene has had problem with codes that
> > use C++.
> >
> 
> Do you have links to some of the issues?  I feel like C++ is common enough
> and mature enough that this is mostly cargo-culted fear, but would be fine
> with being proven wrong.
> 
> I don't know about the portability of pybind11 or any of those, but I was
> definitely under the impression that C++03 code was essentially as portable
> as C, and C++11 was getting close.

There's also the issue of ABI compatibility, which I believe is also
getting close to sane, but is not quite there yet.  Which means that
everyone working within a particular ecosystem has to use the same version
of the same compiler, or risk some pretty nasty blow-ups.

Solaris tries *very* hard to stay away from C++ -- there's no safe way to
use it except in dumb applications that will never load third-party code.
I know Solaris isn't mercurial's biggest market, and I'm sure I can hack
something together that'll work for us, but I anticipate this being a huge
pain in the rear in the long term.

Danek
Sean Farley - Feb. 8, 2016, 10:40 p.m.
Danek Duvall <danek.duvall@oracle.com> writes:

> Kyle Lippincott wrote:
>
>> On Mon, Feb 8, 2016 at 1:33 PM, Sean Farley <sean@farley.io> wrote:
>> 
>> >
>> > Laurent Charignon <lcharignon@fb.com> writes:
>> >
>> > > # HG changeset patch
>> > > # User Laurent Charignon <lcharignon@fb.com>
>> > > # Date 1454965979 28800
>> > > #      Mon Feb 08 13:12:59 2016 -0800
>> > > # Branch stable
>> > > # Node ID 3c1ae9c7e93e8556fa7470919108a02bd989d040
>> > > # Parent  61f4d59e9a0be4e25c1aa016db1a80a540a9d337
>> > > RFC: allow optional C++ 11 extensions with pybind11 for performance code
>> >
>> > I am very much against this as I am also against writing code in rust /
>> > go / etc. Mercurial is a *system tool*. When cloning code on IBM
>> > BlueGene it is annoying as hell to have random dependencies sneak in
>> > like this. For reference, IBM BlueGene has had problem with codes that
>> > use C++.
>> >
>> 
>> Do you have links to some of the issues?  I feel like C++ is common enough
>> and mature enough that this is mostly cargo-culted fear, but would be fine
>> with being proven wrong.
>> 
>> I don't know about the portability of pybind11 or any of those, but I was
>> definitely under the impression that C++03 code was essentially as portable
>> as C, and C++11 was getting close.
>
> There's also the issue of ABI compatibility, which I believe is also
> getting close to sane, but is not quite there yet.  Which means that
> everyone working within a particular ecosystem has to use the same version
> of the same compiler, or risk some pretty nasty blow-ups.
>
> Solaris tries *very* hard to stay away from C++ -- there's no safe way to
> use it except in dumb applications that will never load third-party code.
> I know Solaris isn't mercurial's biggest market, and I'm sure I can hack
> something together that'll work for us, but I anticipate this being a huge
> pain in the rear in the long term.

Yes, this is precisely the experience I have with C++ and libc++ (yuck
and gross incompatibility with libstdc++). It's not just Solaris,
either. Try anything that is not {Linux, Windows, Mac} / {gcc, clang}
and you'll see all kinds of corner cases crop up.
timeless - Feb. 8, 2016, 10:47 p.m.
Sean Farley <sean@farley.io> wrote:
> Yes, this is precisely the experience I have with C++ and libc++ (yuck
> and gross incompatibility with libstdc++). It's not just Solaris,
> either. Try anything that is not {Linux, Windows, Mac} / {gcc, clang}
> and you'll see all kinds of corner cases crop up.

There are plenty of painful corner cases on Windows too.
And if you try hard enough you can get plenty of annoying corner cases on Linux.

Would a CFront style approach be good enough? Write in limited C++,
use llvm to convert it to C, commit the C++ as source and the
generated C and have the build system use the generated C?

http://llvm.org/releases/3.1/docs/FAQ.html#translatecxx
Danek Duvall - Feb. 8, 2016, 11:28 p.m.
timeless wrote:

> Sean Farley <sean@farley.io> wrote:
> > Yes, this is precisely the experience I have with C++ and libc++ (yuck
> > and gross incompatibility with libstdc++). It's not just Solaris,
> > either. Try anything that is not {Linux, Windows, Mac} / {gcc, clang}
> > and you'll see all kinds of corner cases crop up.
> 
> There are plenty of painful corner cases on Windows too.
> And if you try hard enough you can get plenty of annoying corner cases on Linux.
> 
> Would a CFront style approach be good enough? Write in limited C++,
> use llvm to convert it to C, commit the C++ as source and the
> generated C and have the build system use the generated C?
> 
> http://llvm.org/releases/3.1/docs/FAQ.html#translatecxx

It would be good enough from my perspective.

Thanks,
Danek
Kyle Lippincott - Feb. 8, 2016, 11:51 p.m.
On Mon, Feb 8, 2016 at 2:47 PM, timeless <timeless@gmail.com> wrote:

> Sean Farley <sean@farley.io> wrote:
> > Yes, this is precisely the experience I have with C++ and libc++ (yuck
> > and gross incompatibility with libstdc++). It's not just Solaris,
> > either. Try anything that is not {Linux, Windows, Mac} / {gcc, clang}
> > and you'll see all kinds of corner cases crop up.
>
> There are plenty of painful corner cases on Windows too.
> And if you try hard enough you can get plenty of annoying corner cases on
> Linux.
>
> Would a CFront style approach be good enough? Write in limited C++,
> use llvm to convert it to C, commit the C++ as source and the
> generated C and have the build system use the generated C?
>
> http://llvm.org/releases/3.1/docs/FAQ.html#translatecxx


Unfortunately LLVM removed their c backend :(


>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
Sean Farley - Feb. 8, 2016, 11:55 p.m.
timeless <timeless@gmail.com> writes:

> Sean Farley <sean@farley.io> wrote:
>> Yes, this is precisely the experience I have with C++ and libc++ (yuck
>> and gross incompatibility with libstdc++). It's not just Solaris,
>> either. Try anything that is not {Linux, Windows, Mac} / {gcc, clang}
>> and you'll see all kinds of corner cases crop up.
>
> There are plenty of painful corner cases on Windows too.
> And if you try hard enough you can get plenty of annoying corner cases on Linux.
>
> Would a CFront style approach be good enough? Write in limited C++,
> use llvm to convert it to C, commit the C++ as source and the
> generated C and have the build system use the generated C?

I'd rather not go down this route. It's a little too close to switching
languages completely.

By writing in C++ we are making it harder for us in the future to use
pypy. I think we can all agree that writing Python C is annoying
(opposed to just straight C). One way forward could be:

- write standard C, creating a shared library
- use cffi for binding to that

The benefit is that we can use our C code for pypy. The downside is that
we introduce a cffi dependency.
timeless - Feb. 9, 2016, 12:03 a.m.
On Mon, Feb 8, 2016 at 6:51 PM, Kyle Lippincott <spectral@pewpew.net> wrote:
>> http://llvm.org/releases/3.1/docs/FAQ.html#translatecxx
> Unfortunately LLVM removed their c backend :(

http://llvm.org/releases/3.1/docs/ReleaseNotes.html#whatsnew

That's hilarious. Someone should tell their FAQ author to read their
release notes :-(

(Right hand, meet left hand.)
Durham Goode - Feb. 9, 2016, 1:21 a.m.
On 2/8/16 1:53 PM, Sean Farley wrote:
> Laurent Charignon <lcharignon@fb.com> writes:
>
>> On 2/8/16, 1:33 PM, "Sean Farley" <sean@farley.io> wrote:
>>
>>> Laurent Charignon <lcharignon@fb.com> writes:
>>>
>>>> # HG changeset patch
>>>> # User Laurent Charignon <lcharignon@fb.com>
>>>> # Date 1454965979 28800
>>>> #      Mon Feb 08 13:12:59 2016 -0800
>>>> # Branch stable
>>>> # Node ID 3c1ae9c7e93e8556fa7470919108a02bd989d040
>>>> # Parent  61f4d59e9a0be4e25c1aa016db1a80a540a9d337
>>>> RFC: allow optional C++ 11 extensions with pybind11 for performance code
>>> I am very much against this as I am also against writing code in rust /
>>> go / etc. Mercurial is a *system tool*. When cloning code on IBM
>>> BlueGene it is annoying as hell to have random dependencies sneak in
>>> like this. For reference, IBM BlueGene has had problem with codes that
>>> use C++.
>> I understand your concern but that is not the case, it is not a random
>> dependency.
>> Like our C extensions, the C++ extensions, would be *optional*.
>> If you are not able to compile them on some system, mercurial should still
>> work without them.
> It would still be crippled (performance-wise). In academia (and HPC)
> this was very embarrassing. Git could be compiled fast on the same
> machine and was not crippled in the same way.
>
>>> C is much, much, much more portable. In fact, I would like to go even
>>> further and break out the C code we have into a shared library (for cffi
>>> sweetness).
>> It is true that C is more portable.
>> However, it is much more difficult (at least for me) to implement large
>> features like lazy manifest or tree manifest in C compared to rust / go /
>> C++.
>> How do you think we should proceed?
>> Do you suggest finding a C implementation of whatever
>> data-structure/feature that would work for us and include them in the
>> project?
> Why not? I don't C is *that* hard. My usual advice is to prototype in
> your favorite language of Python / rust / go / D / whatever and then
> convert it to C for portability and performance.
I think you underestimate the barrier of entry that C puts in the way.  
There's plenty of places that would benefit from native fast paths 
(revlog parsing, revset handling, etc), and a large reason we haven't 
attempted to do it is because the developer-time vs benefit ratio for C 
is so poor.

As for supporting super computers and solaris, if the current Mercurial 
performance has been adequate for those edge cases for 10 years, I think 
they will be fine with the existing perf for a while longer.  I don't 
think we should hinder development for the 99% platforms (linux, osx, 
windows) just so we can get those same perf improvements on edge cases.  
Hell, most repos are small enough that they won't even need the perf 
improvements we're proposing using c++ for.

That said, it's totally possible that C++ isn't an appropriate cross 
platform solution for even linux+osx+windows.  I'm relying on the 
experience of members of the community to determine the sanity of this idea.
Sean Farley - Feb. 9, 2016, 1:49 a.m.
Durham Goode <durham@fb.com> writes:

> On 2/8/16 1:53 PM, Sean Farley wrote:
>> Laurent Charignon <lcharignon@fb.com> writes:
>>
>>> On 2/8/16, 1:33 PM, "Sean Farley" <sean@farley.io> wrote:
>>>
>>>> Laurent Charignon <lcharignon@fb.com> writes:
>>>>
>>>>> # HG changeset patch
>>>>> # User Laurent Charignon <lcharignon@fb.com>
>>>>> # Date 1454965979 28800
>>>>> #      Mon Feb 08 13:12:59 2016 -0800
>>>>> # Branch stable
>>>>> # Node ID 3c1ae9c7e93e8556fa7470919108a02bd989d040
>>>>> # Parent  61f4d59e9a0be4e25c1aa016db1a80a540a9d337
>>>>> RFC: allow optional C++ 11 extensions with pybind11 for performance code
>>>> I am very much against this as I am also against writing code in rust /
>>>> go / etc. Mercurial is a *system tool*. When cloning code on IBM
>>>> BlueGene it is annoying as hell to have random dependencies sneak in
>>>> like this. For reference, IBM BlueGene has had problem with codes that
>>>> use C++.
>>> I understand your concern but that is not the case, it is not a random
>>> dependency.
>>> Like our C extensions, the C++ extensions, would be *optional*.
>>> If you are not able to compile them on some system, mercurial should still
>>> work without them.
>> It would still be crippled (performance-wise). In academia (and HPC)
>> this was very embarrassing. Git could be compiled fast on the same
>> machine and was not crippled in the same way.
>>
>>>> C is much, much, much more portable. In fact, I would like to go even
>>>> further and break out the C code we have into a shared library (for cffi
>>>> sweetness).
>>> It is true that C is more portable.
>>> However, it is much more difficult (at least for me) to implement large
>>> features like lazy manifest or tree manifest in C compared to rust / go /
>>> C++.
>>> How do you think we should proceed?
>>> Do you suggest finding a C implementation of whatever
>>> data-structure/feature that would work for us and include them in the
>>> project?
>> Why not? I don't C is *that* hard. My usual advice is to prototype in
>> your favorite language of Python / rust / go / D / whatever and then
>> convert it to C for portability and performance.
> I think you underestimate the barrier of entry that C puts in the way.  
> There's plenty of places that would benefit from native fast paths 
> (revlog parsing, revset handling, etc), and a large reason we haven't 
> attempted to do it is because the developer-time vs benefit ratio for C 
> is so poor.

I think you mean Python C. Writing in standalone is a much different
experience IMHO. While C has a higher barrier than, say, Python, C++ is
not the direction you want to go.

> As for supporting super computers and solaris, if the current Mercurial 
> performance has been adequate for those edge cases for 10 years, I think 
> they will be fine with the existing perf for a while longer.

This is a false statement. Solaris / super computers / esoteric hardware
are not currently suffering *any* penalty because the fast paths today
in C. You are jeopardizing that with a move to C++.

>  I don't 
> think we should hinder development for the 99% platforms (linux, osx, 
> windows) just so we can get those same perf improvements on edge cases.  
> Hell, most repos are small enough that they won't even need the perf 
> improvements we're proposing using c++ for.

You and Google are too biased for me to believe "too small".

> That said, it's totally possible that C++ isn't an appropriate cross 
> platform solution for even linux+osx+windows.  I'm relying on the 
> experience of members of the community to determine the sanity of this idea.

If you have never had to support cross-platform C++, or even libc++ vs
libstdc++, then I envy you. It is a headache no on should have to deal
with.

As I mentioned before, why not try cffi with standard C? This route will
open the door for pypy as well as a portable shared library.
Ryan McElroy - Feb. 9, 2016, 10:13 a.m.
On 2/8/2016 23:55, Sean Farley wrote:
> timeless <timeless@gmail.com> writes:
>
>> Sean Farley <sean@farley.io> wrote:
>>> Yes, this is precisely the experience I have with C++ and libc++ (yuck
>>> and gross incompatibility with libstdc++). It's not just Solaris,
>>> either. Try anything that is not {Linux, Windows, Mac} / {gcc, clang}
>>> and you'll see all kinds of corner cases crop up.
>> There are plenty of painful corner cases on Windows too.
>> And if you try hard enough you can get plenty of annoying corner cases on Linux.
>>
>> Would a CFront style approach be good enough? Write in limited C++,
>> use llvm to convert it to C, commit the C++ as source and the
>> generated C and have the build system use the generated C?
> I'd rather not go down this route. It's a little too close to switching
> languages completely.
>
> By writing in C++ we are making it harder for us in the future to use
> pypy. I think we can all agree that writing Python C is annoying
> (opposed to just straight C). One way forward could be:
>
> - write standard C, creating a shared library
> - use cffi for binding to that
>
> The benefit is that we can use our C code for pypy. The downside is that
> we introduce a cffi dependency.

I didn't know what cffi was, and I wanted to know more about it. I found 
this page which was useful: https://cffi.readthedocs.org/en/latest/

Specifically, the "Goals" section compares to to, eg, CPython. It sounds 
useful, especially if we want to go towards pypy in the future.
Bryan O'Sullivan - Feb. 9, 2016, 6:10 p.m.
On Mon, Feb 8, 2016 at 1:13 PM, Laurent Charignon <lcharignon@fb.com> wrote:

> This is a proposal to allow us to write C++ 11 extensions (in addition to
> C89),
> for optional performance code.
>

Can you share an example of a realistic C++-based extension that
demonstrates a significant improvement in readability and safety? It seems
to me that for such a large change, there's a significant burden of proof
that this proposal should meet.
Augie Fackler - Feb. 9, 2016, 8:01 p.m.
On Mon, Feb 08, 2016 at 03:55:59PM -0800, Sean Farley wrote:
>
> timeless <timeless@gmail.com> writes:
>
> > Sean Farley <sean@farley.io> wrote:
> >> Yes, this is precisely the experience I have with C++ and libc++ (yuck
> >> and gross incompatibility with libstdc++). It's not just Solaris,
> >> either. Try anything that is not {Linux, Windows, Mac} / {gcc, clang}
> >> and you'll see all kinds of corner cases crop up.
> >
> > There are plenty of painful corner cases on Windows too.
> > And if you try hard enough you can get plenty of annoying corner cases on Linux.
> >
> > Would a CFront style approach be good enough? Write in limited C++,
> > use llvm to convert it to C, commit the C++ as source and the
> > generated C and have the build system use the generated C?
>
> I'd rather not go down this route. It's a little too close to switching
> languages completely.
>
> By writing in C++ we are making it harder for us in the future to use
> pypy. I think we can all agree that writing Python C is annoying
> (opposed to just straight C). One way forward could be:
>
> - write standard C, creating a shared library
> - use cffi for binding to that
>

Writing standard C is actually /worse/ than writing Python C. You have
even fewer nice things to work with (no map type out of the box, but
in Python C you can at least use PyDict). Part of the reason C++
appeals is precisely because it's a widely-available language with
better memory management tools (like smart pointers) and richer standard
tools available (STL collections come to mind).

I don't dispute the idea that it'll make some things harder. I *am*
getting tired of the same old "HPC people can't get their shit
together" song and dance being the reason I can't have any nice things
and have to write a language that's fundamentally stuck in the 70s.

Put another way: I'm totally open to having some way to do cleaner
programming in C, but as it stands the tools we're able to use there
are disastrously difficult and error-prone.


>
> The benefit is that we can use our C code for pypy. The downside is that
> we introduce a cffi dependency.
>

http://doc.pypy.org/en/release-1.9/cppyy.html is a thing for pypy. I
can't speak to its overall value or usabilty, but C++ does not
preclude use of pypy in the future.

> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Augie Fackler - Feb. 9, 2016, 8:09 p.m.
On Mon, Feb 08, 2016 at 05:49:47PM -0800, Sean Farley wrote:
>
> Durham Goode <durham@fb.com> writes:
>
> > That said, it's totally possible that C++ isn't an appropriate cross
> > platform solution for even linux+osx+windows.  I'm relying on the
> > experience of members of the community to determine the sanity of this idea.
>
> If you have never had to support cross-platform C++, or even libc++ vs
> libstdc++, then I envy you. It is a headache no on should have to deal
> with.
>
> As I mentioned before, why not try cffi with standard C? This route will
> open the door for pypy as well as a portable shared library.

I mentioned this in another reply on the thread, but I'll bring it up
here: "standard C" is such a tiny collection of standard features that
if you want me to take this proposal seriously, you need to start
talking about what the data structures story is going to be for things
that I get for free pretty much anywhere else: maps, lists, sets,
etc. lazymanifest was about the upper end of what I'm willing to spend
time on in C[0], and I know there's more work we could be doing in
native code for some nice performance wins.

I make no claim that C++ is going to be easy on all platforms, but my
strong suspicion is that on the platforms where we have serious users
it'll be relatively straightforward.

AF

0: If I'd known lazymanifest was going to take as long as it did, I
probably wouldn't have done it. I consider myself pretty comfortable
with C, so I think there's a real barrier-to-contribution present from
our continued use of C89 for native speedups.

> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Matthew Turk - Feb. 9, 2016, 8:41 p.m.
Hi Augie,

On Tue, Feb 9, 2016 at 2:01 PM, Augie Fackler <raf@durin42.com> wrote:

> On Mon, Feb 08, 2016 at 03:55:59PM -0800, Sean Farley wrote:
> >
> > timeless <timeless@gmail.com> writes:
> >
> > > Sean Farley <sean@farley.io> wrote:
> > >> Yes, this is precisely the experience I have with C++ and libc++ (yuck
> > >> and gross incompatibility with libstdc++). It's not just Solaris,
> > >> either. Try anything that is not {Linux, Windows, Mac} / {gcc, clang}
> > >> and you'll see all kinds of corner cases crop up.
> > >
> > > There are plenty of painful corner cases on Windows too.
> > > And if you try hard enough you can get plenty of annoying corner cases
> on Linux.
> > >
> > > Would a CFront style approach be good enough? Write in limited C++,
> > > use llvm to convert it to C, commit the C++ as source and the
> > > generated C and have the build system use the generated C?
> >
> > I'd rather not go down this route. It's a little too close to switching
> > languages completely.
> >
> > By writing in C++ we are making it harder for us in the future to use
> > pypy. I think we can all agree that writing Python C is annoying
> > (opposed to just straight C). One way forward could be:
> >
> > - write standard C, creating a shared library
> > - use cffi for binding to that
> >
>
> Writing standard C is actually /worse/ than writing Python C. You have
> even fewer nice things to work with (no map type out of the box, but
> in Python C you can at least use PyDict). Part of the reason C++
> appeals is precisely because it's a widely-available language with
> better memory management tools (like smart pointers) and richer standard
> tools available (STL collections come to mind).
>
> I don't dispute the idea that it'll make some things harder. I *am*
> getting tired of the same old "HPC people can't get their shit
> together" song and dance being the reason I can't have any nice things
> and have to write a language that's fundamentally stuck in the 70s.
>
>
On IRC I mentioned this, but I wanted to bring it up here as well.  I work
on a python project that gets deployed on HPC clusters quite a lot, and we
have long been resistant to C++ in the codebase.  The reasons for this have
been similar to what Sean mentions (although I have not personally done
much with BG machines) and in particular are often related to things like
running on reduced OS nodes that don't have shared library loaders, etc etc.

This has in the past led to needing to do any number of hoop-jumping to
statically link a bunch of Python objects into a massive executable,
mangling zipimport, and so on.  Lately though, we have (in our project,
which may *not* translate elsewhere) found that these have not been issues;
for the most part, deployments are now on machines that don't restrict
nodes having shared libraries, things are slightly more up to date, etc
etc.  C++11 support can be a little weak in places, but I don't have the
numbers to back it up.  Statically linking can be really tricky with mixed
codebases that sometimes share different compiler toolchains (i.e., mixing
gcc with icc), which has caused us headaches previously.

But, as I mentioned to Sean on IRC, in our project we have opened up a
little bit to more C++ in the code base.  The machines we're finding are
now much more up to date on compiler toolchains and we're not really
needing to static link anymore.  For the use cases hg is used in, typically
individuals will be running on "head" nodes which are usually much more
feature rich anyway.

The biggest issues I have *personally* seen with mercurial on HPC centers
in the last year or two have been completely unrelated to mercurial.  These
come down to:

 * Old, out of date OpenSSL libraries or python installations
 * Bad filesystem performance for metadata lookups

I'll hold back from outright disagreeing with Sean (especially since IIRC
he's more experienced in the DOE HPC side of things, and I'm more
experienced in the NSF HPC side) but I don't think that from *my*
perspective C++ embedded in a Python project would be a big barrier to
adoption of hg on the HPC machines I use.  Although, come to think of it, I
use matplotlib on *every* machine I work on, and it has a large C++
component embedded in a Python project, and it has only been a problem when
running on compute nodes, which hg probably doesn't need to do.

-Matt


> Put another way: I'm totally open to having some way to do cleaner
> programming in C, but as it stands the tools we're able to use there
> are disastrously difficult and error-prone.
>
>
> >
> > The benefit is that we can use our C code for pypy. The downside is that
> > we introduce a cffi dependency.
> >
>
> http://doc.pypy.org/en/release-1.9/cppyy.html is a thing for pypy. I
> can't speak to its overall value or usabilty, but C++ does not
> preclude use of pypy in the future.
>
> > _______________________________________________
> > Mercurial-devel mailing list
> > Mercurial-devel@mercurial-scm.org
> > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
Angel Ezquerra - Feb. 9, 2016, 10:32 p.m.
El martes, 9 de febrero de 2016, Matthew Turk <matthewturk@gmail.com>
escribió:

> Hi Augie,
>
> On Tue, Feb 9, 2016 at 2:01 PM, Augie Fackler <raf@durin42.com
> <javascript:_e(%7B%7D,'cvml','raf@durin42.com');>> wrote:
>
>> On Mon, Feb 08, 2016 at 03:55:59PM -0800, Sean Farley wrote:
>> >
>> > timeless <timeless@gmail.com
>> <javascript:_e(%7B%7D,'cvml','timeless@gmail.com');>> writes:
>> >
>> > > Sean Farley <sean@farley.io
>> <javascript:_e(%7B%7D,'cvml','sean@farley.io');>> wrote:
>> > >> Yes, this is precisely the experience I have with C++ and libc++
>> (yuck
>> > >> and gross incompatibility with libstdc++). It's not just Solaris,
>> > >> either. Try anything that is not {Linux, Windows, Mac} / {gcc, clang}
>> > >> and you'll see all kinds of corner cases crop up.
>> > >
>> > > There are plenty of painful corner cases on Windows too.
>> > > And if you try hard enough you can get plenty of annoying corner
>> cases on Linux.
>> > >
>> > > Would a CFront style approach be good enough? Write in limited C++,
>> > > use llvm to convert it to C, commit the C++ as source and the
>> > > generated C and have the build system use the generated C?
>> >
>> > I'd rather not go down this route. It's a little too close to switching
>> > languages completely.
>> >
>> > By writing in C++ we are making it harder for us in the future to use
>> > pypy. I think we can all agree that writing Python C is annoying
>> > (opposed to just straight C). One way forward could be:
>> >
>> > - write standard C, creating a shared library
>> > - use cffi for binding to that
>> >
>>
>> Writing standard C is actually /worse/ than writing Python C. You have
>> even fewer nice things to work with (no map type out of the box, but
>> in Python C you can at least use PyDict). Part of the reason C++
>> appeals is precisely because it's a widely-available language with
>> better memory management tools (like smart pointers) and richer standard
>> tools available (STL collections come to mind).
>>
>> I don't dispute the idea that it'll make some things harder. I *am*
>> getting tired of the same old "HPC people can't get their shit
>> together" song and dance being the reason I can't have any nice things
>> and have to write a language that's fundamentally stuck in the 70s.
>>
>>
> On IRC I mentioned this, but I wanted to bring it up here as well.  I work
> on a python project that gets deployed on HPC clusters quite a lot, and we
> have long been resistant to C++ in the codebase.  The reasons for this have
> been similar to what Sean mentions (although I have not personally done
> much with BG machines) and in particular are often related to things like
> running on reduced OS nodes that don't have shared library loaders, etc etc.
>
> This has in the past led to needing to do any number of hoop-jumping to
> statically link a bunch of Python objects into a massive executable,
> mangling zipimport, and so on.  Lately though, we have (in our project,
> which may *not* translate elsewhere) found that these have not been issues;
> for the most part, deployments are now on machines that don't restrict
> nodes having shared libraries, things are slightly more up to date, etc
> etc.  C++11 support can be a little weak in places, but I don't have the
> numbers to back it up.  Statically linking can be really tricky with mixed
> codebases that sometimes share different compiler toolchains (i.e., mixing
> gcc with icc), which has caused us headaches previously.
>
> But, as I mentioned to Sean on IRC, in our project we have opened up a
> little bit to more C++ in the code base.  The machines we're finding are
> now much more up to date on compiler toolchains and we're not really
> needing to static link anymore.  For the use cases hg is used in, typically
> individuals will be running on "head" nodes which are usually much more
> feature rich anyway.
>
> The biggest issues I have *personally* seen with mercurial on HPC centers
> in the last year or two have been completely unrelated to mercurial.  These
> come down to:
>
>  * Old, out of date OpenSSL libraries or python installations
>  * Bad filesystem performance for metadata lookups
>
> I'll hold back from outright disagreeing with Sean (especially since IIRC
> he's more experienced in the DOE HPC side of things, and I'm more
> experienced in the NSF HPC side) but I don't think that from *my*
> perspective C++ embedded in a Python project would be a big barrier to
> adoption of hg on the HPC machines I use.  Although, come to think of it, I
> use matplotlib on *every* machine I work on, and it has a large C++
> component embedded in a Python project, and it has only been a problem when
> running on compute nodes, which hg probably doesn't need to do.
>
> -Matt
>
>
>> Put another way: I'm totally open to having some way to do cleaner
>> programming in C, but as it stands the tools we're able to use there
>> are disastrously difficult and error-prone.
>>
>>
>> >
>> > The benefit is that we can use our C code for pypy. The downside is that
>> > we introduce a cffi dependency.
>> >
>>
>> http://doc.pypy.org/en/release-1.9/cppyy.html is a thing for pypy. I
>> can't speak to its overall value or usabilty, but C++ does not
>> preclude use of pypy in the future.
>
>
Have you guys considered using some language that transpiles to C instead
of using C++?

I've recently been playing around with nim (http://nim-lang.org), which
compiles to C but has a much nicer syntax (it feels like a statically typed
Python) and a better standard library. It was surprisingly productive and
the resulting executable was as fast as I could have expected from a
program written in raw C or C++.

I'm not saying that using a niche language such as nim would be a good idea
for the mercurial project (nim has not hit version 1.0 yet, although it
will soon), but perhaps there are other ways to generate C code that could
be used (e.g. Cython, etc)?

Cheers,

Angel
Bryan O'Sullivan - Feb. 10, 2016, 12:39 a.m.
On Mon, Feb 8, 2016 at 1:33 PM, Sean Farley <sean@farley.io> wrote:

> When cloning code on IBM
> BlueGene it is annoying as hell to have random dependencies sneak in
> like this. For reference, IBM BlueGene has had problem with codes that
> use C++.
>

As much as I think Laurent has a burden of proof for how C++ helps with
productivity, I submit that you too owe a strong case for why the Mercurial
developer community should be hamstrung by support for a tiny
sub-population if said sub-population is permanently unable or unwilling to
keep up with the times.

This isn't a game of absolutes, and I'd rather have a discussion of the
trade-offs.
Augie Fackler - Feb. 10, 2016, 1:17 a.m.
On Tue, Feb 09, 2016 at 10:10:59AM -0800, Bryan O'Sullivan wrote:
> On Mon, Feb 8, 2016 at 1:13 PM, Laurent Charignon <lcharignon@fb.com> wrote:
>
> > This is a proposal to allow us to write C++ 11 extensions (in addition to
> > C89),
> > for optional performance code.
> >
>
> Can you share an example of a realistic C++-based extension that
> demonstrates a significant improvement in readability and safety? It seems
> to me that for such a large change, there's a significant burden of proof
> that this proposal should meet.

I'll give this some thought. I'm on leave this week, but if I think of
something small I could rearrange using C++ I might give it a
shot. Laurent, I'm going to be a little hard to reach via irc, but
please feel encouraged to coordinate with me off-list if it'll help.

(In a perfect world, we'd be talking about something more like Rust
and not C++, but this ain't that world yet.)

> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Gregory Szorc - Feb. 10, 2016, 2:15 a.m.
On Tue, Feb 9, 2016 at 12:01 PM, Augie Fackler <raf@durin42.com> wrote:

> On Mon, Feb 08, 2016 at 03:55:59PM -0800, Sean Farley wrote:
> >
> > timeless <timeless@gmail.com> writes:
> >
> > > Sean Farley <sean@farley.io> wrote:
> > >> Yes, this is precisely the experience I have with C++ and libc++ (yuck
> > >> and gross incompatibility with libstdc++). It's not just Solaris,
> > >> either. Try anything that is not {Linux, Windows, Mac} / {gcc, clang}
> > >> and you'll see all kinds of corner cases crop up.
> > >
> > > There are plenty of painful corner cases on Windows too.
> > > And if you try hard enough you can get plenty of annoying corner cases
> on Linux.
> > >
> > > Would a CFront style approach be good enough? Write in limited C++,
> > > use llvm to convert it to C, commit the C++ as source and the
> > > generated C and have the build system use the generated C?
> >
> > I'd rather not go down this route. It's a little too close to switching
> > languages completely.
> >
> > By writing in C++ we are making it harder for us in the future to use
> > pypy. I think we can all agree that writing Python C is annoying
> > (opposed to just straight C). One way forward could be:
> >
> > - write standard C, creating a shared library
> > - use cffi for binding to that
> >
>
> Writing standard C is actually /worse/ than writing Python C. You have
> even fewer nice things to work with (no map type out of the box, but
> in Python C you can at least use PyDict). Part of the reason C++
> appeals is precisely because it's a widely-available language with
> better memory management tools (like smart pointers) and richer standard
> tools available (STL collections come to mind).
>

Something else to consider is the inevitable need to support Python 3,
whose C API is different enough from Python 2's to cause at least mild pain.

When you factor in the need to support PyPy, I think it is inevitable that
we end up rewriting the non-Python code to something that conforms to the C
ABI and then calling it from ctypes or cffi. This almost certainly means
ditching Python's C layer.

What we replace the Python C with, I don't know. There are standalone C
header files implementing common data structures we could use to leverage
common data structures not provided by C that would at least give us a
partial illusion of C++. I'm also attracted to the idea of languages like
Rust and Go that compile to the C ABI *and* have their own standard library
to leverage. C++ certainly falls in this camp. While I don't want esoteric
environments having sway over this discussion, there is truth that
libstdc++ compatibility is a hornets nest. There's a reason C has persisted
in popularity despite the numerous advantages of C++ :/


>
>
> >
> > The benefit is that we can use our C code for pypy. The downside is that
> > we introduce a cffi dependency.
> >
>
> http://doc.pypy.org/en/release-1.9/cppyy.html is a thing for pypy. I
> can't speak to its overall value or usabilty, but C++ does not
> preclude use of pypy in the future.
>
> > _______________________________________________
> > Mercurial-devel mailing list
> > Mercurial-devel@mercurial-scm.org
> > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
Sean Farley - Feb. 10, 2016, 11:36 p.m.
Bryan O'Sullivan <bos@serpentine.com> writes:

> On Mon, Feb 8, 2016 at 1:33 PM, Sean Farley <sean@farley.io> wrote:
>
>> When cloning code on IBM
>> BlueGene it is annoying as hell to have random dependencies sneak in
>> like this. For reference, IBM BlueGene has had problem with codes that
>> use C++.
>>
>
> As much as I think Laurent has a burden of proof for how C++ helps with
> productivity, I submit that you too owe a strong case for why the Mercurial
> developer community should be hamstrung by support for a tiny
> sub-population if said sub-population is permanently unable or unwilling to
> keep up with the times.
>
> This isn't a game of absolutes, and I'd rather have a discussion of the
> trade-offs.

That's fair. And I agree.

I didn't so much mean that we should be held up by this subset, I meant
more of "here's one example of using C++ that backfired." There are
other examples, though. There are papercuts from C++'s name mangling.
Not to mention the hornet's nest with ABI compatibility. That issue is
still a problem for the Mac (try compiling boost with gcc). Summing all
of these issues does not instill confidence that we would do any better.

My main fear is having a negative user experience. DVCS, at its core, is
a system tool. Impeding a developer from using their familiar tools is a
serious sin.

So, what else is there?

- code generation?
  - pros: outputs C
  - cons: I have yet to see a code generator provide a good debugger

- cython?
  - pros: has a debugger, somewhat native code
  - cons: incompatible with pypy

- cffi?
  - pros: cleaner C, native code, pypy support
  - cons: need to include custom data types

- something else?

- something crazy?
Laurent Charignon - Feb. 11, 2016, 3:30 p.m.
Hi,

For the time being and for the tree manifest, I will use C and reuse code from lazy manifest.
If the perf are not acceptable(because no hash table) I will look into hash tables in C that we could add to our project.
Writing the code in C seems like a non controversial way to proceed and have someone review the changes :)

In this discussion we all seem to agree on one thing: **we will have to ditch the Python C layer in the near future**. 
We don't know yet if we should (1) use cffi and ctypes to move toward pypy or (2) use cython.
(1) Implies rewriting of our C layer to decouple it from the Python API.
(2) Implies ditching our C code and adding type hints to our performance sensitive code in python, correct?

I didn't really follow the discussions around pypy, when are we planning to support pypy?

Matt, what do you think about this discussion?

Thanks,

Laurent





> On Feb 10, 2016, at 3:36 PM, Sean Farley <sean@farley.io> wrote:
> 
> 
> Bryan O'Sullivan <bos@serpentine.com> writes:
> 
>> On Mon, Feb 8, 2016 at 1:33 PM, Sean Farley <sean@farley.io> wrote:
>> 
>>> When cloning code on IBM
>>> BlueGene it is annoying as hell to have random dependencies sneak in
>>> like this. For reference, IBM BlueGene has had problem with codes that
>>> use C++.
>>> 
>> 
>> As much as I think Laurent has a burden of proof for how C++ helps with
>> productivity, I submit that you too owe a strong case for why the Mercurial
>> developer community should be hamstrung by support for a tiny
>> sub-population if said sub-population is permanently unable or unwilling to
>> keep up with the times.
>> 
>> This isn't a game of absolutes, and I'd rather have a discussion of the
>> trade-offs.
> 
> That's fair. And I agree.
> 
> I didn't so much mean that we should be held up by this subset, I meant
> more of "here's one example of using C++ that backfired." There are
> other examples, though. There are papercuts from C++'s name mangling.
> Not to mention the hornet's nest with ABI compatibility. That issue is
> still a problem for the Mac (try compiling boost with gcc). Summing all
> of these issues does not instill confidence that we would do any better.
> 
> My main fear is having a negative user experience. DVCS, at its core, is
> a system tool. Impeding a developer from using their familiar tools is a
> serious sin.
> 
> So, what else is there?
> 
> - code generation?
>  - pros: outputs C
>  - cons: I have yet to see a code generator provide a good debugger
> 
> - cython?
>  - pros: has a debugger, somewhat native code
>  - cons: incompatible with pypy
> 
> - cffi?
>  - pros: cleaner C, native code, pypy support
>  - cons: need to include custom data types
> 
> - something else?
> 
> - something crazy?
Matt Mackall - Feb. 19, 2016, 9:06 p.m.
On Thu, 2016-02-11 at 15:30 +0000, Laurent Charignon wrote:
> Hi,
> 
> For the time being and for the tree manifest, I will use C and reuse code from
> lazy manifest.
> If the perf are not acceptable(because no hash table) I will look into hash
> tables in C that we could add to our project.
> Writing the code in C seems like a non controversial way to proceed and have
> someone review the changes :)
> 
> In this discussion we all seem to agree on one thing: **we will have to ditch
> the Python C layer in the near future**. 
> We don't know yet if we should (1) use cffi and ctypes to move toward pypy or
> (2) use cython.
> (1) Implies rewriting of our C layer to decouple it from the Python API.
> (2) Implies ditching our C code and adding type hints to our performance
> sensitive code in python, correct?
> 
> I didn't really follow the discussions around pypy, when are we planning to
> support pypy?
> 
> Matt, what do you think about this discussion?

I think we're going to find that using anything but the Python C layer is going
to have performance consequences we're not happy with, especially for building
large Python-native objects. That's been our usual strategy as it generally
means less C code and thus less maintenance pain. Prime example: parsing the
manifest.

But a strategy that we've used in a few places is:

a) build a native C object
b) wrap it as a Python object that mimics a native type like a list or dict
c) drop it into a place we use a native type

This requires a LOT more boilerplate but lets us do things like deferred parsing
and construction of Python's expensive boxed types. We've primarily done this
for the revlog index. The important thing to note here is that while the old
strategy gave us bare metal performance, the new strategy is even faster _by not
doing lots of work_, a situation that's made possible by owning the abstraction
in C.

Unfortunately, this strategy looks like it doesn't work with cffi/ctypes/cython
because it can't do the all-important step (b). Which means:

1) we have to rewrite LOTS of Python code to replace x[foo] with somefunction(x,
foo)
2) we have to add explicit lifetime management of x
3) the Python-level overhead of function calls is way higher
4) there's probably significantly more type boxing/unboxing overhead

It's also worth mentioning that with the Python C API, we get to directly
control some significant aspects of Python garbage collection and threading that
have cut seconds off of runtime but aren't going to be accessible in other
models.

So I'm basically not thrilled with any of our alternatives here. But if someone
wants to experiment in this space, I'd recommend converting our index-parsing
code paths to cffi to get a good feel for the pain involved.

(C++ obviously has all of the above concerns.. plus C++. It's hard enough to get
the current requisite "free" VS98 compiler in the hands of Windows developers,
and I'm pretty sure it doesn't do C++11.)
Siddharth Agarwal - Feb. 19, 2016, 10:29 p.m.
On 2/19/16 13:06, Matt Mackall wrote:
> On Thu, 2016-02-11 at 15:30 +0000, Laurent Charignon wrote:
>> Hi,
>>
>> For the time being and for the tree manifest, I will use C and reuse code from
>> lazy manifest.
>> If the perf are not acceptable(because no hash table) I will look into hash
>> tables in C that we could add to our project.
>> Writing the code in C seems like a non controversial way to proceed and have
>> someone review the changes :)
>>
>> In this discussion we all seem to agree on one thing: **we will have to ditch
>> the Python C layer in the near future**.
>> We don't know yet if we should (1) use cffi and ctypes to move toward pypy or
>> (2) use cython.
>> (1) Implies rewriting of our C layer to decouple it from the Python API.
>> (2) Implies ditching our C code and adding type hints to our performance
>> sensitive code in python, correct?
>>
>> I didn't really follow the discussions around pypy, when are we planning to
>> support pypy?
>>
>> Matt, what do you think about this discussion?
> I think we're going to find that using anything but the Python C layer is going
> to have performance consequences we're not happy with, especially for building
> large Python-native objects. That's been our usual strategy as it generally
> means less C code and thus less maintenance pain. Prime example: parsing the
> manifest.
>
> But a strategy that we've used in a few places is:
>
> a) build a native C object
> b) wrap it as a Python object that mimics a native type like a list or dict
> c) drop it into a place we use a native type
>
> This requires a LOT more boilerplate but lets us do things like deferred parsing
> and construction of Python's expensive boxed types. We've primarily done this
> for the revlog index. The important thing to note here is that while the old
> strategy gave us bare metal performance, the new strategy is even faster _by not
> doing lots of work_, a situation that's made possible by owning the abstraction
> in C.
>
> Unfortunately, this strategy looks like it doesn't work with cffi/ctypes/cython
> because it can't do the all-important step (b). Which means:
>
> 1) we have to rewrite LOTS of Python code to replace x[foo] with somefunction(x,
> foo)
> 2) we have to add explicit lifetime management of x
> 3) the Python-level overhead of function calls is way higher
> 4) there's probably significantly more type boxing/unboxing overhead
>
> It's also worth mentioning that with the Python C API, we get to directly
> control some significant aspects of Python garbage collection and threading that
> have cut seconds off of runtime but aren't going to be accessible in other
> models.
>
> So I'm basically not thrilled with any of our alternatives here. But if someone
> wants to experiment in this space, I'd recommend converting our index-parsing
> code paths to cffi to get a good feel for the pain involved.

For a possibly less involved example of this, it might be worth looking 
at the code that constructs dirstate tuples. These behave like Python 
tuples, except the individual elements are only materialized when requested.

- Siddharth

>
> (C++ obviously has all of the above concerns.. plus C++. It's hard enough to get
> the current requisite "free" VS98 compiler in the hands of Windows developers,
> and I'm pretty sure it doesn't do C++11.)
>

Patch

diff --git a/mercurial/pybindtest.cpp b/mercurial/pybindtest.cpp
new file mode 100644
--- /dev/null
+++ b/mercurial/pybindtest.cpp
@@ -0,0 +1,15 @@ 
+#include <pybind11/pybind11.h>
+#include "util.h"
+
+int add(int i, int j) {
+    return i + j;
+}
+
+namespace py = pybind11;
+
+PYBIND11_PLUGIN(pybindtest) {
+    py::module m("pybindtest", "pybind11 example plugin");
+    m.def("add", &add, "A function which adds two numbers");
+
+    return m.ptr();
+}
diff --git a/mercurial/util.h b/mercurial/util.h
--- a/mercurial/util.h
+++ b/mercurial/util.h
@@ -157,6 +157,8 @@  enum normcase_spec {
 };
 
 #define MIN(a, b) (((a)<(b))?(a):(b))
+/* C++ defines bool, we don't want to redefine it */
+#ifndef __cplusplus
 /* VC9 doesn't include bool and lacks stdbool.h based on my searching */
 #if defined(_MSC_VER) || __STDC_VERSION__ < 199901L
 #define true 1
@@ -165,5 +167,6 @@  typedef unsigned char bool;
 #else
 #include <stdbool.h>
 #endif
+#endif /* __cplusplus */
 
 #endif /* _HG_UTIL_H_ */
diff --git a/setup.py b/setup.py
--- a/setup.py
+++ b/setup.py
@@ -610,6 +610,16 @@  datafiles = []
 setupversion = version
 extra = {}
 
+pybindpath = os.environ.get("HGPYBIND11INCLUDEPATH", None)
+if pybindpath:
+    cppext = [
+              Extension('mercurial.pybindtest', ['mercurial/pybindtest.cpp',],
+              language="c++",
+              include_dirs = [pybindpath],
+              extra_compile_args=['-std=c++11'])
+    ]
+    extmodules.extend(cppext)
+
 if py2exeloaded:
     extra['console'] = [
         {'script':'hg',