Unless you are doing Machine learning or using numpy, I do not recommend anyone ...

cgreerrun · on March 1, 2023

> I do not recommend anyone use python for anything performance sensitive

My default philosophy is to use python _until_ you find something that is performance sensitive, and then make a C/C++ extension for the slow bits. Pybind works great for a hybrid Python/C++ codebase (https://pybind11.readthedocs.io/en/stable/).

Then you can develop and prototype much quicker w/ Python but re-write the slow parts in C++.

Definitely more of a judgement call when threading and function call overhead enter the equation, but I've found this hybrid "99% of the time Python, 1% C++ when needed" setup works great. And it's typically easier for me to eventually port mature code to C++/Go/etc once I've fleshed it out in Python and hit all the design snags.

cgreerrun · on March 1, 2023

If you've never used Pybind before these pybind tests[1] and this repo[2] have good examples you can crib to get started (in addition to the docs). Once you handle passing/returning/creating the main data types (list, tuple, dict, set, numpy array) the first time, then it's mostly smooth sailing.

Pybind offers a lot of functionality, but core "good parts" I've found useful are (a) use a numpy array in Python and pass it to a C++ method to work on, (b) pass your python data structure to pybind and then do work on it in C++ (some copy overhead), and (c) Make a class/struct in C++ and expose it to Python (so no copying overhead and you can create nice cache-aware structs, etc.).

[1] https://github.com/pybind/pybind11/blob/master/tests/test_py...

[2] https://gitlab.unistra.fr/benzerara/pybind11/-/tree/master/e...

synergy20 · on March 7, 2023

what about CFFI?

nijave · on March 1, 2023

>and then make a C/C++ extension for the slow bits

I've found cross language bindings to be pretty easy with Python. If you want something with memory management, you can use cgo

synergy20 · on March 7, 2023

isn't cgo for golang?

mlyle · on March 1, 2023

You're still kinda stuck with concurrency of the python code itself though. It sure would be nice to be able to just throw cores at problems for awhile.

Computers are cheap, and people are expensive.

KptMarchewa · on March 2, 2023

You can spawn threads as much as you want from native language.

mlyle · on March 2, 2023

Sure, and if you can't get stuff in and out of Python objects with concurrency, it doesn't help you much a lot of the time. Plus, again, computing is cheap: it'd be nice to use all my cores before I spend a lot of effort optimizing and rewriting things in native code.

VHRanger · on March 2, 2023

That's a data layout problem largely, though.

If your data is fragmented across a bunch of small containers/classes, passing it around will be expensive whichever the method (either passing to C++, or just in terms of cache efficiency).

If you just pass an array of data back and forth it's cheap.

mlyle · on March 2, 2023

> If you just pass an array of data back and forth it's cheap.

Yes, and numpy is great, and all. Python works great as glue to marshal things to and from native code and do inexpensive (but possibly complicated) bits of control logic.

But if I'm trying to deal with large numbers of client requests, say... the lack of concurrency in python itself really hurts. Sure, I can punt almost everything to native code, but what's the point in having Python at all, then?

Not all problems have state that can be shared well across Multiprocessing or completely externalized to large lumps that travel to native code in a few calls-- I'd actually say these are special case exceptions than the rule.

VHRanger · on March 2, 2023

Right, if your issue is that you're getting a ton of small, individual API calls, python will suffer.

At that point you'd look to Go or another language, and also carefully choose the REST API framework you're setting up.

Note that having a solution setup where the end result is "a ton of small, individual API calls" could possibly indicate a bad system architecture.

mlyle · on March 2, 2023

> Note that having a solution setup where the end result is "a ton of small, individual API calls" could possibly indicate a bad system architecture.

Or just a lot of clients with a fair bit of shared state which is best kept resident, which is a pretty common use case.

It's a bummer to write python code that works well, and then maxes out at 130% CPU load when you grow your usage... and not have any obvious path to scale upwards despite you having 32 threads of execution around. Then, you can rewrite some of the more expensive things in native code to squeeze a little more performance, or add indirection to store the data somewhere else so multiprocessing works.

Other languages that have more finely grained locks scale 3-4x higher with minimal thought, and much, much higher with a bit of thought about how to handle locking and data model.

> At that point you'd look to Go or another language

Well, yah... this is us complaining about Python's concurrency problems.

bb88 · on March 2, 2023

I think the question is how much more cost does it take to move the code from python to C/C++/Rust/whatever? That's a human problem until ChatGPT can solve that problem for you.

KptMarchewa · on March 2, 2023

Rust with Py03/Maturin fills that role for me. And it works with Java/JNI too.

jacobr1 · on March 1, 2023

I can also recommend Cython as an alternative to Pybind for the extension wrapper.

bee_rider · on March 1, 2023

And if you are using for example Numpy, you aren’t using Python for anything performance sensitive of course, because Numpy is almost certainly calling the system’s tuned BLAS implementation. Which should handle the parallelism I guess. If anything I’d expect parallel Python calls to Numpy to result in oversubscription…

fest · on March 1, 2023

I don't think numpy functions are neccessarily multithreaded and probably many are inherently sequential by their nature, so there are definitely case where multiprocessing can speed up the overall program.

bb88 · on March 2, 2023

Someone once said that python + numpy is probably going to be faster than writing it using basic C++, since numpy is using highly tuned libraries underneath.

I don't know for certain this is the case, but I'd like to see some benchmarks about it.

kelipso · on March 2, 2023

You would almost never use raw C++ when working with linear algebra stuff. You use a library like Eigen that interfaces with BLAS, LAPACK, etc., so you definitely get all the advantages of those highly tuned libraries, plus the speed of C++ and potential flexibility of not having to make multiple array copies and so on.

cozzyd · on March 2, 2023

Right, but you can use those highly tuned libraries yourself.

I do kind of wish numpy had a stable C API that didn't require a Python interpreter though.

bee_rider · on March 2, 2023

They aren’t necessarily threaded, but if you care about Numpy performance on an Intel chip at least you are already using MKL for Numpy’s BLAS, and MKL’s gemm is threaded.

anonkogudhyfhhf · on March 1, 2023

Parallelism using CPU instructions and vectorization most likely. Threading would still improve performance

bee_rider · on March 2, 2023

Multiplying a large enough matrix in Python using MKL for Numpy, I can watch the cpu usage go to 400% in top. You may need to run it in a loop or make the matrices quite large, a surprisingly large amount of computation has to happen before it’ll show up in top.