3.4. Performance

3.4.1. Measuring performance

The performance of the py-pde package depends on many details and general statements are thus difficult to make. However, since the core operators are just-in-time compiled using numba, many operations of the package proceed at performances close to most compiled languages. For instance, a simple Laplace operator applied to fields defined on a Cartesian grid has performance that is similar to the operators supplied by the popular OpenCV package. The following figures illustrate this by showing the duration of evaluating the Laplacian on grids of increasing number of support points for two different boundary conditions (lower duration is better):

../_images/performance_periodic.png ../_images/performance_noflux.png

Note that the call overhead is lower in the py-pde package, so that the performance on small grids is particularly good. However, realistic use-cases probably need more complicated operations and it is thus always necessary to profile the respective code. This can be done using the function estimate_computation_speed() or the traditional timeit, profile, or even more sophisticated profilers like pyinstrument.

3.4.2. Improving performance

Factors influencing the performance of the package include the compiler used for numpy, scipy, and of course numba. Moreover, the BLAS and LAPACK libraries might make a difference. The package has some basic support for multithreading, which can be accelerated using the Threading Building Blocks library. Finally, it can help to install the intel short vector math library (SVML). However, this is not distributed with macports and might thus be more difficult to enable.

Using macports, one could for instance install the following variants of typical packages

port install py37-numpy +gcc8+openblas
port install py37-scipy +gcc8+openblas
port install py37-numba +tbb