3.4. Performance¶
3.4.1. Measuring performance¶
The performance of the py-pde package depends on many details and general
statements are thus difficult to make.
However, since the core operators are just-in-time compiled using numba
,
many operations of the package proceed at performances close to most compiled
languages.
For instance, a simple Laplace operator applied to fields defined on a Cartesian
grid has performance that is similar to the operators supplied by the popular
OpenCV package.
The following figures illustrate this by showing the duration of evaluating the
Laplacian on grids of increasing number of support points for
two different boundary conditions (lower duration is better):
Note that the call overhead is lower in the py-pde package, so that the
performance on small grids is particularly good.
However, realistic use-cases probably need more complicated operations and it is
thus always necessary to profile the respective code.
This can be done using the function
estimate_computation_speed()
or the traditional
timeit
, profile
, or even more sophisticated profilers like
pyinstrument.
3.4.2. Improving performance¶
Factors influencing the performance of the package include the compiler used for
numpy
, scipy
, and of course numba
.
Moreover, the BLAS and LAPACK libraries might make a difference.
The package has some basic support for multithreading, which can be accelerated
using the Threading Building Blocks library.
Finally, it can help to install the intel short vector math library (SVML).
However, this is not distributed with macports and might thus be more
difficult to enable.
Using macports, one could for instance install the following variants of typical packages
port install py37-numpy +gcc8+openblas
port install py37-scipy +gcc8+openblas
port install py37-numba +tbb