Skip to main content

Exploring Python-based frameworks for geophysical modeling*

Tuomas
Karna
Intel Corporation
Talk
The majority of ocean and atmosphere modelling is still carried out with low-level model implementations, such as Fortran codes with manually implemented MPI parallelization. With the success of GPUs and other accelerators it has become imperative to seek alternative frameworks that offer both high-level, flexible developer APIs as well as high computational performance across a range of target hardware.
Python-based frameworks, such as PyTorch and Tensorflow have been very successful in the machine learning (ML) domain. Similar high-level modelling frameworks are still lacking in the traditional scientific computing space. One benefit of using ML frameworks for scientific computing is that they offer autograd capability which permits inverse modelling applications with minimal code changes.
In this talk we utilize simple ocean modelling test cases implemented on different Python-based frameworks, such as Numpy, Numba, PyTorch, and JAX, to assess their suitability for ocean modeling. We assess both the suitability of the API to express the needed physical operators as well as computational performance on CPUs and GPUs. 2D wave equation and shallow water equation implementations are used as the basis of our analysis. Both structured and unstructured mesh models are considered.
Targeting large-scale simulations, we focus especially on distributed-memory capability (multi-node CPU/GPU configurations) which is currently lacking in many of the Python frameworks. We compare the Python based implementations against existing ocean models and other state-of-the-art implementations.
To illustrate distributed computing in Python, we include the proof-of-concept Sharded Array for Python (https://github.com/IntelPython/sharded-array-for-python) package in the analysis. It implements the Array API, a subset of NumPy API, with the aim to automate the scale-out of NumPy-like codes. It uses an MLIR-based just-in-time-compiler to transparently distribute and parallelize array computation to systems of non-shared memory using MPI. As parallelization is handled automatically, the model developer only implements serial NumPy-like code.
Our preliminary results suggest that high-level Python frameworks can be used to implement at least a subset of geophysical applications with acceptable performance and scalability.