The current trend, that actually started 20 years ago with the coming of the first modern supercomputers and the MPI standard, is to compute bigger systems in a shorter time. The motivations for this are essentially threefold:
Nowadays, supercomputers are built on architectures composed of from a few hundreds to a few hundreds thousands cores, i.e., an unprecedented computing power at the disposal of researchers. And the power of supercomputers will keep on growing in the future at the pace on doubling every 18 months. This is a formidable opportunity to extend our comprehension of the surrounding world, and for fluid mechanicists a powerful mean to better understand the intricate dynamics of fluid flows.
Though the majority of Fluid Mechanics parallel codes are implemented with the classical MPI standard, recent advances in computer hardware are worth noticing:
(a) JUQUEEN, the IBM Blue Gene 458,752 cores Juliech
(b) ENER110, the Bull 6,048 cores IFPEN supercomputer
Fig 1. Supercomputers: (a) the largest European supercomputer in Germany, and (b) our more modest local supercomputer at IFPEN-Lyon
The PeliGRIFF team would like to acknowledge the GENCI (Grand Equipement National de Calcul Intensif) for its support in granting us every year since 2011 the access to the computing resources of OCCIGEN, the supercomputer of CINES, located in Montpellier in the South of France, and CURIE, the supercomputer of TGCC (CEA), located in Bruyères-le-Châtel in the south suburb of Paris, through DARI selection process.
PeliGRIFF and its granular dynamics module Grains3D are fully parallel. They both use a classical domain decomposition technique and the MPI standard to implement inter-processor communications. PeliGRIFF is built on the Pelicans platform, developed and maintained by IRSN Cadarache, France, and freely available under the Cecill-C licence. Pelicans is essentially a C++ application framework with a set of integrated reusable components, designed to simplify the task of developing applications of numerical mathematics and scientific computing, particularly those concerning partial differential equations and initial boundary value problems (text picked up from the Pelicans documentation). A plug-in technique offers the possibility to couple PeliGRIFF with various granular dynamics solvers. For our own use at IFPEN, it is coupled with our own DEM solver Grains3D (see Figure 1).
Fig 2. PeliGRIFF code structure
The governing equations in the granular dynamics code Grains3D are solved by an explicit time algorithm, which means that there are no matrices to invert. In theory, such numerical method is supposed to scale pretty well, provided the load balancing is reasonable. Surprisingly, in the initial stages of the parallel developement of the code (which was previously a serial code), we faced significant troubles to provide an acceptable parallel efficiency. We found out that this was not related to any MPI issue but to the memory access and management on multi-core processor or node. Hence, we had to re-think the serial architecture, limiting as much as possible dynamic memory creation and destruction and promoting good data alignment wherever possible. These efforts hopefully resulted in a huge improvement of the parallel efficiency of Grains3D. In fact, on reasonably well load balanced configurations, Grains3D exhibits now a weak scaling between 0.5 in worst cases to 0.95 in the most favorable ones.
On the fluid side, we implemented both Finite Element and Finite Volume/Staggerred Grid schemes and an operator-splitting time algorithm. Resulting discretized systems are stored in distributed matrices & vectors. The linear algebra part is based on PETSc for distributed matrices, vectors and linear system solvers and PETSc is coupled to HYPRE to get access to efficient Algebraic Multi-Grid (Boomer-AMG) and LU incomplete preconditioners. In particular, the pressure laplacian linear system involved in the L2 projection step to enforce mass conservation, costs the most in terms of computing time. In fact, since it does not contain any transient term, it is not well conditioned. For this particular system, we employ the Boomer-AMG parallel preconditioner of HYPRE with the PMIS/HMIS coarsening type and ext+i interpolation formula and usually obtain a good weak scalability on jobs up to 512 million cells and 1,000 cores (see Figure 3). Jobs on up to a few billion cells on a larger number of cores can be run with a weak scaling factor above 0.5, which is deemed to be quite good for this type of linear system solution.
Fig 3. Weak scaling test with PeliGRIFF on a classical 3D lid-driven cavity problem on Jade (the supercomputer of CINES, Montpellier, France): 512,000 cells per node, up to 512,000,000 cells on 1,000 cores
In the momentum equations, diffusive terms are treated implicity while convective ones are treated explicitly. However, due to the use of generally small time steps, the diffusion matrix is highly diagonally dominant; as a result, the solution of the diffusive linear system with the convective at the rhs usually takes a very limited part of the total computing time and scales quite well.
Finally, the Fictitious Domain saddle-point problem solved by an iterative Uzawa algorithm is implemented in a matrix-free fashion. The matrix-free feature is particularly well adapted to parallel computing as a particle moves from one sub-domain (one process) to another sub-domain (another process). However, additional tests are required to assess the scalability of the implementation.