System architecture
GVVM is a C++17 cloud-resolving model using Kokkos for on-device parallelism (CUDA backend enabled in CMake), MPI for distributed memory, and optional NCCL for collectives on NVIDIA GPUs when ENABLE_NCCL is on.
Repository layout (application code)
| Path | Role |
|---|---|
src/main.cpp |
MPI init, Kokkos init, optional NCCL, config load, split communicators for I/O servers, Grid / Parameters / State / HaloExchanger / Model / OutputManager, time loop |
src/driver/ |
Model: orchestrates dynamical core, physics, and tendencies; implements init, run_step, finalize |
src/core/ |
Grid, State, Field, Parameters, HaloExchanger, initializer, boundary helpers |
src/dynamics/ |
Vector-vorticity dynamical core, time integration, forcings (sponge, nudging, random), idealized tests |
src/physics/ |
P3 (p3/), RRTMGP (rrtmgp/), turbulence, surface, land (Noah); CMake aggregates as eamxx_physics interface + static libs |
src/io/ |
OutputManager (ADIOS2), IOServer (SST consumer to HDF5) |
src/utils/ |
ConfigurationManager (JSON via nlohmann), timing and timers |
src/share/ |
Shared EAMxx-derived utilities, constants, physics helpers |
externals/ekat/ |
EKAT submodule: logging, YAML, testing utilities, Kokkos integration (this only used in EAMxx-related things) |
rundata/ |
Sample default_config.json, initial profiles, initial fields, P3 lookup tables |
Fortran pieces (e.g. Noah OpenACC) are linked through the physics/land subtree as required by CMake.
Execution model
- MPI:
MPI_Init, then optional shared-memory communicator sizing to set OpenMP threads per rank (omp_set_num_threads). - GPU:
cudaGetDeviceCount/cudaGetDevice; Kokkos is initialized withset_device_idfrom the node-local rank modulo GPU count. - Configuration:
ConfigurationManagerloads JSON (default../rundata/input_configs/default_config.jsonor CLI path). - I/O split: If
--io-tasks N> 0, ranks are colored into simulation vs I/O; I/O ranks callrun_io_serverand exit; simulation ranks continue. - Simulation ranks: NCCL communicator is created when enabled;
Gridbuilds a Cartesian MPI decomposition;StateandHaloExchangerare constructed;Model::initruns initializer and optional physicsinitialize/init;OutputManagerwrites initial step; the loop callsmodel.run_step(dt)untilsimulation.total_time_sis reached.
Major libraries (CMake targets)
vvm_driver—Modelvvm_core— grid, state, halos, parametersvvm_dynamics— dynamical core and related forcingsvvm_io— ADIOS2 output and I/O servervvm_utils— configuration and timingscream_share— shared EAMxx codevvm_physics— interface aggregating P3, RRTMGP, turbulence, surface, land
The executable links MPI::MPI_CXX, Kokkos::kokkos, and the targets above.
Communication
- Halo exchange:
HaloExchangerexchanges halos for listed fields; CUDA graph optimization is configurable viaoptimization.cuda_graph_halo_exchangein JSON. - NCCL: Used when
ENABLE_NCCLis defined;HaloExchangerandStateconstructors takencclComm_tand a CUDA stream in that build.
Configuration
JSON keys are resolved with dotted paths (e.g. physics.p3.enable_p3) in ConfigurationManager::find_node. No separate YAML runtime config is required for the main executable; EKAT may use YAML for its own tooling in subprojects.