Best C++ Libraries for Scientific Computing in 2025 (with Boilerplate Project)

by Martin D. Maas, Ph.D

An opinionated selection of the best C++ libraries for numerical and scientific computing, with a ready-to-build CMake boilerplate using FetchContent.

If you’ve ever tried to do scientific computing in C++, you’ve probably experienced the paradox of choice. Unlike Julia, where you Pkg.add a library and it just works, or Python, where pip install numpy is all you need, the C++ ecosystem is fragmented. There are multiple linear algebra libraries, multiple build systems, multiple FFT libraries, and multiple ways to do everything.

This post is an opinionated guide to assembling a modern, expressive C++ stack for scientific computing. The guiding philosophy is simple: pick one good library for each purpose, make them all work together, and show that modern C++ can actually be pleasant to use.

To back up the talk, I’ve created SciBaseCPP — a boilerplate CMake project that pulls everything in via FetchContent (this is our Pkg equivalent, as we will discuss). Clone it, configure, build, and start computing without installing a single library manually.

The selection

Here’s what we’re using and why:

PurposeLibraryWhy this one
Multidimensional arraysC++23 std::mdspan (via Kokkos ref. impl.)Standard, zero-copy, non-owning
Linear algebraEigen 3.4Dense + sparse, mature, fast, header-only
FFTPocketFFTBSD-licensed, same engine as NumPy
ODE integrationSUNDIALS/CVODEProduction-grade stiff & nonstiff solvers from LLNL
OptimizationNLoptMultiple algorithms, gradient-free and gradient-based
Formatting{fmt}std::format backport, much better than iostream
Special functionsC++17 <cmath>Bessel, Legendre, Gamma — already in the standard
VTK outputLean-VTKWrite VTK files in ~200 lines of code
TestingCatch2Clean, header-friendly, widespread
Python bindingsnanobind / pybind11See my comparison post

Let’s go through each of them.

C++23 std::mdspan: Multidimensional Array Views

Let’s start with what’s new. mdspan is one of the most important additions for scientific computing in C++23. It gives us a standard way to view flat memory as a multidimensional array, without owning it and without copying. Think of it like a std::span but for matrices and tensors.

The catch: as of mid-2025, GCC 14+ has native <mdspan> but GCC 13 (still the default on Ubuntu 24.04) doesn’t. The solution is the Kokkos reference implementation, which works with GCC 11+ and provides the same API:

#include <experimental/mdspan>  // Kokkos reference implementation

// 2D matrix: dynamic extents
std::vector<double> data(12);
auto mat = std::mdspan<double, std::dextents<size_t, 2>>(data.data(), 4, 3);

for (size_t i = 0; i < mat.extent(0); i++)
    for (size_t j = 0; j < mat.extent(1); j++)
        mat[i, j] = std::sin(double(i)) * std::cos(double(j));  // C++23 syntax

// 3D tensor with mixed static/dynamic extents
std::vector<double> vol(2 * 3 * 4);
auto tensor = std::mdspan<double,
    std::extents<size_t, 2, std::dynamic_extent, std::dynamic_extent>>(
        vol.data(), 3, 4);
// tensor.rank() == 3, extents: 2×3×4

With the Kokkos reference implementation, the types live in std::, and if your compiler supports C++23 multidimensional subscript (__cpp_multidimensional_subscript), you use mat[i, j]. On older compilers, mat(i, j) is available instead.

The real power of mdspan is as a universal interface. If you write functions that accept mdspan parameters, they can work with data from Eigen, raw C arrays, std::vector, GPU memory — anything that provides a pointer. This is important because unlike Julia or NumPy, C++ still doesn’t have a standard owning multidimensional array type — but with mdspan, at least the view is now standardized.

Linear Algebra: Eigen

For actual linear algebra — decompositions, solvers, eigenvalues — you need a computational library, and Eigen is the best choice. The leading alternatives are Armadillo, Xtensor, and Blaze, but Eigen wins on breadth and maturity.

Why Eigen:

  • Header-only (no linking headaches)
  • Excellent dense and sparse support
  • Expression templates for lazy evaluation and zero-copy operations
  • Battle-tested in projects like Google Ceres, TensorFlow, and ROS
  • Seamless interop with BLAS/LAPACK backends (OpenBLAS, MKL) when you need raw speed

Here’s what solving a linear system, computing eigenvalues, and doing an SVD looks like:

#include <Eigen/Dense>

// Solve Ax = b
Eigen::Matrix3d A;
A << 2, -1,  0,
    -1,  2, -1,
     0, -1,  2;
Eigen::Vector3d b(1.0, 0.0, 1.0);
Eigen::Vector3d x = A.colPivHouseholderQr().solve(b);
// Residual |Ax - b| = 5.98e-16

// Eigenvalue decomposition
Eigen::SelfAdjointEigenSolver<Eigen::Matrix3d> solver(A);
auto eigenvalues = solver.eigenvalues();
// [0.5858, 2.0000, 3.4142]

// SVD
Eigen::MatrixXd B(4, 2);
B << 1, 2, 3, 4, 5, 6, 7, 8;
Eigen::JacobiSVD<Eigen::MatrixXd> svd(B, Eigen::ComputeThinU | Eigen::ComputeThinV);
auto singular_values = svd.singularValues();
// [14.2691, 0.6268]

Should we still use Eigen with C++23 mdspan?

Yes. Eigen and mdspan solve different problems. Eigen is a computational library — it gives you matrix algebra, decompositions, and solvers. mdspan is a view — it provides a non-owning, multidimensional window onto flat memory. They complement each other beautifully.

The practical interop story is that Eigen exposes raw pointers via .data(), and mdspan wraps raw pointers. So you get zero-copy interop:

#include <experimental/mdspan>  // Kokkos reference implementation

// Eigen owns the data
Eigen::VectorXd vec = Eigen::VectorXd::LinSpaced(12, 0, 11);

// mdspan provides a 3x4 matrix view — zero copy
auto mat_view = std::mdspan<double, std::dextents<size_t, 2>>(
    vec.data(), 3, 4);

// Access the same data through either interface
mat_view[1, 2];  // C++23 multidimensional subscript
vec(6);           // same element via Eigen

Note: Eigen doesn’t natively support mdspan yet (there’s a GitLab issue #2608 tracking it), but the raw-pointer bridge works perfectly and is the expected approach for now.

Formatting: {fmt}

If you’re still writing std::cout << "x = " << x << std::endl; for debug output, do yourself a favor and switch to {fmt}. It’s what std::format (C++20) is based on, but it actually works today and compiles faster:

#include <fmt/core.h>

fmt::println("Residual |Ax - b| = {:.2e}", residual);
fmt::println("x = [{:.4f}, {:.4f}, {:.4f}]", x(0), x(1), x(2));

It’s a small thing, but it removes a surprising amount of friction from C++ development.

Special Functions: C++17 Already Has Them

Here’s something many C++ developers don’t know: the C++17 standard library already includes a solid set of mathematical special functions. No Boost, no GSL, no extra dependencies:

#include <cmath>

std::tgamma(0.5);          // Gamma function → √π
std::cyl_bessel_j(0, 2.4); // Bessel J0 near its first zero
std::legendre(2, 0.5);     // Legendre P2(0.5) = -0.125
std::erf(0.5);             // Error function
std::lgamma(0.5);          // Log-Gamma

GCC, Clang, and MSVC all support these since their C++17 implementations. The available functions include Bessel functions (J, Y, I, K), Legendre polynomials, associated Legendre functions, spherical harmonics helpers, Laguerre and Hermite polynomials, elliptic integrals, beta functions, Riemann zeta, and more.

Nonlinear Optimization: NLopt

NLopt is Steven Johnson’s (of FFTW fame) library for nonlinear optimization. It bundles a wide range of algorithms — gradient-based (MMA, SLSQP, L-BFGS) and gradient-free (COBYLA, Nelder-Mead, BOBYQA) — behind a clean, uniform interface. You can switch algorithms with a single enum change.

Here’s the classic test — minimizing the Rosenbrock function f(x,y)=(1x)2+100(yx2)2f(x,y) = (1-x)^2 + 100(y-x^2)^2:

#include <nlopt.hpp>

double rosenbrock(const std::vector<double>& x, std::vector<double>& grad, void*) {
    if (!grad.empty()) {
        grad[0] = -2.0 * (1 - x[0]) - 400.0 * x[0] * (x[1] - x[0]*x[0]);
        grad[1] = 200.0 * (x[1] - x[0]*x[0]);
    }
    return (1 - x[0]) * (1 - x[0]) + 100 * (x[1] - x[0]*x[0]) * (x[1] - x[0]*x[0]);
}

nlopt::opt opt(nlopt::LD_LBFGS, 2);
opt.set_min_objective(rosenbrock, nullptr);
opt.set_xtol_rel(1e-12);

std::vector<double> x = {-1.0, -1.0};
double min_val;
opt.optimize(x, min_val);
// x = [1.000000, 1.000000], f = 4.73e-21

Alternatives: Ceres Solver (Google’s library, better for large-scale nonlinear least squares), Ipopt (interior-point for large sparse NLP), or dlib’s optimization routines.

Fast Fourier Transforms: PocketFFT

The FFT is a fundamental tool for scientific computing. The obvious choice is FFTW — it’s the gold standard, and the fastest on x86. But FFTW is GPL-licensed, which means your code must also be GPL if you link to it. For many projects — commercial or otherwise — this is a dealbreaker.

Our solution: PocketFFT. This is the same FFT engine that NumPy uses internally (since NumPy 1.17, it replaced the old fftpack). It’s BSD-licensed, header-only in its C++ version, supports float/double/long double, multidimensional transforms, and even DCT/DST. Performance is very respectable — not quite FFTW-level for highly optimized sizes, but close enough for the vast majority of applications.

We include PocketFFT in SciBaseCPP via FetchContent from its cpp branch:

#include <pocketfft_hdronly.h>

// Forward FFT of a 64-sample signal
const size_t N = 64;
std::vector<std::complex<double>> signal(N), result(N);

// ... fill signal with data ...

pocketfft::shape_t shape = {N};
pocketfft::stride_t stride = {sizeof(std::complex<double>)};
pocketfft::c2c(shape, stride, stride, {0}, pocketfft::FORWARD,
               signal.data(), result.data(), 1.0);

The full landscape of FFT options:

  • FFTW (GPL): Fastest. But GPL means your code must be GPL too.
  • PocketFFT (BSD): The NumPy default. Good performance, permissively licensed. ← our choice
  • Intel MKL FFT (proprietary, free): Excellent on Intel hardware.
  • Eigen’s FFT module: Unified interface to FFTW, MKL, or its built-in kissfft backend.

ODE Integration: SUNDIALS/CVODE

For ODE integration, the previous version of this post suggested writing your own RK4. That’s fine for simple problems, but for anything serious — stiff systems, adaptive step control, DAEs — you want a production-grade solver. SUNDIALS from Lawrence Livermore National Laboratory is exactly that.

SUNDIALS provides several solvers: CVODE (stiff and nonstiff ODEs), ARKODE (multi-rate ODEs), IDA (DAEs), and KINSOL (nonlinear systems). We pull in just CVODE to keep the build lean:

#include <cvode/cvode.h>
#include <nvector/nvector_serial.h>
#include <sunlinsol/sunlinsol_dense.h>
#include <sunmatrix/sunmatrix_dense.h>
#include <sundials/sundials_core.hpp>

// Right-hand side: dy/dt = -y
static int rhs(sunrealtype t, N_Vector y, N_Vector ydot, void*) {
    NV_Ith_S(ydot, 0) = -NV_Ith_S(y, 0);
    return 0;
}

// Create context, set initial condition
sundials::Context sunctx;
N_Vector y = N_VNew_Serial(1, sunctx);
NV_Ith_S(y, 0) = 1.0;

// Create CVODE solver (Adams for nonstiff, BDF for stiff)
void* cvode_mem = CVodeCreate(CV_ADAMS, sunctx);
CVodeInit(cvode_mem, rhs, 0.0, y);
CVodeSStolerances(cvode_mem, 1e-12, 1e-14);

// Integrate to t = 5
sunrealtype t_final;
CVode(cvode_mem, 5.0, y, &t_final, CV_NORMAL);
// y(5) = 0.006737946999, exact = 0.006737946999, error ≈ 1e-13

The API is C-style (SUNDIALS is written in C), but the C++ context wrapper handles resource management cleanly. The real value of CVODE shows up when you tackle stiff systems — chemical kinetics, circuit simulation, reaction-diffusion — where a naive RK4 would require absurdly small time steps or simply diverge. CVODE’s BDF method handles these automatically with adaptive step-size control and Newton iteration.

For simple nonstiff problems, a 15-line RK4 is still perfectly fine and SciBaseCPP includes one as well. But having CVODE available means you won’t hit a wall when your problems get harder.

Alternatives: Boost.Odeint (header-only, limited to explicit methods for the most part), or the full SUNDIALS suite if you need DAE support (IDA/IDAS) or multi-rate integration (ARKODE).

3D Visualization: Lean-VTK

Visualization in scientific computing is a topic deserving its own post (and I wrote one: Best 3D Scientific Visualization Libraries). The short version: if you’re using ParaView or VisIt for postprocessing, you just need to write VTK files from your C++ code.

Lean-VTK does exactly this in about 200 lines of code — no linking to the massive VTK library. Write your mesh coordinates and data arrays, call a function, get a .vtu file:

#include <leanvtk.hpp>

// Write unstructured mesh data to VTK format
leanvtk::VTUWriter writer;
writer.add_scalar_field("temperature", temperature_data);
writer.write_surface_mesh("output.vtu", n_points, n_cells, points, cells);

Python Interoperability: nanobind or pybind11

If you’re using C++ for numerics, you’ll almost certainly want to call it from Python at some point. I’ve written a detailed comparison of nanobind vs pybind11, but the summary is:

  • pybind11: More mature, larger community, more features. Great if you need advanced binding patterns.
  • nanobind: Same author (Wenzel Jakob), designed as a successor. ~2.5× smaller binaries, ~1.3× faster call overhead, more efficient.

Both integrate cleanly with CMake and FetchContent. If you’re starting a new project, nanobind is the better default.

Build System: CMake + FetchContent

This is the part that makes the whole thing work. The question “should we use Bazel or CMake?” comes up periodically, and for scientific computing the answer is clear: CMake.

Not because CMake is elegant (it isn’t), but because:

  1. Every scientific library supports CMake. Eigen, NLopt, FFTW, Catch2, fmt, nanobind — they all ship CMakeLists.txt files and export proper targets.
  2. FetchContent is a game-changer. You declare a dependency, CMake clones and builds it at configure time. No manual installation, no system packages, no package manager.
  3. IDE integration. VS Code + CMake Tools, CLion, Visual Studio — they all speak CMake natively.

Bazel is a fine build system for large monorepos (Google uses it), but the scientific C++ ecosystem is built around CMake. Fighting that ecosystem is not worth it.

FetchContent in practice

Here’s what pulling in Eigen via FetchContent looks like:

include(FetchContent)

FetchContent_Declare(
    Eigen
    GIT_REPOSITORY https://gitlab.com/libeigen/eigen.git
    GIT_TAG        3.4.0
    GIT_SHALLOW    TRUE
)
set(EIGEN_BUILD_DOC OFF CACHE BOOL "" FORCE)
set(EIGEN_BUILD_TESTING OFF CACHE BOOL "" FORCE)
FetchContent_MakeAvailable(Eigen)

# Then just link:
target_link_libraries(my_app PRIVATE Eigen3::Eigen)

That’s it. No apt install libeigen3-dev, no find_package that breaks on a different OS, no manual -I/usr/include/eigen3. Clone the repo, run cmake, and it works.

The SciBaseCPP project pulls in 8 libraries this way: mdspan, Eigen, fmt, PocketFFT, SUNDIALS, NLopt, Catch2, and Lean-VTK. The first configure takes a minute or two (it’s cloning repositories), but subsequent builds are fast. This approach gives us something comparable to Julia’s Pkg system or Rust’s Cargo — not quite as polished, but functional and reproducible.

One gotcha: not every library is perfectly FetchContent-friendly. Lean-VTK, for example, uses CMAKE_SOURCE_DIR internally which breaks when built as a sub-project. The workaround is to use FetchContent_Populate and build a minimal target manually:

FetchContent_GetProperties(lean_vtk)
if(NOT lean_vtk_POPULATED)
    FetchContent_Populate(lean_vtk)
    add_library(leanvtk STATIC ${lean_vtk_SOURCE_DIR}/src/leanvtk.cpp)
    target_include_directories(leanvtk PUBLIC ${lean_vtk_SOURCE_DIR}/include)
endif()

The SciBaseCPP Boilerplate

Putting it all together, SciBaseCPP is a ready-to-build project that demonstrates all of the above. To build it:

git clone https://github.com/EpsilonForge/SciBaseCPP
cd SciBaseCPP
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
cmake --build . -j$(nproc)

# Run the demo
./scibase_demo

# Run the tests
./scibase_tests

The demo produces output like:

╔══════════════════════════════════════════════════╗
║  SciBaseCPP — Scientific Computing in C++23    ║
╚══════════════════════════════════════════════════╝

=== mdspan: Multidimensional Array Views ===
  4x3 matrix of sin(i)*cos(j):
      0.0000   0.0000  -0.0000
      0.8415   0.4546  -0.3502
      0.9093   0.4913  -0.3784
      0.1411   0.0762  -0.0587

=== Eigen: Linear Algebra ===
  Solving Ax = b where A is a tridiagonal matrix:
  x = [1.0000, 1.0000, 1.0000]
  Residual |Ax - b| = 5.98e-16

=== NLopt: Nonlinear Optimization ===
  Rosenbrock minimum: f(1.000000, 1.000000) = 4.73e-21

=== PocketFFT: Fast Fourier Transform ===
  64-sample signal: 5Hz (amp 3.0) + 12Hz (amp 1.5)
  Bin  5 magnitude: 1.5000 (expected 1.5)
  Bin 12 magnitude: 0.7500 (expected 0.75)

=== SUNDIALS/CVODE: ODE Integration ===
  dy/dt = -y, y(0) = 1, solving to t = 5
  y(5) = 0.0067379470 (exact: 0.0067379470)
  Error: 1.05e-13

=== ODE Integration (RK4) ===
  dy/dt = -y, y(0) = 1
  y(5) = 0.0067379470 (exact: 0.0067379470)
  Error: 1.76e-13

All demos completed successfully!

The test suite (14 tests, 163 assertions) covers mdspan access patterns, Eigen decompositions, mdspan+Eigen interop, NLopt convergence, PocketFFT forward/inverse roundtrips, CVODE exponential decay, and special function values.

Final thoughts

Can C++ match the developer experience of Julia or Python for scientific computing? Not quite — you’ll always have the compilation step, and template error messages will always be painful. But with the right libraries and CMake + FetchContent, modern C++ is much more approachable than it was even five years ago.

The key insight is that the library selection problem is solvable. You don’t need to evaluate 15 linear algebra libraries. Pick Eigen. You don’t need to agonize over build systems. Use CMake. You don’t need to write your own mdspan or format library. The ecosystem has mature solutions for all of these.

If you’re coming from Python or Julia and wondering whether C++ is worth the trouble for performance-critical code — the answer is yes, especially if you set up your project correctly from the start. SciBaseCPP is meant to be that starting point.