Nanobind vs Pybind11: Calling C++ from Python in 2026

by Martin D. Maas, Ph.D

Last updated: April 6, 2026

A hands-on comparison of nanobind and pybind11 for calling C++ from Python, with working code examples, benchmarks, and practical advice on which to choose.

In a previous post, I showed how to call C++ from Python using pybind11 — a header-only C++ library that makes it straightforward to expose C++ classes and functions to Python, with excellent Numpy and Eigen interoperability.

Since then, the same author — Wenzel Jakob — has released nanobind, a ground-up rewrite of pybind11 that aims to be smaller, faster, and more aligned with modern C++17. In this post, I’ll compare the two libraries side by side with working code, real benchmarks, and some observations on when each one makes sense.

Same author, different philosophy

This is the first thing to know: nanobind and pybind11 come from the same person. Wenzel Jakob originally created pybind11 around 2015, and it became incredibly popular — thousands of projects rely on it today. But after years of experience with pybind11’s design, he started nanobind with a deliberately different set of trade-offs.

The key philosophical differences are:

nanobind targets C++17, while pybind11 supports C++11 and later. This lets nanobind use more modern language features internally.
nanobind prioritizes small binary size and fast compilation. Pybind11’s header-only design means everything gets compiled into your module. Nanobind uses a small pre-compiled core library, which dramatically reduces the size of each binding module.
nanobind has stricter semantics. It doesn’t try to be as forgiving as pybind11 in terms of implicit conversions and type coercions. This can catch bugs earlier but means some pybind11 code won’t port over directly.
nanobind has first-class ndarray support that works with NumPy, PyTorch, TensorFlow, and JAX without depending on any of them.

In short, nanobind is not a drop-in replacement for pybind11. It’s a leaner, more opinionated successor from the same author.

A side-by-side code comparison

Let’s look at the same binding code written for both libraries. I’ll use three examples: a simple function, a NumPy array operation, and a class.

Example 1: A simple function

pybind11:

#include <pybind11/pybind11.h>
namespace py = pybind11;

double add(double a, double b) { return a + b; }

PYBIND11_MODULE(pybind_example, m) {
    m.def("add", &add, "Add two numbers",
          py::arg("a"), py::arg("b"));
}

nanobind:

#include <nanobind/nanobind.h>
namespace nb = nanobind;
using namespace nb::literals;

double add(double a, double b) { return a + b; }

NB_MODULE(nanobind_example, m) {
    m.def("add", &add, "a"_a, "b"_a);
}

The nanobind version is slightly more concise. The "a"_a syntax (using C++ user-defined literals) replaces the more verbose py::arg("a"). Both work the same way from the Python side.

Example 2: NumPy array operations

This is where the differences become more interesting.

pybind11:

#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>
namespace py = pybind11;

py::array_t<double> scale_array(py::array_t<double> input, double factor) {
    auto buf = input.request();
    auto *ptr = static_cast<double *>(buf.ptr);

    auto result = py::array_t<double>(buf.size);
    auto res_buf = result.request();
    auto *res_ptr = static_cast<double *>(res_buf.ptr);

    for (ssize_t i = 0; i < buf.size; i++)
        res_ptr[i] = ptr[i] * factor;

    return result;
}

Pybind11 uses a buffer_info protocol — you call .request() to get a descriptor, then cast the pointer. It works, but it’s somewhat verbose and C-style.

nanobind:

#include <nanobind/nanobind.h>
#include <nanobind/ndarray.h>
namespace nb = nanobind;

nb::ndarray<nb::numpy, double, nb::shape<-1>> scale_array(
        nb::ndarray<nb::numpy, const double, nb::shape<-1>> input, 
        double factor) {
    size_t n = input.shape(0);
    double *result = new double[n];
    const double *ptr = input.data();

    for (size_t i = 0; i < n; i++)
        result[i] = ptr[i] * factor;

    size_t shape[1] = { n };
    nb::capsule deleter(result, [](void *p) noexcept {
        delete[] static_cast<double *>(p);
    });
    return nb::ndarray<nb::numpy, double, nb::shape<-1>>(
        result, 1, shape, deleter);
}

The nanobind approach is different in two important ways:

Shape and dtype are encoded in the C++ type. The template parameter nb::shape<-1> says “this is a 1D array of unknown length”, and double says “the elements are doubles”. Pybind11 doesn’t provide this level of compile-time type safety — you discover shape mismatches at runtime.
Memory management is explicit. Nanobind uses a capsule to attach a destructor to the returned array, making ownership completely clear. This is more code, but it eliminates an entire class of memory management bugs that can be subtle in pybind11.

A key advantage of nanobind’s ndarray is that it’s framework-agnostic: the same C++ code can accept arrays from NumPy, PyTorch, JAX, or TensorFlow. You just change the template parameter (nb::numpy, nb::pytorch, etc.) or omit it entirely to accept any framework’s tensor.

Example 3: A class with properties

pybind11:

#include <pybind11/pybind11.h>
namespace py = pybind11;

class Integrator {
    double a_, b_;
    int n_;
public:
    Integrator(double a, double b, int n) : a_(a), b_(b), n_(n) {}

    double trapezoid(py::object func) {
        double h = (b_ - a_) / n_;
        double sum = 0.5 * (func(a_).cast<double>() 
                          + func(b_).cast<double>());
        for (int i = 1; i < n_; i++)
            sum += func(a_ + i * h).cast<double>();
        return sum * h;
    }

    double get_a() const { return a_; }
    double get_b() const { return b_; }
    int get_n() const { return n_; }
    std::string describe() const;
};

PYBIND11_MODULE(pybind_example, m) {
    py::class_<Integrator>(m, "Integrator")
        .def(py::init<double, double, int>(),
             py::arg("a"), py::arg("b"), py::arg("n") = 1000)
        .def("trapezoid", &Integrator::trapezoid, py::arg("func"))
        .def_property_readonly("a", &Integrator::get_a)
        .def_property_readonly("b", &Integrator::get_b)
        .def_property_readonly("n", &Integrator::get_n)
        .def("__repr__", &Integrator::describe);
}

nanobind:

#include <nanobind/nanobind.h>
namespace nb = nanobind;
using namespace nb::literals;

class Integrator {
    double a_, b_;
    int n_;
public:
    Integrator(double a, double b, int n) : a_(a), b_(b), n_(n) {}

    double trapezoid(nb::object func) {
        double h = (b_ - a_) / n_;
        double sum = 0.5 * (nb::cast<double>(func(a_)) 
                          + nb::cast<double>(func(b_)));
        for (int i = 1; i < n_; i++)
            sum += nb::cast<double>(func(a_ + i * h));
        return sum * h;
    }

    double get_a() const { return a_; }
    double get_b() const { return b_; }
    int get_n() const { return n_; }
    std::string describe() const;
};

NB_MODULE(nanobind_example, m) {
    nb::class_<Integrator>(m, "Integrator")
        .def(nb::init<double, double, int>(),
             "a"_a, "b"_a, "n"_a = 1000)
        .def("trapezoid", &Integrator::trapezoid, "func"_a)
        .def_prop_ro("a", &Integrator::get_a)
        .def_prop_ro("b", &Integrator::get_b)
        .def_prop_ro("n", &Integrator::get_n)
        .def("__repr__", &Integrator::describe);
}

Note the small API differences: nanobind uses def_prop_ro instead of pybind11’s def_property_readonly, and the casting syntax is inverted — nb::cast<double>(obj) instead of obj.cast<double>(). These are minor, but they add up to slightly more concise code.

Both versions work identically from Python:

import math

integ = Integrator(0.0, math.pi, n=10000)
print(integ)            # Integrator(a=0.000000, b=3.141593, n=10000)
print(integ.trapezoid(math.sin))  # 1.9999999836 (≈ 2.0)
print(integ.a, integ.n) # 0.0 10000

Build system: CMake is non-negotiable for nanobind

One area where pybind11 is more flexible is the build process. Pybind11 is header-only, so you can compile a module with a one-liner:

c++ -O2 -shared -std=c++17 -fPIC $(python3 -m pybind11 --includes) \
    pybind_example.cpp -o pybind_example$(python3-config --extension-suffix)

That’s it. No CMake, no build system, no fuss. This is great for quick experiments.

Nanobind, on the other hand, strongly expects CMake (or scikit-build-core). It has a pre-compiled core library that needs to be linked in, which means a simple command-line compilation doesn’t work out of the box. The standard CMakeLists.txt for nanobind looks like this:

cmake_minimum_required(VERSION 3.15)
project(nanobind_example)

find_package(Python 3.8 COMPONENTS Interpreter Development.Module REQUIRED)

execute_process(
  COMMAND "${Python_EXECUTABLE}" -m nanobind --cmake_dir
  OUTPUT_STRIP_TRAILING_WHITESPACE OUTPUT_VARIABLE nanobind_ROOT)
find_package(nanobind CONFIG REQUIRED)

nanobind_add_module(nanobind_example nanobind_example.cpp)

This is clean and well-designed, but it does mean you need CMake in your workflow. For projects that already use CMake (most serious C++ projects), this is a non-issue. For a quick script, it’s a small extra hurdle.

Benchmarks: binary size and call overhead

I compiled the same binding code with both libraries and measured the results. Here’s what I found:

Binary size

Library	Module size
pybind11	318 KB
nanobind	125 KB

Nanobind is 2.5X smaller. This matters more than you might think — if you’re shipping a Python package with dozens of binding modules, the size difference adds up quickly. It also means faster compilation, since less template code is being instantiated.

Function call overhead

I benchmarked calling add(1.0, 2.0) one million times from Python:

Library	Time	Per call
pybind11	0.079s	0.08 µs
nanobind	0.061s	0.06 µs

Nanobind is about 1.3X faster in raw call overhead. This matches the project’s claims. The difference comes from nanobind’s more efficient function dispatch and argument parsing.

For heavy computational functions where the C++ work dominates, this overhead difference is negligible. But for bindings that make many small calls (like per-element callbacks or fine-grained APIs), it can matter.

Array operations

For passing and returning a 100,000-element NumPy array, both libraries performed nearly identically — the bottleneck is the actual computation and memory allocation, not the binding layer.

When to use which

After working with both libraries, here’s my take:

Use pybind11 when:

You need maximum compatibility. Pybind11 supports C++11 and has been battle-tested in thousands of projects. If you’re targeting older compilers or platforms, it’s the safer bet.
You want a quick one-file experiment. The header-only design means you can compile with a single command. No CMake needed.
You’re working on an existing pybind11 project. Porting to nanobind is possible but requires actual work — it’s not a drop-in replacement. If your bindings work fine, there’s no urgent reason to migrate.
You rely on specific pybind11 features like py::buffer_protocol or certain automatic conversion behaviors that nanobind intentionally dropped.

Use nanobind when:

You’re starting a new project. If you’re writing bindings from scratch, nanobind is the better default choice in 2026. It’s faster, produces smaller binaries, and the API is cleaner.
Binary size matters. If you’re shipping a Python package (especially on PyPI), the 2-3X size reduction is significant.
You need framework-agnostic tensor support. Nanobind’s ndarray works with NumPy, PyTorch, JAX, and TensorFlow through a single C++ type. Pybind11 only natively supports NumPy.
You want stricter type safety. Nanobind’s less forgiving type system catches more errors at compile time and at binding time, rather than silently doing the wrong thing.
You’re already using CMake. The build setup is trivial if CMake is already part of your workflow.

The migration question

If you have an existing pybind11 project, should you migrate? The nanobind documentation includes a migration guide, and the process is mostly mechanical — rename some types, adjust some APIs. The main friction points are:

py::array_t → nb::ndarray (different API)
obj.cast<T>() → nb::cast<T>(obj) (inverted syntax)
def_property_readonly → def_prop_ro (shorter names)
Implicit conversions that pybind11 allowed may need explicit handling
The GIL is handled differently (nanobind releases it by default for pure C++ functions)

For a small project, this can be done in an afternoon. For a large codebase with complex bindings, it’s a larger effort that should be weighed against the concrete benefits.

Conclusion

Nanobind is what pybind11 would look like if it were designed today, with the benefit of a decade of hindsight. It’s not radically different — the core idea of using C++ templates to generate Python bindings is the same — but it’s smaller, faster, and more carefully designed.

For new projects, I’d reach for nanobind. For existing pybind11 projects, I’d migrate only if the binary size or compilation time improvements justify the effort. Either way, both libraries are excellent tools for bridging C++ and Python, and the fact that they come from the same author means the overall ecosystem is in good hands.

*The full code for the examples in this post is available in the nanobind-vs-pybind11 repo.