Microbench v2.0 is here

By Alex Lubbock 2026-03-17

Microbench v2.0 is here.

This is a big release, with many new features based on feedback I've received from users and from questions after a talk I gave at RSECon25.

What is Microbench?

The motivation behind Microbench is that running the same script on a laptop, cloud VM, or cluster can produce different results. This can happen for many reasons, including differences in (for example):

Hardware: Some cluster nodes have A100 40GB, others have A100 80GB
Operating system: You might develop on Windows or Mac, but use a Linux VM or cluster
Software dependencies: You ran conda update --all locally but forgot to check the new environment into git
Analysis script: You forgot to run git pull before the cluster run
Data: You forget to copy across the new version of the dataset
Environment: Sometimes an environment variable difference can affect results, such as with PyTorch and OMP_NUM_THREADS.

The central premise of Microbench is to make it easy and "cheap" to capture metadata that can affect reproducibility, such that it can be done routinely. By default, the output format is JSON lines, which many programs can read and you're not "locked in" to Microbench. As a side benefit, this gives you performance information (runtime duration), which can make it easy to spot performance or configuration differences across a cluster (also useful for sysadmins!).

Microbench can be used as a command line program to wrap other scripts (new in v2.0) or integrated within a Python script for more precise control.

Here's a list of what metadata Microbench can record as of v2.0:

Metadata	CLI flag	Python mixin
Run UUID, Microbench version, timezone, duration(s), start/finish times, invocation method	(default)	(default)
Python version, prefix, and executable	(default)	(default: included in `MicroBench`)
Hostname, OS; CPU core count and total RAM (requires psutil)	(default)	`MBHostInfo`
All `SLURM_*` environment variables	(default)	`MBSlurmInfo`
Loaded Environment Modules / Lmod stack	(default)	`MBLoadedModules`
Current working directory	(default)	`MBWorkingDir`
Git repo, commit hash, branch, dirty flag	`--mixin git-info`	`MBGitInfo`
SHA-256 (or other) hash of specified files	`--mixin file-hash`	`MBFileHash`
Installed Python packages and versions	`--mixin installed-packages`	`MBInstalledPackages`
Installed Conda packages and versions	`--mixin conda-packages`	`MBCondaPackages`
NVIDIA GPU names, memory, and attributes	`--mixin nvidia-smi`	`MBNvidiaSmi`
Cgroup CPU/RAM limits (containers, Linux only)	`--mixin cgroup-limits`	`MBCgroupLimits`
Function call arguments	—	`MBFunctionCall`
Function return value	—	`MBReturnValue`
Peak memory over a function call via `tracemalloc`	—	`MBPeakMemory`
Python packages loaded into the caller's globals	—	`MBGlobalPackages`
Line-by-line performance profile	—	`MBLineProfiler`

What Microbench is not

Reproducibility is a big topic - it can affect scientific results or analyses that inform business decisions. Microbench is not a panacea to make everything reproducible; instead, it's designed as a low-friction tool to help identify reproducibility issues.

Making computational pipelines reproducible starts with some basics: code quality, automated testing, version control (e.g. with git), data versioning, virtual environments or containers (to keep package versions consistent). Utrecht University has a free online workshop on reproducibility if you're interested in an introduction to this topic.

Sometimes, even with all these best practices in place, reproducibility differences can crop up, as with the PyTorch and OMP_NUM_THREADS example I raised earlier. The solution is a layered approach that adopts best practices mentioned previously. Microbench is designed to be an additional layer of defence against reproducibility differences that can be adopted relatively easily to provide a simple way to audit metadata - both for retrospectives and for controlled experiments (e.g. benchmarking a script with different versions of numpy).

And with that - let's look at what Microbench brings in the new release.

What's new in v2.0

Here are the highlights. The changelog contains a full list of changes.

A new CLI

The headline feature is the new command-line interface.

You can now wrap any command and capture timing plus metadata without writing Python code:

microbench --outfile results.jsonl -- ./run_simulation.sh --steps 1000

That makes Microbench useful for:

shell scripts
compiled executables
SLURM jobs
batch pipelines
"just benchmark this command quickly" workflows

One of the questions I received at RSECon25 was around applying Microbench to non-Python workflows, so it's nice to announce this capability.

The CLI includes default metadata capture out of the box, and supports extra fields, repeated iterations, warmups, stdout/stderr capture, timeouts, and subprocess monitoring.

For many users, this will be the easiest way to adopt Microbench.

Better organisation of metadata

Records are now cleaner and more structured, with nested namespaces like mb, call, host, python, slurm, and git.

New documentation site

Rather than relying on an increasingly long README, I made the jump to a new documentation site using mkdocs-material; this makes it easier to browse and search the docs.

Better Python ergonomics

New in v2 for the Python API:

bench.record(name) for timing arbitrary code blocks
bench.arecord(name) for async context-manager timing
bench.record_on_exit(name) for timing an entire script or batch job
bench.time(name) for sub-timings inside a single benchmark record
native async decorator support for async def
bench.summary() and summary(results) for quick statistics

with bench.record('pipeline'):
    with bench.time('parse'):
        data = parse(raw)
    with bench.time('transform'):
        result = transform(data)

This means Microbench now works across a wider range of code shapes:

classic decorated functions
notebook or script blocks
async pipelines
long-running jobs that should emit one record at process exit

Telemetry renamed to periodic monitoring

"Telemetry" is a term that, for many, conjures up images of spyware and privacy violations - which is not what the Microbench feature does at all! This was another realisation from my RSECon25 talk feedback. Microbench doesn't transmit any data to a third party, so the name needed changing.

The feature has been renamed to Periodic Monitoring. This opt-in feature captures metrics - like memory or CPU usage over a function's lifetime - at a periodic interval for inclusion into the results. Hopefully the new name makes it clearer.

See the periodic monitoring docs page to learn more about this feature.

More output options

Microbench v2 introduces a pluggable output architecture.

Instead of only writing to one file-like destination, you can now target multiple sinks at once with outputs=[...].

Built-in sinks include:

FileOutput
RedisOutput
HttpOutput

That makes it easier to:

append JSONL to a shared results file
push records into Redis for concurrent cluster jobs (useful for scenarios without a network filesystem)
POST results to webhooks or internal services (useful for notifications or webhooks)

And if you need something custom, you can subclass Output.

The get_results() API has changed to return a dict by default, to provide a way to get results without Pandas installed. Use get_results(format="df") to get a Pandas DataFrame, which is the previous behaviour from v1.x.

Many new mixins

Version 2 adds a new set of mixins including MBWorkingDir, MBLoadedModules, MBGitInfo, MBPeakMemory, MBFileHash, and MBCgroupLimits. See the earlier table for details.

What this means for existing users

v2.0 is a major release, and some migrations will be needed. See the changelog and in particular the breaking changes there to learn how to migrate.

The main changes are restructured record fields (now nested under namespaces like call, host, mb), a new output sink API replacing MicroBenchRedis, and get_results() now returning a dict by default.

Release notes and docs

For new users, I recommend looking at the getting started page, and the CLI page to get an idea of capabilities and usage.

To explore v2.0 in more detail:

Thank you

This release is the result of a substantial review and cleanup pass across the codebase, docs, tests, and packaging. Thanks to everyone who provided feedback.

I hope the new features and the CLI in particular make Microbench useful to a wider audience.

If you find Microbench useful, please cite our paper.

Thanks for reading!