Microbench v2.0 is here.
This is a big release, with many new features based on feedback I've received from users and from questions after a talk I gave at RSECon25.
What is Microbench?
The motivation behind Microbench is that running the same script on a laptop, cloud VM, or cluster can produce different results. This can happen for many reasons, including differences in (for example):
- Hardware: Some cluster nodes have A100 40GB, others have A100 80GB
- Operating system: You might develop on Windows or Mac, but use a Linux VM or cluster
- Software dependencies: You ran
conda update --alllocally but forgot to check the new environment into git - Analysis script: You forgot to run
git pullbefore the cluster run - Data: You forget to copy across the new version of the dataset
- Environment: Sometimes an environment variable difference can affect results, such as with PyTorch and OMP_NUM_THREADS.
The central premise of Microbench is to make it easy and "cheap" to capture metadata that can affect reproducibility, such that it can be done routinely. By default, the output format is JSON lines, which many programs can read and you're not "locked in" to Microbench. As a side benefit, this gives you performance information (runtime duration), which can make it easy to spot performance or configuration differences across a cluster (also useful for sysadmins!).
Microbench can be used as a command line program to wrap other scripts (new in v2.0) or integrated within a Python script for more precise control.
Here's a list of what metadata Microbench can record as of v2.0:
| Metadata | CLI flag | Python mixin |
|---|---|---|
| Run UUID, Microbench version, timezone, duration(s), start/finish times, invocation method | (default) | (default) |
| Python version, prefix, and executable | (default) | (default: included in MicroBench) |
| Hostname, OS; CPU core count and total RAM (requires psutil) | (default) | MBHostInfo |
All SLURM_* environment variables |
(default) | MBSlurmInfo |
| Loaded Environment Modules / Lmod stack | (default) | MBLoadedModules |
| Current working directory | (default) | MBWorkingDir |
| Git repo, commit hash, branch, dirty flag | --mixin git-info |
MBGitInfo |
| SHA-256 (or other) hash of specified files | --mixin file-hash |
MBFileHash |
| Installed Python packages and versions | --mixin installed-packages |
MBInstalledPackages |
| Installed Conda packages and versions | --mixin conda-packages |
MBCondaPackages |
| NVIDIA GPU names, memory, and attributes | --mixin nvidia-smi |
MBNvidiaSmi |
| Cgroup CPU/RAM limits (containers, Linux only) | --mixin cgroup-limits |
MBCgroupLimits |
| Function call arguments | — | MBFunctionCall |
| Function return value | — | MBReturnValue |
Peak memory over a function call via tracemalloc |
— | MBPeakMemory |
| Python packages loaded into the caller's globals | — | MBGlobalPackages |
| Line-by-line performance profile | — | MBLineProfiler |
What Microbench is not
Reproducibility is a big topic - it can affect scientific results or analyses that inform business decisions. Microbench is not a panacea to make everything reproducible; instead, it's designed as a low-friction tool to help identify reproducibility issues.
Making computational pipelines reproducible starts with some basics: code quality, automated testing, version control (e.g. with git), data versioning, virtual environments or containers (to keep package versions consistent). Utrecht University has a free online workshop on reproducibility if you're interested in an introduction to this topic.
Sometimes, even with all these best practices in place, reproducibility differences
can crop up, as with the PyTorch and OMP_NUM_THREADS
example I raised earlier. The solution is a layered approach that adopts best
practices mentioned previously. Microbench is designed to be an additional layer
of defence against reproducibility differences that can be adopted relatively
easily to provide a simple way to audit metadata - both for retrospectives
and for controlled experiments (e.g. benchmarking a script with different
versions of numpy).
And with that - let's look at what Microbench brings in the new release.
What's new in v2.0
Here are the highlights. The changelog contains a full list of changes.
A new CLI
The headline feature is the new command-line interface.
You can now wrap any command and capture timing plus metadata without writing Python code:
microbench --outfile results.jsonl -- ./run_simulation.sh --steps 1000
That makes Microbench useful for:
- shell scripts
- compiled executables
- SLURM jobs
- batch pipelines
- "just benchmark this command quickly" workflows
One of the questions I received at RSECon25 was around applying Microbench to non-Python workflows, so it's nice to announce this capability.
The CLI includes default metadata capture out of the box, and supports extra fields, repeated iterations, warmups, stdout/stderr capture, timeouts, and subprocess monitoring.
For many users, this will be the easiest way to adopt Microbench.
Better organisation of metadata
Records are now cleaner and more structured, with nested namespaces like
mb, call, host, python, slurm, and git.
New documentation site
Rather than relying on an increasingly long README, I made the jump to a new documentation site using mkdocs-material; this makes it easier to browse and search the docs.
Better Python ergonomics
New in v2 for the Python API:
bench.record(name)for timing arbitrary code blocksbench.arecord(name)for async context-manager timingbench.record_on_exit(name)for timing an entire script or batch jobbench.time(name)for sub-timings inside a single benchmark record- native async decorator support for
async def bench.summary()andsummary(results)for quick statistics
with bench.record('pipeline'):
with bench.time('parse'):
data = parse(raw)
with bench.time('transform'):
result = transform(data)
This means Microbench now works across a wider range of code shapes:
- classic decorated functions
- notebook or script blocks
- async pipelines
- long-running jobs that should emit one record at process exit
Telemetry renamed to periodic monitoring
"Telemetry" is a term that, for many, conjures up images of spyware and privacy violations - which is not what the Microbench feature does at all! This was another realisation from my RSECon25 talk feedback. Microbench doesn't transmit any data to a third party, so the name needed changing.
The feature has been renamed to Periodic Monitoring. This opt-in feature captures metrics - like memory or CPU usage over a function's lifetime - at a periodic interval for inclusion into the results. Hopefully the new name makes it clearer.
See the periodic monitoring docs page to learn more about this feature.
More output options
Microbench v2 introduces a pluggable output architecture.
Instead of only writing to one file-like destination, you can now target
multiple sinks at once with outputs=[...].
Built-in sinks include:
FileOutputRedisOutputHttpOutput
That makes it easier to:
- append JSONL to a shared results file
- push records into Redis for concurrent cluster jobs (useful for scenarios without a network filesystem)
- POST results to webhooks or internal services (useful for notifications or webhooks)
And if you need something custom, you can subclass Output.
The get_results() API has changed to return a dict by default, to provide a
way to get results without Pandas installed. Use get_results(format="df")
to get a Pandas DataFrame, which is the previous behaviour from v1.x.
Many new mixins
Version 2 adds a new set of mixins including MBWorkingDir, MBLoadedModules,
MBGitInfo, MBPeakMemory, MBFileHash, and MBCgroupLimits. See the earlier
table for details.
What this means for existing users
v2.0 is a major release, and some migrations will be needed. See the changelog and in particular the breaking changes there to learn how to migrate.
The main changes are restructured record fields (now nested under namespaces
like call, host, mb), a new output sink API replacing MicroBenchRedis,
and get_results() now returning a dict by default.
Release notes and docs
For new users, I recommend looking at the getting started page, and the CLI page to get an idea of capabilities and usage.
To explore v2.0 in more detail:
Thank you
This release is the result of a substantial review and cleanup pass across the codebase, docs, tests, and packaging. Thanks to everyone who provided feedback.
I hope the new features and the CLI in particular make Microbench useful to a wider audience.
If you find Microbench useful, please cite our paper.
Thanks for reading!