Labs · Contribute

Contribute to Labs.

Three ways to get involved: research collaboration, open source contributions, or sending us data. We are selective but genuinely open — the right collaborator on the right problem makes a real difference to what we can learn.

How to get involved

Three contribution paths.

Each path has different expectations and different benefits. Read them carefully before reaching out — the clearer you are about which path you are interested in, the faster we can respond.

Research collaboration

Partner on experiments

If you are working on problems that overlap with our research streams — agent reliability, evaluation methodology, AI system failure modes, or developer tooling — we occasionally partner on experiments with external researchers.

Contact us with your research context and what you are working on
We will respond within 5 business days if there is potential overlap
Collaborative experiments are co-authored; data sharing is documented and consented
We do not do "we review your work" collaborations — it is a real partnership or nothing

Open source

Contribute to our repos

Each open-source repository has its own CONTRIBUTING.md with specific guidelines. Generally: we welcome PRs, issue reports, documentation improvements, and test coverage additions.

Check the existing issues before opening a new one
For large changes, open an issue first to discuss before writing code
PRs must include tests and pass the existing CI pipeline
Documentation PRs are especially welcome — code without docs is half-finished

Share your data

Improve the benchmarks

Anonymised real-world data from your AI systems helps make our benchmarks more representative. We credit data contributors in the methodology notes and share relevant findings back.

Data must be anonymised — no client names, no identifiable project details
You retain ownership; we get a research licence for the specific experiment
We share our findings with you before publishing
We note data contributors in the benchmark methodology section

General principles

How we work with contributors.

These are not formal rules — they are the norms that make collaborations go well. We hold ourselves to them and expect the same from contributors.

Be specific about what you are offering

Vague offers to "help" are hard to act on. Tell us what you have: a dataset, a specific skill, a research context, time to review methodology.

Do not ghost mid-collaboration

Research timelines depend on contributors following through. If circumstances change, tell us early. An early "I cannot continue" is far better than a silent drop-off.

Disagreement is welcome, disrespect is not

We want people who will challenge our methodology and point out where we are wrong. We do not want people who are rude about it. The distinction matters.

Credit goes to the people who did the work

We credit contributions accurately. If you reviewed methodology, you are credited as a reviewer. If you co-designed the experiment, you are a co-author. We do not inflate or deflate credit.

We publish failures too

If a collaboration produces a result that contradicts our prior findings, we publish that. Contributing to a result that overturns a previous conclusion is still a valid contribution.

Timelines are estimates

Research takes longer than planned. We communicate timeline changes proactively and expect the same from collaborators. Build slack into your own commitments.

Current needs

What we are actively looking for.

These are open needs across our four research streams as of March 2026. We update this list quarterly. Reaching out about a specific need here gets a faster response than a general inquiry.

Agent reliability & failure modes

High priority

data

Anonymised logs of agent task failures in production — especially silent failures where the agent returns an output but it is subtly wrong.

research

Literature review support: surveying existing academic work on AI agent failure taxonomies to compare with our empirically-derived classification.

review

Methodology review for our Q2 2026 benchmark update — particularly the section on multi-agent coordination failures.

Evaluation methodology

Open

research

We are designing a new evaluation protocol for code correctness that goes beyond test coverage. Looking for collaborators with evaluation methodology expertise.

data

Real-world code review comments paired with AI-generated code — to understand where human reviewers catch things automated tests do not.

Spec-to-code pipelines

Open

data

Feature specifications paired with the code that implemented them — to study how well specifications translate to implementation across different specification styles.

code

Testing harness improvements for our spec-evaluation pipeline. The current harness is slow and we need help optimising it.

Developer tooling

Open

review

UX feedback on our observability dashboard prototype. Looking for engineers who work with AI pipelines daily and can give us informed critique.

code

TypeScript SDK improvements — particularly around the streaming output handling and error classification utilities.

Get in touch

Tell us what you are working on.

We respond to every genuine inquiry, even if the answer is “not right now.” The more context you give us about your background and what you are working on, the more useful our response can be.

Response time: We aim to respond within 5 business days. If you are reaching out about a specific open need listed above, mention it in your message — those get routed directly to the relevant researcher.

Tell us what you are working on.

We respond to every genuine inquiry, even if the answer is “not right now.” The more context you give us about your background and what you are working on, the more useful our response can be.