Turn natural-language lab protocols into Opentrons scripts — with citations for every extracted value.

Writing OT-2 protocols by hand means translating an experiment into Python, well by well. nl2protocol turns that into a review job. Every value the model picked is cited back to the words you wrote, so you check the citations instead of writing the code.

Hover the marked phrases above to see where each one came from. Every value in the generated spec carries a citation like that.

Try it live → View on GitHub

from instruction
domain default
inferred
reviewer agreed

pipeline stages: 7
gap detectors + suggesters: 13
worked example protocols: 13
typical run cost: ~$0.20

How it works

01
Extract

Sonnet reads your instruction and emits a structured spec. Each extracted value records how it got there: either the verbatim quote it came from in your instruction, or, if it was inferred, the reasoning behind the inference.

model claude-sonnet-4-20250514
02
Resolve gaps

Labware descriptions are first mapped to your config, then a loop fills whatever the spec is missing. Cheap deterministic suggesters run before any model call (a config lookup or a well-capacity default covers most gaps). A Haiku reviewer audits anything the model inferred against your instruction. A gap fills on its own only at high confidence with the reviewer agreeing. Everything else is handed to you, one decision at a time.

reviewer claude-haiku-4-5
03
Simulate

A deterministic constraint check confirms the resolved spec fits the hardware, then a builder turns it into Opentrons Python and runs that script through Opentrons’ own simulator. A simulator failure blocks the run; the script and the failure log are saved for inspection instead of returned as a usable protocol.

verifier opentrons.simulate

What’s different

Every value carries provenance.

RAG-based converters emit Opentrons code with no link back to what you asked for. Here, every extracted volume, well, labware, and substance is tagged with its source: a verbatim quote from your instruction, a standard domain default, or an inference with the reasoning attached. Any value in the spec can be traced to where it came from.
A separate reviewer pass audits the LLM’s guesses.

Values the extractor inferred, rather than quoted from your instruction, are checked by a second Claude call (Haiku) that grades the reasoning before you ever see the gap. A weak or unsupported inference gets flagged for your review instead of passing through silently.
The script is simulator-verified.

The generated Python is run through Opentrons’ own simulator before it is returned. Code that would fail to load on the robot is caught here, not on the deck.

The live UI mid-pipeline: instruction on the left, extracted spec with inline provenance checks in the middle, and a gap-resolution dialog on the right — Figure 1. The live UI mid-pipeline — instruction on the left, the extracted spec with inline provenance checks in the middle, and a gap-resolution dialog on the right where the model proposes a value for a missing field along with its reasoning. Faint arrows trace each extracted value back to its source in the instruction.

Turn natural-language lab protocols into Opentrons scripts — with citations for every extracted value.

How it works

What’s different

Every value carries provenance.

A separate reviewer pass audits the LLM’s guesses.

The script is simulator-verified.