Turn natural-language lab protocols into Opentrons scripts — with citations for every extracted value.
Writing OT-2 protocols by hand means translating an experiment into Python, well by well. nl2protocol turns that into a review job. Every value the model picked is cited back to the words you wrote, so you check the citations instead of writing the code.
Hover the marked phrases above to see where each one came from. Every value in the generated spec carries a citation like that.
- from instruction
- domain default
- inferred
- reviewer agreed
How it works
-
01 Extract
Sonnet reads your instruction and emits a structured spec. Each extracted value records how it got there: either the verbatim quote it came from in your instruction, or, if it was inferred, the reasoning behind the inference.
-
02 Resolve gaps
Labware descriptions are first mapped to your config, then a loop fills whatever the spec is missing. Cheap deterministic suggesters run before any model call (a config lookup or a well-capacity default covers most gaps). A Haiku reviewer audits anything the model inferred against your instruction. A gap fills on its own only at high confidence with the reviewer agreeing. Everything else is handed to you, one decision at a time.
-
03 Simulate
A deterministic constraint check confirms the resolved spec fits the hardware, then a builder turns it into Opentrons Python and runs that script through Opentrons’ own simulator. A simulator failure blocks the run; the script and the failure log are saved for inspection instead of returned as a usable protocol.
What’s different
-
Every value carries provenance.
RAG-based converters emit Opentrons code with no link back to what you asked for. Here, every extracted volume, well, labware, and substance is tagged with its source: a verbatim quote from your instruction, a standard domain default, or an inference with the reasoning attached. Any value in the spec can be traced to where it came from.
-
A separate reviewer pass audits the LLM’s guesses.
Values the extractor inferred, rather than quoted from your instruction, are checked by a second Claude call (Haiku) that grades the reasoning before you ever see the gap. A weak or unsupported inference gets flagged for your review instead of passing through silently.
-
The script is simulator-verified.
The generated Python is run through Opentrons’ own simulator before it is returned. Code that would fail to load on the robot is caught here, not on the deck.