Opentrons OT-2 only (no Flex yet).
hard limitGenerated scripts target the OT-2 Python API. The Opentrons Flex has a different protocol API surface; nl2protocol doesn’t emit Flex-compatible code in this version.
The scope of what nl2protocol can produce, what your config needs to declare, and the patterns in your instruction that produce clean extractions. Architectural limits live in the engineering log; this page is what you need to know before you run it.
What kind of physical setup the generated scripts target.
Generated scripts target the OT-2 Python API. The Opentrons Flex has a different protocol API surface; nl2protocol doesn’t emit Flex-compatible code in this version.
Like the physical OT-2, your config can declare at most one left-mounted and one right-mounted pipette. Each must be paired with a tip rack also declared in the config.
Each Opentrons pipette has a fixed range (e.g. P20:
1–20uL, P300: 20–300uL,
P1000: 100–1000uL). If your instruction asks
for a volume outside any mounted pipette’s range, the
constraint checker will surface a violation and ask you what to
do — not silently generate broken code.
Heater-shaker, thermocycler, magnetic, temperature modules: the extractor will reference them only when your config lists them. Mentioning “heat to 95C” in your instruction without a thermocycler or temperature module in the config produces a constraint violation rather than a fabricated module reference.
The system can only reference labware your config declares. Everything else surfaces as an ambiguity for you to resolve.
Your config’s labware entries must use Opentrons’ official
labware names (e.g.
corning_96_wellplate_360ul_flat,
opentrons_24_tuberack_eppendorf_2ml_safelock_snapcap).
Custom or third-party labware isn’t supported in v1.
The labware resolver tries to map your instruction’s wording (“tube rack”, “the reservoir”) to a config-declared label. Anything it can’t map confidently becomes a per-piece confirmation you resolve in the UI. Anything truly absent from the config halts the pipeline.
A1, B2, H12 — column
letters A–H (or A–P for 384-well), row numbers 1–12 (or
1–24). The extractor can sometimes infer wells from
descriptions like “tube 1”, but explicit
notation is more reliable.
What your natural-language instruction needs to look like.
Stage 1 is a Haiku classifier that decides whether your text is a plausible protocol instruction. Questions (“what’s a transfer?”), pure vagueness (“do an experiment”), and non-liquid-handling operations (“centrifuge this”) get rejected cheaply before any expensive call runs.
Prompts and few-shot examples are written in English. Other languages may work in theory but aren’t tested and may produce worse extractions.
“100uL” is grounded; the extractor cites
it verbatim. “about 100uL” is still cited but
flagged exact: false. “a little” or
“some” forces the extractor to either infer
a default (which you’ll see flagged in the UI) or leave the
volume blank (which becomes a gap to resolve).
Bullet points, numbered lists, or clear paragraph boundaries per step. Run-on paragraphs with multiple actions in one sentence make it more likely the extractor drops or merges steps.
If your config labels a labware "sample_rack", calling
it “the sample rack” in the instruction (rather
than “the tube holder”) gives the labware resolver
a much higher-confidence match. Not required, but reduces the
number of per-piece confirmations you have to make.
LLM-driven extraction has predictable failure modes. The visual surface exists so you can catch these in column 2 (extracted spec) before the pipeline proceeds.
Sonnet sometimes silently omits a step in protocols longer than ~15 steps. Nothing in the downstream pipeline catches this — the orchestrator and constraint checker only see what was extracted. Always count steps in column 2 against your original instruction.
Ambiguous wording like “pipette up and down 3 times in
well A1” may extract as a transfer
(source=A1, dest=A1) rather than a mix. The
generated code still runs but the action semantics differ. Look
for steps where source and destination are the same well.
“wells B1 through B4” usually expands cleanly to
[B1, B2, B3, B4]. “wells B1-D4” can
be ambiguous — row-by-row vs column-by-column expansion order
is not always inferred correctly. Use explicit lists for
non-contiguous or large ranges.
“Distribute the buffer” without naming targets
can make the extractor invent a well set (often A1–A12
on a 96-well). Always be explicit about destinations when
distributing.
When wells in a transfer step were cited from multiple bullet points and the cite phrasing doesn’t literally name each well, the verifier may complain. Real extraction was fine; verifier is too strict in this shape. If you see a fabrication gap on a wells field that visibly matches your instruction, accept the suggestion to move on. (Tracked as a known bug; see the engineering log for the fix plan.)
Categories of protocol the system isn’t designed for. If your protocol falls in here, the system will either reject the instruction at Stage 1 or produce a constraint violation.
Generated scripts are single-session. Anything that requires the user to step away for hours and come back to continue (overnight incubation, multi-day cell culture) needs to be split into separate scripts, one per session.
The Opentrons API exposes pause with an optional
note ("user monitors heat shock timing", "swap tube rack")
for manual steps. Anything more elaborate — physical
rearrangement of decks, custom hardware operations — is
outside the API surface and the generated script.
Centrifugation, gel electrophoresis, autoclave cycles, plate
readers, manual cell counting — the OT-2 doesn’t do these,
and the extractor will reject the instruction if it’s
fundamentally a non-liquid-handling protocol. Workflows that
combine liquid handling with external equipment work as long
as the external steps map to pause calls.
Architectural and internal-engineering limits (where the implementation drifts from intent, contract gaps, known internal bugs) live in the engineering log. The full pipeline walkthrough is in the architecture doc on GitHub.