How to use nl2protocol.
Four sections: try the demo, write a good instruction, write a config, and read the visual surface. If something doesn’t work, the limitations page covers known failure modes and recommendations.
01 · Quick start
The 30-second flow to run your first protocol.
-
Open demo.nl2protocol.com. The form is the landing page there.
-
Pick an example from the dropdown or click Upload your own… and choose your
instruction.txt+config.jsonfiles. -
Paste your Anthropic API key. Get one at console.anthropic.com if you don’t have one. Your key stays in the request and is never stored.
-
Click ▶ Start pipeline. The three-column visual surface fills in as each stage runs. Modals appear when the system needs your input.
Cost. Each run is multiple Anthropic calls
(one Sonnet extraction, one Sonnet labware-resolve, a Haiku
reviewer over non-deterministic suggestions, plus possible
LLM gap-suggesters). Typical protocol: $0.05–
$0.30 per run on your Anthropic account.
02 · Writing an instruction
The instruction is what you’d tell a careful labmate. The extractor (Sonnet) reads it once and tries to produce a structured spec with citations back to your text.
Patterns that produce clean extractions
- Quantitative volumes.
100uL>about 100uL>some buffer. The first cites verbatim; the second is flagged hedged; the third forces inference or leaves a gap. - Specific labware descriptions. If your config calls something
"sample_rack", calling it “the sample rack” in the instruction (rather than “the tube holder”) gives the resolver a high-confidence match and avoids a per-piece confirmation. - One action per sentence. Run-on paragraphs make it more likely the extractor merges or drops steps.
- Bullet or numbered steps. Especially for protocols longer than ~5 steps. Clear boundaries help the LLM segment.
- Standard well notation.
A1,B2,H12are unambiguous. “tube 1” usually works but requires inference.
Patterns to avoid
- Pronouns spanning paragraphs. “Add it to the previous well” 5 sentences later — the LLM may resolve it wrong.
- Range expressions without explicit listing.
B1-B4usually expands.B1-D4is ambiguous (row-major vs column-major). - Operations Opentrons doesn’t support. “centrifuge”, “sonicate”, “run gel” — these fail at the input validator (Stage 1).
Example: a working instruction
Perform heat shock bacterial transformation with 4 different plasmids:
Setup:
- Temperature module is used for heat shock (keep at 4C initially)
- Tube rack A1-A4 contains 4 different plasmid DNA samples (5uL each)
- Tube rack B1-B4 contains competent cells (50uL each)
Steps:
1. Transfer 5uL plasmid from rack A1 to rack B1
2. Transfer 5uL plasmid from rack A2 to rack B2
3. Transfer 5uL plasmid from rack A3 to rack B3
4. Transfer 5uL plasmid from rack A4 to rack B4
5. Set temperature module to 42C
6. Pause 30 seconds (heat shock)
7. Set temperature module back to 4C
Notice: quantitative volumes, explicit per-step transfers (no “for each” looping), specific labware naming, named module references that match the config.
Erroneous-extraction patterns the system has known weaknesses on: see the limitations §04 for what to double-check in column 2 before the pipeline proceeds.
03 · Writing a config
A JSON file describing your physical Opentrons setup — which labware on which deck slots, which pipettes mounted, which modules present. The extractor sees this so it can ground references to “the rack” or “the heater-shaker” in your actual hardware.
Minimum shape
{
"labware": {
"source_plate": {
"load_name": "corning_96_wellplate_360ul_flat",
"slot": "1"
},
"tube_rack": {
"load_name": "opentrons_24_tuberack_eppendorf_2ml_safelock_snapcap",
"slot": "2"
},
"tiprack": {
"load_name": "opentrons_96_tiprack_300ul",
"slot": "3"
}
},
"pipettes": {
"left": {
"model": "p300_single_gen2",
"tipracks": ["tiprack"]
}
}
}
With modules
{
"labware": { ... },
"pipettes": { ... },
"modules": {
"thermocycler": {
"model": "thermocyclerModuleV2"
},
"heater_shaker": {
"model": "heaterShakerModuleV1",
"slot": "1"
}
}
}
Field reference
labware.<key>.load_name- An official Opentrons labware definition name (e.g.
corning_96_wellplate_360ul_flat). Custom labware isn’t supported. labware.<key>.slot- Deck slot the labware sits in. OT-2 has slots 1–11.
pipettes.{left,right}.model- Pipette model name (
p20_single_gen2,p300_single_gen2,p1000_single_gen2, multi-channel variants, etc.). pipettes.{left,right}.tipracks- List of labware keys (from the
labwareblock) that this pipette will pull tips from. Each tip rack referenced must be the right capacity for the pipette. modules.<key>.model- Opentrons module model identifier. Common ones:
thermocyclerModuleV2,heaterShakerModuleV1,magneticModuleV2,temperatureModuleV2.
Tip: copy one of the example configs in the repo and modify. There are 13 worked examples covering common setups (transfers, PCR, magnetic-bead cleanup, ELISA, etc.).
04 · Reading the visual surface
The page that fills in as the pipeline runs. Three columns, left to right, plus modals when the system needs you.
The three columns
- Instruction. Your input text. Phrases that became citations have colored underlines — the hue tells you what kind of value they grounded.
- Protocol steps. The live spec. It starts as the extracted state (Stage 2) and is mutated in place as labware resolves (Stage 3), gaps fill (Stage 5), and constraints are checked (Stage 6) — so what used to be separate extracted, resolved, and validated views are now one column that accrues content. Each value is encoded on two axes: color is its citation hue, outline is its provenance source (from instruction, domain default, or inferred). Per-field revision chains and any reviewer verdicts show inline. Hover any value to highlight its citation in column 1.
- Generated script. The Opentrons Python that goes to the robot. Hover any line to highlight the step it came from; step blocks link to the lines they produced.
Cross-column arrows
When you hover a value in the protocol-steps column, the system draws an arrow back to its citation in the instruction column, and step blocks point to the script lines they generated. Lets you verify the grounding visually.
Modals during the run
The system stops to ask you when it can’t decide confidently. Four modal types:
- Initial contents. A table of
(labware, well, substance, volume)rows for things you pre-fill into wells. Defaults are pre-populated (italic-dim). Edit volumes or accept. - Source containers. Y/N on inferred source-only wells. Answers “you need to physically pre-fill these wells before running.”
- Labware assignments. For each labware description that didn’t map to a config label confidently, you pick from a dropdown. Reasoning shown inline under each row.
- Per-Gap modal. For individual missing values, fabricated citations, or constraint violations. Four buttons: Accept the suggestion, Edit a custom value, Override (fabrication only), or Quit.
What to check before the pipeline proceeds
The protocol-steps column is your single most-important review point — especially the extracted state right after Stage 2. The downstream pipeline trusts that the extraction is faithful to your instruction. Specifically:
- Step count matches your instruction
- Action types are right (
transfervsmixvspause) - Volumes match what you wrote
- Wells aren’t invented (each well referenced should be cited)
Architecture deep-dive lives in the PIPELINE.md doc on GitHub. The GAP_LIFECYCLE.md doc has function-level sequence diagrams for the gap-resolution machinery. Both are also linked from the engineering log.