← back to home · docs

How to use nl2protocol.

Four sections: try the demo, write a good instruction, write a config, and read the visual surface. If something doesn’t work, the limitations page covers known failure modes and recommendations.

01 · Quick start

The 30-second flow to run your first protocol.

Open nl2protocol.com. The form is the landing page.
Pick an example from the dropdown or click Upload your own… and choose your instruction.txt + config.json files.
Paste your Anthropic API key. Get one at console.anthropic.com if you don’t have one. Your key stays in the request and is never stored.
Click ▶ Start pipeline. The five-column visual surface fills in as each stage runs. Modals appear when the system needs your input.

Cost. Each run is multiple Anthropic calls (one Sonnet extraction, one Sonnet labware-resolve, a Haiku reviewer over non-deterministic suggestions, plus possible LLM gap-suggesters). Typical protocol: $0.05– $0.30 per run on your Anthropic account.

02 · Writing an instruction

The instruction is what you’d tell a careful labmate. The extractor (Sonnet) reads it once and tries to produce a structured spec with citations back to your text.

Patterns that produce clean extractions

Quantitative volumes. 100uL > about 100uL > some buffer. The first cites verbatim; the second is flagged hedged; the third forces inference or leaves a gap.
Specific labware descriptions. If your config calls something "sample_rack", calling it “the sample rack” in the instruction (rather than “the tube holder”) gives the resolver a high-confidence match and avoids a per-piece confirmation.
One action per sentence. Run-on paragraphs make it more likely the extractor merges or drops steps.
Bullet or numbered steps. Especially for protocols longer than ~5 steps. Clear boundaries help the LLM segment.
Standard well notation. A1, B2, H12 are unambiguous. “tube 1” usually works but requires inference.

Patterns to avoid

Pronouns spanning paragraphs. “Add it to the previous well” 5 sentences later — the LLM may resolve it wrong.
Range expressions without explicit listing. B1-B4 usually expands. B1-D4 is ambiguous (row-major vs column-major).
Operations Opentrons doesn’t support. “centrifuge”, “sonicate”, “run gel” — these fail at the input validator (Stage 1).

Example: a working instruction

Perform heat shock bacterial transformation with 4 different plasmids:

Setup:
- Temperature module is used for heat shock (keep at 4C initially)
- Tube rack A1-A4 contains 4 different plasmid DNA samples (5uL each)
- Tube rack B1-B4 contains competent cells (50uL each)

Steps:
1. Transfer 5uL plasmid from rack A1 to rack B1
2. Transfer 5uL plasmid from rack A2 to rack B2
3. Transfer 5uL plasmid from rack A3 to rack B3
4. Transfer 5uL plasmid from rack A4 to rack B4
5. Set temperature module to 42C
6. Pause 30 seconds (heat shock)
7. Set temperature module back to 4C

Notice: quantitative volumes, explicit per-step transfers (no “for each” looping), specific labware naming, named module references that match the config.

Erroneous-extraction patterns the system has known weaknesses on: see the limitations §04 for what to double-check in column 2 before the pipeline proceeds.

03 · Writing a config

A JSON file describing your physical Opentrons setup — which labware on which deck slots, which pipettes mounted, which modules present. The extractor sees this so it can ground references to “the rack” or “the heater-shaker” in your actual hardware.

Minimum shape

{
  "labware": {
    "source_plate": {
      "load_name": "corning_96_wellplate_360ul_flat",
      "slot": "1"
    },
    "tube_rack": {
      "load_name": "opentrons_24_tuberack_eppendorf_2ml_safelock_snapcap",
      "slot": "2"
    },
    "tiprack": {
      "load_name": "opentrons_96_tiprack_300ul",
      "slot": "3"
    }
  },
  "pipettes": {
    "left": {
      "model": "p300_single_gen2",
      "tipracks": ["tiprack"]
    }
  }
}

With modules

{
  "labware": { ... },
  "pipettes": { ... },
  "modules": {
    "thermocycler": {
      "model": "thermocyclerModuleV2"
    },
    "heater_shaker": {
      "model": "heaterShakerModuleV1",
      "slot": "1"
    }
  }
}

Field reference

labware.<key>.load_name: An official Opentrons labware definition name (e.g. corning_96_wellplate_360ul_flat). Custom labware isn’t supported.
labware.<key>.slot: Deck slot the labware sits in. OT-2 has slots 1–11.
pipettes.{left,right}.model: Pipette model name (p20_single_gen2, p300_single_gen2, p1000_single_gen2, multi-channel variants, etc.).
pipettes.{left,right}.tipracks: List of labware keys (from the labware block) that this pipette will pull tips from. Each tip rack referenced must be the right capacity for the pipette.
modules.<key>.model: Opentrons module model identifier. Common ones: thermocyclerModuleV2, heaterShakerModuleV1, magneticModuleV2, temperatureModuleV2.

Tip: copy one of the example configs in the repo and modify. There are 11 working configs covering common setups (transfers, PCR, magnetic-bead cleanup, ELISA, etc.).

04 · Reading the visual surface

The page that fills in as the pipeline runs. Five columns, left to right, plus modals when the system needs you.

The five columns

Instruction. Your input text. Phrases that became citations have colored underlines — the hue tells you what kind of value they grounded.
Extracted spec. The structured representation after Stage 2. Each value is colored by its provenance source: blue for from instruction, yellow for domain default, orange for inferred. Hover any value to highlight its citation in column 1.
Resolved spec. After Stage 3 (labware resolver maps your descriptions to config labels) and Stage 5 (gap-resolution loop fills missing fields). Any reviewer verdicts show as a tiny tag.
Validated spec. Same content as column 3, after the constraint checker (Stage 6) confirms hardware physics. If there were violations, they’re flagged inline.
Generated script. The Opentrons Python that goes to the robot. Hover any line to highlight the spec field it came from.

Cross-column arrows

When you hover a value in any spec column, the system draws an arrow back to its citation in the instruction column. Lets you verify the grounding visually.

Modals during the run

The system stops to ask you when it can’t decide confidently. Four modal types:

Initial contents. A table of (labware, well, substance, volume) rows for things you pre-fill into wells. Defaults are pre-populated (italic-dim). Edit volumes or accept.
Source containers. Y/N on inferred source-only wells. Answers “you need to physically pre-fill these wells before running.”
Labware assignments. For each labware description that didn’t map to a config label confidently, you pick from a dropdown. Reasoning shown inline under each row.
Per-Gap modal. For individual missing values, fabricated citations, or constraint violations. Four buttons: Accept the suggestion, Edit a custom value, Override (fabrication only), or Quit.

What to check before the pipeline proceeds

Column 2 (extracted spec) is your single most-important review point. The downstream pipeline trusts that the extraction is faithful to your instruction. Specifically:

Step count matches your instruction
Action types are right (transfer vs mix vs pause)
Volumes match what you wrote
Wells aren’t invented (each well referenced should be cited)

Architecture deep-dive lives in the PIPELINE.md doc on GitHub. The GAP_LIFECYCLE.md doc has function-level sequence diagrams for the gap-resolution machinery. Both are also linked from the engineering log.