← back to home · docs

How to use nl2protocol.

Four sections: try the demo, write a good instruction, write a config, and read the visual surface. If something doesn’t work, the limitations page covers known failure modes and recommendations.

The 30-second flow to run your first protocol.

  1. Open nl2protocol.com. The form is the landing page.

  2. Pick an example from the dropdown or click Upload your own… and choose your instruction.txt + config.json files.

  3. Paste your Anthropic API key. Get one at console.anthropic.com if you don’t have one. Your key stays in the request and is never stored.

  4. Click ▶ Start pipeline. The five-column visual surface fills in as each stage runs. Modals appear when the system needs your input.

Cost. Each run is multiple Anthropic calls (one Sonnet extraction, one Sonnet labware-resolve, a Haiku reviewer over non-deterministic suggestions, plus possible LLM gap-suggesters). Typical protocol: $0.05$0.30 per run on your Anthropic account.

The instruction is what you’d tell a careful labmate. The extractor (Sonnet) reads it once and tries to produce a structured spec with citations back to your text.

Patterns that produce clean extractions

Patterns to avoid

Example: a working instruction

Perform heat shock bacterial transformation with 4 different plasmids:

Setup:
- Temperature module is used for heat shock (keep at 4C initially)
- Tube rack A1-A4 contains 4 different plasmid DNA samples (5uL each)
- Tube rack B1-B4 contains competent cells (50uL each)

Steps:
1. Transfer 5uL plasmid from rack A1 to rack B1
2. Transfer 5uL plasmid from rack A2 to rack B2
3. Transfer 5uL plasmid from rack A3 to rack B3
4. Transfer 5uL plasmid from rack A4 to rack B4
5. Set temperature module to 42C
6. Pause 30 seconds (heat shock)
7. Set temperature module back to 4C

Notice: quantitative volumes, explicit per-step transfers (no “for each” looping), specific labware naming, named module references that match the config.

Erroneous-extraction patterns the system has known weaknesses on: see the limitations §04 for what to double-check in column 2 before the pipeline proceeds.

A JSON file describing your physical Opentrons setup — which labware on which deck slots, which pipettes mounted, which modules present. The extractor sees this so it can ground references to “the rack” or “the heater-shaker” in your actual hardware.

Minimum shape

{
  "labware": {
    "source_plate": {
      "load_name": "corning_96_wellplate_360ul_flat",
      "slot": "1"
    },
    "tube_rack": {
      "load_name": "opentrons_24_tuberack_eppendorf_2ml_safelock_snapcap",
      "slot": "2"
    },
    "tiprack": {
      "load_name": "opentrons_96_tiprack_300ul",
      "slot": "3"
    }
  },
  "pipettes": {
    "left": {
      "model": "p300_single_gen2",
      "tipracks": ["tiprack"]
    }
  }
}

With modules

{
  "labware": { ... },
  "pipettes": { ... },
  "modules": {
    "thermocycler": {
      "model": "thermocyclerModuleV2"
    },
    "heater_shaker": {
      "model": "heaterShakerModuleV1",
      "slot": "1"
    }
  }
}

Field reference

labware.<key>.load_name
An official Opentrons labware definition name (e.g. corning_96_wellplate_360ul_flat). Custom labware isn’t supported.
labware.<key>.slot
Deck slot the labware sits in. OT-2 has slots 1–11.
pipettes.{left,right}.model
Pipette model name (p20_single_gen2, p300_single_gen2, p1000_single_gen2, multi-channel variants, etc.).
pipettes.{left,right}.tipracks
List of labware keys (from the labware block) that this pipette will pull tips from. Each tip rack referenced must be the right capacity for the pipette.
modules.<key>.model
Opentrons module model identifier. Common ones: thermocyclerModuleV2, heaterShakerModuleV1, magneticModuleV2, temperatureModuleV2.

Tip: copy one of the example configs in the repo and modify. There are 11 working configs covering common setups (transfers, PCR, magnetic-bead cleanup, ELISA, etc.).

The page that fills in as the pipeline runs. Five columns, left to right, plus modals when the system needs you.

The five columns

  1. Instruction. Your input text. Phrases that became citations have colored underlines — the hue tells you what kind of value they grounded.
  2. Extracted spec. The structured representation after Stage 2. Each value is colored by its provenance source: blue for from instruction, yellow for domain default, orange for inferred. Hover any value to highlight its citation in column 1.
  3. Resolved spec. After Stage 3 (labware resolver maps your descriptions to config labels) and Stage 5 (gap-resolution loop fills missing fields). Any reviewer verdicts show as a tiny tag.
  4. Validated spec. Same content as column 3, after the constraint checker (Stage 6) confirms hardware physics. If there were violations, they’re flagged inline.
  5. Generated script. The Opentrons Python that goes to the robot. Hover any line to highlight the spec field it came from.

Cross-column arrows

When you hover a value in any spec column, the system draws an arrow back to its citation in the instruction column. Lets you verify the grounding visually.

Modals during the run

The system stops to ask you when it can’t decide confidently. Four modal types:

What to check before the pipeline proceeds

Column 2 (extracted spec) is your single most-important review point. The downstream pipeline trusts that the extraction is faithful to your instruction. Specifically:

Architecture deep-dive lives in the PIPELINE.md doc on GitHub. The GAP_LIFECYCLE.md doc has function-level sequence diagrams for the gap-resolution machinery. Both are also linked from the engineering log.