Local-first prompt evaluation
Prompt Tournament Runner logo

Prompt Tournament Runner

Compare 3–5 prompt variants on the same task, score the outputs 1–5, pick a winner, and keep a local record so you never re-litigate the same prompt twice. No account, no API keys, no telemetry — it runs entirely on your machine.

Quick start

npm ci
npm run serve        # builds, then serves the UI on http://localhost:8790

# reachable over LAN / Tailscale so you can score from your phone:
npm run serve -- --host 0.0.0.0 --port 8790

The loop

  1. Capture the task (and optional context)
  2. Enter 3–5 prompt variants
  3. Run them, paste the outputs back in
  4. Score each 1–5 and pick a winner
  5. The run is saved locally for later review

Use it for

  • choosing a system prompt with evidence, not vibes
  • comparing outputs across models or temperatures
  • prompt regression checks before you ship
  • scoring a session from your phone over Tailscale

Privacy

100% local. The app never calls an external API and has no analytics. Runs are written to .prompt-tournament-runs.json in your working directory — yours to keep, diff, or delete.