Scout · Forge · Prove.
Three multi-LLM pipelines, designed to run in sequence or independently. Each does one job well — find gaps, generate ideas, debate them — and hands off clean structured state to the next.
Run them as a chain (Scout → Forge → Prove) for a $3-$10 end-to-end venture thesis (cheap default ~$3, top-tier ~$9). Or one at a time when you just want fresh signals, rapid ideation, or a pre-mortem on an existing idea.
Scout
Daily market intelligence — find the gaps the market hasn't noticed yet.
What Scout does
Scout runs over a daily snapshot of 79 RSS sources + 100 community-pain sources (Reddit, HN, Lobsters, GitHub Issues), prefiltered by the sectors you select. Five stages:
- Fetch — pull cached articles + pain posts for your sectors
- Score — LLM scores each article (idea_potential, confidence A-D), and clusters pain posts into themes by sector + frequency
- Curate — pick top 8 articles, top 10 pain clusters, link them into cross-signals (article × pain → startup wedge)
- Topics — synthesize 3 venture-grade topic cards with trend signal, severity-tagged pain signals, and a core question
- Brief — assemble the daily brief (overview, takeaway, narratives, sector heatmap)
Quality benchmark
On the claude-sonnet-4-6 baseline (3 sectors, ~$1.73): 13K-char daily brief, 10 sharp cross-signals linking news to pain to wedge, 3 topics with concrete market wedges + competitor pricing, 30 keywords reflecting real domain vocabulary (after our stopword + per-cluster cap fixes).
MiniMax M2.7 produces nearly identical structural quality at ~$0.45. Grok 4 was weaker on Scout — generic topics, hallucinated sources — and is no longer offered.
Forge
Multi-agent ideation — turn gaps into screened, ranked startup ideas.
What Forge does
Forge takes context (a Scout report, or your own free-form input) and runs a five-round structured conversation between agents:
- Round 1 — Pain discovery (gated): Proposer drafts pain points; Trend Scout, Contrarian, Gap Finder, Benchmark Hunter, Evidence Hunter add competitive + adjacent-market context
- Rounds 2-4— Iterative deepening: Defender plays creative coach pushing on differentiation, pricing, and the "stop-scrolling sentence"; Proposer commits to specifics
- Round 5— Top-3 selection with explicit hybrid & portfolio analysis
- Screening pass — All 5 agents cast a kill vote and a RICE score; tie-breaks resolved via aggregate RICE total
Each surviving idea ships with 20 structured fields including moat, problem, why-now, target market, revenue model, competitive landscape, kill switches with thresholds, and a 3-step validation plan with numeric success criteria.
Quality benchmark
On claude-opus-4-7 ($2.19/run): Three venture-grade ideas with concrete pricing tiers, traceable Reddit/Trustpilot/G2 sources, full RICE scoring from each agent, and explicit kill votes with reasoning.
Sonnet 4.6 and Gemini Flash variants are restricted to Scout — we observed quality drops in Forge's 5-round screening with those models. The cheapest viable Forge model is MiniMax M2.7 (~$0.45/run, well-balanced). Opus 4.7 / GPT-5.5 are the quality picks for venture-grade output.
Prove
Multi-agent debate — pre-mortem the idea before you build it.
What Prove does
Prove runs an adversarial debate: Proposer defends the idea, Challenger attacks market viability, Analyst pressure-tests the unit economics, Defender plays steelman, and Reviewer audits every factual claim against URLs in a Phase A5 pass. Sub-agents (Contrarian, Gap Finder, Trend Scout, Evidence Hunter, Benchmark Hunter) inject competitive context.
After each round the panel can vote on one of four verdicts:
- APPROVED — strong consensus to build
- CONDITIONAL — proceed if conditions are met
- REJECTED— Challenger's market-viability veto fires
- PIVOT OUT — the idea changed category mid-debate; the panel recommends pivoting and produces a Pivot Report instead of an execution plan
Verdict logic includes idempotency: replaying the same X-Payment tx returns the same verdict, no double-charge.
Quality benchmark
On gpt-5.5 ($5.50/run): all 5 main agents + all 5 sub-agents fire with 5K-25K chars of analysis each. Phase A5 fact-checks every claim against cited URLs. Verdict + 18.8K-char Pivot Report when the panel pivots.
How they chain together
Each pipeline writes structured state to its session table. The next pipeline reads that state to seed its prompt:
- Scout → Forge — Forge reads
scout_reports.topicsandscout_reports.daily_briefas Round 0 context - Forge → Prove — pick one idea from
forge_sessions.top_ideas; Prove reads its 20 structured fields to seed the debate - Standalone — every pipeline also accepts free-form input, so you can skip the chain and feed your own idea directly into Forge or Prove
Cost summary at a glance
| Pipeline | Cheap default | Quality pick | Duration |
|---|---|---|---|
| Scout | MiniMax M2.7 — $0.45 | Sonnet 4.6 — $1.50 | ~6 min |
| Forge | MiniMax M2.7 — $0.45 | Opus 4.7 — $2.20 | ~30 min |
| Prove | GPT-5.4 — $1.20 | GPT-5.5 — $5.50 | ~20 min |
All costs are pass-through to your LLM provider — GapSmith doesn't take a margin on token spend. Your purchase covers software access; your API key covers compute.
Agent API vs Done-For-You
These pipelines are also available as paid services. We deliberately split them into two tiers based on price-quality trade-off:
Endpoints under /api/v1/* run on a balanced cost-effective LLM (MiniMax / Sonnet 4.6 tier) so per-call USDC pricing stays in the $0.05–$15 range. Right tier when an agent just needs fresh signal at machine speed.
We run the full pipeline on Claude Opus 4.7 / GPT-5.5 Pro with a human pass on top of every report. Right tier when quality matters more than per-call cost. $39 / $99 / $149 per run.
Done-For-You details →