Pipelines

Scout · Forge · Prove.

Three multi-LLM pipelines, designed to run in sequence or independently. Each does one job well — find gaps, generate ideas, debate them — and hands off clean structured state to the next.

Takeaway

Run them as a chain (Scout → Forge → Prove) for a $3-$10 end-to-end venture thesis (cheap default ~$3, top-tier ~$9). Or one at a time when you just want fresh signals, rapid ideation, or a pre-mortem on an existing idea.

1

Scout

Daily market intelligence — find the gaps the market hasn't noticed yet.

~$1.50
~6 min
Sectors
Up to 10
Articles ingested
70-90
Pain signals
200-400
Top ideas surfaced
3 topics

What Scout does

Scout runs over a daily snapshot of 79 RSS sources + 100 community-pain sources (Reddit, HN, Lobsters, GitHub Issues), prefiltered by the sectors you select. Five stages:

  • Fetch — pull cached articles + pain posts for your sectors
  • Score — LLM scores each article (idea_potential, confidence A-D), and clusters pain posts into themes by sector + frequency
  • Curate — pick top 8 articles, top 10 pain clusters, link them into cross-signals (article × pain → startup wedge)
  • Topics — synthesize 3 venture-grade topic cards with trend signal, severity-tagged pain signals, and a core question
  • Brief — assemble the daily brief (overview, takeaway, narratives, sector heatmap)

Quality benchmark

On the claude-sonnet-4-6 baseline (3 sectors, ~$1.73): 13K-char daily brief, 10 sharp cross-signals linking news to pain to wedge, 3 topics with concrete market wedges + competitor pricing, 30 keywords reflecting real domain vocabulary (after our stopword + per-cluster cap fixes).

MiniMax M2.7 produces nearly identical structural quality at ~$0.45. Grok 4 was weaker on Scout — generic topics, hallucinated sources — and is no longer offered.

2

Forge

Multi-agent ideation — turn gaps into screened, ranked startup ideas.

~$0.45 - $2.20
~30 min
Rounds
5 + screen
Agents
5 main + 5 sub
Top ideas
3 ranked
Fields per idea
20

What Forge does

Forge takes context (a Scout report, or your own free-form input) and runs a five-round structured conversation between agents:

  • Round 1 — Pain discovery (gated): Proposer drafts pain points; Trend Scout, Contrarian, Gap Finder, Benchmark Hunter, Evidence Hunter add competitive + adjacent-market context
  • Rounds 2-4— Iterative deepening: Defender plays creative coach pushing on differentiation, pricing, and the "stop-scrolling sentence"; Proposer commits to specifics
  • Round 5— Top-3 selection with explicit hybrid & portfolio analysis
  • Screening pass — All 5 agents cast a kill vote and a RICE score; tie-breaks resolved via aggregate RICE total

Each surviving idea ships with 20 structured fields including moat, problem, why-now, target market, revenue model, competitive landscape, kill switches with thresholds, and a 3-step validation plan with numeric success criteria.

Quality benchmark

On claude-opus-4-7 ($2.19/run): Three venture-grade ideas with concrete pricing tiers, traceable Reddit/Trustpilot/G2 sources, full RICE scoring from each agent, and explicit kill votes with reasoning.

Sonnet 4.6 and Gemini Flash variants are restricted to Scout — we observed quality drops in Forge's 5-round screening with those models. The cheapest viable Forge model is MiniMax M2.7 (~$0.45/run, well-balanced). Opus 4.7 / GPT-5.5 are the quality picks for venture-grade output.

3

Prove

Multi-agent debate — pre-mortem the idea before you build it.

~$2.50 - $5.50
~20 min
Agents
5 main + 5 sub
Rounds
Up to 4
Verdicts
4 outcomes
Fact-check
Phase A5

What Prove does

Prove runs an adversarial debate: Proposer defends the idea, Challenger attacks market viability, Analyst pressure-tests the unit economics, Defender plays steelman, and Reviewer audits every factual claim against URLs in a Phase A5 pass. Sub-agents (Contrarian, Gap Finder, Trend Scout, Evidence Hunter, Benchmark Hunter) inject competitive context.

After each round the panel can vote on one of four verdicts:

  • APPROVED — strong consensus to build
  • CONDITIONAL — proceed if conditions are met
  • REJECTED— Challenger's market-viability veto fires
  • PIVOT OUT — the idea changed category mid-debate; the panel recommends pivoting and produces a Pivot Report instead of an execution plan

Verdict logic includes idempotency: replaying the same X-Payment tx returns the same verdict, no double-charge.

Quality benchmark

On gpt-5.5 ($5.50/run): all 5 main agents + all 5 sub-agents fire with 5K-25K chars of analysis each. Phase A5 fact-checks every claim against cited URLs. Verdict + 18.8K-char Pivot Report when the panel pivots.

How they chain together

Each pipeline writes structured state to its session table. The next pipeline reads that state to seed its prompt:

  • Scout → Forge — Forge reads scout_reports.topics and scout_reports.daily_brief as Round 0 context
  • Forge → Prove — pick one idea from forge_sessions.top_ideas; Prove reads its 20 structured fields to seed the debate
  • Standalone — every pipeline also accepts free-form input, so you can skip the chain and feed your own idea directly into Forge or Prove

Cost summary at a glance

PipelineCheap defaultQuality pickDuration
ScoutMiniMax M2.7 — $0.45Sonnet 4.6 — $1.50~6 min
ForgeMiniMax M2.7 — $0.45Opus 4.7 — $2.20~30 min
ProveGPT-5.4 — $1.20GPT-5.5 — $5.50~20 min

All costs are pass-through to your LLM provider — GapSmith doesn't take a margin on token spend. Your purchase covers software access; your API key covers compute.

Agent API vs Done-For-You

These pipelines are also available as paid services. We deliberately split them into two tiers based on price-quality trade-off:

Agent API
Cost-effective LLM

Endpoints under /api/v1/* run on a balanced cost-effective LLM (MiniMax / Sonnet 4.6 tier) so per-call USDC pricing stays in the $0.05–$15 range. Right tier when an agent just needs fresh signal at machine speed.

Agent API reference →
Done-For-You
Top-tier LLM + human review

We run the full pipeline on Claude Opus 4.7 / GPT-5.5 Pro with a human pass on top of every report. Right tier when quality matters more than per-call cost. $39 / $99 / $149 per run.

Done-For-You details →
Want to see real output? Try the live Scout / Forge / Prove pages. Or hit the Agent API directly — see the API reference.