Pipelines

Scout · Forge · Prove.

Three multi-LLM pipelines, designed to run in sequence or independently. Each does one job well — find gaps, generate ideas, debate them — and hands off clean structured state to the next.

Takeaway

Run them as a chain (Scout → Forge → Prove) for a $3-$10 end-to-end venture thesis (cheap default ~$3, top-tier ~$9). Or one at a time when you just want fresh signals, rapid ideation, or a pre-mortem on an existing idea.

Scout

Daily market intelligence — find the gaps the market hasn't noticed yet.

~$1.50

~6 min

Sectors

Up to 10

Articles ingested

70-90

Pain signals

200-400

Top ideas surfaced

3 topics

What Scout does

Scout runs over a daily snapshot of 79 RSS sources + 100 community-pain sources (Reddit, HN, Lobsters, GitHub Issues), prefiltered by the sectors you select. Five stages:

Fetch — pull cached articles + pain posts for your sectors
Score — LLM scores each article (idea_potential, confidence A-D), and clusters pain posts into themes by sector + frequency
Curate — pick top 8 articles, top 10 pain clusters, link them into cross-signals (article × pain → startup wedge)
Topics — synthesize 3 venture-grade topic cards with trend signal, severity-tagged pain signals, and a core question
Brief — assemble the daily brief (overview, takeaway, narratives, sector heatmap)

Quality benchmark

On the claude-sonnet-4-6 baseline (3 sectors, ~$1.73): 13K-char daily brief, 10 sharp cross-signals linking news to pain to wedge, 3 topics with concrete market wedges + competitor pricing, 30 keywords reflecting real domain vocabulary (after our stopword + per-cluster cap fixes).

MiniMax M2.7 produces nearly identical structural quality at ~$0.45. Grok 4 was weaker on Scout — generic topics, hallucinated sources — and is no longer offered.

Forge

Multi-agent ideation — turn gaps into screened, ranked startup ideas.

~$0.45 - $2.20

~30 min

Rounds

5 + screen

Agents

5 main + 5 sub

Top ideas

3 ranked

Fields per idea

What Forge does

Forge takes context (a Scout report, or your own free-form input) and runs a five-round structured conversation between agents:

Round 1 — Pain discovery (gated): Proposer drafts pain points; Trend Scout, Contrarian, Gap Finder, Benchmark Hunter, Evidence Hunter add competitive + adjacent-market context
Rounds 2-4— Iterative deepening: Defender plays creative coach pushing on differentiation, pricing, and the "stop-scrolling sentence"; Proposer commits to specifics
Round 5— Top-3 selection with explicit hybrid & portfolio analysis
Screening pass — All 5 agents cast a kill vote and a RICE score; tie-breaks resolved via aggregate RICE total

Each surviving idea ships with 20 structured fields including moat, problem, why-now, target market, revenue model, competitive landscape, kill switches with thresholds, and a 3-step validation plan with numeric success criteria.

Quality benchmark

On claude-opus-4-7 ($2.19/run): Three venture-grade ideas with concrete pricing tiers, traceable Reddit/Trustpilot/G2 sources, full RICE scoring from each agent, and explicit kill votes with reasoning.

Sonnet 4.6 and Gemini Flash variants are restricted to Scout — we observed quality drops in Forge's 5-round screening with those models. The cheapest viable Forge model is MiniMax M2.7 (~$0.45/run, well-balanced). Opus 4.7 / GPT-5.5 are the quality picks for venture-grade output.

Prove

Multi-agent debate — pre-mortem the idea before you build it.

~$2.50 - $5.50

~20 min

Agents

5 main + 5 sub

Rounds

Up to 4

Verdicts

4 outcomes

Fact-check

Phase A5

What Prove does

Prove runs an adversarial debate: Proposer defends the idea, Challenger attacks market viability, Analyst pressure-tests the unit economics, Defender plays steelman, and Reviewer audits every factual claim against URLs in a Phase A5 pass. Sub-agents (Contrarian, Gap Finder, Trend Scout, Evidence Hunter, Benchmark Hunter) inject competitive context.

After each round the panel can vote on one of four verdicts:

APPROVED — strong consensus to build
CONDITIONAL — proceed if conditions are met
REJECTED— Challenger's market-viability veto fires
PIVOT OUT — the idea changed category mid-debate; the panel recommends pivoting and produces a Pivot Report instead of an execution plan

Verdict logic includes idempotency: replaying the same X-Payment tx returns the same verdict, no double-charge.

Quality benchmark

On gpt-5.5 ($5.50/run): all 5 main agents + all 5 sub-agents fire with 5K-25K chars of analysis each. Phase A5 fact-checks every claim against cited URLs. Verdict + 18.8K-char Pivot Report when the panel pivots.

How they chain together

Each pipeline writes structured state to its session table. The next pipeline reads that state to seed its prompt:

Scout → Forge — Forge reads scout_reports.topics and scout_reports.daily_brief as Round 0 context
Forge → Prove — pick one idea from forge_sessions.top_ideas; Prove reads its 20 structured fields to seed the debate
Standalone — every pipeline also accepts free-form input, so you can skip the chain and feed your own idea directly into Forge or Prove

Cost summary at a glance

Pipeline	Cheap default	Quality pick	Duration
Scout	MiniMax M2.7 — $0.45	Sonnet 4.6 — $1.50	~6 min
Forge	MiniMax M2.7 — $0.45	Opus 4.7 — $2.20	~30 min
Prove	GPT-5.4 — $1.20	GPT-5.5 — $5.50	~20 min

All costs are pass-through to your LLM provider — GapSmith doesn't take a margin on token spend. Your purchase covers software access; your API key covers compute.

Agent API vs Done-For-You

These pipelines are also available as paid services. We deliberately split them into two tiers based on price-quality trade-off:

Agent API

Cost-effective LLM

Endpoints under /api/v1/* run on a balanced cost-effective LLM (MiniMax / Sonnet 4.6 tier) so per-call USDC pricing stays in the $0.05–$15 range. Right tier when an agent just needs fresh signal at machine speed.

Agent API reference →

Done-For-You

Top-tier LLM + human review

We run the full pipeline on Claude Opus 4.7 / GPT-5.5 Pro with a human pass on top of every report. Right tier when quality matters more than per-call cost. $39 / $99 / $149 per run.

Done-For-You details →

← Previous

Architecture

x402 on Solana

Want to see real output? Try the live Scout / Forge / Prove pages. Or hit the Agent API directly — see the API reference.