Changelog
Hand-curated highlights of what we've shipped on GapSmith — features users see, fixes that change observable behavior, and agent-API additions. Repository: github.com/balflee/GapSmith.
May 15, 2026
- Feature/free-trial — sign up, verify email, get 3 free runs
New dedicated landing page for paid acquisition. Email signup + verification → trigger auto-grants 1 Scout + 1 Forge + 1 Prove run, no credit card. Trial runs use a company-funded MiniMax key server-side (no BYOK setup needed for new visitors). After the 3 runs are consumed the existing 402 → /pricing flow takes over. Email verification is now required for all new signups (anti-abuse — no quota until the link is clicked). Existing paid users are unaffected.
May 14, 2026
- FixProve: "PROCEED 2 / REJECT 0 → REJECTED" no longer happens silently
Two changes ship together. (1) Engine: Challenger's hidden veto threshold was ≤4/10 in R2+, which fired even when both Analyst and Reviewer voted PROCEED — producing unexplainable REJECTED verdicts. Loosened to ≤3/10 (same as R1), reserving veto for clearly-bad market reads (1-3 / 10). (2) UI: the verdict card now shows the Challenger's score and a "Veto triggered" badge when relevant, plus a one-line explainer when the Challenger overrode the binary vote. Existing sessions render correctly after deploy — per-round Challenger score was always in the DB, just never surfaced.
May 13, 2026
- FeatureScout / Forge / Prove: leave the page, come back, runs are still there
Pipelines have always run in the background on Railway — but the browser was throwing away the session id when you left, so it felt like restarting. Now: dispatch pushes ?session=<id> into the URL (bookmark-able, refresh-able), "Past Sessions" rows for in-flight runs are clickable ("Click to watch live"), and stale links to /scout-report?id=X mid-run cleanly redirect to the live progress view instead of dead-ending on "No Report Found". Zero engine changes — pure UI wiring.
edf0a53
May 12, 2026
- Feature/lab/debate-room streams each agent reply as it lands
Lab debates now render message-by-message in real time — Proposer's bubble appears as soon as their LLM call returns, then Challenger's, then Analyst's — instead of dropping a whole round at once like Prove does. Auto-scrolls; sub-agent tool calls (Trend Scout / Benchmark Hunter / Evidence Hunter) thread indented under their parent persona. A typing pill at the bottom mirrors the engine's progress message so you know who's about to speak. Pivots /lab from "watch the result" to "watch them argue."
303d39d - FixLab debate room polish — clean topic header + actionable errors
Two annoyances on the live lab page fixed. (1) Pasting a long markdown idea brief no longer dumps the entire wall of text into the sticky header — we extract a real title (strip ##, **, #N: prefixes, cap at 120 chars). (2) When a run fails (OpenAI insufficient_quota, invalid key, model-not-found, rate limit, context overflow), the error card translates the litellm exception into a one-line user fix and names which persona's model triggered it (e.g. "OpenAI quota exhausted on Challenger + Defender — top up or pick a different provider"). Raw error stays available behind a collapsible for bug reports.
0ff03f4
May 11, 2026
- Feature/lab/debate-room/new — pick a different LLM per persona, BYOK
The big one: lab debates can now run each of the 6 personas on a different LLM. Claude Opus on Proposer, MiniMax on Challenger, Gemini Pro on Analyst, GPT-5.5 on Defender — or any combination, including all-same-model. Strict BYOK (your keys decrypted in-memory at dispatch, never logged); free for testing (no Prove quota consumed; runs land in a separate lab_sessions table so experiments don't pollute the production dataset). Sticky header shows per-persona model chips while running. Engine reuses the full Prove debate logic — same gates, same verdict YAML, same sub-agents — just with per-persona LLM bindings.
d7355eb - FixForge + Prove waiting UX — honest time estimates + heartbeat
Forge previously said "results in 20–40 seconds" — accurate for MiniMax, misleading for Claude/Gemini with native search where Round 1 alone can run 1–8 minutes. Now: model-aware time estimate up front, plus a client-side activity heartbeat after 90s of no engine progress so the page never looks frozen during slow LLM calls. No more "is it stuck or is it thinking?" tickets.
878243b
May 7, 2026
- FixUpstream LLM 5xx no longer eats your run quota
When the AI model provider (Gemini, Anthropic, MiniMax, etc.) returns a 503/429/connection error mid-pipeline, the engine now classifies it, refunds the quota unit you spent at /start, and surfaces a green "Your run quota was NOT used — retry anytime" badge in the run-page error card. Run failures from upstream outages cost you nothing. UI also recognizes 503 / "service unavailable" / connection errors directly (was generic "Something Went Wrong" before).
aacf908 - FixProve no longer false-rejects ADJUSTED debates
PIVOT_OUT detection moved from regex to a mandatory YAML verdict block (status: STRENGTHENED | ADJUSTED | VULNERABLE | PIVOT_OUT) the agent must emit. Three rounds of regex tightening upstream still couldn't handle every false-positive variant — a Defender stats-table row "| 🔴 PIVOT_OUT | 0 |" reporting zero pivots was triggering REJECTED on debates that were actually ADJUSTED. Verified end-to-end on MiniMax-M2.7 ($0.022 smoke).
3e41c59 - FeatureForge gets a fourth competitive category: RECONSTRUCT
Forge ideation Step 3 used a 3-category schema (BLUE_OCEAN / IMPROVABLE / RED_OCEAN) and Step 4 told Proposer to skip RED_OCEAN — which means Notion (vs Confluence), Linear (vs Jira), Stripe (vs PayPal) class opportunities were systematically filtered out. New RECONSTRUCT category surfaces ideas where the incumbent looks healthy but the Job-To-Be-Done has shifted underneath. Plus a "why hasn't anyone done this?" sanity gate on BLUE_OCEAN to catch survivor-bias wedges before debate kills them.
ce017ab
May 6, 2026
- Feature/lab/debate-room — visualized 6-persona Prove debate (WIP)
Microsoft-Teams-style chat replay of a real, paid mainnet Prove session. 6 AI personas with editorial-illustration avatars, phase progress (A → A.5 → B → C → D → vote), expandable sub-agent tool calls (Trend Scout / Benchmark Hunter / Evidence Hunter), verdict reveal with kill-brief banner. Read-only replay for now; mixed-model debates ship next.
37a63fd - FeatureLive mainnet traction strip on homepage
Verifiable on-chain numbers beneath the hero — sessions count, USDC settled, paid agent API calls — with a Solscan link to the merchant wallet. Honest small numbers preferred over vanity metrics.
c7186aa
May 5, 2026
- Feature/docs/api/playground — interactive API explorer
Pick any of the 7 endpoints, tweak query/body params via a form, and copy a runnable curl / Python / TypeScript snippet. Sample-response tab shows real production payloads (gaps, pain clusters, kill briefs) so judges and integrators can see actual output shapes without spending USDC.
c6b5d84 - APIPIVOT_OUT is now a distinct verdict on /api/v1/prove/debate
When a panelist self-declares the idea unsalvageable mid-debate, agents now see verdict="PIVOT_OUT" instead of REJECTED — no more inspecting report.pivot_report to disambiguate. OpenAPI spec lists all four verdicts (APPROVED, CONDITIONAL_APPROVED, REJECTED, PIVOT_OUT) and which report.* field to read for each path.
dcd4886 - FixVote-rejected Prove now ships a Strategist kill brief, not silence
When the panel voted REJECTED via final tally (not via in-round PIVOT_OUT), the Strategist was never called and report.output / summary / analysis shipped empty. Now: dedicated kill-brief synthesis with top 3 reasons cited by persona/round, salvage paths, and 1-page decision summary. Verified by job_moskspum: 7045-char output (was 0).
9e45013 - APIPOST /api/v1/prove/debate Compute API live ($25 USDC, ~60 min)
Closes the agent platform gap — Scout (Data) + Forge (Compute) + Prove (Compute) all paid in USDC over x402. Result payload: { verdict, report, rounds, votes }. Webhooks fire on completion.
81c40c2 - FixPre-payment body validation in withX402Payment
Agents posting an invalid body (wrong enum, missing field, type mismatch) now get 422 BEFORE the 402 advertisement. No USDC burned on requests that would be rejected after on-chain settlement.
08084ad - APIStructured session_config object on /forge/ideate
Agents can now pass { profile, budget, timeline, revenue_threshold, founder_signal } with enum-validated values instead of hand-building SESSION_CONFIG.md. Markdown string form still accepted.
0e230fd
May 4, 2026
- FeatureForge → Prove SESSION_CONFIG inheritance
When Prove debates an idea generated by Forge, it now inherits the same project context (Profile/Budget/Timeline/Revenue) Forge ranked the idea under — keeping ratings internally consistent. Override toggle available.
8669cba - FixForge screening: rank-1 always matches the WINNER badge
Six historical sessions had the lower-RICE idea promoted to rank-1 because the cascade tiebreaker disagreed with the simple summed-totals comparison the WINNER badge uses. Added a simple-max safety override + pair label↔total correctly post-reorder.
fd59f33 - FeatureFACT_CLAIMS source-link rule enforced on Forge hard stats
Forge prompts now require every hard fact (competitor names + pricing, funding, ARR, contract status) to either cite an inline [REF: SEARCH] URL, tag as [assumption], or be deleted. Catches the 'fabricated competitor pricing' failure mode that surfaced in earlier Prove fact-checks.
024e3da - FeatureDefault-expand Project Context card on Forge / Prove
Reduces the silent-defaults problem — most users hit Start without ever knowing the Profile/Budget/Timeline/Revenue knobs existed. Now visible by default; collapsing is one click for users who don't want it.
10c49db - FixProve vote-condition deduplication (semantic-aware)
Multiple voters often arrived at the same gating condition with slightly different wording. dict.fromkeys() only caught byte-identical duplicates; the new normalized first-N-words signature catches paraphrases too. 1 production session retroactively cleaned.
e3f71cc - FixStop leaking [SUB_AGENT_QUALITY_WARNING] into transcripts
The internal quality marker was meant as a downstream signal but ended up rendered to users on /prove-report. 6 historical sessions retroactively cleaned. Quality is now judged downstream from content alone.
1cd4973 - FeatureSESSION_CONFIG threaded through Forge + Prove
Solo founders / Funded teams / Enterprise users can finally tell the engine their real Budget / Timeline / Team Profile / Revenue threshold instead of being silently rated against generic Small Team / $10K / $100K assumptions. LEAN_FIT bands are now proportional to the user's actual budget.
cecf836
April 30, 2026
- FeatureGapSmith x402 agent platform — initial commit
Scout / Forge / Prove pipelines, x402 USDC payment rail, /api/v1/* agent API, /docs/api playground, examples/agent_demo.py reference impl. Live at gapsmith.draftlabs.org.
c514f60