Forge — Case Study · nsikakobong.ubom

The problem

Founders waste hours on the blank page. The business plan is the most-requested document of the early stage — investors want it, accelerators want it, the founder's own thinking wants it — and almost every founder I've watched approach it has stalled out on structure before they ever got to substance. Templates exist; they're either too long to be useful or too thin to be credible. Generic LLM prompts produce a generic LLM business plan: ten sections that read like the model averaging across every business plan it has ever seen.

The cost is not the writing. The cost is the hours of blank-page anxiety a founder spends staring at section headings before the writing starts.

The approach

Structure over freeform. A business plan is a constrained document with a known shape. Use that shape as a schema and force the model to produce content that fits it, rather than asking it to write prose and hoping the structure emerges.

A second model to challenge the first. A single model writing in one pass writes too confidently. A second model, with its own prompt and its own context, doing a per-section critic pass, catches the lazy paragraph, the unfounded claim, the duplicate executive summary disguised as a market analysis. The critic is a separate call — not the same model asked to critique its own output, which produces shallow critique because the model defends what it just wrote.

Two agents. One drafts, one critiques. The critic gets a different prompt and a fresh context window. Neither knows the other's reasoning. The output is what survives both.

What I built

Single-input ingestion— a one-line business idea plus a few structured fields (industry, stage, market). No multi-step wizard.
Streamed sectioned generation— the generator agent produces each section against a typed schema; sections stream into the UI as they complete, so the founder sees motion within a second.
Critic agent— a second OpenAI call with its own prompt and context, emitting structured per-section feedback against the same schema. Suggestions surface inline; the founder can accept, edit, or ignore.
Word and PDF export— the schema renders to a downloadable Word doc and a print-ready PDF. No accounts, no email gate.
No persistence by default— the plan lives in the session. Closing the tab loses it. This is a feature, not an oversight.

InputOne-line idea

→

Agent · drafterGenerator

→

Agent · reviewerCritic

Technical decisions worth calling out

Structured outputs over prompting. OpenAI's structured outputs API with a typed schema per section. The schema is the contract; the renderer maps schema to UI and to export. Asking a model to "produce a business plan" and parsing the prose is a worse engineering signal than constraining the model to a shape and trusting it to fill the shape.

Two agents over one bigger prompt. A single fat prompt with "generate then critique" produces a defended draft, not a critiqued one. Splitting drafter and critic across two calls — different prompts, different contexts — produces useful, non-defensive feedback. It costs an extra API call per section; the quality lift is worth it.

FastAPI for orchestration. The Next.js front end handles input and rendering; FastAPI handles the orchestration, schema validation, and the two-agent call chain. Keeping orchestration off the edge means the front end stays a UI, not a half-implemented agent host.

No persistence by default. Founder plans are sensitive. The simplest privacy posture is the one that has nothing to leak. Accounts and saved drafts can come in v1.1 if a real user asks; until then, no auth, no DB, no question.

Hard problem worth mentioning

Tuning the critic. The first cut of the critic was useful but mean — it found real problems and surfaced them in a tone that read as adversarial. Founders bounced off it. The second cut overcorrected and became encouraging-but-useless, finding "opportunities for clarification" where the first version had said "this paragraph contradicts the previous section."

The fix was not a tone-of-voice prompt. It was a structured severity field in the critic's output schema and a default UI that surfaced only medium-and-above issues, with low-severity nits collapsed by default. The critic stays sharp; the founder sees the sharp critique only at the point where they have signal to act on it. Tuning the critic to be useful without being mean was more work than building the critic itself.

Outcome

Live as a personal product, open to use. Used by myself and a handful of founders for first-draft business plans. The most common piece of feedback: the critic surfaced something the founder hadn't noticed in their own thinking, which is the bar I was building for.

What I'd do differently

Version the generator and critic independently. I shipped them as a single coupled deployment, which made it impossible to swap one without retesting the whole pipeline. The two agents are conceptually independent — different prompts, different contexts, different goals — and they should have been independently versioned and rolled from day one. Next iteration treats them as two services with their own release cadence and their own eval suite.