Back to Home
Orchestrate logo

Orchestrate

From zero to done: Designing AI That Thinks With You, Not For You

Founding Designer
0 → 1
AI Product
SaaS
GenAI
B2C

Orchestrate helps people move from vague, overwhelming goals to clear plans and concrete next steps — using conversational AI that explains its reasoning, not just its output.

RoleFounding AI Product Designer & Product Owner
ResponsibilitiesResearch, product strategy, interaction design, AI system design
Collaborators2 consulting engineers
Timeline6 months
ScopeFunctional prototype — validated with users, designs dev-ready

AI engineering case

The AI Layer: Architecture, Prompt Design & Safety

Outcomes
Day 1code-first design system
+40%trust, without changing the AI
25% → 85%plan acceptance
88 respondents · 6 interviewsresearch that became the brief
ZOOM IN
A walkthrough of Orchestrate — the conversation, the generated plan, and the controls users have over it.
01

Problem & Context

Most people don't fail at productivity because they lack the right tool. They fail because of activation energy — the psychological gap between having a goal and knowing how to take the first step. “I don't know where to start” isn't an excuse; it's the actual barrier.

  • The scale is real: 91% of workers experience high or extreme stress, with 82% at risk of burnout — costing $322B annually in lost productivity (APA, 2025). 40% cite "not knowing where to start" as their biggest productivity challenge.

  • Existing tools solve the wrong problem: Every major productivity tool — Todoist, Notion, Google Tasks — helps you schedule and manage tasks. None of them help you generate them. They assume you already know what to do.

  • People want transparency, not automation: Users don't want AI to take over. They want to understand why a plan is structured the way it is, stay in control of their process, and trust what they're looking at — not just accept a black-box output.

02

Who It's For & What Success Looks Like

The “capable but stuck” professional

Someone who has real work already done — photos taken, a project half-finished, a skill developed — but hasn't shipped it yet. Not because they lack ability, but because the gap between “I have this” and “this exists in the world” feels overwhelming or unclear.

They don't need to be told what to do — they roughly know. What they need is someone to help them narrow the scope, make the first step feel small enough to actually do, and hold the bigger picture so they don't have to.

Job To Be Done: When I'm facing a big, undefined goal, I want AI to help me break it into something I can actually start today — so I feel capable rather than overwhelmed.

Success metrics

  • Breakdown quality accuracy% of generated tasks rated as useful and actionable

  • Time to first actionable taskfrom goal input to a task the user can act on today

  • Trust & clarity scoredoes the user understand and believe the plan? Target: 4+/5

  • Retention intentwould they return after the first session?

03

The Solution — Goal Breakdown Feature

Research revealed that users weren't struggling to schedule tasks — they couldn't generate them in the first place. That insight drove the pivot from energy-aware scheduling to goal breakdown: a conversational AI that takes a vague goal and turns it into a structured, explainable, editable plan.

Manual entry and chat, side by side

Two ways to start a goal, not one. Users can type it directly when they already know what they want, or hand it to the AI when they don't. Chat opens full-page — the conversation is the work, the AI drives. Manual entry stays compact — the user drives. Both paths land in the same editable goal artifact, so the choice up front never locks anyone in.

ZOOM IN
One entry point, two postures: switch between typing the goal directly and chatting it out. Either way lands in the same editable artifact.

Goal Action Plan with breakdown

The canvas opens progressively once the conversation has enough context — milestones, tasks, and subtasks appear with confidence scores and “Why this?” explanations so users understand the plan before they follow it. Side-by-side layout keeps the conversation visible alongside the breakdown.

ZOOM IN
Side-by-side view: the chat stays on the left while the generated plan opens on the right, with confidence scores and 'Why this?' inline.

Breakdown with ability to refine, ask why and re-write

Manual input is always available as an escape hatch. Full edit access at every level: the AI proposes, the user decides. No locked outputs, no black-box results.

Refine and edit panel — manual input, inline 'Why this?' explanations, and per-task edit controls
ZOOM IN
Every level of the breakdown is editable. The AI proposes; the user decides — no locked outputs.

Goals Dashboard

An at-a-glance overview of all active goals, showing progress, task counts, deadlines, and status. A persistent AI assistant column sits alongside the goal cards, surfacing a daily focus suggestion, next actions per goal, and at-risk alerts — turning the dashboard from a passive tracker into an active accountability partner.

Goals dashboard — active goals with progress, task counts, deadlines, and an AI assistant column
ZOOM IN
The dashboard pairs goal cards with a persistent AI assistant — surfacing today's focus, next actions, and at-risk alerts.
04

Process

4.1 Discovery: From ‘better scheduling’ to ‘lowering activation energy’

Research methods: surveys (88 respondents), 6 in-depth interviews, and a 4-week competitive immersion across UseMotion, Reclaim as the main competitors and Todoist, Notion for gathering more context about task management.

I always struggled with deciding what to work on next — I'd open my goals and just freeze. I want something that just tells me what to focus on right now, so I skip that whole mental loop.

Jeny

I do weekly reviews every Sunday and try to keep my goals updated — but the moment a new task pops up mid-week, my schedule fills up in two days and I lose track of where to fit it.

Andres

Main insights that informed the design direction

  • Decision fatigue around prioritization: Users felt paralyzed when opening their goals, unsure what to work on first. 40% cited "not knowing where to start" as their biggest productivity challenge, not lack of time or tools.

  • Schedule fit and effort estimation: Users struggled to estimate how long large goals would realistically take. Without visibility into the full timeline, they frequently overcommitted.

  • Fragmented tools, no single source of truth: Users were spread across multiple apps — notes, calendars, task managers — with no unified place to track goals, tasks, and progress together.

  • AI as a thinking partner, not a dictator: Fear of over-automation was real. 80% were already using fragmented workarounds to stay in control. People want AI that collaborates, not one that decides for them.

  • Trust requires transparency: Users need to understand why a plan is structured the way it is. Without reasoning, they don't follow it.

How this changed the product

  • → V1 focus shifted entirely to goal breakdown — generating the task list users couldn't create themselves
  • → Explainability and user control became non-negotiable UX requirements from day one

4.2 AI System Design: Reliable, Explainable, and Cost-Aware

Orchestrate's AI isn't a single model answering a question. It's a pipeline of four specialised agents — each responsible for one cognitive task: routing, analysing, breaking down, validating. The architecture was designed this way because goal planning is too complex and too high-stakes for a one-shot response.

Multi-agent AI pipeline — Orchestrator routes to Analyzer, Breakdown, and Validator
ZOOM IN
Four specialised agents, each owning one cognitive task — routing, analysing, breaking down, validating — with explicit handoffs.
  • Why multi-agent vs. single prompt: A single prompt doing everything produced shallow breakdowns and unpredictable failures. Separating concerns (analyze → break down → validate) made each step more reliable and easier to improve independently.

  • Why this model architecture: GPT-4o for breakdown quality where accuracy matters; lighter rules-based logic for validation to avoid unnecessary API costs — resulting in 10x cost reduction vs. an all-LLM approach.

  • How guardrails improve the user experience: The Validator agent catches unsafe goals (dangerous health advice), unrealistic timelines, and low-confidence outputs before they reach the user — accuracy improved from 68% to 92% after guardrails were introduced.

Ethics & Transparency by Design

Before writing a single prompt, I ran a bias mapping exercise with the developer and consulted an AI specialist to cross-check assumptions. The goal was to identify potential harms before they became design decisions, not after.

Bias mapping board
ZOOM IN
Bias mapping board showing user needs, AI capabilities, potential biases, mitigations, and reporting mechanisms

Key biases identified and how they shaped the product

  • More tasks = betterRemoved gamification and streaks that would enable unhealthy productivity patterns.

  • Universal task granularityMade breakdown detail adjustable; neurodivergent users need different structure levels than neurotypical ones.

  • AI as motivatorPositioned Orchestrate as organisational support, not a productivity coach; no guilt-inducing language or prompts.

  • Black-box suggestionsAdded confidence scores and "Why this?" directly in response to this bias; users need to interrogate the AI, not just trust it.

4.3 Interaction & Iteration: Making a Complex AI Feel Understandable

To move quickly through layout exploration, I used UXpilot and v0 to prototype and visualise ideas before committing to Figma — compressing what would have been weeks of pixel work into hours.

The core challenge: AI interactions are non-linear, but interfaces are usually built for linear flows. Early designs assumed users would move step-by-step — they didn't.

Traditional UX flow
ZOOM IN
Traditional UX Flow
AI UX flow with loops
ZOOM IN
AI UX Flow — with exit/loop options at every stage

Key decisions

1. Arriving at one AI, several postures

The breaking point was a small moment with a big tell: in manual mode, with a chat also reachable, users hit a “wait — what does this chat even know?” pause. The toggle had quietly promised two separate AIs, and neither knew about the other's work. The tempting fix was to add a side panel to manual mode too — but that just multiplied the problem into three AI surfaces that looked like three different products.

The reframe that resolved it: the toggle was never a mode, it was a first-run choice. One AI that shows up in different postures — full-page when the conversation is the work, a docked co-pilot when the form is the work, an inline suggestion when a single field is. Held together by four rules (one character, always a diff, always shows what it's reading, never silent), the three surfaces stop competing and become one ladder of commitment users slide along as their goal takes shape.

Trade-offs. Collapsing the toggle means losing an explicit, legible “switch to chat” affordance — discoverability now leans on the persistent co-pilot, which I'd want to validate with real users rather than assume. The morph from conversation to form is also more complex to build and animate than two static modes, and it asks the system to infer intent from a first sentence, which can guess wrong. And “one assistant everywhere” only pays off if every team holds the four rules consistently — the moment one surface drifts, the illusion of a single AI breaks and you're back to three products.

ZOOM IN

Approach

One AI, several postures. The toggle isn’t a mode — it’s a first-run choice. The same assistant shows up full-page when the conversation is the work, as a docked co-pilot when the form is, and as an inline suggestion when a single field is.

Why this

Held together by four rules — one character, always a diff, always shows what it’s reading, never silent — the three surfaces stop competing and become one ladder of commitment users slide along as their goal takes shape.

Chat (full-page) — the posture for users who don't yet know the shape of what they want. Nothing exists yet (no form, no goal, no artifact), so the AI takes the whole screen — there's nothing else to look at. It leads: asks questions, draws intent out of the user. The AI drives the conversation; the user follows.

Co-pilot (side panel) — the posture for users who already know what they're building. The form is the protagonist now, sitting center stage; the AI moves into a narrow rail beside it, watching, suggesting, pointing. It stops leading and starts assisting — the user drives, the AI helps. That's why it shows the “reading from your form” strip: once there's something to read, the AI has to prove it can see what the user sees.

They're not two features — they're the same assistant in two postures, and the morph between them is continuous.

Chat is the AI when you have nothing; co-pilot is the AI when you have something.

2. Conversational over instant generation

Instant plan generation felt impersonal — users couldn't see themselves in it and were stuck with accept or restart. A conversational flow where the AI asks questions, generates iteratively, and adapts mid-conversation gave users a sense of ownership and significantly higher plan acceptance.

Conversation (selected)
ZOOM IN

How it works

A back-and-forth conversation thread — AI asks a clarifying question, user responds, AI refines the plan. No binary buttons; the conversation continues naturally.

Result

85% acceptance rate (+240%)

Feels like working with someone. I liked that it asked me clarifying questions and the plan look legit and I know where to start.

3. Editable time estimates and full control over AI suggestions

Every task has a clickable time badge you can adjust. This acknowledges that AI-generated estimates are a starting point, not gospel — the user owns the plan.

ZOOM IN
Each task carries a time badge. Tap to adjust — the AI's estimate is a starting point, not the final number.

4. Structured options inside the chat

When the AI needs a decision, it offers a small picker — buttons for the obvious choices, free text for everything else. The format keeps the conversation moving without forcing users down a pre-defined path.

ZOOM IN
When the AI needs a decision, it shows a small picker: numbered options, a free-text fallback, and Skip — never a forced path.
  • Structured vs. open-ended — buttons are faster, but they shouldn't trap users. That's why “Something else” (free text) and “Or reply directly...” (full chat input) stay available.
  • Guidance vs. autonomy — the picker suggests answers but never blocks the user's own phrasing. “Skip” reinforces this.
  • Density vs. clarity — 2–4 options per question, max 3 questions. More than that and it becomes a form, not a conversation.
  • Mobile-first — large tap targets, numbered for keyboard users, single-column layout.

5. Resource tracking

Testing revealed users naturally wanted to attach resources to goals and tasks — links, documents, references. This came up consistently enough to flag as a near-term design requirement.

Resource attached to an individual task
ZOOM IN
Resources at the task level — a reference attached exactly where it's needed.
Goal-level resource library — all links and references in one place
ZOOM IN
Same primitive at the goal level — every link and reference gathered in one library.

6. How to maintain momentum

Big goals create friction. Motivation drops when the next step feels too large or too vague. Orchestrate reduces that friction by structuring the plan so the first move is always easy, the calendar is already cleared, and the bigger picture stays visible without getting in the way.

First tasks are short and achievable

The first tasks in every action plan are intentionally small — 15–30 minute units designed to create early wins. Research shows small, completable tasks lower the activation energy needed to start and increase follow-through. Instead of one large block, the plan breaks work into granular steps a user can pick up without feeling overwhelmed.

First tasks are short and achievable — 15–30 min units
ZOOM IN
The first three tasks in every plan are deliberately 15–30 minutes — designed to produce an early win and lower activation energy.

AI-suggested scheduling

When calendar access is granted, Orchestrate scans for the nearest available slot and proactively recommends a time to begin. Rather than leaving the user to figure out when to start, the AI surfaces a concrete moment — closing the gap between planning and doing.

AI finds a free slot in the user's calendar and suggests when to start
ZOOM IN
Once calendar access is granted, Orchestrate scans for the nearest open slot and proposes a concrete time to begin.

Future north star — ideas added to the backlog

Based on the goal and the conversation context, Orchestrate surfaces what might be worth doing next — features, directions, or ideas that aren't part of the immediate plan but shouldn't be forgotten. These go into a backlog automatically, giving the user a sense of the bigger picture without cluttering the active plan.

ZOOM IN
Ideas that aren't part of today's plan land in a backlog automatically — bigger picture preserved without cluttering the active goal.

Code-First from the Start

Most projects live in Figma until handoff. This one didn't.

A few high-fidelity screens set the visual direction. Everything else was built directly in code using Claude with MCP — production-ready, no translation layer. The design system followed the same logic: tokens, rules, and components built in code, not Figma, applying the same thinking as engineering — what's globally updatable, what's a reusable component, what's a one-off.

Version controlled in GitHub. Code quality reviewed and refactored with engineering guidance throughout.

No handoff gap — because there was no handoff.

Orchestrate design system — tokens and typography
05

Outcomes & What's Next

Day 1code-first design system

Built the design system directly in code from day one — every token, component, and primitive lived in the codebase, not in Figma. Zero design-to-code translation lag, one source of truth across the team, and new screens shipping at the speed of the design decision behind them.

+40%trust, without changing the AI

Confidence scoring showed users where the system was uncertain rather than projecting false confidence. Self-reported trust jumped from 3.1 to 4.3/5 — same AI, more honesty.

25% → 85%plan acceptance

100% of testers wanted to refine, not decide. Designing for collaboration over dictation — editable history, iterative refinement — cut restart rate from 28% to 5%.

88 respondents · 6 interviewsresearch that became the brief

Every major design decision traces back to something a real person said. The "I'm Stuck" mode, confidence scoring, guardrails against toxic productivity — all came from research, not assumption.

What I'd do next

  • Test with users over 2–4 weeks of real use (diary study) — prototype testing validated the mechanics, but longitudinal data would reveal whether task quality holds

  • Integrate a scheduling layer — once a goal is broken down, helping users place tasks into their week is the natural next step

  • Explore adaptive personalization — using edits and feedback to improve breakdown quality over time, while keeping goal data private

What this project taught me

01.

Rapid prototyping as a thinking tool

Using UXpilot and v0 to prototype early meant I could reason through AI behaviour before committing to a direction.

02.

Designing alongside evolving technology

The more valuable skill wasn't designing the right solution, it was designing systems flexible enough to stay relevant as the technology matured.

03.

Non-linear flows are more natural — and more complex to design for

The design challenge shifts from guiding users through defined steps to creating a flow that feels open but stays purposeful.

Current status

Functional prototype. Designs validated with developers. Next step is securing a development partner and moving into build.