Problem & Context
Most people don't fail at productivity because they lack the right tool. They fail because of activation energy — the psychological gap between having a goal and knowing how to take the first step. “I don't know where to start” isn't an excuse; it's the actual barrier.
The scale is real: 91% of workers experience high or extreme stress, with 82% at risk of burnout — costing $322B annually in lost productivity (APA, 2025). 40% cite "not knowing where to start" as their biggest productivity challenge.
Existing tools solve the wrong problem: Every major productivity tool — Todoist, Notion, Google Tasks — helps you schedule and manage tasks. None of them help you generate them. They assume you already know what to do.
People want transparency, not automation: Users don't want AI to take over. They want to understand why a plan is structured the way it is, stay in control of their process, and trust what they're looking at — not just accept a black-box output.
Who It's For & What Success Looks Like
The “capable but stuck” professional
Someone who has real work already done — photos taken, a project half-finished, a skill developed — but hasn't shipped it yet. Not because they lack ability, but because the gap between “I have this” and “this exists in the world” feels overwhelming or unclear.
They don't need to be told what to do — they roughly know. What they need is someone to help them narrow the scope, make the first step feel small enough to actually do, and hold the bigger picture so they don't have to.
Job To Be Done: When I'm facing a big, undefined goal, I want AI to help me break it into something I can actually start today — so I feel capable rather than overwhelmed.
Success metrics
Breakdown quality accuracy — % of generated tasks rated as useful and actionable
Time to first actionable task — from goal input to a task the user can act on today
Trust & clarity score — does the user understand and believe the plan? Target: 4+/5
Retention intent — would they return after the first session?
The Solution — Goal Breakdown Feature
Research revealed that users weren't struggling to schedule tasks — they couldn't generate them in the first place. That insight drove the pivot from energy-aware scheduling to goal breakdown: a conversational AI that takes a vague goal and turns it into a structured, explainable, editable plan.
Manual entry and chat, side by side
Two ways to start a goal, not one. Users can type it directly when they already know what they want, or hand it to the AI when they don't. Chat opens full-page — the conversation is the work, the AI drives. Manual entry stays compact — the user drives. Both paths land in the same editable goal artifact, so the choice up front never locks anyone in.
Goal Action Plan with breakdown
The canvas opens progressively once the conversation has enough context — milestones, tasks, and subtasks appear with confidence scores and “Why this?” explanations so users understand the plan before they follow it. Side-by-side layout keeps the conversation visible alongside the breakdown.
Breakdown with ability to refine, ask why and re-write
Manual input is always available as an escape hatch. Full edit access at every level: the AI proposes, the user decides. No locked outputs, no black-box results.

Goals Dashboard
An at-a-glance overview of all active goals, showing progress, task counts, deadlines, and status. A persistent AI assistant column sits alongside the goal cards, surfacing a daily focus suggestion, next actions per goal, and at-risk alerts — turning the dashboard from a passive tracker into an active accountability partner.

Process
4.1 Discovery: From ‘better scheduling’ to ‘lowering activation energy’
Research methods: surveys (88 respondents), 6 in-depth interviews, and a 4-week competitive immersion across UseMotion, Reclaim as the main competitors and Todoist, Notion for gathering more context about task management.
I always struggled with deciding what to work on next — I'd open my goals and just freeze. I want something that just tells me what to focus on right now, so I skip that whole mental loop.
Jeny
I do weekly reviews every Sunday and try to keep my goals updated — but the moment a new task pops up mid-week, my schedule fills up in two days and I lose track of where to fit it.
Andres
Main insights that informed the design direction
Decision fatigue around prioritization: Users felt paralyzed when opening their goals, unsure what to work on first. 40% cited "not knowing where to start" as their biggest productivity challenge, not lack of time or tools.
Schedule fit and effort estimation: Users struggled to estimate how long large goals would realistically take. Without visibility into the full timeline, they frequently overcommitted.
Fragmented tools, no single source of truth: Users were spread across multiple apps — notes, calendars, task managers — with no unified place to track goals, tasks, and progress together.
AI as a thinking partner, not a dictator: Fear of over-automation was real. 80% were already using fragmented workarounds to stay in control. People want AI that collaborates, not one that decides for them.
Trust requires transparency: Users need to understand why a plan is structured the way it is. Without reasoning, they don't follow it.
How this changed the product
- → V1 focus shifted entirely to goal breakdown — generating the task list users couldn't create themselves
- → Explainability and user control became non-negotiable UX requirements from day one
4.2 AI System Design: Reliable, Explainable, and Cost-Aware
Orchestrate's AI isn't a single model answering a question. It's a pipeline of four specialised agents — each responsible for one cognitive task: routing, analysing, breaking down, validating. The architecture was designed this way because goal planning is too complex and too high-stakes for a one-shot response.

Why multi-agent vs. single prompt: A single prompt doing everything produced shallow breakdowns and unpredictable failures. Separating concerns (analyze → break down → validate) made each step more reliable and easier to improve independently.
Why this model architecture: GPT-4o for breakdown quality where accuracy matters; lighter rules-based logic for validation to avoid unnecessary API costs — resulting in 10x cost reduction vs. an all-LLM approach.
How guardrails improve the user experience: The Validator agent catches unsafe goals (dangerous health advice), unrealistic timelines, and low-confidence outputs before they reach the user — accuracy improved from 68% to 92% after guardrails were introduced.
Ethics & Transparency by Design
Before writing a single prompt, I ran a bias mapping exercise with the developer and consulted an AI specialist to cross-check assumptions. The goal was to identify potential harms before they became design decisions, not after.

Key biases identified and how they shaped the product
More tasks = better — Removed gamification and streaks that would enable unhealthy productivity patterns.
Universal task granularity — Made breakdown detail adjustable; neurodivergent users need different structure levels than neurotypical ones.
AI as motivator — Positioned Orchestrate as organisational support, not a productivity coach; no guilt-inducing language or prompts.
Black-box suggestions — Added confidence scores and "Why this?" directly in response to this bias; users need to interrogate the AI, not just trust it.
4.3 Interaction & Iteration: Making a Complex AI Feel Understandable
To move quickly through layout exploration, I used UXpilot and v0 to prototype and visualise ideas before committing to Figma — compressing what would have been weeks of pixel work into hours.
The core challenge: AI interactions are non-linear, but interfaces are usually built for linear flows. Early designs assumed users would move step-by-step — they didn't.


Key decisions
1. Arriving at one AI, several postures
The breaking point was a small moment with a big tell: in manual mode, with a chat also reachable, users hit a “wait — what does this chat even know?” pause. The toggle had quietly promised two separate AIs, and neither knew about the other's work. The tempting fix was to add a side panel to manual mode too — but that just multiplied the problem into three AI surfaces that looked like three different products.
The reframe that resolved it: the toggle was never a mode, it was a first-run choice. One AI that shows up in different postures — full-page when the conversation is the work, a docked co-pilot when the form is the work, an inline suggestion when a single field is. Held together by four rules (one character, always a diff, always shows what it's reading, never silent), the three surfaces stop competing and become one ladder of commitment users slide along as their goal takes shape.
Trade-offs. Collapsing the toggle means losing an explicit, legible “switch to chat” affordance — discoverability now leans on the persistent co-pilot, which I'd want to validate with real users rather than assume. The morph from conversation to form is also more complex to build and animate than two static modes, and it asks the system to infer intent from a first sentence, which can guess wrong. And “one assistant everywhere” only pays off if every team holds the four rules consistently — the moment one surface drifts, the illusion of a single AI breaks and you're back to three products.
Approach
One AI, several postures. The toggle isn’t a mode — it’s a first-run choice. The same assistant shows up full-page when the conversation is the work, as a docked co-pilot when the form is, and as an inline suggestion when a single field is.
Why this
Held together by four rules — one character, always a diff, always shows what it’s reading, never silent — the three surfaces stop competing and become one ladder of commitment users slide along as their goal takes shape.
Chat (full-page) — the posture for users who don't yet know the shape of what they want. Nothing exists yet (no form, no goal, no artifact), so the AI takes the whole screen — there's nothing else to look at. It leads: asks questions, draws intent out of the user. The AI drives the conversation; the user follows.
Co-pilot (side panel) — the posture for users who already know what they're building. The form is the protagonist now, sitting center stage; the AI moves into a narrow rail beside it, watching, suggesting, pointing. It stops leading and starts assisting — the user drives, the AI helps. That's why it shows the “reading from your form” strip: once there's something to read, the AI has to prove it can see what the user sees.
They're not two features — they're the same assistant in two postures, and the morph between them is continuous.
Chat is the AI when you have nothing; co-pilot is the AI when you have something.
2. Conversational over instant generation
Instant plan generation felt impersonal — users couldn't see themselves in it and were stuck with accept or restart. A conversational flow where the AI asks questions, generates iteratively, and adapts mid-conversation gave users a sense of ownership and significantly higher plan acceptance.

How it works
A back-and-forth conversation thread — AI asks a clarifying question, user responds, AI refines the plan. No binary buttons; the conversation continues naturally.
Result
85% acceptance rate (+240%)
Feels like working with someone. I liked that it asked me clarifying questions and the plan look legit and I know where to start.
3. Editable time estimates and full control over AI suggestions
Every task has a clickable time badge you can adjust. This acknowledges that AI-generated estimates are a starting point, not gospel — the user owns the plan.
4. Structured options inside the chat
When the AI needs a decision, it offers a small picker — buttons for the obvious choices, free text for everything else. The format keeps the conversation moving without forcing users down a pre-defined path.
- Structured vs. open-ended — buttons are faster, but they shouldn't trap users. That's why “Something else” (free text) and “Or reply directly...” (full chat input) stay available.
- Guidance vs. autonomy — the picker suggests answers but never blocks the user's own phrasing. “Skip” reinforces this.
- Density vs. clarity — 2–4 options per question, max 3 questions. More than that and it becomes a form, not a conversation.
- Mobile-first — large tap targets, numbered for keyboard users, single-column layout.
5. Resource tracking
Testing revealed users naturally wanted to attach resources to goals and tasks — links, documents, references. This came up consistently enough to flag as a near-term design requirement.


6. How to maintain momentum
Big goals create friction. Motivation drops when the next step feels too large or too vague. Orchestrate reduces that friction by structuring the plan so the first move is always easy, the calendar is already cleared, and the bigger picture stays visible without getting in the way.
First tasks are short and achievable
The first tasks in every action plan are intentionally small — 15–30 minute units designed to create early wins. Research shows small, completable tasks lower the activation energy needed to start and increase follow-through. Instead of one large block, the plan breaks work into granular steps a user can pick up without feeling overwhelmed.

AI-suggested scheduling
When calendar access is granted, Orchestrate scans for the nearest available slot and proactively recommends a time to begin. Rather than leaving the user to figure out when to start, the AI surfaces a concrete moment — closing the gap between planning and doing.

Future north star — ideas added to the backlog
Based on the goal and the conversation context, Orchestrate surfaces what might be worth doing next — features, directions, or ideas that aren't part of the immediate plan but shouldn't be forgotten. These go into a backlog automatically, giving the user a sense of the bigger picture without cluttering the active plan.
Code-First from the Start
Most projects live in Figma until handoff. This one didn't.
A few high-fidelity screens set the visual direction. Everything else was built directly in code using Claude with MCP — production-ready, no translation layer. The design system followed the same logic: tokens, rules, and components built in code, not Figma, applying the same thinking as engineering — what's globally updatable, what's a reusable component, what's a one-off.
Version controlled in GitHub. Code quality reviewed and refactored with engineering guidance throughout.
No handoff gap — because there was no handoff.

Outcomes & What's Next
Built the design system directly in code from day one — every token, component, and primitive lived in the codebase, not in Figma. Zero design-to-code translation lag, one source of truth across the team, and new screens shipping at the speed of the design decision behind them.
Confidence scoring showed users where the system was uncertain rather than projecting false confidence. Self-reported trust jumped from 3.1 to 4.3/5 — same AI, more honesty.
100% of testers wanted to refine, not decide. Designing for collaboration over dictation — editable history, iterative refinement — cut restart rate from 28% to 5%.
Every major design decision traces back to something a real person said. The "I'm Stuck" mode, confidence scoring, guardrails against toxic productivity — all came from research, not assumption.
What I'd do next
Test with users over 2–4 weeks of real use (diary study) — prototype testing validated the mechanics, but longitudinal data would reveal whether task quality holds
Integrate a scheduling layer — once a goal is broken down, helping users place tasks into their week is the natural next step
Explore adaptive personalization — using edits and feedback to improve breakdown quality over time, while keeping goal data private
What this project taught me
Rapid prototyping as a thinking tool
Using UXpilot and v0 to prototype early meant I could reason through AI behaviour before committing to a direction.
Designing alongside evolving technology
The more valuable skill wasn't designing the right solution, it was designing systems flexible enough to stay relevant as the technology matured.
Non-linear flows are more natural — and more complex to design for
The design challenge shifts from guiding users through defined steps to creating a flow that feels open but stays purposeful.
Current status
Functional prototype. Designs validated with developers. Next step is securing a development partner and moving into build.

