Consumer AI · 0-to-1 · PRD · LLM Evaluation

Roadworthy

An AI packing copilot that thinks like a seasoned traveler — so you don't have to.

Solo PM — Strategy, Research, PRD, GTM

Problem

Travelers over-pack or forget essentials due to generic, static packing lists

Solution

Personalized AI packing recommendations based on trip context, weather, and user history

Stage

Concept to full PRD

Methods

User Personas, Hypothesis Statement, Eval Framework, RAG Architecture, Phased GTM

“Packing lists haven't changed in 30 years. The traveler has.”

Most packing apps give you the same static checklist regardless of whether you're heading to a business summit in Chicago or a surf trip in Costa Rica. The context is always missing.

The real problem isn't that people don't know what to pack — it's that the cognitive load of trip-specific planning falls entirely on the traveler. Every trip starts from zero.

Roadworthy was designed to eliminate that friction by building a packing copilot that learns trip context, reasons about it, and delivers recommendations that feel like advice from someone who's been there.

The users

Marcus Reed

The Frequent Business Traveler

34, management consultant, flies 3x per month

Packs on autopilot but over-relies on memory, forgets context-specific items like adapters, timezone meds, and client-dinner attire.

Needs: Speed, reliability, and zero surprises on travel day

I don't need a list. I need a copilot.

Jordan Hayes

The Intentional Leisure Traveler

29, takes 4–6 trips per year, different types each time

Spends too long researching what to pack for new destinations or activity types.

Needs: Contextual guidance without having to start from scratch every time

Every trip feels like the first time I've ever packed.

My approach

From assumption to AI product spec.

01

Define the Problem

Wrote a hypothesis statement anchored to a specific, measurable user outcome. Framed the core assumption: if Roadworthy delivers contextually accurate recommendations on the first attempt, users will skip manual list-building entirely.

02

Research the Solution Space

Audited existing packing apps including PackPoint, TripList, and Google Trips. Identified the gap: none use trip context dynamically. All are static or semi-static list tools.

03

Design the AI Layer

Defined the RAG architecture approach: retrieve relevant packing data and augment LLM prompts with destination, weather, activity type, and past-trip context before generation. Wrote prompt constraints to reduce hallucination and enforce category-based output structure.

04

Build the PRD

Produced a full PRD covering problem framing, personas, functional requirements, model evaluation criteria, eval framework, edge case handling, and a phased GTM rollout plan.

The PRD highlights

Model Evaluation Framework

Defined success criteria for AI output quality: accuracy, relevance, completeness, and tone. Built a scoring rubric for human eval and automated regression testing.

Prompt Constraints

Established hard constraints to prevent the model from generating unsafe, irrelevant, or over-generic recommendations. Defined output schema: categories, quantities, priority flags.

Phased GTM Rollout

Phase 1: iOS MVP with manual trip input and AI list generation. Phase 2: Weather and calendar API integrations. Phase 3: Learning loop — user edits feed back into personalization layer.

Hypothesis statement

“We believe that frequent travelers who struggle with context-specific packing decisions will successfully generate a complete, trip-accurate packing list on their first Roadworthy session — without editing more than 20% of AI-generated items — because the RAG-augmented prompt delivers contextual specificity that static list tools cannot match.

We will know this is true when 70% of beta users complete their first packing list without requesting a regeneration.”

What I learned

AI PM work is system design work.

The most important product decision on Roadworthy wasn't the feature set — it was deciding what the AI should never do. Prompt constraints and eval frameworks aren't just engineering concerns. They're core PM responsibilities on any AI product.

Defining the hypothesis before building the PRD kept every feature decision anchored to a real user outcome. It also made trade-off conversations easier — if a feature didn't move the hypothesis metric, it didn't make Phase 1.

This project sharpened my conviction that great AI product management is less about prompting the model and more about designing the system around it.