PRDs to Prototypes: Unlock LLMs for Product Management

By Shea Lutton and Eric Harper

For product managers, a prototype is an invaluable tool to validate that you’re solving customer needs and to communicate exactly how a feature should look and feel. But prototypes require a critical tradeoff, your development team has to sacrifice their current feature work in order to build the prototype, slowing down feature delivery. LLMs are eliminating this tradeoff by enabling PMs to build prototypes independently. 

ChatGPT allows product managers to go far beyond simply distilling customer needs into product requirement documents. In 2025, PMs can use the power of LLMs to test customer needs themselves by building working prototypes and directly testing feature value with customers.

In our experience, using these PM techniques on revenue generating features (but at a small scale) has made us 4x more effective than traditional PM workflows. It reduces our time to validate concepts with customers, boosts our personal productivity, avoids using developer time for prototyping, increases clarity (and avoids meetings!), and ultimately lets us focus development effort on features with proven value for our customers.

Forget product requirement documents, the future of product management is independently building feature prototypes for direct customer feedback. 

As early adopters, engineers have written about the many ways that LLMs make them more efficient. Less has been written about LLM impact on other roles such as product management. What is the state of the art in product management in the LLM era? Simply using ChatGPT to write emails, PRDs, or roadmap documents is a marginal gain, maybe 5% to 10% more effective week over week.

LLMs for PMs

LLMs are starting to show how useful they can be for non-engineering professions, and that will increase in the future as more people start to use these tools in different settings.

PMs can use LLMs to replicate the productivity of an entire team of PMs, UX/UI designers, engineering managers and software engineers by developing business and feature context for ChatGPT. Since the goal of a PM is to condense the most valuable customer needs into clarity for what needs to be built, building a model yourself helps you iterate faster to validate customer needs. It turns a long prototype cycle into much shorter feedback loops, moving from this workflow:

2024 Workflow:

To a shorter, independent feedback loop. Fundamentally, when PMs can work unaccompanied and get further into the development process, they can significantly accelerate the pace of feature discovery. PMs can talk to customers, build a hypothesis for what features drive the most value, independently build a prototype, and gain direct feedback on the value. That short cycle will be the way great PMs work by the end of 2025 (if they are not already!).

2025 Workflow:

The ability for PMs to validate prototypes in this way is a major step towards the teachings of the “Lean Startup” by Eric Ries or the “Startup Owner’s Manual” by Steve Blank. PMs can validate ideas and iterate without having to dedicate the critical cycles of a development team to test a new feature. 

In modern software development, the development team is the key constraint that limits your rate of progress. For readers of “The Goal” or “The Phoenix Project”, this is the key insight from both books. Having PMs identify if they are solving the correct issue means that development teams are focused on solving qualified customer problems.

Planning

We’ll show you the tools, methods, and structures we use in our modern PM workflows. But first, let’s take a second to reflect on what parts of the PM process are still valuable. 

Good planning is key for LLMs just as it is for humans. A one-shot prompt is as likely to deliver a valuable customer feature as 1000 monkeys are likely to write the next Principia

Our first take-away is that the PM process of aggregating many customer needs into a priority list, then creating feature briefs for the most compelling needs, and validating those features in a detailed product requirement document (PRD) is still a highly useful exercise. It’s the most effective way to confirm that you’re focused on the most important problems for your customers. Without this examination, organizations have a tendency to focus on easy problems, regardless of how valuable they are for your customers. Our workflow uses a standard template to help us deliver organized ideas in a one page format to test the concept. If that meets our requirements, we progress into a full plan using AI tools. 

All of your business, customer, and technical context is separate from feature requirements, but it’s important to the final product. We create a structured library of code prompts that allow us to load the context we need from various personas and perspectives to generate code for our business needs. This includes business context about your industry, strategy, product value, customer roles, and details of your tech stack (languages, frameworks, tenets). 

The same way a clean code repo breaks source code into logical structures, we break out prompt context into user roles (both internal and external stakeholders), brand style guidelines, go to market standards, and the principles and standards for how our companies build software. PM leaders should be intentionally developing these libraries across their teams to apply your principles and standards consistently across prototypes.

We split our prompt context into small chunks to balance the cost and speed of token processing against the quality of the results. We find we get better results by starting our prompts for code with business context and feature requirements from our planning work. We take the resulting code, add additional role context and iterate several times, such as loading the security architect role and revising the code for security quality. Even for limited prototypes, security is tremendously important and there is no substitute for engaging our brains for deep security reviews of the code produced. Working to balance token/context size may be a temporary limitation as ChatGPT rapidly advances, but in early 2025, adding context and doing multiple revision passes has given us the best results. 

Prototype Development

From this point, PMs can start using AI tools to directly code working mockups. Why have PMs do this and not pass the task off to a developer? Speed, accuracy, and money. The PM already has the context of the customer’s need in their head. LLMs use the context library to develop code for a working prototype, and the PMs can take those mockups directly to customers for feedback. Only when customers have reacted favorably to a feature mockup and given positive sales signals will you pass a prototype to the development team. 

In our workflows, this means PMs create new dev branches in git, paste Linear ticket information into Cursor along with relevant prompts from the prompt library such as the style guide (colors, fonts, and presentation), the design principles, the needs of various customer roles, and the internal development standards such as frameworks, languages, and architecture standards. The output of this work will be qualified customer feedback. 

If it’s valuable to customers, then it’s worth having your development team turn it into a real feature (with real security, real authentication, real redundancy, real SRE, and real business continuity). The ability for PMs to independently generate prototypes does not lessen the need to build secure and reliable products. You still need your engineering team. 

What to Watch For

What can go wrong building a prototype this way? When the output is wrong, it usually fits into one of these categories:

  1. Tech incorrect – The code or solution does not work (needs iteration)
  2. Tech correct, but missed the broader purpose – When your code works but it’s only solving part of a broader issue (revise planning)
  3. Business incorrect – The result works but is not helpful to customers as expected (revise business context and replan)

As you catch these errors, add material to your prompt library to correct the misunderstanding and iterate. It’s also helpful to add negative prompts, such as a “Never Do” section that corrects initial coding mistakes. Also explicitly ask ChatGPT what questions it has and what assumptions were made. 

Changing Needs

In 2024, a great PM knew their user base, knew their product inside and out, and wrote clear documents for how the next feature should be delivered for customers. In 2025, using AI enabled tools, a great PM can go much further, replicating the productivity of a team of six to eight people by building prototypes themselves to confirm if features are valuable to customers.

Your team should start collaborating on a team-wide set of role prompts and business value prompts to start producing full working prototypes. PMs need to advance their skills to take advantage of the opportunity in front of them. 

In our next post we invite you to look at the suite of tools that Eric Harper and Shea Lutton use to accelerate product development such as DrawCast and RepoPrompt to boost your team’s productivity. If you would like us to come and speak with your team about how the AI era can boost your company’s productivity, please contact us. 

Measuring Billionths of Seconds

Recently I was asked to help investigate the performance of a fancy bit of hardware. The device in question was an xCelor XPM3, an ultra low-latency Layer 1 switch. Layer 1 switches are often used by trading firms to replicate data from one source out to many other sources. The exciting thing about these kinds of switches is that they can take network packets in one port and redirect them back out another port in 3 billionths of a second. That is fast. This may be no surprise, but that is so fast that it is pretty hard to measure.

To measure something in nanoseconds, billionths of a second, you need some equally exotic gear. I happened to have an FPGA based packet capture card with a clock disciplined by a high-end GPS receiver, some optical taps, and a pile of Twinax cables. Oh boy, let the fun begin. Even with toys like these, the minimum resolution of my packet capture system was 8 nanoseconds, nearly 3 times slower than the XPM3 can move packets. To get around this problem, I replicated each packet through every port on the XPM3, bouncing it all the way down the switch and back. Physically this meant that every port was diagonally connected with Twinax cables like this:

And inside the XPM3 it was moving data between ports like this:

The problem now is that I have two variables. Sending a packet down the switch this way means that it moves through 32 replication ports (r) and 30 Twinax cables (t). After running 10 million packets through this test setup, I knew that 32r + 30t + 35 = 212.93259 nanoseconds on average. The ‘35’ is the number of nanoseconds it took for the packet capture system to timestamp the arriving packets. But how could I determine the time for replication and the time in the Twinax cables? The answer was to get a second equation so that I could substitute variables.

I ran a second trial using just ports 1-8 instead of the full 32 ports. This gave me 8r + 6t + 35 = 75.958503 nanoseconds. Now with two variables and two equations I could simply substitute them to calculate that a replication port took 3.35 nanoseconds per hop and Twinax cables took 2.34 nanoseconds for each .5-meter length.

Divvy Bike Shares in Chicago

The Chicago based bike sharing company, Divvy, hosted a contest this past winter. They released anonymous ride data on over 750,000 rides taken in 2013. The contest had several categories to see who could draw the most meaning from these data and who could design the most beautiful representation of the rides. I entered the contest as a way to learn about D3.js, a new data visualization tool that is amazingly powerful. And complicated.

I thought it would be fun to see where most people were coming from and going to. When I start a play-project like this, I reach for my two favorite data analysis machetes, Postgres and Python. Cleaning and loading the data into Postgres was pretty straight forward, which lead to the fun part, trying to derive a meaningful framework with which to examine the ride data.

Pretty quickly it became apparent that breaking down the day into small time slices and aggregating the top departure points would yield interesting insights. It became even more interesting when you categorize the top departure points by their corresponding top destinations. 2pm

At different times of the day the pattern of rides looks wildly different. Early in the morning a massive influx of riders use Divvy bikes near the citys two main train stations. In the middle of the day, bike usage centers around the primary tourist attractions with everyone coming and going to the same places. And in the small hours of the morning the bikes serve as cab replacements in the neighborhoods with lots of bars. 3am

With the ride data extracted, I used D3 to make it beautiful. D3 allows shapes to move and change color in seemingly magical ways inside a web browser. Each departure point can be linked to its top destinations and they will arrange themselves. Crain’s Chicago Business newspaper saw my entry and is running a special print edition of the graphic in an upcoming paper. You can see the online edition here.

Predicting the future is for suckers

I spent the weekend thinking through a trading strategy dubbed by a Wall St. Journal reporter as the “Common Sense” (CS) trading strategy (see Part I). It turns out that common sense was a disaster when tested against historical data. The original formula was to buy or sell 5% of your cash or share value whenever the market moved 5%. I will refer to these levels as the market threshold and the aggression level. Using that formula, the CS strategy faithfully sold shares at market peaks and bought in the market valleys, but it sold off too many shares over time and paid out too much money in taxes. It was substantially worse than a “Buy and Hold” (B&H) strategy that bought on the low points and never sold.

Knowing that the shortcomings of the original CS trading strategy are incurring taxes and holding too much money in cash, can it be improved? Two things come to mind: search out better set of values for when and how much to buy/sell and alter the algorithm to buy more aggressively than it sells. Raising the market threshold will cause the strategy to trade less and lowering the % of assets to buy and sell will mean lower taxable income. To prove that the results are not a fluke of timing, four different historical time periods were used. Computing power is cheap so I plowed through hundreds of combinations of inputs to find the most profitable over four different time periods.

Start Date End Date Length Significance

1/1/06

12/31/12

7 years Short term

8/11/87

12/31/12

25 years Before the 1987 crash

10/19/87

12/31/12

25 years After the 1987 crash

1/1/50

12/31/12

63 years Long term

Buy and Hold is hard to beat. Starting with the longest term, 1/1950 through 12/2012, the B&H strategy earned $855,902. The best CS configuration returned 45% less money or $472,129. Ouch. The best performing version of the CS strategy was to buy and sell when the market moved wildly, by 20% and to sell very small percentages of your holdings each time, around 1%. This meant that the minimal amount of value was lost to taxes. The best B&H on the other hand spent 100% of its free cash the first time the market moved 2% and never traded again.

Graph.1950.Total

For fun I chose to medium length periods of time on either side of the great 1987 crash. The Before graph started in August of 1987 and the After graph was from October 1987. The results are very similar, except as you would expect, buying at a low point right after a market crash earned more money overall. The best performing inputs were nearly identical. B&H performed best when it bought aggressively after the first market move and never traded again. CS performed best when it bought or sold 1% when the market moved 20%.

Before the 1987 crash (August)Graph.1987B.Total

After the 1987 crash (October):Graph.1987A.Total

Now for the short term where things get interesting. The tables turn thanks the most recent market crash. The CS strategy came out on top, but it reinforces why this is a poor trading strategy for most people. I will just say that the CS strategy gets lucky here:

Graph.2006.Total

Unlike the longer time periods, the best performing versions of the CS strategy in the short run were very aggressive. When the market moved 20%, it bought and sold 100% of its positions. This is what you would do if you had a crystal ball. Sell everything when the market is high, and buy back when the market is low. The pink shaded sections are the times where the CS strategy owned zero shares of stock. So why is this bad? It is a strategy that counts on huge volatility to be successful and historically the markets just don’t fluctuate that much. If the market entered a period of calm sustainable growth, you would be caught with your money on the sidelines earning no return. It is the reason why historically the CS strategy works best when it makes very small moves. So unless you can predict the future and you *know* that there is a major market crash coming, you can’t win by timing the market with this strategy.

At the start of this post I mentioned that there might be a second way to improve the CS strategy, by buying and selling at different rates. That might get around the tax issue while preventing too much money from sitting around in cash. But this post is getting long, so I will come back to that another day.

Here is the source code and data if you want to play yourself:  CS.Trading.Strat.Source

Debunking the “Common Sense” trading strategy

Most small investors are not that good at predicting the financial future. I am certainly awful at it and I worked at a trading company for five years. I had a front row seat watching how the big guys run a trading firm today and I can tell you that it takes a lot of specialist knowledge, technology, and money. My job was to run the technology. It is a world that is far removed from the tools and timescale of everyday people.

For investors who don’t have a supercomputers and 10Gb links to the New York Stock Exchange, how do you know when to make trading decisions? A writer for the Wall St. Journal wrote an article a year or two back claiming that a “common sense” trading strategy was the right move for the average Joe. He claimed that investors should buy whenever the market fell by 5% and sell whenever it rose by 5%. The intuition was clear, this model would force people to buy low and sell high.

But does it work? I built a model to test this trading strategy. The original article was a little fuzzy on some of the key details such as how much of your wealth should you move when the market crosses the 5% threshold, so I assumed that to be 5% as well.

Here are the key assumptions for the “Common Sense” (CS) trading strategy:

  • Buy or sell when the market falls/rises by 5%
  • Every sale results in 15% of your gains going to long-term capital gains taxes
  • Every sale is 5% of the value of held shares
  • Every purchase uses 5% of free cash
  • With an initial $10,000, $7K is invested on day 1 and $3k is held in cash
  • Investment was started on 1/3/2006

To compare this against a “Buy and Hold” (BH) trading strategy

  • Buy when the market falls by 5% from the last high
  • Every purchase uses 5% of free cash
  • Never sell
  • With an initial $10,000, $7K is invested on day 1 and $3k is held in cash
  • Investment was started on 1/3/2006

Here are two graphs of the S&P 500 index with the transactions of the two strategies overlaid. The CS strategy has both purchases and sales, the B&H has only purchases. Both strategies make purchases when the market has fallen by 5%. A glance at the graphs confirms that both strategies are doing pretty well at buying during market low points.

First the CS graph. Notice that it Buys (blue) when the market falls and Sells (red) on the way up. It does a great job of hitting all the peaks on the graph:

Graph.2006.CS.1 The B&H has the same pattern of purchases, but never sells any shares:Graph.2006.BH.1

So how did they do? The “common sense” trading strategy is a disaster. Buy and Hold finished with $11,586.81 and “common sense” ended up losing money, ending with only $9,711.43 of the original $10,000. That is $1,875 or 16% worse than the B&H strategy over the seven years.

Final Earnings

What happened? Three things are going wrong. First, long-term capital gain taxes suck out a ton of your profits. At 15% of your gains, every sale nibbles away at your purchasing power. The CS strategy paid $1,021 in taxes over the seven years, which is a majority of the performance difference between the two strategies. In the seven years, that $1,021 would have grown by 6.5% to $1,088. Over more time both the money you don’t pay in taxes plus the 6.5% in growth really get big.

Second, the CS algorithm is too risk-averse. It moves too much money out of investments and into cash. Since the market has generally risen for the last 70+ years, there are more selling events than buying events. So the number of shares owned decreases over time as the cash begins to equal the value of the shares you own. You can see in this graph how the B&H strategy ends up with 2.6x more shares. When you money sits in cash, it does not grow.

Graph.2006.Shares.Owned

Lastly, as you purchase new shares over time, you increase the average purchase price of the shares you hold. Since the price of stocks has continued to rise over time, in the future when the market dips by 5%, shares will still cost more then they did on day one. Buying the small dips does not help because they are very rarely deep enough to lower the average cost of your purchases. What seems like a perfectly good idea turns out to be horrible in real life. But are there ways to save the CS strategy? I will give this some more thought and follow up with another post.

Here is the python source code and data if you want to play. CS.Trading.Strat.Source

Python pipeline with gap detection and replay using zeromq

I needed a way to process a ton of information recently. I had a bunch of systems that I could use, each with wildly different levels of resources. I needed to find a way to distribute work to all of these CPU’s and gather results back without missing any data. The answer that I came up with was to create a processing pipeline using zeromq as a message broker. I abstracted the process into three parts: task distribution, processing, and collection.

Since Python has a global interpreter lock, I needed to distribute my process using something other than threads. I turned to ZeroMQ since they have a robust message system. One of the ZeroMQ message patterns, the pipeline, was a great jumping off point for my needs. But the pipeline didn’t have any sort of tracking or flow control. I added a feedback loop to keep track of the total items sent and gap detection. The idea is that all work to be processed will have a sequential key. There are three pieces of the app, Start, Middle, and Finish. Start and Finish will use the sequential keys to understand where they are in the process and to replay any missing items.

There can be gotchas. If you don’t set the timing correctly on the apps, you might start replaying long processing jobs too quickly and create a process storm. But for the type of work I was doing it was fairly easy to set a reasonable set of times.

Pipeline Files

 

Visual analysis of building activity in Chicago 2006-2012

In 2008 the property bubble burst in Chicago. It is hard to gauge a recession without some hard numbers. In this case a visual representation gives a powerful view into the scale of the decline in building activity, measured by the total value of building permits by large builders. The visual gives a reference for the scale of the recession. A big thank you to the Chicago’s Open Data Portal for providing the data to work with.

The Data Portal has all the Chicago building permits available online which are a great metric for building activity. I narrowed down the permits to construction activity (elevator repair and fire alarm systems didn’t count) and used Python and Gephi to graph out the connections. Take a look at the result:

Yearly building activity of the largest builders in Chicago, 2006-2012

It was important to filter out smaller builders to have a clear image. The threshold for a builder to make the graph was at least 100 building permits or total permit value of over two million dollars. Each year is scaled to the total value of the building permits for that year, ranging from $8.3 billion in 2006 down to $752 million in 2009 and back up to $4 billion in 2011. Look what happened to John C. Hanna’s activity. In 2006 Hanna’s firm was the most active by properties. In 2007 and 2008 the activity was significantly reduced and failed to make the graph in 2009. By 2010 Hanna was back on the graph and by 2011 was growing again.

If you would like the higher resolution version or a PDF of the image, contact me.