AI Exploration Is Just A/B Testing

I recently built a portfolio site by generating 40 different prototypes with Claude — fluid simulations, particle systems, magnetic field visualizations, all kinds of stuff. The process that made it work wasn't some novel AI methodology. It was A/B testing logic, applied to creative work.

Here's what I mean.

The same instinct, different scale

In A/B testing, the whole point is that you don't know which version will perform better. So you run both, measure, and let the data decide. You resist the urge to go with your gut before you've seen the results.

Working with AI on creative projects is the same problem. You don't know which direction is best until you've seen enough options. The difference is that AI lets you test at a scale that would've been absurd before. Instead of A vs. B, you can run A through Z — and you should, because the best idea is rarely the first one or the second one.

Where it maps cleanly

Hypothesis-driven exploration. In A/B testing, each variant starts with a hypothesis: "a shorter headline will convert better." In my process, each prototype started with one too: "what if scrolling changed the physics instead of the color?" or "what if the layout was asymmetric?" The hypothesis is what makes each variant worth building, not just different but meaningfully different.

Predefined success metrics. You don't launch an A/B test without knowing what you're measuring. I didn't evaluate 40 prototypes without criteria either. I scored each one on interaction quality, technical depth, concept, performance, portfolio fit, and emotional register. Same instinct — decide what "winning" means before you see the results, so you're not rationalizing after the fact.

Statistical humility. Good A/B testers know that early results lie. A variant that looks great after 100 visitors might look average after 10,000. Similarly, the prototype that impressed me most on first viewing (Magnetic Field, E26) turned out to lack a content layer that made it actually usable as a portfolio. First impressions aren't conclusions.

Killing your darlings with data. The hardest part of A/B testing is shutting down a variant you personally like because the numbers say otherwise. Same thing here. Some of my favorite explorations scored poorly on portfolio fit or performance. The scoring framework gave me permission to let them go without second-guessing.

Where it goes further

The A/B testing parallel breaks down in one important way: A/B tests pick a winner. My process picked a cast.

The top 6 explorations didn't compete for one slot. They each filled a different role — the ship-ready option, the memorable alternative, the safe fallback, the conceptual peak. That's less like A/B testing and more like casting a movie. You're not looking for six versions of the same character. You're looking for a team where each member does something the others can't.

This is maybe the most useful takeaway for anyone working with AI on open-ended problems. Don't just rank your outputs from best to worst. Ask what unique job each one does. You'll often find that the "winner" is obvious once you frame it that way.

The point

If your team already thinks in experiments, hypotheses, and metrics, you're better prepared for creative AI work than you think. The skills transfer directly. The only shift is recognizing that AI lets you run your experiments at a scale that changes what's possible — not just faster versions of the same two options, but a genuinely broad exploration of the space.

And just like with A/B testing, the discipline isn't in generating the variants. It's in how you evaluate them.