Bob, our head of product design at Pinterest, asked me to write up a post on the perils and benefits of AB testing after reading my post on building cross-functional teams. This is me obliging.
One thing it is never difficult to do is to convince an engineer to do an experiment. In general, this is a good thing. Famous engineer W. Edwards Deming said, “In God we trust, all others bring data.” AB experiments generate data, and data settles arguments. AB experiments have helped us move from product decisions made by HIPPO (highest paid person’s opinion) to those made by effectiveness. We build better products as a result that delight more people.
An AB experiment culture can also have a dark side though. Once people figure out that AB experiments can settle disputes where multiple viewpoints are valid, that fact can lead people to not make any decisions at all and test everything. I liken this type of approach to being in a dark room and feeling around for a door. If you blindly experiment, you might find the way out. But turning the light on would make it way easier. Rabid AB testing can be an excuse for not searching for those light bulbs. It can be thought of as easier to experiment than to actually talk to your users or develop a strategy or vision for a product. Here are some perils and benefits of AB testing to think about as you experiment in your organization.
Benefits
1) Quantifying Impact
This one is pretty obvious. AB experiments tell you exactly what the lift or decrease a treatment causes versus a control. So, you can always answer the question of what an investment in Project X did for the company.
2) Limiting Effort on Bad Ideas
Another great benefit of AB testing is that you can receive feedback on an idea before committing a ton of effort into it. Not sure if an investment in a new project is worth it from a design and engineering perspective? Whip up a quick prototype and put it in front of people. If they don’t like it, then you didn’t waste a lot of time on it.
3) Limiting Negative Impact of Projects
Most additions to a product hurt performance. AB testing allows you to test on only a segment of an audience and receive feedback without it affecting all users. Obviously, the larger the company, the smaller the percentage you can trigger an experiment on to get a solid read.
4) Learning What Makes Something Work
In AB experiments, you isolate one variable at a time, so you know exactly what causes a change in metrics. You don’t have to debate about whether it was a headline or background color or the logo size anymore.
Perils
1) Not Building a Strategy or Vision
Many places convince themselves that testing is a strategy in and of itself. While AB testing can inform a strategy or vision, it is not one in and of itself. What happens in these cases is that companies do tons of random experiments in totally different directions, and their failure rate is very high because they have no unifying vision on what they are trying to do.
2) Wasting Time
AB testing can slow companies down when they convince themselves that every single thing needs to be tested thoroughly. Everyone knows of the anecdote from Google where they tested 41 shades of blue.
3) Optimizing for the Wrong Metric
AB experiments are designed to measure the impact of a change on a metric. If you don’t pick the right metric or do not evaluate all of the important ones, you can be making tradeoffs in your product you do not realize. For example, revenue over user engagement.
4) Hitting A Local Maxima
AB experiments do a very good job at helping optimize a design. They do not do as well as helping to identify bold new designs that may one day beat current designs. This leads many companies to stay stuck in a design rut where their current experience is so well optimized, they can’t change anything. You need to be able to challenge assumptions and invest in a new designs that may need quite a bit of time to beat control. This is why most travels sites look like they were last re-designed in 2003.
So, I’d prefer to optimize Deming’s quote to “When the goal is quantified and the ROI is worth it, run an AB experiment. All others bring vision.” It doesn’t have quite the same ring to it.
Currently listening to Forsyth Gardens by Prefuse 73.