How A/B Testing at LinkedIn, Wealthfront and eBay Made Me a Better Manager
Management

How A/B Testing at LinkedIn, Wealthfront and eBay Made Me a Better Manager

Instacart VP of Product Elliot Shmukler shares how he uses A/B testing as a management framework to not only accelerate product decisions, but also empower product teams.

At the time of this writing, Ellio Shmukler was an executive at Instacart. He's currently the co-founder and CEO of Anomalo.

Over his career, Instacart VP of Product Elliot Shmukler has seen the interplay of people and products unfold across many industries. For instance, the former Wealthfront and Linkedin product leader knows that much of what takes people sideways with money and career involves emotions. He’s also seen the same dynamic at play as product managers make decisions to pursue ideas, essentially their currency at a startup. Bake into the equation different product management styles with shrinking launch windows and the pressure compounds. That’s why Shmukler champions A/B testing not only as a sound product development practice, but also as an effective management tool.

While at eBay, Shmukler pioneered the use of A/B testing to improve eBay’s Search experience and helped launch eBay Express, its largest product bet at the time. At LinkedIn, he led a 15-person product team that was responsible for nearly half of LinkedIn’s page views. While he was the VP of Product at Wealthfront, Shmukler not only helped the startup grow from $150 million to over $3 billion in client assets, but also launched Direct Indexing, Single-Stock Diversification Service and its first mobile app. Over his career, he’s learned to extract the full benefits of A/B testing for his products and team.

In this interview, Shmukler shows why he uses A/B testing as a management framework, illustrating how it works to not only accelerate decisions, but also empower the teams making them. He outlines the framework’s benefits and challenges, as well as how to implement and scale it a startup. Any product or growth leader will learn from his data-driven approach to product and team management.

A Tale of Two Decisions

At the heart of any management framework is the goal of making better decisions. As people naturally gravitate to the different permutations of better — quicker, higher quality, cheaper, etc. — they neglect to take a nuanced look at the distinct types of decisions that a company makes. In an annual letter to shareholders, Amazon CEO Jeff Bezos speaks of two varieties of decisions: the irreversible (Type 1) and the changeable (Type 2). Here’s how he describes them:

  • Type 1 decisions: “Some decisions are consequential and irreversible or nearly irreversible – one-way doors – and these decisions must be made methodically, carefully, slowly, with great deliberation and consultation. If you walk through and don’t like what you see on the other side, you can’t get back to where you were before.”
  • Type 2 decisions: “But most decisions aren’t like that – they are changeable, reversible – they’re two-way doors. If you’ve made a suboptimal Type 2 decision, you don’t have to live with the consequences for that long. You can reopen the door and go back through. Type 2 decisions can and should be made quickly by high judgment individuals or small groups.”

As a company grows, there’s a tendency to broadly apply the Type 1 decision-making process to every choice, including Type 2 decisions. To paraphrase Bezos, what results is a slowness and an unthoughtful risk aversion that leads to a failure to experiment and diminished invention. If Type 2 decision-making is applied indiscriminately, he argues that most companies will go extinct before they get large, having used a lightweight process for irreversible choices.

Shmukler believes universal A/B testing is an ideal way to focus an organization on using Type 2 decision making for most choices. He says, “In the traditional sense, A/B testing is about having at least two versions of the product live: an A version, typically the original implementation or control, and a B version, which you think might be better,” says Shmukler. “Thus when A/B testing is applied to Type 2 management decisions, it’s very easy to walk back through the door as Bezos suggests by simply turning off B and returning to A. Universal A/B testing may even highlight when a Type 1 decision is needed. If you’re having significant trouble coming up with a way to test a direction or sense you won’t be able to roll back the test without consequence, you might be dealing with a Type 1 decision.”

While Bezos warns of “one-size-fits all” decision-making from the helm of Amazon, Shmukler has seen similar pitfalls at early-stage startups, especially in the product management function. The notion of PM as CEO of her product is well-known, but when it comes to decision-making, there’s a real truth to it. The path to shipping a product takes an acute ability to expertly triage different types of decisions along the way. To complicate matters, product managers naturally bring their own management style to how they make the decisions.

Elliot Shmukler

A Case For A/B Testing as a Management Framework

Early in his career, Shmukler co-founded Sombasa Media, which he eventually sold to About.com. It was at his startup that he first noted two different product management styles of making decisions: the way of the visionary and the data-driven PM. “The visionary product manager reads the tea leaves and decides more on gut feel, while the data-driven PM uses experimentation and analysis to make calls,” he says, “There are many effective decision-making frameworks out there, but I wanted to use one that would simultaneously surface the best choice for the product while still encouraging the inherently different approaches to ideation among my product managers.”

For visionary PMs, a lot of power is generated when they know something works and they want to bring it to the rest of the world. Shmukler admits that many of the visionary PMs that he’s known have been or become founders or CEOs, because of their deep belief in taking something personally relevant to scale. “Reid Hoffman built LinkedIn to reflect how he approached professional networking and his vision for how it should work online. Obviously, the product has evolved over the years but the site is still largely true to his original vision, which was right on the money.”

Data-driven PMs develop their insights methodically and are less headstrong before making decisions. “I’d characterize these product managers with the phrase: ‘strong opinions, held weakly.’ Data-driven PMs come to their stance not so much from how they're living their life, but by looking at and collecting new data. As more information comes in, they’re much more likely to refine those insights,” says Shmukler. “It’s more of a challenge to give recognizable names for these types of PMs because the data is at the forefront. But you’ll find them working on growth teams and gravitating toward roles where data already has a very strong role.”

Given these vastly different styles, it can be a challenge as a leader to reconcile these different approaches to product management. For heads of product, this scenario may sound familiar: a visionary PM says, “We need to do X.” Then, a data-driven PM responds. “No, X is wrong. It won’t work. We need to do Y.” Shmukler faced this countless times — and on a daily basis as his team grew. “It’s draining and very hard to resolve without seeming to favor a side,” says Shmukler. “Instead of giving a verdict, test both theories and let data be the judge. At first pass, this method may seem to favor the data-driven people, but it empowers each PM to push ideas forward. They learn independently rather than feeling that a decision was made for them.”

Your team's undoubtedly better with diversity of thought, but trigger the relief valve when tension mounts.

To illustrate its simple effectiveness, here’s how Shmukler has applied A/B Testing as a management practice. “A group of visionary PMs wanted to change how we talked about the company, specifically in the language on our homepage. The more metrics-driven PMs said they had run those experiments before and that it hadn’t had an impact on sign-ups,” he says, “To resolve that situation, someone has to make the decision to deploy time and headcount to do the test or not. Either way, someone will think you’re making a bad decision. Visionary PMs will say you’re too data-driven and not being grounded in the bigger goal and data-driven PMs will say it’s a waste of time. Universal A/B testing solves this, as no such decision has to be made. Instead, there’s an understanding that any idea from a trusted team will be tested in a lightweight way. The results — rather than a single arbiter — decide if ideas should be pursued further.”

In this case, Shmukler asked the team to work together to put the theory to the test to gauge impact. “The visionary PMs wrote the new content for the homepage and the data-driven PMs took point on structuring the experiment. It took a day or two to implement and we collected data over a month,” he says, “All the tests’ results are public on a dashboard that the entire company can access. Both camps are happy: the data-driven PMs got to use data to experiment and determine results, and the visionary PMs got their idea out there. In this specific case, the new homepage language did not have a significant impact. There were no hard feelings. The visionary PMs recalibrated and redirected their energy to new, different ideas.”

Universal A/B testing also allows Shmukler to say “yes” to multiple product ideas — with the only restriction being that they need to tested. “It’s a great way to implement a Type 2 decision making process and prevent the team from getting bogged down with a Type 1 approach unnecessarily,” says Shmukler. “It not only helps the team steer clear of the ‘one-size-fits-all’ trap that Bezos cites, but it increases experimentation, autonomy and learning throughout the organization. Most critically, it fosters goodwill among smart — but very different — PMs who want to try out their ideas.”

Regardless of which idea wins, there isn't a lot of conflict among PMs because the results come from an authoritative test, not an authority. The key is that it’s done quickly and transparently to everyone. Here’s a summary of this method’s benefits and challenges:

Benefits

  • Autonomy. “One thing I found pretty valuable in running things this way is everyone basically gets their ideas to come to life, regardless of rank or tenure. As a manager, I am less a dictator and more an arbiter, stepping in only to help refine future ideas.”
  • Lower risk. “Once there’s a tool to quickly run tests, the company bears very little cost to experimentation but the act of doing experiments reduces overall risk. If an idea is a bad — or potentially damaging — one, we’ll know quickly and not move ahead with the idea. If it’s a good idea, there may be benefits that would not have been discovered without testing.”
  • Employee engagement. “This approach keeps product managers on my team a lot happier because they know there’s a channel to get their ideas out there and to prove that their ideas work. They’ve told me they learn and develop at a faster clip because they can try a bunch of ideas and learn from the results.”
  • Changes actions, not styles. “This system allows data-driven PMs to evaluate ideas rationally and lets visionary PMs more quickly hone their instinct by literally ‘experiencing’ more of their ideas faster. It embraces — not alters — their styles.”

Challenges

  • User sample size. “At the very beginning, startups need to operate more in visionary mode. To run this system effectively, you’ll need a bit of traction and some user base. There's no airtight rule of thumb, but if you're trying to test around a particular action that your users take — signing up, clicking a button or sharing a post — you probably want hundreds of people taking that action a week at a minimum. Ideally, you’re testing in the thousands for faster results and to run multiple tests concurrently. Of course, companies that throw millions of users into an experiment have significant advantage in leveraging these approaches.”
  • Legacy decision-making methods. “In one of my jobs, my company didn’t know about A/B testing. Instead they required each person to prepare a detailed financial and impact analysis on how each idea would move the needle. To me, that’s a surefire way to kill ideas and manufacture models with gobs of assumptions. But some companies still take this approach.”
  • A culture of metrics, but not around experimentation. “I was at one company that was very good at monitoring its business metrics every day, but didn’t do A/B testing or small experimentation. It put its faith in those high-level metrics, but when things went south, it couldn’t isolate the problem quickly. One time, it had launched seven releases — all without an A/B test — when a major issue surfaced. The team had to roll-back most of the changes in order to A/B test and find the issue.”
Get ahead of complaints from crestfallen colleagues about squashed ideas. It’s so easy to test. Soon you’ll have a dozen A/B tests running.

How to Implement and Scale A/B Testing

Through the years, Shmukler has heard all the objections around implementing A/B tests not just for product development, but as a management tool. “For growth and later stage companies, it nearly always takes a crisis. At scale, people get stuck in their ways and it’s hard to convince them otherwise merely on theoretical merit,” he says, “But for startups, the protest is always around time and cost. I’ve used this system at a startup when it was 20 employees and barely retrofitted it as it grew past 100 people.”

Shmukler has worked at both large companies and startups, and has found that, regardless of size, this testing framework produces very little friction to launching an experiment, because its not a drag on time or cost. Here’s how:

Invest in a lightweight experimentation tool. In Shmukler experience, it takes just a handful of engineers to do the initial build of a tool and dashboard to run, monitor and evaluate A/B tests. “Most folks who’ve never done this think the resources required are more than they actually are. With less than a month of work, startups can get A/B tests practically automated. PMs can get the results quickly and without a lot of work. If they want to dig into data, they can, but they don't have to since most of the core evaluation is being done for them,” says Shmukler.

If you don’t have the engineering bandwidth or chops, there are many tools available. “If you're using a tool like Optimizely, you probably don't even have to write any code to enable A/B testing,” says Shmukler. “It takes care of it all for you: the testing, the data gathering and the evaluation. Of course, you have to build the different features, but that must be done in any model. If you're using an A/B testing tool, it's almost no additional work for you. There may be some cost, but usually there are freemium options for early-stage startups. It’s a worthy investment when you consider that it’ll help you make sound product decisions and align your team. In the long run, rolling out releases blindly will always take longer.”

If you’re still not convinced, Shmukler intently breaks it down further, “Really, what you’re doing is choosing a user, flipping a coin and putting them in either the A or B group. Depending on that association, you’ll show users a different page or send them to a different flow," he says, “At the start, you can quickly develop something simple. Just get the experimentation going. Continue to collect and test data — for all that’s happening with your product.”

Reduce the barrier to test to one decision. Making A/B tests a universal practice means making each experiment an easy, instant choice. “The friction to launch a test should be as low as possible. Essentially, you want the only hurdle to be just the decision to test at all. After that, the system will essentially run and evaluate it all,” says Shmukler. “It comes down to building great frameworks. For engineers, you need a framework where launching a test is incredibly easy — maybe it's a couple of lines of code to create a new experiment. For everyone else, this is where you need, essentially, an internal dashboard that automatically evaluates the test.”

Shmukler recommends that product teams use dashboards that automatically recognize when new experiments are launched. He describes one that he’s built in the past: “It lets you pick a goal for that experiment. Then, based on that goal, it shows all the metrics that you should be looking at for that experiment,” says Shmukler. “Then it immediately evaluates the data and gives a simple red or green signal to indicate if the change should roll out or roll back. If it doesn’t have enough data, it gives a neutral indication to recommend that the test should keep running. That's the level of simplicity and automation you want. You shouldn’t feel like you’re paying a price at each step to experiment.”

Untangle interactions among tests. As with any system, it’s prone to break at scale. “At LinkedIn, we eventually noted interactions between experiments. For example, tests on a sign-up flow and a profile page could color the other’s results. This becomes hard to untangle,” says Shmukler. “Use statistical methods to alert your dashboard if there’s a problematic interaction. It should allow you to choose to shut down one of the experiments and evaluate your remaining test. If you anticipate an issue before testing, consider using different selection criteria to route specific users into experiments to avoid those kinds of interactions altogether.”

At LinkedIn, Shmukler and his team used sophisticated selection criteria to avoid interaction between experiments. “We’d select from users that weren't in other experiments or sub-segment them to made sure that a segment was only in a single experiment at a time. You end up having to build more complex processes for selecting someone into an experiment then evaluating the experiment over time,” he says.

As the complexity increases with concurrent experimentation, dashboards must become easier to interpret. “The best dashboards are the simple ones — where it gives an indicative green or a red signal — but you still must read the results. You may want to run a red experiment for longer to gather and interpret more data,” says Shmukler. “But as you scale, you want less and less of that interpretation happening because you're going to distribute this information across a much larger team. The likelihood that everyone will extract the same interpretation decreases. So, fool-proof your dashboard so that it gives anyone the full — and same — picture.”

Understand the broad impact of your experiment. Both visionary and data-driven PMs will put increasing faith in A/B testing as the tool itself sharpens. “Most A/B tests tell you whether the experiment is a good idea or a bad idea. The next step is to develop a dashboard that shows the impact of the experiments on the key metrics for the entire product or the entire company,” says Shmukler. “Tests don’t always make that obvious. For example, you may optimize some small part of the sign-up flow and see a 10% lift in completion of that step, but that doesn’t necessarily translate to 10% more sign-ups because not every user may encounter that step.”

Deepening the capability of your testing software to translate an experimental result into top level impact should be on every product team’s A/B testing roadmap. “It’ll helps you understand what to expect in top level numbers as a unit and organization. Also, it helps people select experiments a little better by rooting them in the context of the company. It’s important to know if it’s a 10% win for the entire product or just for a small corner of the product. By reflecting the impact of an experiment on business metrics, the hope is to build each team’s awareness of these experiments, making them more meaningful to run.”

Tying it All Together

Many startups understand the value of A/B testing as a tool to grow a product, but it’s also a built-in mechanism to better develop a product team. Universal A/B testing not only resolves the tension between different product manager types, but it also does so while giving them autonomy and acknowledgement of their personal style. Even if startups face challenges with small sample sizes and alternative systems, investing in a lightweight experimentation tool is always worth it. As the company and number of experiments grow, so must the testing software — and it must employ simpler signals so any employee can understand the big picture.

“A/B testing does more than elevate products faster — it does the same for teams. Using universal A/B testing as a management tool sends a message to PMs of all stripes: intelligently tested ideas will rise to the top, regardless of who thinks of them,” says Shmukler. “They say that to go fast, go alone. And to go far, go together. This system allows you to be both ambitious and autonomous. Most people want to be both in their work — and everyone wants to see their ideas come to life.”