How to build an Experimentation and Testing Bench

Posted on Posted in Analytics Strategy, Data Science, RecSys, Technology

How do you, and those around you, deal with failure?

Because if the answer is anything but “well” or “every fail has a lesson”, then progressive, iterative, experimentation and AB testing really, really isn’t for you.

Go away. Stop reading. It’s not everyone.

The way to build an experimentation and testing bench is entirely by setting sail for fail.

How to build your Experimentation and Testing Bench

Your bench will be built:

  • 95% with culture
  • 5% with technology

The technology is well thought out and several vendors are excellent in this space. The technology isn’t a problem. Sure, there are a few pretty shockingly bad systems out there. Every industry has a phony.

Building a testing bench begins with policy.

That policy is shaped by these questions:

  • Why are you testing? (What’s the optimization objective?)
  • Why does the optimization objective happen? (What are the causal factors?)
  • What causes changes to those causal factors? (What matters to test?)
  • How is knowledge recorded?
  • Who tests?

Question by Question

Why are you testing? / What is the Optimization Objective?

Is it sales? Is it trial? Is it number of leads landed?

Is it ad revenue?

If you select an optimization objective that aligns with what you, and the business, really cares about, you’re building a bench that works and has focus.

If you can’t focus long enough to get through a policy document; you got more fundamental problems than that.

Inability to state a primary optimization objective is the number one cause of analytics bench failure.

Why does the optimization objective happen? / What are the causal factors?

Suppose that the optimization objective is to drive trial. Suppose that the CEO is obsessed with driving trial because it relates directly to her business plan for that particular phase of the product lifecycle.

Alright – what causes people to trial a product?

Curiosity? Trust? Existing problem? Belief that the product is likely to solve their problem? Belief that the product will cause an unfair advantage? Ease of trial? Awareness that the trial is free? Is it the relatively few barriers between wanting to trial and trying it? Is it brand? Is it social proof? Is it any combination of the above stated reasons and so much more?

Different people have different truths.

I’ve met brand creatives, real traditional people, who place no credence in the effective ad frequency curve; and I know catalog scientists that assign little credit to the impact of creative.

Your rationale for why people do what they do will vary.

And that’s great. Let’s embrace the heterogeneity of opinion.

What causes changes in those causal factors? / What matters to test?

Assume that you pick Trust, Ease, and Social Proof as your three big pillars.

What causes trust?

What causes ease?

What causes social proof?

Trust is associated with specific iconography on the page. In some sectors it’s iconography. It’s associations. It’s accreditation. The whole ‘client tile’ wall of icons is another one. After all, “Surely, if P&G is a client, they must be good!” There are all sorts of trust marks that can be arranged visually.

Ease is associated with just how many blockers your going to put in between people and the demo. Are you going to demand a credit card number? A name? Or are you going to put any barriers in there? Just let people try it?

Social proof involves all sorts of tactics – pictures of people, testimonials, tweets from others, links to trusted authorities. Tweets from friends of the visitor. There are all sorts of social proof components.

If you believe in all of that, there are already thousands of tests that could be run, simply probing each one of those hypotheses.

How is knowledge recorded?

Where are the reasons for believing things recorded? And, where are the results of your tests going to reside?

A culture of experimentation and testing is likely to far outlive your tenure at the firm…so how is that knowledge recorded and managed over time? How do you make it really easy for people to test, or to search for past tests?

Who tests?

Do agile teams test whatever they need to test to validate their assumptions? Can only the CEO authorize a test? Is anybody empowered to test a hypothesis?

The answers to all of those questions forms a critical component of your test bench.

Technology

The problem is solved. Pick the right technology for the answers to the questions above.

If you’re pursuing a democratic strategy, then the testing tool should be dead simple to use.

If you’re pursuing a distrust-the-employees-what-do-they-do-know strategy, then the testing tool should very difficult for laypeople to use and should require a mountain of fighting with the IT department and four VP signatures to get anything done. (But dammit, at least you got testing in!)

Technology is a very small part of the overall bench.

imposter2

Fail

Chances are very good that you’re understanding of the customer, of mass audiences, will be wrong.

Chances are excellent that you won’t nail it the first time. It’s extremely rare to be absolutely certain and absolutely right about something in the absence of evidence.

You can have a very detailed model about what’s important to people when they engage on a trial and be wrong about a given segment – (“Oh, it turns out that if you animate a video of a cat in the background licking the ‘TRY IT NOW’ button, you’ll drive mad breakthroughs amongst redditors, what an insight!”)

Wisdom begins with knowing what you know so you can chart what you know you don’t know.

Wisdom is also accumulated learning. And before you can really start learning, you have to be acclimated to fail.

Can your bench handle sustained failure? Can your bench handle 100 hypotheses and not one, single, statistically significant, sustainable, learning? Can your bench scale fail?

I’m not quite certain where things went wrong in digital – when people actually started taking it so seriously that they couldn’t try something new, fail – and not risk the entirety of their careers. I am fairly certain that such digital cultures are unlikely to generate sustainable competitive advantage.

Success

Success means taking 1000 small risks and walking away with 10 pretty interesting validations.

Success means making tactical gains.

Those tactical gains may add up into tens of millions of dollars.

Structured, persistent, scientific management of digital enables the rare possibility of making truly strategic breakthroughs. It’s possible, if it’s theory driven and the optimization objective is allowed to drift often. But it’s not guaranteed.

And that, I believe, is how you build a experimentation and testing bench.