{PODCAST} In-Ear Insights: Best Practices for A/B Testing for Marketers

In this episode of In-Ear Insights, the Trust Insights podcast, Katie and Chris discuss the similarities and differences between A/B testing and clinical trials. What aspects should marketers keep from the life or death world of healthcare, and what aspects don’t apply to the realm of marketing? Learn some best practices for A/B testing when you tune in.

[podcastsponsor]

Watch the video here:

{PODCAST} In-Ear Insights: Best Practices for A/B Testing for Marketers

Watch this video on YouTube

Can’t see anything? Watch it on YouTube here.

Listen to the audio here:

Download the MP3 audio here.

Machine-Generated Transcript

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for listening to the episode.

Christopher Penn 0:02
This is In-Ear Insights, the Trust Insights podcast.

In this week’s in your insights, we have a listener slash reader question from Doug, who says, was asking about proper AV testing.

So Katie, here’s the scenario.

Doug says, I have an interesting non random case, I have a hard time explaining to the team that what follows does not count as an A B test, we give a list to sales reps each month, they have to go through to ask people to renew their memberships, we believe they go through them in order and usually finish about 80% of list, there’s no particular logic to the ordered list is just an output of a script, the situation seems pretty close to random, except that a rep can choose to skip someone and the list does have an order that’s probably close to random.

We wanted to see the success rate between those who are contacted and those who are not.

What’s wrong here that I’m missing.

So Katie, you spent 10 years in pharma doing clinical trials, and an A B test is nothing more than a properly blinded, randomized control trial, right? in your, in your perspective, is this an A B test? And if it isn’t a randomized controlled trial? What do you see wrong with it?

Katie Robbert 1:12
Um, I mean, it can be it’s, if you follow the procedures correctly.

And so you know, the first thing that sticks out to me that this person is communicating to you is that the sales reps can skip over people that there nullifies the test, you know, in a true, this is obviously not a clinical trial.

This is, you know, a sales process.

And theoretically, it’s not a life or death situation.

In a true clinical trial, you can’t just randomly decide to skip over people, because you don’t feel like calling them or you don’t feel like communicating with them, because it could be life or death.

And you do invalidate your research findings, because you did not follow the preset procedure.

And so, you know, you can, I think one of the things that we like to do is we like to borrow some of the guidelines from that more stringent testing, but because we’re talking about, you know, calling someone or not calling someone to buy your piece of software, it’s probably not as critical to follow that stringent AV test.

And so, you know, in this case, you know, it sounds like he was just kind of curious about something, but he’s not going to necessarily.

I mean, I don’t know, we would need to find out, like, based on what you find out, are you going to change your sales process or not? I mean, that’s probably the first question.

If it’s more of just a I’m curious, then it really doesn’t matter if they’re stringently following whether or not the sales reps are calling, not calling communicating with the people on the list.

What do you think, Chris?

Christopher Penn 2:51
Well, the first thing that came to me was, what are we testing? are we are we in a B testing the sales reps? Are we AV testing the audience, so you’re looking for a difference in the audience or in the procedures you’re using? So that was my first question was there’s not really a hypothesis, a good test has a clearly defined hypothesis.

And this one doesn’t, you know, success rate between those who are contacted and those who are not can meet a lot of things.

And there’s a lot of variables to unpack in there, as opposed to saying, you know, we’re going to test calling on Tuesdays at 9am, versus calling Wednesdays at 9am.

To see which is the better time to call, like, that’s a very clear hypothesis.

And you could say, we believe calling Wednesdays at 9am will have a 5% increase in sales.

That’s a provably true or false statement.

Whereas, you know, which is more successful, like you said, it’s not really a test, per se.

It’s just kind of a, I wonder, there’s nothing wrong with it.

I want that’s the start of a good test, because has a good question.

But it’s like, not rigorous enough.

Katie Robbert 3:50
It sounds to me like the goal of this was to test the overall outcome.

And so the hypothesis, if you start to piece it together is people who are contacted are more likely to convert versus people who aren’t.

And so, to your point, Chris, there’s not a whole lot of guideline around what that means.

You know, or is it the outcome being they picked up the phone and are reengaged.

And so there are a lot of missing pieces.

But again, this isn’t a clinical trial.

This is someone who is just trying to see if they can maybe get a few more sales through the door, I’m

Unknown Speaker 4:27
assuming? I would assume,

Katie Robbert 4:30
I would assume so.

So, you know, I don’t.

So you would said, are we testing the audience? Or are we testing the sales reps? This to me doesn’t sound like something that’s being set up to test the sales reps, per se, like that would the outcome would be to be something different.

So you would I guess you would give the sales reps different messaging, but the same audience like you would, the audience would be the control and what the sales rep is.

is working with might be the experimental part.

But in this case, the way in which you’re interacting with the audience is the experimental part and the sales rep is the control.

And so the sales rep is the constant.

And then the audience is getting different types is getting subjected to different methods.

Whereas if you were testing the sales reps, it would be sort of the opposite.

Christopher Penn 5:24
And that’s a really important point, I think, because again, if you don’t clearly define what is your testing you, on a good day, you’ll get mixed results on a bad day, you’ll have a completely invalid test, because there’s literally no way to figure out what’s going on.

The other part I thought was that really stood out was what you had said, about people being able to skip things, right.

You know, silly slash, not silly example, would be like, if, if the sales reps were all guys, like, I just want to talk to other guys and get some get some business done right, and they skip over people with female sounding names, you are then partitioning the audience in a way that is counter contradictory to the idea of a randomized trial.

Katie Robbert 6:05
Unless it is built into the test plan, that you will only and that becomes a fact that that you will only talk to, you know, male gender test subjects or something like and so all of that has to be predefined, you can’t just randomly decide it after the fact.

That was something that we dealt with a lot.

And it was, you know, it made actually recruiting for the clinical trials a little bit more tricky is trying to get that, you know, blended mix of genders and backgrounds and ethnicities.

And depending on what it is that you’re trying to test, that can be really tricky.

Just as an example, as a quick anecdote, one of the products that we were doing a clinical trial on to find about to find out about the efficacy of self report of your substance abuse, your substance use and abuse, to a computer versus to a person.

And one of the populations that we were testing was the Asian population.

And that, you know, what we learned was that that population tends to be very private, and not talk about those kinds of things.

Like, it’s just you don’t, you don’t talk about it, you don’t acknowledge it.

And so it made recruiting for the clinical trial very, very tricky, because we couldn’t find anyone who was willing to participate, because that would then mean that they would be open enough to talk about whatever it was they had going on.

And so those are, you know, in a true, you know, clinical trial, those are things that you have to factor in.

But again, going back to this example of, you know, just testing the sales process, it probably isn’t a bad idea to first take a look at the type of audience you have and what information you already have about them.

And so, you know, I know when I sign up for certain things, a lot of times it asks me about my preferred communication method, do you want to get a text message? Do you want to get a phone call? Do you want to get an email and you know, 10 times out of 10, I don’t want to be called.

So I just want to email.

And so that might be something to take a look at first.

And so if you are, you know, all of a sudden calling people who said Don’t call me, then, yeah, maybe that’s worth testing, because maybe they click the wrong button, or maybe they just really don’t want to talk to you.

So like, there’s a lot of different ways to look at your audience.

But you really need to understand them first.

Before you can say, and this is what I’m going to introduce to experiment with.

Christopher Penn 8:36
That’s a really important point, and probably one that never happens before an AV test.

If I think about if I think about all of our client interactions, we’ve had people doing, you know, different AV tests and things.

I don’t think anybody sits down and actually does exploratory data analysis to look at the audience to begin with, and say like, Okay, is there a, is there a substantial change on factors we know about like gender? Is there a difference in who is sponsoring you, Katie, one of the things we found with Trust Insights is, for whatever reason, our points of contact and our decision makers, substantial majority of the time skew is people who identify as female.

It’s overwhelming, I would say.

And so without that initial data exploration, it’s very hard to even come up with a hypothesis because you don’t know what you don’t know at that point.

Katie Robbert 9:28
Right.

Well, and that’s, you know, that statement is an anecdotal sort of back of the envelope.

You know, I’m sure if we really broke it down, we could come to, you know, very clear, you know, buckets of our audience of identify as male that identify as female, you know, communicative, you know, non-communicative you know, those kinds of things.

And so that definitely might be something for this person to consider is what are they already know about? audience and, you know, what are they already know about the purchasing patterns of this audience? You know, do they tend to, you know, renew? Or are they a one time, you know, client? If they’re a one time? What does it take to get them to renew? Are the renewals, you know, in their, you know, best interest? Is it something, you know, there’s a lot of different questions that this person could be asking.

And so, you know, obviously, we don’t have a lot of information to go on, they’re basically just saying, you know, we gave the sales reps a list that was randomized to either contact them or non contact them.

But there’s a lot of missing information that would make this a more useful exercise.

Christopher Penn 10:44
Yep, exactly.

And the other thing that I think is really important about that exploratory data analysis is we’ve got another client we’re working on, you know, processing some survey results on.

And at the end of the day, I think the intent of, of, you know, some of these questions is, does this thing that we’re asking about actually make a difference in the business result we care about, right? So if you want to make more sales, does gender matter? Does time of day matter? Does the salesperson matter? And one of the challenges that I see with a lot of data analysis is that people put together tons and tons of charts and graphs and pie charts and all sorts of interesting visualizations and stuff.

But they never actually sit down and answer the question, yes, we think this matters, or no, we don’t think this matters, and then say, okay, we think this matters, let’s build a test plan around it.

It just kind of, you know, as Avinash Kaushik says, sort of data puking, and it’s left as an exercise to the reader to try and figure out Yeah, this is something we probably should test for.

Katie Robbert 11:43
Yeah, and, you know, so I guess that then begs the question, why did this particular person decide to test this specific thing with the sales reps, you know, communicate versus not communicate? You know, anecdotally, my instinct would be that, you know, you probably do want to be communicating with your customers, you probably do want to be giving them new information.

So it’s an it is a little bit of an odd AB test, per se, um, you know, when I think of an A B test, it’s it’s more of it’s, it’s not the, you know, control and experimental, it’s more of the, okay, here’s two things, which one do you prefer a or b.

And so those are, you know, when you really break down, those are two different kinds of tests.

Because when, when this person is describing the sales team is going to contact or not contact, the people who aren’t being contacted, don’t know that they’re a part of this test.

And so in an A B test, you do know that you’re part of the test, and you’re given two options, or two or more options, and ABC test or whatever.

And so that might also be part of the problem is the approach of thinking that, you know, it’s this or that it’s this or that, to the subjects in the test, they need to know they are this or that they need to sign up to say, Yes, I want to be part of this.

And then given the instructions to say, you will either be given this or you will be given that, but it sounds again, we don’t have all the information, it sounds like what’s happening is they just decided, let’s try this thing.

But they consent from the subjects being their customers, the audience didn’t happen.

And so therefore, the audience doesn’t know that they’re part of this test.

Christopher Penn 13:44
So it’s interesting, because when we do a lot of AV testing, like in Google Optimize, for example, on our web pages, or in our email marketing software, again, the audience doesn’t know that they’re part of a test, right? They you just see the version of the website that you see.

But I think your point about the existence of a control or not, is really important.

Because I’ve we’ve seen plenty of cases where marketers running tests badly don’t have a control.

They’re like, Okay, we’ve got you have this button on the website, but test red or blue, and you know, the buttons currently yellow, it’s like, well, no, I mean, I guess you can’t do that.

But if you don’t have the yellow button in there as a control, you have no idea whether or not this is actually better.

In the same timeframe, I think.

So.

Experiment design is a part that’s that’s clearly lacking there.

But again, the good news is that reputable software for doing this and marking like Google Optimize, for example, doesn’t really give you a choice, right? You have to use control.

You’re not given a choice, but that differentiation between an A B test versus an actual controlled trial where there is a control, and then there’s experimental group is really important.

It’s something that you should not go without, I would think,

Katie Robbert 14:55
well, and to your point, you know, if let’s say for example, and again, We keep saying we don’t have all the information because I don’t want to miss speak about what this company is doing.

But let’s say the guideline is just communicate with this list, don’t communicate with this list.

Okay? There’s a lot of different ways to communicate with a customer, you can email them, you can message them on social media, you can call them, you can text them, that’s already four different things that you’re testing compared to this other side of the bucket where nothing is being tested.

And so let’s say that the group that was communicated with does better, great, you’ve tested four different things within that test.

So you don’t know which of those things was actually effective.

And so you need to break it down a little bit more clear.

And so if you were to say, I want to AB test, email, versus dming, on social media, okay, that I can understand, because each group is getting something.

And so that would be a valid way to think about, you know, understanding which communication method is more effective.

Whereas if you’re saying, we’re going to communicate with them, by all means necessary versus not, you’d still don’t have valid results, because you don’t know what worked.

Christopher Penn 16:14
Exactly.

And here’s the part that I, I know for sure, marketers don’t test and don’t look at, whereas the clinical world does and the surveying world if your market research has been done properly, does, that’s not response bias.

To your point.

Ease is there a difference not only in the people who responded, but also differences, people who don’t respond, right people, you know, for example, people in our generation and older than us are, are generally less hesitant about answering the phone, generally less, because we grew up with telephones, right? It was one of the primary methods of communication, whereas people, for example, in my child’s generation will not answer the phone period for any reason.

And so you have an age difference there in a non response bias, where it’s not just you know, person did answer do not answer did respond, not as fun.

But the question we have to ask as, as data scientists is, is there a statistically significant difference? In the people who didn’t respond? Even when we’re looking at our own marketing, automation software, looking in things retrospectively? Is there a difference in people who do and don’t open our newsletter? For example? Is there a people a difference in the people who do and don’t come back to our website? And if so, is that difference, statistically significant? And the question you always ask is, so what is what should we do something about it? Right? It may turn out like in your example, if you’re, you’re texting and emailing and calling and sending postal mail, and a certain percent of the population in the texting bucket doesn’t respond, is there a difference? And if so, that might be a setting for follow up just to say, Yeah, okay, these people who didn’t respond to a text are all above the age of 55, let’s try a different, maybe we should try a different method.

Katie Robbert 18:01
And that’s absolutely valid.

That was one of the challenges we would run into when we were doing our clinical trials was getting people to show up and participate.

And so when you’re asking people to be a part of this thing, typically there needs to be something in it for them as well.

And so it doesn’t have to be like a monetary thing or some sort of, you know, form of payment.

But, you know, if you are working with your audience to test different methods of communication, you also need to think about what’s in it for them? What are they getting out of this like for you, you know, that what you’re getting out of it is the likelihood of increased revenue.

But if I’m the person who’s being contacted, if I’m the person who’s being experimented with, what do I get out of it? Do I get a better user experience? Do I get more clear preferences around how I do or don’t want to be communicated with? Am I being heard? Am I being listened to? Are you actually, you know, responding to my feedback on the thing that you’re trying to sell me? So like, what am I the user getting out of it? Not just what are you the company getting out of it.

Christopher Penn 19:09
But I assume that is also part of experiment design, which again, something that we don’t spend a whole lot of time on in marketing.

Katie Robbert 19:16
That’s, it’s absolutely true, because we don’t think about it in those terms.

You know, marketing tends to be very selfish.

And I mean, that in the sense of, you know, in my experience, and you know, we’ve all been guilty of this, we tend to think about it from the lens of what do I need to happen? What do I need to communicate? What do I need people to know? And so I make it all about me.

And I keep thinking about it from my perspective, and more intuitive, more empathetic marketers think about it from the sense of, what do my customers need to know what do my What does my audience need to benefit From this information, and so it’s thinking about marketing just in that sort of flipped.

And that’s not easy to do, especially if you don’t have a lot of that background information about your audience.

But marketing does tend to be a very selfish thing.

And you sort of, again, to go back to that comparison of clinical trial, it doesn’t matter what you want.

All that matters is that you are working toward the hypothesis that you have set up, that has nothing to do with you, it has everything to do with the outcome of the drug, the intervention, the you know, whatever the thing is, it doesn’t care what you want to happen, the data is going to be the data.

And that is one of the big differences between, you know, obviously, aside from the whole, you know, life and death thing, between a clinical trial, and you know, what we do in marketing, it doesn’t like, we tend to not really care if we’re getting the result we want, because we can just pivot and change course, and do something different until we do get the result that we want.

And that’s not true of clinical trials, like the data is the data is the data.

Whereas in marketing, even though we’re borrowing from, you know, the structure of a clinical trial, we can still be selfish and start to pivot if it’s if we’re not getting the result that we want.

Christopher Penn 21:18
Should we be borrowing more from clinical trials?

Katie Robbert 21:23
You know, it really depends.

And so this is something that we say, that process for the sake of process is not good process.

And so structure for the sake of structure doesn’t necessarily get you better results, some kind of structure, some kind of foundation is obviously good, being able to trace everything back to the question that you’re trying to answer to some original hypothesis is very good.

But the rigidity of a clinical trial could get in the way of being agile.

Because you can’t like if you if you borrow too much from the structure of a clinical trial, then you’re setting you’re putting yourself in this like, very stringent, unbendable box.

And the thing that’s really nice about marketing is that you can start off with a hypothesis and find out very quickly, okay, this is never gonna work, this is never going to see the light of day, let me just go ahead and spin up something different, or, you know, the data that I’m collecting isn’t going to get me to the answer.

Therefore, you know, let me try different kinds of data.

And so with a clinical trial, you do have those phase ones that are essentially a proof of concept.

in marketing, what we’ve talked about is that exploratory data analysis part of part of the phase where that’s essentially that phase one.

But the big difference is, with the marketing exploratory data analysis, it’s much easier to pivot than it is in something like a clinical trial.

So yes, you can borrow, but I wouldn’t borrow so strictly that you are making yourself unproductive and unable to move forward.

Christopher Penn 23:13
Got it? So to recap, here is Doug’s question of whether this is a B test.

accuracy, the answer is no, it’s in the form presented does not appear to be a good AV test is something that’s clearly trial that has a hypothesis that’s clearly defined.

So you know, what you’re testing for? Has randomization so that you’re eliminating biases, as much as possible might even be blinded.

Again, you know, to the example of you know, should someone be able to skip based on you know, calling someone the answer’s no, it’s if you’re not if you want it to be good design.

So if you got questions about a an A B test, Oh, God, Katie.

Katie Robbert 23:50
Well, the other thing that I would add to that is that you have, you know, a set of results that you actually do something with

Christopher Penn 23:58
Come on.

That’s crazy talk.

Katie Robbert 24:00
Well, again, what’s so what what’s the point of testing if you’re not going to make a change?

Christopher Penn 24:05
Exactly.

If you got questions about an AV test, you’re thinking about running or you’re thinking about, you know, designing one and you want to ask about a pop over to our free slack group TrustInsights.ai dot AI slash analytics for marketers over 1700 other marketers asking analytics questions, beginner and advanced to like all day long and wherever it is that you’re watching or listening.

If you want to get the show on the channel of your choice, hop on over to Trust Insights dot A AI slash ti podcast, we can find every place that we do publish the show.

Thanks for tuning in, and we’ll talk to you next time.

Take one help solving your company’s data analytics and digital marketing problems.

This is Trust insights.ai today and let us know how we can help you

Need help with your marketing AI and analytics?

Machine-Generated Transcript

Leave a Reply Cancel reply

Subscribe to our Weekly Newsletter

Pin It on Pinterest