You Ask, I Answer: Data Quality and AI?

Warning: this content is older than 365 days. It may be out of date and no longer relevant.

You Ask, I Answer: Data Quality and AI?

Sampurnakumar asks, “What level of data quality do you need for AI to be successful? Does it require the best data and best data usage to solve problems?”

One of the problems with AI, conceptually, is that it’s seen as this mysterious entity that we don’t fully understand. Any qualified AI practitioner should immediately debunk this concept when possible, because AI as it is currently used in the commercial space is anything but mysterious. Substitute the word spreadsheet for AI and see how the question changes, because at its heart, AI is just math.

You Ask, I Answer: Data Quality and AI?

Watch this video on YouTube.

Can’t see anything? Watch it on YouTube here.

Listen to the audio here:

Download the MP3 audio here.

Machine-Generated Transcript

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for watching the video.

In today’s episode Sampurnakumar I think I got that right asks, what level of data quality Do you need for AI to be successful doesn’t require the best data and the best data usage to solve problems.

So one of the problems with AI today, conceptually, is that it is seen as this mysterious entity that we don’t fully understand that it’s, you know, no one could understand what’s going on in the machine, which is patently untrue.

Any qualified AI practitioner should immediately debunk this, this concept of we don’t know what’s going on inside the AI.

Because AI is, at least as it is being used in commercial business applications.

I’m sure there are some things in academia which people are still researching, hence, its research but for commercial use for things that we’re doing In business, if you don’t know what’s going on inside the box, you did it wrong.

Right? Ai, particularly traditional machine learning, there’s no excuse for not knowing what the machine is doing.

And so when we take the phrase AI away, one of the tricks that I like to do is to substitute the word spreadsheet, because at its core AI is just math, right? It’s nothing more than doing math, stats and probability.

re ask that question, what level of data quality you need for spreadsheet to be successful? Right, that’s, that’s a lot easier to understand.

Right? You need a minimum data quality, otherwise your spreadsheets will be wrong, right? And if your spreadsheets are wrong, you’ve got to make bad decisions.

So the question is less about the type of data quality you need for AI to work and more about the type of data quality you need to get the outcome you’re looking for AI does not solve new problems, right AI solves existing business and math and marketing problems that we don’t have the scale to handle.

Or we don’t have the time to handle but we’ve previously tried to handle them, right? If you do image classification, image classification is something we do all day, right? You see a crowded store and you recognize a person’s face that you know, that’s image recognition, you do that already.

So you’re not doing anything brand new, you’re just doing AI is doing that faster and a greater scale.

When you’re trying to solve a complex mathematical question.

You could do 300 variable multiple regression analysis by hand, you would not do it quickly.

And it wouldn’t be a lot of fun, but it can be done.

It is something that a human being could do.

It is just not efficient for a human being to do so.

So Think about with AI.

And the data quality you need of the data usage.

How would a human tackle is what level of data quality would a human being need in order to make this work? If you had a spreadsheet open? How would you solve that problem with a spreadsheet? And what data quality would you need? A lot of the time data quality comes down to risk assessment.

What level of error? Are you comfortable with? What level of error is acceptable? If you’re doing marketing, right, and you’re doing campaign targeting, and and no plus or minus 5%.

Probably not going to break the bank unless you deploy a multi billion dollar marketing campaign if you drop on 1000 bucks on a Facebook ad, right? What level of error Are you comfortable with probably you’re pretty comfortable, the fairly wide margin of error Right.

On the other hand, if you are doing medical devices, and the device that you are programming and building a model for is going to be implanted in thousands of human beings, your margin of error is really small, right? or it should be if you’re an ethical practitioner, because you want to have as little error as possible and therefore, risk as few lives as possible, right? There’s a much higher standard for error.

There’s a much lower tolerance for error in cases like that as it should be.

So data quality, at its core is really about risk mitigation.

What level of risk Are you comfortable with? What level of risk are is your organization comfortable with? How wrong Are you allowed to be? Because remember, when you take data that you have, and you feed it to AI, all it’s doing is processing the same data.

It’s a larger scale, so the margin of error may be the same.

It might apply some Minus 3%.

It’s just that instead of a spreadsheet with 1000 rows, you may be looking at data set with a billion rows and 5% of a billion is a much larger absolute number than 5% of 1000.

But if you’re comfortable with that level of error, great, now one of the things that AI is capable of doing, because again, it’s all just math is identifying very quickly whether something has greater error than we thought, right? So you have this piece of software developed or the status that you’re working with, and it shows, you know, an MA e or an MSE or an RMSE.

Or any of these the error metrics, area under curve and and the numbers are wildly off.

You’re like, Huh, that doesn’t look right.

When you went into situations like that, that is an opportunity for you to use these tools and say, I think there’s more wrong with this data than we thought.

The reason we don’t do that more is because most practitioners who work with data at least in marketing, copy that in marketing Do not have a formal background of any kind and exploratory data analysis, the ability to look at a data set and go, yeah, there’s some things wrong here.

Right? That’s something that’s a skill that people lack.

And that’s certainly a skill that I would like to see more marketers adopt and embrace is the ability to do exploratory data analysis in order to find what level of error is wrong in the data to begin with? And if you don’t do that, you can never know that.

And then yeah, you do have, at that point, the worst case scenario, you have unknown risk, you don’t know what the risks are.

And that’s very dangerous, because it could be 2%.

It could be 200%.

And you could have you could be in for a really bad time as as the South Park character say, so that’s the answer to that question, substitute the word spreadsheet and then ask yourself what level of risk Are you comfortable with in your data in order to make the determination whether you’ve got good enough data to use with artificial intelligence and machine learning Thanks for the question please leave your follow up questions below.

As always, please subscribe to the YouTube channel and the newsletter.

I’ll talk to you soon.

Take care, want help solving your company’s data analytics and digital marketing problems? This is Trust insights.ai and let us know how we can help you

Machine-Generated Transcript

Comments

Leave a Reply Cancel reply

Pin It on Pinterest