A playbook for machine learning projects that work
Author Eric Siegel says most leaders miss the boat when it comes to deploying the biggest breakthrough technology to date.
Eric Siegel sees business executives becoming overly transfixed by systems like ChatGPT and he wants them to snap out of it. Most of the benefits of artificial intelligence (AI) right now are coming not from hyped-up generative AI, but from its weighty older cousin, machine learning (ML).
The problem, says Siegel, a former Columbia University professor and best-selling author, is that companies usually get ML deployment wrong. His new book, The AI Playbook, offers readers a six-step methodology that he calls bizML to help organizations avoid the many potential pitfalls. Understanding it doesn’t require an advanced degree in data science.
The book title aims to get readers’ attention. They’ve all seen articles about AI, but they may not recognize that the underlying technology is actually machine learning. As Siegel says, “You have to start by bringing their attention to the topic under the terminology they're already familiar with."
The crux of the matter, he says, is for management to view each ML project as a business project, and not become enchanted by the technology itself. Each initiative must proceed from a concrete understanding of the ML use case (the process outcomes it will predict and how the organization will use the predictions) as well as how to measure the benefits.
Leading with business value may be standard for any major technology initiative. But Siegel may be the first to put together a detailed road map for the business practice of running a machine learning project. As a bonus, he thinks his bizML methodology for implementing ML can just as easily be used for many generative AI deployments.
We talked with Siegel about how to ground ML projects in reality.
SAP Insights: Why did you want to write this book?
Eric Siegel: I could see there was a need for a more formal methodology for how to run machine learning projects so that they successfully deploy. The book is first and foremost for non-data scientists: stakeholders who are responsible for delivering business value. They need to ramp up on a semi-technical understanding, which basically comes down to what to predict, how well their ML models can predict it, and how to change their operations accordingly.
Q: So, you don't use ML or AI just to do it. Is that right?
Siegel: That's right. Most of us on the technical side got into machine learning because of the cool factor. That was certainly the case for me, more than 30 years ago. You're going to use data from some limited number of previous cases to draw generalizations that pertain to new situations that have never before been encountered. Scientifically, that idea is awesome.
The models that are generated from data tend to perform well and, most often, we can avoid any technical “gotchas.” The dire issue is that, in the end, the organization just isn't ready to act on the predictions – to operationalize them.
But the science fiction intrigue of ML pumps it up. Certainly, stakeholders understand they are running a business, and that any technology needs to serve business objectives. But the hype is so intense, it often overrides those purely practical impulses. Everybody is sort of seduced by the core technology. It's like being more excited about the rocket science than the launch of the rocket.
So we must start by defining the business objective – but that is only the first of six steps in the bizML framework.
Improving the business is about changing processes. Machine learning is an unmatched way to accomplish that. Prediction is the holy grail for improving most any kind of large-scale operation in which you're driving hundreds of thousands of decisions a week, a day, an hour – whatever the scale is.
Q: In the book, you discuss the “ML Paradox”: for the technology to succeed, we need improvements in human understanding and leadership more than in the technology itself. Is that a major sticking point?
Siegel: The core technology is sound. The models that are generated from data tend to perform well and, most often, we can avoid any technical “gotchas.” The dire issue is that, in the end, the organization just isn't ready to act on the predictions – to operationalize them.
So, the bad news is already out there. IBM had a survey recently saying the average return from AI projects is essentially zero – lower than the cost of capital. The good news is that we can address this problem – first by recognizing that it is not a problem with the technology itself.
Q: Talk about the concept that you call the accuracy fallacy. What do people misunderstand about the predictive accuracy of machine learning?
Siegel: The accuracy fallacy is one of the ways AI tends to be hyped in the newspaper and magazine headlines. You’ll read things like AI/ML systems are able to predict “accurately” all sorts of things such as whether you're going to become a criminal or whether you're going to have a heart attack next week. These kinds of human outcomes and behavior can be valuable to put odds on, but we don't have clairvoyance – and neither can our computers – no matter how amazing and advanced the technology gets. There is a ceiling on the accuracy of such predictions.
With ML you can predict a lot better than just guessing. But you're not going to have super high confidence that this customer is definitely going to buy an ice cream cone next Wednesday or that the weather is going to be a certain way on a certain day, three months from now. There's a whole host of things that are literally impossible to predict with high confidence. Many human behaviors defy reliable, definitive prediction.
Q: What metrics do organizations need to assess the predictive performance of ML? You write in the book about them, including lift.
Siegel: How good is the model or algorithm at predicting the behavior we’re interested in? You have to put some numbers on it. Accuracy is usually a misleading measure.
Lift is a technical metric that can tell you how much better your model predicts something, such as who is most likely to respond to your marketing message, in comparison to random guesswork. But lift doesn’t give you any direct insight into how much value or profit was derived, or how many customers you saved, or whatever the business metric or business goal may be.
Instead, you have to convert the results into something like profit: If I use this model to target my marketing, this is how much it costs to contact everyone, and this is the average return we get when a customer responds. Then we can do the arithmetic and estimate the profit – not just how well it predicts or how often it predicts correctly.
For any given situation, the model could potentially predict just plain wrong. But there are two types of wrong and sometimes the metrics that organizations use don’t distinguish between them. The two wrongs are false positives and false negatives. This matters because, from a business standpoint, the costs of those two kinds of errors, respectively, differ greatly. Usually, a false negative will cost a lot more than a false positive. Business metrics must integrate that information to strike the right balance between the errors. They’ll never be eliminated, but you can usually trade off one for another.
In some cases, establishing the relative costs of these errors can be tough. How do you put a number on the cost of a heart attack going undetected, or of misinformation staying on a social media platform, or having a legitimate e-mail message being put into your spam folder incorrectly?
The relative weight of those two types of errors, false positives and false negatives, is going to be in the system somewhere, at least implicitly. So even if it’s a hard call, it’s one that’s got to be made.
The C-suite gets an AI upgrade
Learn how AI and machine learning will alter how companies are led.
Q: You say your bizML methodology can also be used when deploying a generative AI project. What would that look like?
Siegel: The book mostly focuses on predictive ML use cases – which provide the greatest opportunities for improving existing large-scale operations. Whereas with generative AI, most of the use cases are focused on having the machine create new content, often a first draft. It's going to write for you; it's going to draw a picture for you. These are very different kinds of projects.
But even within the generative AI arena, the same concepts apply. Instead of just saying we need to start using, perhaps, ChatGPT, you should begin by answering a number of questions up front. Exactly how are you going to use this technology? Which employees, conducting which procedures, are going to use the output of a generative AI model as a first draft of their work? For which types of cases? How many times a day will they use it? So, broadly speaking, the project should follow a sequence of steps similar to bizML's steps.
Q: You end the book by reminding readers that a successful ML project starts with a successful pitch. Give us a brief example?
Siegel: The premise is simple: reframe ML projects as “operations-improvement projects that use ML.” Don’t let the hype get to you. Don’t lead with the scientific virtues and quantitative capabilities of the technology.
Instead, tell a simple story about how processes will improve. Something like, “Our direct mail is ineffective. Only half a percent respond. If we could increase that to 1.5% using machine learning, that would mean a projected $500,000 increase in revenue in return for our current marketing spend, tripling the ROI of marketing campaigns.”
Then ask what they think and answer their questions. Opening up a dialogue is critical.