Machine Learning and Predictive Analytics have been receiving a lot of attention lately! Without question, this is an exciting technology with extremely broad applicability. After all, who wouldn’t want to be able to predict the future? Still, with hype comes confusion, and there is a lot of confusion today about what exactly Machine Learning is and how to use it.
I have good news! There are really only 4 (yes, four) Machine Learning problems. For anyone who wants to explore the value of Machine Learning, it’s important to understand them, because the first step in any Machine Learning process is to figure out which of these problems you are trying to solve. Data Science teams address this question before they begin designing a Machine Learning model. If your problem does not fit into one of these buckets, forget the hype! You’re better off taking a simpler approach.
Classification – in this machine learning problem, we’re trying to figure out if some bit of data (an observation) represents something simple which we already understand (a Label). This label can either be a Yes or No decision, (Two Class) or it can be one of a set of possible answers (Multi Class). In order for this to work well, you need to provide the Machine Learning model with examples first. Applications include:
- Facial Recognition – is this picture an image of my customer?
- Voice Recognition – what word is represented by this sound?
- Handwriting Recognition – which letter in the alphabet does this image represent?
- Fraud Detection – is this transaction fraudulent?
- Medical Outcomes – will this person have a stroke in the next year?
- Proactive Maintenance – will this piece of machinery fail in the next 72 hours?
- Credit Default Risk – will this borrower default on his/her loan?
Regression – in this machine learning problem, a Yes or No answer is not going to be enough. In order to solve this problem, the machine needs to predict a value (i.e. a price, a temperature, a measurement) by understanding the numeric relationship of that value to other values (or Factors). If you took Calculus, this might sound like a simple “Rate of Change” function: you’re on the right track. Just as with Classification, Regression problems need some examples in order to work well. Applications include:
- Cost Analysis – when will be the best time to buy something?
- Demand Prediction – how many widget’s will we sell next year?
Clustering – this is where things get complicated (!!) With the first two problems, we have examples we can use to “train” our machines to predict a label AND we can test them with labeled observations (known to Data Scientists as “Ground Truth”). But what if we don’t have a ground truth? The best we can do is identify clusters of observations. Fair warning: without ground truth, evaluating the results will be a challenge. Still, some applications include:
- Grouping of Content – Grouping Today’s News into Categories, or Documents into Topics
- Materials Classification – take a Raw Materials Master File and organize it into a taxonomy
- Customer Segmentation – identify similar customers based upon purchase behavior
Recommender – have you ever been on a website which presented a recommendation of something you might “Like”? Movie recommendations on Netflix, product recommendations on Amazon, or advertisements on your apps – if you are familiar with the internet, you probably understand the premise here.
That’s it. Now you know how to recognize a problem which Machine Learning can help you with. If your business problem does not fall into one of these four, you don’t need a machine learning model to solve it. More importantly, if you know the factors which drive a business outcome, just build a model in Excel – you don’t need a Data Science team for that.