Archive for Predictive Analytics

Our 5 Rules of Data Science

In manufacturing, the better the raw materials, the better the product. The same goes for data science, where a team cannot be effective unless the raw materials of data science are available to them. In this realm, data is the raw material which produces a prediction. However, raw materials alone are not sufficient. Business people who oversee machine learning teams must demand that best practices be applied, otherwise investments in machine learning will produce dubious business results. These best practices can be summarized into our five rules of data science.

For the purpose of illustration, let’s assume the data science problem our team is working on is related to the predictive maintenance of equipment on a manufacturing floor. Our team is working on helping the firm predict equipment failure, so that operations can replace the equipment before it impacts the manufacturing process.

Our 5 Rules of Data Science

1. Have a Sharp Question

A sharp question is specific and unambiguous. Computers do not appreciate nuance. They are not able to classify events into yes/no buckets if the question is: “Is Component X ready to fail?” Nor does the question need to concern itself with causes. Computers do not ask why – they calculate probability based upon correlation. “Will component X overheat?” is a question posed by a human who believes that heat contributes to equipment failure. A better question is: “Will component X fail in the next 30 minutes?”

2. Measure at the Right Level

Supervised learning requires real examples from which a computer can learn. The data you use to produce a successful machine learning model must demonstrate cases where failure has occurred. It must also demonstrate examples where equipment continues to operate smoothly. We must be able to unambiguously identify events that were failure events, otherwise, we will not be able to train the machine learning model to classify data correctly.

3. Make Sure Your Data is Accurate

Did a failure really occur? If not, the machine learning model will not produce accurate results. Computers are naïve – they believe what we tell them. Data science teams should be more skeptical, particularly when they believe they have made a breakthrough discovery after months of false starts. Data science leaders should avoid getting caught up in the irrational exuberance of a model that appears to provide new insight. Like any scientific endeavor, test your assumptions, beginning with the accuracy and reliability of the observations you started with to create the model.

4. Make Sure Your Data is Connected

The data used to train your model may be anonymized, because factors that correlate closely to machine failure are measurements, not identifiers. However, once the model is ready to be used, the new data must be connected to the real world – otherwise, you will not be able to take action. If you have no central authoritative record of “things”, you may need to develop a master data management solution before your Internet of Things with predictive maintenance machine learning can yield value. Also, your response to a prediction should be connected. Once a prediction of failure has been obtained, management should already know what needs to happen – use insights to take swift action.

5. Make Sure You Have Enough Data

The accuracy of predictions improve with more data. Make sure you have sufficient examples of both positive and negative outcomes, otherwise it will be difficult to be certain that you are truly gaining information from the exercise.

The benefits of predictive maintenance, and other applications of machine learning, are being embraced by businesses everywhere. For some, the process may appear a bit mysterious, but it needn’t be. The goal is to create a model which, when fed real-life data, improves the decision making of the humans involved in the process. To achieve this, data science teams need the right data and the right business problem to solve. Management should work to ensure that these five questions are answered to their satisfaction before investing in data science activities.

Not sure if you have the right raw materials? Talk to BlumShapiro Consulting about your machine learning ambitions. Our technology team is building next generation predictive analytics solutions that connect to the Internet of Things. We are helping our clients along each step of their digital transformation journey.

Berry_Brian-240About Brian: Brian Berry leads the Microsoft Business Intelligence and Data Analytics practice at BlumShapiro. He has over 15 years of experience with information technology (IT), software design and consulting. Brian specializes in identifying business intelligence (BI) and data management solutions for upper mid-market manufacturing, distribution and retail firms in New England. He focuses on technologies which drive value in analytics: data integration, self-service BI, cloud computing and predictive analytics

Using Real Time Data Analytics and Visualization Tools to Drive Your Business Forward

Business leaders need timely information about the operations and profitability of the businesses they manage to help make informed decisions. But when information delivery is delayed, decision makers lose precious time to adjust and respond to changing market conditions, customer preferences, supplier issues or all three. When thinking about any business analytics solution, a critical question to ask is: how frequently can we (or should we) update the underlying data? Often, the first answer from the business stakeholders is “as frequently as possible.” The concept of “real time analytics,” with data being provided up-to-the minute, is usually quite attractive. But there may be some confusion about what this really means.

While the term real time analytics does refer to data which is frequently changing, it is not the same as simply refreshing data frequently. Traditional analytics packages which take advantage of data marts, data warehouses and data cubes are often collectively referred to as a Decision Support System (DSS). A DSS helps business analysts, management and ownership understand historical trends in their business, perform root cause analysis and enable strategic decisions. Whereas a DSS system aggregates and analyzes sales, costs and other transactions, a real time analytics system ingests and processes events. One can imagine a $25 million business recording 10,000 transactions a day. One can imagine that same business recording events on their website: login, searches, shopping cart adds, shopping card deletes, product image zoom events. If the business is 100% online, how many events would that be? The answer may astonish you.

Why Real Time Analytics?

DSS solutions answer questions such as “What was our net income last month?”, “What was our net income compared to the same month last year?” or “Which customers were most profitable last month?” Real time analytics answers questions such as “Is the customer experience positive right now?” or “How can we optimize this transaction right now?” In the retail industry, listening to social media channels to hear what customers are saying about their experience in your stores, can drive service level adjustments or pricing promotions. When that analysis is real-time, store managers can adjust that day for optimized profitability. Some examples:

  1. Social media sentiment analysis – addressing customer satisfaction concerns
  2. Eliminating business disruption costs with equipment maintenance analytics
  3. Promotion and marketing optimization with web and mobile analytics
  4. Product recommendations throughout the shopping experience, online or “brick and mortar”
  5. Improved health care services with real time patient health metrics from wearable technology

In today’s world, customers expect world class service. Implicit in that expectation is the assumption that companies with whom they do business “know them”, anticipate their needs and respond to them. That’s easy to say, but harder to execute. Companies who must meet that expectation need technology leaders to be aware of three concepts critical to making real time analytics a real thing.

The first is Internet of Things or IoT. The velocity and volume of data generated by mobile devices, social media, factory floor sensors, etc. is the basis for real time analytics. “Internet of Things” refers to devices or sensors which are connected to the internet, providing data about usage or simply their physical environment (where the device is powered on). Like social media and mobile devices, IoT sensors can generate enormous volumes of data very, very quickly – this is the “big data” phenomenon.

The second is Cloud Computing. The massive scale of IoT and big data can only be achieved with cloud scale data storage and cloud scale data processing. Unless your company’s name is Google, Amazon or Microsoft, you probably cannot keep up. So, to achieve real-time analytics, you must embrace cloud computing.

The third is Intelligent Systems. IBM’s “Watson” computer achieved a significant milestone by out-performing humans on Jeopardy. Since then, companies have been integrating artificial intelligence (AI) into large scale systems. AI in this sense is simply a mathematical model which calculates the probability that data represents something a human would recognize: a supplier disruption, a dissatisfied customer about to cancel their order, an equipment breakdown. Using real time data, machine learning models can recognize events which are about to occur. From there, they can automate a response, or raise an alert to the humans involved in the process. Intelligent systems help humans make nimble adjustments to improve the bottom line.

What technologies will my company need to make this happen?

From a technology perspective, a clear understanding of cloud computing is essential. When evaluating a cloud platform, CIO’s should look for breadth of capability and support for multiple frameworks. As a Microsoft Partner, BlumShapiro Consulting works with Microsoft Azure and its Cortana Intelligence platform. This gives our clients cloud scale, low cost and a wide variety of real time and big data processing options.

CIO Article 1

This diagram describes the Azure resources which comprise Cortana Intelligence. The most relevant resources for real time analytics are:

  1. Event Hubs ingest high velocity streaming data being sent by Event Providers (i.e. Sensors and Devices)
  2. Data Lake Store provide low cost cloud storage which no practical limits
  3. Stream Analytics perform in-flight processing of streaming data
  4. Machine Learning, or AzureML, supports the design, evaluation and integration of predictive models into the real-time pipeline
  5. Cognitive Services are out-of-the-box Artificial Intelligence services, addressing a broad range of common machine intelligence scenarios
  6. Power BI supports streaming datasets made visible in a dashboard context

Four Steps to Get Started with Real Time Analytics

Start with the Eye Candy – If you do not have a dashboard tool which supports real-time data streaming, consider solutions such as Power BI. Even if you are not ready to implement an IoT solution, Power BI makes any social media or customer marketing campaigns much more feasible. Power BI can be used to connect databases, data marts, data warehouses and data cubes, and is valuable as a dashboard and visualization tool for existing DSS systems. Without visualization, it will be very difficult to provide human insights and actions for any kind of data, slow or fast.

Get to the Cloud – Cloud storage costs and cloud processing scale are the only mechanisms by which real time analytics is economically feasible (for most companies). Learn how investing in technologies like Cloud Computing can really help move your business forward.

Embrace Machine Intelligence – To make intelligent systems a reality, you will need to understand machine learning technologies, if only at a high level. Historically, this has meant developing a team of data scientists, many of whom have PhD’s in Mathematics or Statistics, and open source tools like R or Python. Today, machine learning is much more accessible then it has ever been. AzureML helps to fast track both the evaluation and operationalization of predictive models.

Find the Real-Time Opportunity – As the technology leader in the organization, CIO’s will need to work closely with other business leaders to understand where real-time information can increase revenue, decrease costs or both. This may require imagination. Start with the question – what would we like to know faster? If we knew our customer was going to do this sooner, how would we respond? If we knew our equipment was going to fail sooner, how would we respond? If we knew there was an opportunity to sell more, how would we respond?

Berry_Brian-240About Brian: Brian Berry leads the Microsoft Business Intelligence and Data Analytics practice at BlumShapiro. He has over 15 years of experience with information technology (IT), software design and consulting. Brian specializes in identifying business intelligence (BI) and data management solutions for upper mid-market manufacturing, distribution and retail firms in New England. He focuses on technologies which drive value in analytics: data integration, self-service BI, cloud computing and predictive analytics

Power BI Demo CTA

 

3 Tips to Jump Start your Data Science Plan

Are you looking to form a Data Science capability at your company?

If you answered, yes, then you probably already get the Machine Learning concept (The 4 Machine Learning Problems).  Maybe you are coming from either a Statistics or Computer Science background.  Either way, you see the potential of Data Science and Predictive Analytics and you’re ready to demonstrate some tangible benefit to management.

How are you getting started?  I’m hearing about two core hurdles:

  1. We’re looking for a great business problem to solve, one which could reasonably be solved with data the business already collects
  2. Our internal resources have very little practical experience working on a formal data science team, and don’t understand how it aligns to more traditional project teams

Time to Value is critical, but you need to do it in a way that has a formal process for managing risk, one which can be communicated inside and outside the team.  Here are the things you want to have in place, in order to launch your first project.

Establish Your Data Science Methodology – every project has a project plan and data science projects are no different.  What should the Data Science one look like?  Several teams of very smart people have already asked this question and independently arrived at the same conclusion.  My favorite is the “Cross Industry Standard Process for Data Mining”  (CRISP-DM) because it calls out the need for basic Business Understanding of the problem first.  Basically there are 6 phases of the process

  1. Set the Business Objectives
  2. Find the Data
  3. Prep and Cleanse the Data
  4. Do the Machine Learning Work
  5. Evaluate the Model You Created – does it meet the Business Objectives?
  6. Deploy the Model

Need a picture?  Note the backwards arrows – Data Science is an iterative process.

Assess your Data Capabilities – Data Science needs Data.  Teams that try to predict outcomes without relevant data are setup for failure.  An example: let’s say that you would like to forecast demand for your products, in order to reduce your inventory.  You might start with basic sales data and find that you are not getting the  level of prediction accuracy you expected.  What other factors might be driving demand?  Customer Satisfaction might be one you decide to include.  But what if your company is not measuring customer satisfaction in any quantifiable way?  Data Science leaders need to understand the capabilities of their company (in effect, the Data Science customer) with respect to data assets, in order to effectively determine which business problems are ripe for prediction.

Outsource the Team – Data Science requires a very specialized set of skills.  You probably have some of those skills yourself: Computer Science, Statistics and an understanding of the principles behind Machine Learning.  These three are important, but equally important is Business and Domain Knowledge.  Do you have a team of resources which possess all four?  If you are working with a technology provider who already understands your business and who also has demonstrated capability in  delivering data science value – then outsourcing the work to that team becomes very attractive.  If you don’t have such a resource, consider a business and technology consulting partner such as Blum Shapiro Consulting.  Provided you already understand the CRISP-DM process, you’ll be able to effectively manage a seasoned team of business and data science pros.

Can Data Science increase your bottom line?  Improve Customer Loyalty?  Drive down costs?  Yes it can, provided you have a methodology to manage the work as a project, data to support it and a capable team.  If you’re convinced the opportunity is there, follow these tips and Data Science will have a strategic role within your company after your first big win!

 

 

The 4 Machine Learning Problems, Explained

Machine Learning and Predictive Analytics have been receiving a lot of attention lately!  Without question, this is an exciting technology with extremely broad applicability.  After all, who wouldn’t want to be able to predict the future?  Still, with hype comes confusion, and there is a lot of confusion today about what exactly Machine Learning is and how to use it.

I have good news!  There are really only 4 (yes, four) Machine Learning problems.  For anyone who wants to explore the value of Machine Learning, it’s important to understand them, because the first step in any Machine Learning process is to figure out which of these problems you are trying to solve.  Data Science teams address this question before they begin designing a Machine Learning model.  If your problem does not fit into one of these buckets, forget the hype! You’re better off taking a simpler approach.

Classification – in this machine learning problem, we’re trying  to figure out if some bit of data (an observation) represents something simple which we already understand (a Label).  This label can either be a Yes or No decision, (Two Class) or it can be one of a set of possible answers (Multi Class).  In order for this to work well, you need to provide the Machine Learning model with examples first.  Applications include:

  1. Facial Recognition – is this picture an image of my customer?
  2. Voice Recognition – what word is represented by this sound?
  3. Handwriting Recognition – which letter in the alphabet does this image represent?
  4. Fraud Detection – is this transaction fraudulent?
  5. Medical Outcomes – will this person have a stroke in the next year?
  6. Proactive Maintenance – will this piece of machinery fail in the next 72 hours?
  7. Credit Default Risk – will this borrower default on his/her loan?

Regression – in this machine learning problem, a Yes or No answer is not going to be enough.  In order to solve this problem, the machine needs to predict a value (i.e. a price, a temperature, a measurement) by understanding the numeric relationship of that value to other values (or Factors).  If you took Calculus, this might sound like a simple “Rate of Change” function: you’re on the right track.  Just as with Classification, Regression problems need some examples in order to work well.  Applications include:

  1. Cost Analysis – when will be the best time to buy something?
  2. Demand Prediction – how many widget’s will we sell next year?

Clustering – this is where things get complicated (!!)  With the first two problems, we have examples we can use to “train” our machines to predict a label AND we can test them with labeled observations (known to Data Scientists as “Ground Truth”).  But what if we don’t have a ground truth?  The best we can do is identify clusters of observations.   Fair warning: without ground truth, evaluating the results will be a challenge.  Still, some applications include:

  1. Grouping of Content – Grouping Today’s News into Categories, or Documents into Topics
  2. Materials Classification – take a Raw Materials Master File and organize it into a taxonomy
  3. Customer Segmentation – identify similar customers based upon purchase behavior

Recommender – have you ever been on a website which presented a recommendation of something you might “Like”?  Movie recommendations on Netflix, product recommendations on Amazon, or advertisements on your apps – if you are familiar with the internet, you probably understand the premise here.

That’s it.  Now you know how to recognize a problem which Machine Learning can help you with.   If your business problem does not fall into one of these four, you don’t need a machine learning model to solve it.  More importantly, if you know the factors which drive a business outcome, just build a model in Excel – you don’t need a Data Science team for that.

Good luck!