Our 5 Rules of Data Science

In manufacturing, the better the raw materials, the better the product. The same goes for data science, where a team cannot be effective unless the raw materials of data science are available to them. In this realm, data is the raw material which produces a prediction. However, raw materials alone are not sufficient. Business people who oversee machine learning teams must demand that best practices be applied, otherwise investments in machine learning will produce dubious business results. These best practices can be summarized into our five rules of data science.

For the purpose of illustration, let’s assume the data science problem our team is working on is related to the predictive maintenance of equipment on a manufacturing floor. Our team is working on helping the firm predict equipment failure, so that operations can replace the equipment before it impacts the manufacturing process.

1. Have a Sharp Question

A sharp question is specific and unambiguous. Computers do not appreciate nuance. They are not able to classify events into yes/no buckets if the question is: “Is Component X ready to fail?” Nor does the question need to concern itself with causes. Computers do not ask why – they calculate probability based upon correlation. “Will component X overheat?” is a question posed by a human who believes that heat contributes to equipment failure. A better question is: “Will component X fail in the next 30 minutes?”

2. Measure at the Right Level

Supervised learning requires real examples from which a computer can learn. The data you use to produce a successful machine learning model must demonstrate cases where failure has occurred. It must also demonstrate examples where equipment continues to operate smoothly. We must be able to unambiguously identify events that were failure events, otherwise, we will not be able to train the machine learning model to classify data correctly.

3. Make Sure Your Data is Accurate

Did a failure really occur? If not, the machine learning model will not produce accurate results. Computers are naïve – they believe what we tell them. Data science teams should be more skeptical, particularly when they believe they have made a breakthrough discovery after months of false starts. Data science leaders should avoid getting caught up in the irrational exuberance of a model that appears to provide new insight. Like any scientific endeavor, test your assumptions, beginning with the accuracy and reliability of the observations you started with to create the model.

4. Make Sure Your Data is Connected

The data used to train your model may be anonymized, because factors that correlate closely to machine failure are measurements, not identifiers. However, once the model is ready to be used, the new data must be connected to the real world – otherwise, you will not be able to take action. If you have no central authoritative record of “things”, you may need to develop a master data management solution before your Internet of Things with predictive maintenance machine learning can yield value. Also, your response to a prediction should be connected. Once a prediction of failure has been obtained, management should already know what needs to happen – use insights to take swift action.

5. Make Sure You Have Enough Data

The accuracy of predictions improve with more data. Make sure you have sufficient examples of both positive and negative outcomes, otherwise it will be difficult to be certain that you are truly gaining information from the exercise.

The benefits of predictive maintenance, and other applications of machine learning, are being embraced by businesses everywhere. For some, the process may appear a bit mysterious, but it needn’t be. The goal is to create a model which, when fed real-life data, improves the decision making of the humans involved in the process. To achieve this, data science teams need the right data and the right business problem to solve. Management should work to ensure that these five questions are answered to their satisfaction before investing in data science activities.

Applied Machine Learning: Optimizing Patient Care in Hospitals, Profitably

Why are so many industries exploring Machine Learning as a means of delivering innovation and value?  In my view, the technology speaks to a primitive urge – machine learning is like having a crystal ball, telling you what will happen next.  For a business, it can convey information about a customer before they introduce themselves.  On a personal level, when I consider what I would like to have information about in advance, the first thing that springs to mind is obvious: my health.  Am I about to get sick?  How can I improve my wellness and overall health?  If you are wearing a Fit-Bit right now, then you probably agree with me.

In my last blog, I shared some real-world examples for how the Hospitality Industry applies Machine Learning.  What about Health Care and Hospitals?  While hospitals have similar challenges, in that they accommodate guests who stay overnight, the objectives in health care are quite different, and changing rapidly.   The Affordable Care Act is driving new business models, incentivizing outcome based reimbursement as opposed to volume based reimbursement.  Unlike hotels, today’s hospitals are interested in ensuring their guests do not have to return, at least not in the short term.   They also need to manage costs in a way they have not been incentivized to do in the past.  Hospitals across the country are considering how predictive analytics can have a  meaningful impact on operations, leading to improved patient health and improving the bottom line.

The cost savings opportunity for health care providers is startling. Here are just three examples:

Reducing Hospital Readmissions – in 2014, Medicare fined 2,610 hospitals $428 million for having high hospital readmission rates. Leaving actual fines aside, industry analysts estimate that the overall cost of preventable readmissions approaches $25 Billion annually. As a result, hospital systems all over the nation are mobilizing to intervene, using ML to identify risk factors which are highly predictive of readmission. Carolinas Healthcare System, partnering with Microsoft, did just that. Using data from 200,000 patient-discharge records, they created a predictive model deliver customized discharge planning, saving the hospital system hundreds of thousands of dollars annually. Read the article in Healthcare IT News.

Clinical Variation Management – Mercy Hospital is partnering with Ayasdi to find the optimal care path for common surgical procedures. Using knee replacement as an example, the Clinical Variation Management software helps hospital administrators find clusters of patient outcomes, then enables the exploration of those clusters in order to correlate a metric (i.e. Length of Stay) with a certain regiment or activity. Watch this video to learn how Mercy Hospital saved $50 million by applying Machine Learning to an extremely common procedure.

Improving Population Health – Dartmouth Hitchcock, a healthcare network affiliated with Dartmouth University, is piloting a remote monitoring system for patients requiring chronic care. 6,000+ patients are permitting the hospital to collect biometric data (i.e. blood pressure, temperature, etc.) in order that nurses and health coaches can monitor their vital signs, and machines can predict good days and bad days. Quite the opposite of hospitality: Dartmouth Hitchcock is trying to keep the guests from needing to checking in! Read more about the Case Study from Microsoft.

Are machines taking over for physicians? No. The Patient – Physician relationship remains (and I think will always remain) central to the delivery of personal health care. However, it seems clear that ACA is providing significant rewards to health care providers who manage population risk better. Machines can help here: through data, machine learning can find risk “hiding in the data”.

Contact Blum Shapiro Consulting to learn more about how Azure Machine Learning can curtail hospital readmissions, identify variations in common clinical procedures and improve patient population health.

3 Tips to Jump Start your Data Science Plan

Are you looking to form a Data Science capability at your company?

If you answered, yes, then you probably already get the Machine Learning concept (The 4 Machine Learning Problems).  Maybe you are coming from either a Statistics or Computer Science background.  Either way, you see the potential of Data Science and Predictive Analytics and you’re ready to demonstrate some tangible benefit to management.

How are you getting started?  I’m hearing about two core hurdles:

  1. We’re looking for a great business problem to solve, one which could reasonably be solved with data the business already collects
  2. Our internal resources have very little practical experience working on a formal data science team, and don’t understand how it aligns to more traditional project teams

Time to Value is critical, but you need to do it in a way that has a formal process for managing risk, one which can be communicated inside and outside the team.  Here are the things you want to have in place, in order to launch your first project.

Establish Your Data Science Methodology – every project has a project plan and data science projects are no different.  What should the Data Science one look like?  Several teams of very smart people have already asked this question and independently arrived at the same conclusion.  My favorite is the “Cross Industry Standard Process for Data Mining”  (CRISP-DM) because it calls out the need for basic Business Understanding of the problem first.  Basically there are 6 phases of the process

  1. Set the Business Objectives
  2. Find the Data
  3. Prep and Cleanse the Data
  4. Do the Machine Learning Work
  5. Evaluate the Model You Created – does it meet the Business Objectives?
  6. Deploy the Model

Need a picture?  Note the backwards arrows – Data Science is an iterative process.

Assess your Data Capabilities – Data Science needs Data.  Teams that try to predict outcomes without relevant data are setup for failure.  An example: let’s say that you would like to forecast demand for your products, in order to reduce your inventory.  You might start with basic sales data and find that you are not getting the  level of prediction accuracy you expected.  What other factors might be driving demand?  Customer Satisfaction might be one you decide to include.  But what if your company is not measuring customer satisfaction in any quantifiable way?  Data Science leaders need to understand the capabilities of their company (in effect, the Data Science customer) with respect to data assets, in order to effectively determine which business problems are ripe for prediction.

Outsource the Team – Data Science requires a very specialized set of skills.  You probably have some of those skills yourself: Computer Science, Statistics and an understanding of the principles behind Machine Learning.  These three are important, but equally important is Business and Domain Knowledge.  Do you have a team of resources which possess all four?  If you are working with a technology provider who already understands your business and who also has demonstrated capability in  delivering data science value – then outsourcing the work to that team becomes very attractive.  If you don’t have such a resource, consider a business and technology consulting partner such as Blum Shapiro Consulting.  Provided you already understand the CRISP-DM process, you’ll be able to effectively manage a seasoned team of business and data science pros.

Can Data Science increase your bottom line?  Improve Customer Loyalty?  Drive down costs?  Yes it can, provided you have a methodology to manage the work as a project, data to support it and a capable team.  If you’re convinced the opportunity is there, follow these tips and Data Science will have a strategic role within your company after your first big win!