Tag Archive for Data Science

Two Key Benefits of HR Analytics

In my last article, I wrote about the definition of HR Analytics and the skills needed to be successful in this field. In this article, I want to discuss two key benefits of HR analytics to the HR function in an organization and to the business: Evidence Based Decisions and Reducing Human Bias.

HR professionals want to be strategic partners with business leaders, not simply a cost center designed to maintain policies and procedures. While these policies are important, analytics provides HR with a means to demonstrably improve the efficiency of a company’s people resources. It does this in several ways.

Evidence Based Management Decisions

Through its dependence upon data and facts, HR Analytics delivers evidence, and evidence trumps intuition. To support these benefits, I’ll ask two questions:

Is your interview process optimized to find the best candidate for a position?

If you have ever participated in an interview process from the hiring perspective, you may be aware that at many companies, interviewing candidates can be an informal, non-standardized process. At worst, interviewees are simply asked by HR “What did you think?” More sophisticated HR methodologies define a standardized process for who the candidate meets and what questions are asked. At each stage, feedback is collected and quantified, typically in the form of ratings. Are these ratings predictive of future performance in the job role to be filled? HR Analytics can tell you the factors that are predictive of high performers in certain job roles (or tell you that you don’t know and that you should either change your process or collect different data points).

Does internal employee training improve company performance?

Most HR professionals would say ”Yes, employee training is a good thing and we need to do it.“ Many top companies spend precious resources to train their sales staff or send aspiring leaders to leadership training. Does this training have a material impact on performance? On the company’s bottom line? HR Analytics aspires to quantify that benefit. To do this, we may need to pull together data from several systems, such as on-the-job performance data, financial data and data collected during the training process. We should define the performance metrics that are most important in that job role. We must also consider a baseline of performance (i.e., comparable employees who were not able to take the training). By taking a more scientific approach, we can quantify the benefit and produce evidence of impact. We may also demonstrate that certain training is ineffective.

Reducing Human Bias

If you have read Michael Lewis’s book The Undoing Project, then you know about the work done by psychologists in the last 50 years to explain how bias interrupts the human mind’s ability to perceive information. Literally, our personal bias leads us to see things that simply are not there. We all have expectations, and these expectations are based upon hard won human experience—most of which has served us very well in life. But in the case of making HR judgments, or indeed any judgement requiring us to process large amounts of information, bias is quite detrimental.

In the questions/examples provided above, we see the opportunity for human bias to creep into common HR processes and potentially undermine them. First, let’s examine the interviewing process. As people, we may have expectations about how a qualified candidate dresses, how they speak, and which personality traits are most prominent in a good candidate. These are likely informed by our own experience, and colleagues who may have made a deep impression on us. Just as likely, information contradicting the same bias is dismissed. This means that our human minds are not able to process large amounts of information in a uniform and objective manner. When applied correctly, HR analytics can do this much better.  For example, an HR analytics team would consider data collected during the evaluation phase and performance data for successful applicants; in other words, before and after hire. Hopefully, many applicants become very successful at your firm, but you also know that many do not. We can apply a label certain to each candidate profile, recognizing that the candidate either was or was not successful.  We can then train our analytics algorithms to learn what a successful employee will look like, mathematically, at hire time and reduce our human bias. Bear in mind that bias can still creep into the process, if interviewers fail to recognize the need for standardization and quantification.

Similarly, as it relates to evaluating training against performance, we see an opportunity for bias to lead to conclusions that are false, or at least for which there is no evidence. Business leaders can (and should) demand this evidence from HR, so that they know that capital is being deployed correctly in support of the firm’s financial well-being. To be clear, it can be very difficult to prove causation between training and financial ratios (i.e., that training causes an increase in Net Income). However, HR should be able to provide evidence demonstrating correlation between employees who perform well on the job (be that metric in sales figures or on-time delivery) and those who attend certain training activities. When HR provides evidence of this correlation, it becomes a strategic partner with business leaders, helping them see and understand the patterns in human behavior.

See Differently, Know the Facts

Analytics offers HR professionals an opportunity to approach decision making differently. Measurements and quantification of candidate and employee characteristics and performance can provide evidence of correlation between the policies HR is supporting and the outcomes the business seeks to drive. By thinking differently about HR, we can reduce our propensity to see things that are not there, replacing that vision with a clear eyed, scientific, data-driven approach.

Want to learn more about the world of HR Analytics? We are speaking at this year’s CBIA Human Resources Conference on the topic. We hope to see you there!

Berry_Brian-240About Brian: Brian Berry leads the Microsoft Business Intelligence and Data Analytics practice at BlumShapiro. He has over 15 years of experience with information technology (IT), software design and consulting. Brian specializes in identifying business intelligence (BI) and data management solutions for upper mid-market manufacturing, distribution and retail firms in New England. He focuses on technologies which drive value in analytics: data integration, self-service BI, cloud computing and predictive analytics

5 Critical Skillsets for HR Analytics

Increasingly, companies are applying analytics and data science procedures to new areas of their business. Human Resources (HR) management, with its central role in managing the People in a business, is one such area. HR Analytics is a fact-based approach to managing people. A fact-based approach helps organizations validate their assumptions about how best to manage their people. This makes good business sense: on average, companies spend 70% of their budget on personnel expenses.

Using data and statistical methods, HR may look to examine people-oriented questions, such as:

  • Can we better understand employee absenteeism rates at a labor-intensive business, such as retail, food service or industrial manufacturing? Can we predict it?
  • Do our compensation realities reflect fair and balanced job classification policies? Asked differently, which factors are most predictive of compensation: ones we want to reward (i.e. education level, on-the-job performance) or ones we need to ignore (i.e. gender, age or race)?
  • What is our real employee churn rate? Can we identify employees headed out the door and take preventive steps?
  • Are our service response times keeping pace with spikes in customer demand?

These questions, and many more, can be answered with datasets, data science and statistics.  But how?  Analytics involves skill sets that go beyond those considered “traditional.” Knowledge of recruitment, hiring, firing and compensation are key to understanding HR processes. However, HR professionals often struggle to answer these questions in a data-driven manner, because they lack the diverse skills required to perform advanced analytics. These skills include statistical and data analytical techniques, data aggregation, and mathematical modelling. Finding the right data can be another challenge. Data analytics requires data, and that data is likely to reside in several different systems. IT professionals play a critical role. Finally, communication to the business is a key skill. HR Analytics projects may produce analysis and models that contradict conventional wisdom.  Action on these insights requires the team to communicate the what, why and how’s of Data Science.

To be successful, HR Analytics projects require five distinct skillsets to be successful in creating value for an organization.

  • Without Business input, HR Analytics projects may answer questions with no value added to the organization.
  • Without Marketing input, insights from HR Analytics will fail to be adopted by the business.
  • Without HR input, the team will struggle to recognize relevant data and interpret the outcomes.
  • Without Data Analytics input, analysis will be “stuck in first gear” – producing basic descriptive statistics (i.e. Averages and Totals), but never advancing to diagnostic (i.e. root cause) or predictive (i.e. Machine Learning) models.
  • Without IT input, the team struggles to acquire relevant data in a usable format.

HR leaders must engage all the required perspectives and skillsets to be successful with analytics. Business, marketing, HR and IT are common perspectives found in most organizations. But Data Analytics professionals, able to cleanse data, identify candidate predictive models and evaluate model output, are typically lacking.  We encourage HR professionals, interested in learning more about The Power of Data, to reach out to our Data Analytics Advisory Services team. Our goal is to help you understand the data science process, identify business opportunities, and potentially offer analytics services that fill in the missing pieces for your puzzle.

Want to learn more about the world of HR Analytics?  We are speaking at this year’s CBIA Human Resources Conference on the topic. We hope to see you there!

Berry_Brian-240About Brian: Brian Berry leads the Microsoft Business Intelligence and Data Analytics practice at BlumShapiro. He has over 15 years of experience with information technology (IT), software design and consulting. Brian specializes in identifying business intelligence (BI) and data management solutions for upper mid-market manufacturing, distribution and retail firms in New England. He focuses on technologies which drive value in analytics: data integration, self-service BI, cloud computing and predictive analytics

Create a Pareto Chart in Power BI

“Baseball is 90% mental, and the other half is physical.” – Yogi Berra

You just have to love Yogi Berra quotes like this. We all pretty much know what he’s talking about, even if his math is not spot on. It’s a restatement of the Pareto Principle, the 80/20 rule! It applies to just about anything in life or business. If I had to write a definition of it for technical documentation, it would look something like this:

“A situation where eighty percent of events attributed to a group are caused by twenty percent of the members of the group.”

Re-stated as examples:

  • Eighty percent of your human resource issues are caused by twenty percent of your employees.
  • Eighty percent of your maintenance issues are caused by twenty percent of your equipment.
  • Eighty percent of your sales are attributed to twenty percent of your products.
  • Eighty percent of the wealth is controlled by twenty percent of the population.


(And I’m certainly not in that last twenty percent. If I was, I wouldn’t have to write articles like this one!)

Now that we’ve got an understanding of the principle, let’s look at how it can be visualized. Excel has a very simple wizard for creating a Pareto Chart that can be found on the Insert menu:

But we want one of these in Power BI. And Power BI doesn’t have one (yet, maybe later). We’ll need to ‘roll our own’.  Let’s discuss the various parts of the chart itself, so we know what we’re shooting for.

  • The categories, or series, at the bottom (One, Two, Three, etc.) represent the different members of the ‘group’ we are trying to analyze. They may be employees, machines on the manufacturing line, or products in our catalog.
  • The blue bar above each ‘member’ is its respective measurement (count of HR issues, money spent on maintenance, annual sales, etc.). The scale of this measurement is on the left side, in our case going from 0 to 120.
  • The final element is the curved line and the right-side scale measuring from zero to one hundred percent. This represents, at each category member, the percentage of the cumulative total of all members to the left of the member in question, inclusive. To put it another way, as we add each category’s number to the running total of those on its left, the line represents that running total divided by the entire total for all members of the category.

The arrow points to the spot on the line where it crosses 80%, in our case, after about the first four members, as can be seen by following the green dashed line from right to left, then down. The first four members would be the ‘twenty percent’ of the Pareto Principle, and their cumulative measure would be the eighty.

Note: Math wizards may point out that four members divided by a total member count of fifteen is closer to thirty percent than twenty, but remember that this is a rule of thumb, and we all know that some thumbs are bigger or smaller than others.

To plot some data in a Pareto Chart, we’ll need a couple of pieces of information from it:

  • Each member’s respective total
  • The grand total
  • The running total at each member, sorted from largest to smallest
  • The percentage that running total represents compared to the grand total

Now that we understand what we’re shooting for, let’s get started.

If your data includes a running sum of the measurement for each member, sorted by the respective member’s measurement, then you’re golden and can skip to the section titled Add the Grand Total and Running Percent. Your data may include a Ranking column so you may be able to skip the respective steps in each of the following two sections. For the rest of you, keep reading. We’ll look at two approaches to getting the intermediate bits of data: Power Query (M), and DAX.

Create the Rank and Running Sum in Power Query

Let’s start with some simple data in Excel, in fact the same data used to generate the Excel Pareto chart we used to explain the concepts:

We’ll load this data (it’s in an Excel table called “Table1”) and edit it in the Power BI Query Editor. First, we need to sort the data by the [Measure] column, sorted descending. Click the down-arrow next to the Measure column title and select Sort Descending.

Next, on the Add Column menu, select Index Column. Keep the defaults of Starting Index of 1 and Increment of 1.

I renamed my column to [Power Query Rank] to differentiate it from ranking step we’ll introduce in the model later via DAX.

Next, we’ll add the running total as a Custom Column with a formula as shown below:

Hint: If you can’t read the formula from the screen shot, it is:

= Table.Range ( #”Renamed Columns”, 0, [Power Query Rank] )

Attribution should go to Sam Vanga and SQL Server Central for this bit of M code: http://www.sqlservercentral.com/blogs/samvangassql/

The Power Query function Table.Range can be explained like this: Given a table of data, in our case the last of our query steps, a.k.a. #”Renamed Columns”, start at the 0 row (top), and go down the number of rows represented by the value in column [Power Query Rank]. The result is a table associated with each row in the query. The first row of the query has a table with one row of data in it. The second row has a table with two rows, and so forth. This table is represented by the word “Table” on each row of the column we just added.

From here, click on the ‘expand’ arrow in the column header and select the Aggregate radio button, check off the “Sum of Measure” column, and un-check “Use original column name as prefix”:

I renamed the resulting column [Power Query Running Total] (not shown).

Click Close and Apply on the Home menu.

Create the Rank and Running Sum in DAX

As with all things Microsoft, there is more than one way to accomplish a goal. In our case, the goal is to get the running total, and just like before, we’ll need the ranking first. For this exercise, we’ll be using DAX instead of Power Query, but should get the same results.

Create a column with the formula as follows:

DAX Rank = RANKX (All ( Table1 ), [Measure] )

Next, create the DAX Running Total measure as:

DAX Running Total =


SUM ( Table1[Measure] ),



Table1[DAX Rank] <= MAX ( Table1[DAX Rank] )



This DAX formula does pretty much the same thing as the Power Query Range.Table function above, the only difference is that it includes the aggregate within, eliminating the need for an extra column.

Note: Know the difference between Columns and Measures in DAX. Mistaking the two will cause error, frustration, and hair loss.

Plotting all these columns and measures on a simple table visual shows that Power Query and DAX come up with the same answers for Rank and Running Total, a good sanity check. Also, the ranks are easy to verify as to accuracy, and with a little mental math, running totals are as well. I had to re-format some of the numbers to make them show without decimals.

Add the Grand Total and Running Percentage

There’s two more pieces we need: [Grand Total] which is self-explanatory, and [Running Percent], which is the ‘percentage of the [Running Total] compared to the [Grand Total]’. These can only be done in DAX. Add a measure as follows:

Grand Total = CALCULATE ( SUM ( Table1[Measure] ) , ALL ( Table1 ) )

This calculates the Grand Total and makes it available at every slice (row of each Member).

Now add the last item, a column with the expression:

Running Percent = [Power Query Running Total] / [Grand Total]


Running Percent = DIVIDE ( [Power Query Running Total] , [Grand Total] )

Note: The column [DAX Running Total] would work just as well as its Power Query equivalent since we know it has the same number.

Format this last one as a percent.

Create the Chart

Now the fun part. For this we’ll need either a “Line and Stacked Column Chart” or a “Line and Clustered Column Chart”. This is the easiest part of the whole exercise:

  • The Shared Axis is the [Member] column (“One”, “Two”, “Three”, etc.)
  • The Column values is the [Measure] column
  • The Line values in the [Running Percent] column

Like I said, simple if you have all of the data pieces in front of you.

Need help getting the right data pieces? Not sure what charts you can generate from the data pieces you have? There’s probably a way to get to where you want to be. Reach out to our team of data scientists at BlumShapiro Consulting to learn more about how data can help guide your organization into the future.

Our 5 Rules of Data Science

In manufacturing, the better the raw materials, the better the product. The same goes for data science, where a team cannot be effective unless the raw materials of data science are available to them. In this realm, data is the raw material which produces a prediction. However, raw materials alone are not sufficient. Business people who oversee machine learning teams must demand that best practices be applied, otherwise investments in machine learning will produce dubious business results. These best practices can be summarized into our five rules of data science.

For the purpose of illustration, let’s assume the data science problem our team is working on is related to the predictive maintenance of equipment on a manufacturing floor. Our team is working on helping the firm predict equipment failure, so that operations can replace the equipment before it impacts the manufacturing process.

Our 5 Rules of Data Science

1. Have a Sharp Question

A sharp question is specific and unambiguous. Computers do not appreciate nuance. They are not able to classify events into yes/no buckets if the question is: “Is Component X ready to fail?” Nor does the question need to concern itself with causes. Computers do not ask why – they calculate probability based upon correlation. “Will component X overheat?” is a question posed by a human who believes that heat contributes to equipment failure. A better question is: “Will component X fail in the next 30 minutes?”

2. Measure at the Right Level

Supervised learning requires real examples from which a computer can learn. The data you use to produce a successful machine learning model must demonstrate cases where failure has occurred. It must also demonstrate examples where equipment continues to operate smoothly. We must be able to unambiguously identify events that were failure events, otherwise, we will not be able to train the machine learning model to classify data correctly.

3. Make Sure Your Data is Accurate

Did a failure really occur? If not, the machine learning model will not produce accurate results. Computers are naïve – they believe what we tell them. Data science teams should be more skeptical, particularly when they believe they have made a breakthrough discovery after months of false starts. Data science leaders should avoid getting caught up in the irrational exuberance of a model that appears to provide new insight. Like any scientific endeavor, test your assumptions, beginning with the accuracy and reliability of the observations you started with to create the model.

4. Make Sure Your Data is Connected

The data used to train your model may be anonymized, because factors that correlate closely to machine failure are measurements, not identifiers. However, once the model is ready to be used, the new data must be connected to the real world – otherwise, you will not be able to take action. If you have no central authoritative record of “things”, you may need to develop a master data management solution before your Internet of Things with predictive maintenance machine learning can yield value. Also, your response to a prediction should be connected. Once a prediction of failure has been obtained, management should already know what needs to happen – use insights to take swift action.

5. Make Sure You Have Enough Data

The accuracy of predictions improve with more data. Make sure you have sufficient examples of both positive and negative outcomes, otherwise it will be difficult to be certain that you are truly gaining information from the exercise.

The benefits of predictive maintenance, and other applications of machine learning, are being embraced by businesses everywhere. For some, the process may appear a bit mysterious, but it needn’t be. The goal is to create a model which, when fed real-life data, improves the decision making of the humans involved in the process. To achieve this, data science teams need the right data and the right business problem to solve. Management should work to ensure that these five questions are answered to their satisfaction before investing in data science activities.

Not sure if you have the right raw materials? Talk to BlumShapiro Consulting about your machine learning ambitions. Our technology team is building next generation predictive analytics solutions that connect to the Internet of Things. We are helping our clients along each step of their digital transformation journey.

Berry_Brian-240About Brian: Brian Berry leads the Microsoft Business Intelligence and Data Analytics practice at BlumShapiro. He has over 15 years of experience with information technology (IT), software design and consulting. Brian specializes in identifying business intelligence (BI) and data management solutions for upper mid-market manufacturing, distribution and retail firms in New England. He focuses on technologies which drive value in analytics: data integration, self-service BI, cloud computing and predictive analytics