Archive for Big Data

6 Steps For Creating Golden Records

If you are an organization seeking to improve the quality of the data in your business systems, begin by automating the creation of Golden Records. What is a Golden Record? A Golden Record is the most accurate, complete and comprehensive representation of a master data asset (i.e. Customer, Product, Vendor). Golden Records are created by pulling together incomplete data about some “thing” from the systems in which they were entered. The System of Entry for a customer record may be a Customer Relationship Management (CRM) or Enterprise Resource Planning (ERP) system. Having multiple systems of entry for customer data can lead to poor quality of customer master data – even giving your employees bad information to work off of.

But why not simply integrate the CRM and ERP systems, so that each system has the same information about each customer? In theory, this is a perfect solution; in practice, it can be difficult to achieve. Consider these problems:

  1. What if there are duplicate records in the CRM? Should two records be entered into each ERP? Or the reverse: what if one CRM customer should generate two customer in the ERP (each with different pricing terms, for example)?
  2. What if one or more ERP systems require data to create a record, but that data is not typically (or ever) collected in the CRM? Should the integration process fail, what will be the remediation process?
  3. What if one of your ERP systems cannot accommodate the data entered in CRM or other systems? For example, what if one of your ERP systems cannot support international postal codes? Are you prepared to customize or upgrade that system?

There are many more compatibility issues that can occur. The more Systems of Entry you must integrate, the more likely you are to have many obstacles standing between you and full integration. If your business process assumptions change over time, the automated nature of systems integration itself can become a source of data corruption, as mistakes in one system are automatically mirrored in others.

Golden Record Management, by contrast, offers a significantly less risky approach. Golden Records are created in the Master Data Management (MDM) system, not in the business systems. This means that corrections and enhancements to the master data can be made without impacting your current operations.

6 Steps For Creating Golden Records

At a high level, the process of creating Golden Records looks like this:

  1. Create a model for your master data in the master data management system. This model should include all the key attributes MDM can pull from Systems of Entry that could be useful to creating a Golden Record.
  2. Load data into the model from the variety of SOE’s available. These can be business systems, spreadsheets, or external data sources. Maintain the identity of each record, so that you know where the data came from and how the SOE identifies it (for example, the System ID for the record).
  3. Standardize the attributes that will be used to create clusters of records. For Customers and Vendors, location and address information should be standardized.
  4. If possible, verify attributes that will be used to create clusters of records.
  5. Create clusters of records, by Matching key attributes, to create groups of master data records. The cluster identifier will be the Golden Record identifier. You can also think of this in terms of a hierarchy. The Golden Record is the Parent and the source records are the Children.
  6. Populate the Golden Record, created in MDM, with attributes from the records in its cluster (the source data). This final step, called Survivorship, requires a deeper understanding of how the source data was entered than the previous five steps. We want to create a Golden Record that contains all the best data. Therefore, we need to make some judgements about which of the SOE’s is also the best System of Record for a given attribute (or set of attributes).

Great! We’ve consolidated our master data, entered from a variety of systems, into one system which also contains a reference to a parent record, called the Golden Record. This Golden Record is our best representation of the “thing” we need to understand better.

But wait! The systems of entry, the systems your business USES to operate, have not been updated. Can you still take advantage of these Golden Records?

The answer is “yes” – you can take advantage of the Golden Records in two ways:

  1. As the basis for reporting, because each Golden Record is also a “roll-up” of real system records that are referenced by orders, returns, commissions, etc. Golden Records provide a foundation for consistent Enterprise Reporting.
  2. As the basis for data quality improvements in each system of entry, assuming these systems can import a batch of data and update existing records that match a system ID.

These benefits of Golden Records are gained without the high risk and high costs that come with systems integration. Further, if you have modeled your master data correctly, it is possible to automate the data quality benefits of Golden Records Management, by updating these systems in real-time. See how BlumShapiro can help with your master data needs and golden record creation.

Berry_Brian-240About Brian: Brian Berry leads the Microsoft Business Intelligence and Data Analytics practice at BlumShapiro. He has over 15 years of experience with information technology (IT), software design and consulting. Brian specializes in identifying business intelligence (BI) and data management solutions for upper mid-market manufacturing, distribution and retail firms in New England. He focuses on technologies which drive value in analytics: data integration, self-service BI, cloud computing and predictive analytics

The Value of Golden Records

Running multiple ERP systems simultaneously can be quite painful for any mid-size organization. Since each ERP maintains their own chart of accounts, financial consolidation and reporting can become all-consuming for the finance teams. When each ERP has its own Customer Master, sales team visibility into strategic accounts is limited, while smaller accounts receive terms that can become big problems for AR. These separate ERP systems lead to issues for other departments—marketing wants a single comprehensive product master; supply chain managers want a single comprehensive vendor master.

Obviously, there is hyperbole involved in my description. However, these are some of the many reasons executive management would like all business units working from a single ERP, with integrated financial reporting, consistent business processes for the whole company and lowered costs of operations.

So, you initiated a multi-year ERP implementation / migration / consolidation project.

At the outset, each ERP specialist is skeptical of the consolidation strategy. “Our ERP is tailored to our business unit” is a common argument for keeping each ERP running. When asked, “How’s the quality of the data?” the same ERP specialists may complain that the data quality is poor. Unfortunately, data problems don’t get better by maintaining the status quo.

Severe master data quality problems present an obstacle to an efficient ERP transition. Let’s think about the customer: if you were to bring all customer master records into a new system wholesale, you’d have many duplicated accounts. You’d have diverse naming convention issues. You’d have some accounts that refer to distribution centers, some to end users, some to drop ship locations. You’d have a wide variety of payment terms.

Get your ERP ambitions moving again, and focus on data quality in a way that enables the final goal—centralized and integrated business processes. Here’s how:

  1. Build Golden Records for Customer. A Golden Record is a representation of your master data, which is the fullest, cleanest and most accurate information available. They are created from consolidating master data from multiple Systems of Record (ERP’s and other systems), standardizing that data, verifying the accuracy where possible, and then building clusters of similar records. This process of matching facilitates the creation of Golden Records, which contain the best information from all the master data in the cluster.
  2. Do the same for Product
  3. Do the same for Vendor

Are you sensing a pattern? Provided your systems of record have a reasonable amount of data characterizing each row of data, similarity clusters can be built. Inaccurate, non-standard data makes the process a little harder, but feasible. Accounting Master Data (i.e., GL Accounts) further benefit from a Uniform Chart of Accounts, to which all other systems may be mapped.

Golden Records Management is a non-intrusive, low-risk tool for accelerating the ERP migration process. Building Golden Records is repeatable for many types of master data and provides a means for preparing the best possible data for import into any new system. In Part 2, I’ll talk about how Golden Records and Master Data Management deliver a perpetual framework for Data Quality, extending the lifetime of legacy systems.

Want to learn more about the impact of master data on your organization? Join us on December 6 in Hartford, CT for our half-day workshop Discovering the Value in Your Data. Hear from data governance experts from BlumShapiro Consulting and Profisee as they address key topics for business, finance and technology leaders on data and master data management.

Berry_Brian-240About Brian: Brian Berry leads the Microsoft Business Intelligence and Data Analytics practice at BlumShapiro. He has over 15 years of experience with information technology (IT), software design and consulting. Brian specializes in identifying business intelligence (BI) and data management solutions for upper mid-market manufacturing, distribution and retail firms in New England. He focuses on technologies which drive value in analytics: data integration, self-service BI, cloud computing and predictive analytics

See the Impact Digital Transformation Can Have on Your Bottom Line

Digital Transformation has become an industry buzzword. We’re here to clarify what it means in dollars and cents.

Digital transformation represents an organizational change where data becomes relevant and valuable. Once transformed, these organizations use data to improve decision making, connect with their customers, improve vendor relationships and allow employees to provide higher level skills and value to the organization.

Digitally transformed organizations think about their products and services in both a physical and digital space, use technology to improve customer service, often have an enhanced perspective of their market and how their business model operates within that redefined market.

We believe that digital transformation is a qualification to compete in today’s business environment.

The question people often ask next is – how much does this cost?  We posit the answer to that question is nothing.  The cost (and risk) is in remaining stagnant. Digital transformation uncovers assets previously underutilized by the organization. The proper investments in digital transformation will only empower your organization to survive and thrive – and when done properly it should yield an immediate and direct ROI that returns value back to the organization straightaway.

We’ve developed a Digital Transformation Accounting Worksheet ROI calculator for you to experiment with. Punch in your numbers and let us know what you think.  We would be happy to discuss your digital transformation in more detail.

About Noah:

240-Ullman,-NoahNoah is the Director of Business Development for BlumShapiro’s Technology Consulting Group. He brings over 25 years of business experience from entrepreneurial start ups, to over a decade of working at Microsoft in various sales, marketing and business development roles. Noah has launched Windows XP, Office XP, Tablet PC, Media Center PC, MSN Direct Smartwatches (an early IoTattempt), several videogames, a glove controller, and a wine import company/brand. Noah spent three years living overseas building out Microsoft’s Server and Tools business in Eastern Europe working with the IT Pro and developer communities. He considers himself a futurist, likes science fiction and loves applying what was recently science fiction to real world problems and opportunities. 

Create a Pareto Chart in Power BI

“Baseball is 90% mental, and the other half is physical.” – Yogi Berra

You just have to love Yogi Berra quotes like this. We all pretty much know what he’s talking about, even if his math is not spot on. It’s a restatement of the Pareto Principle, the 80/20 rule! It applies to just about anything in life or business. If I had to write a definition of it for technical documentation, it would look something like this:

“A situation where eighty percent of events attributed to a group are caused by twenty percent of the members of the group.”

Re-stated as examples:

  • Eighty percent of your human resource issues are caused by twenty percent of your employees.
  • Eighty percent of your maintenance issues are caused by twenty percent of your equipment.
  • Eighty percent of your sales are attributed to twenty percent of your products.
  • Eighty percent of the wealth is controlled by twenty percent of the population.

 

(And I’m certainly not in that last twenty percent. If I was, I wouldn’t have to write articles like this one!)

Now that we’ve got an understanding of the principle, let’s look at how it can be visualized. Excel has a very simple wizard for creating a Pareto Chart that can be found on the Insert menu:

But we want one of these in Power BI. And Power BI doesn’t have one (yet, maybe later). We’ll need to ‘roll our own’.  Let’s discuss the various parts of the chart itself, so we know what we’re shooting for.

  • The categories, or series, at the bottom (One, Two, Three, etc.) represent the different members of the ‘group’ we are trying to analyze. They may be employees, machines on the manufacturing line, or products in our catalog.
  • The blue bar above each ‘member’ is its respective measurement (count of HR issues, money spent on maintenance, annual sales, etc.). The scale of this measurement is on the left side, in our case going from 0 to 120.
  • The final element is the curved line and the right-side scale measuring from zero to one hundred percent. This represents, at each category member, the percentage of the cumulative total of all members to the left of the member in question, inclusive. To put it another way, as we add each category’s number to the running total of those on its left, the line represents that running total divided by the entire total for all members of the category.

The arrow points to the spot on the line where it crosses 80%, in our case, after about the first four members, as can be seen by following the green dashed line from right to left, then down. The first four members would be the ‘twenty percent’ of the Pareto Principle, and their cumulative measure would be the eighty.

Note: Math wizards may point out that four members divided by a total member count of fifteen is closer to thirty percent than twenty, but remember that this is a rule of thumb, and we all know that some thumbs are bigger or smaller than others.

To plot some data in a Pareto Chart, we’ll need a couple of pieces of information from it:

  • Each member’s respective total
  • The grand total
  • The running total at each member, sorted from largest to smallest
  • The percentage that running total represents compared to the grand total

Now that we understand what we’re shooting for, let’s get started.

If your data includes a running sum of the measurement for each member, sorted by the respective member’s measurement, then you’re golden and can skip to the section titled Add the Grand Total and Running Percent. Your data may include a Ranking column so you may be able to skip the respective steps in each of the following two sections. For the rest of you, keep reading. We’ll look at two approaches to getting the intermediate bits of data: Power Query (M), and DAX.

Create the Rank and Running Sum in Power Query

Let’s start with some simple data in Excel, in fact the same data used to generate the Excel Pareto chart we used to explain the concepts:

We’ll load this data (it’s in an Excel table called “Table1”) and edit it in the Power BI Query Editor. First, we need to sort the data by the [Measure] column, sorted descending. Click the down-arrow next to the Measure column title and select Sort Descending.

Next, on the Add Column menu, select Index Column. Keep the defaults of Starting Index of 1 and Increment of 1.

I renamed my column to [Power Query Rank] to differentiate it from ranking step we’ll introduce in the model later via DAX.

Next, we’ll add the running total as a Custom Column with a formula as shown below:

Hint: If you can’t read the formula from the screen shot, it is:

= Table.Range ( #”Renamed Columns”, 0, [Power Query Rank] )

Attribution should go to Sam Vanga and SQL Server Central for this bit of M code: http://www.sqlservercentral.com/blogs/samvangassql/

The Power Query function Table.Range can be explained like this: Given a table of data, in our case the last of our query steps, a.k.a. #”Renamed Columns”, start at the 0 row (top), and go down the number of rows represented by the value in column [Power Query Rank]. The result is a table associated with each row in the query. The first row of the query has a table with one row of data in it. The second row has a table with two rows, and so forth. This table is represented by the word “Table” on each row of the column we just added.

From here, click on the ‘expand’ arrow in the column header and select the Aggregate radio button, check off the “Sum of Measure” column, and un-check “Use original column name as prefix”:

I renamed the resulting column [Power Query Running Total] (not shown).

Click Close and Apply on the Home menu.

Create the Rank and Running Sum in DAX

As with all things Microsoft, there is more than one way to accomplish a goal. In our case, the goal is to get the running total, and just like before, we’ll need the ranking first. For this exercise, we’ll be using DAX instead of Power Query, but should get the same results.

Create a column with the formula as follows:

DAX Rank = RANKX (All ( Table1 ), [Measure] )

Next, create the DAX Running Total measure as:

DAX Running Total =

CALCULATE (

SUM ( Table1[Measure] ),

FILTER (

ALLSELECTED ( Table1 ),

Table1[DAX Rank] <= MAX ( Table1[DAX Rank] )

)

)

This DAX formula does pretty much the same thing as the Power Query Range.Table function above, the only difference is that it includes the aggregate within, eliminating the need for an extra column.

Note: Know the difference between Columns and Measures in DAX. Mistaking the two will cause error, frustration, and hair loss.

Plotting all these columns and measures on a simple table visual shows that Power Query and DAX come up with the same answers for Rank and Running Total, a good sanity check. Also, the ranks are easy to verify as to accuracy, and with a little mental math, running totals are as well. I had to re-format some of the numbers to make them show without decimals.

Add the Grand Total and Running Percentage

There’s two more pieces we need: [Grand Total] which is self-explanatory, and [Running Percent], which is the ‘percentage of the [Running Total] compared to the [Grand Total]’. These can only be done in DAX. Add a measure as follows:

Grand Total = CALCULATE ( SUM ( Table1[Measure] ) , ALL ( Table1 ) )

This calculates the Grand Total and makes it available at every slice (row of each Member).

Now add the last item, a column with the expression:

Running Percent = [Power Query Running Total] / [Grand Total]

Or:

Running Percent = DIVIDE ( [Power Query Running Total] , [Grand Total] )

Note: The column [DAX Running Total] would work just as well as its Power Query equivalent since we know it has the same number.

Format this last one as a percent.

Create the Chart

Now the fun part. For this we’ll need either a “Line and Stacked Column Chart” or a “Line and Clustered Column Chart”. This is the easiest part of the whole exercise:

  • The Shared Axis is the [Member] column (“One”, “Two”, “Three”, etc.)
  • The Column values is the [Measure] column
  • The Line values in the [Running Percent] column

Like I said, simple if you have all of the data pieces in front of you.

Need help getting the right data pieces? Not sure what charts you can generate from the data pieces you have? There’s probably a way to get to where you want to be. Reach out to our team of data scientists at BlumShapiro Consulting to learn more about how data can help guide your organization into the future.