Tag Archive for Big Data

6 Steps For Creating Golden Records

If you are an organization seeking to improve the quality of the data in your business systems, begin by automating the creation of Golden Records. What is a Golden Record? A Golden Record is the most accurate, complete and comprehensive representation of a master data asset (i.e. Customer, Product, Vendor). Golden Records are created by pulling together incomplete data about some “thing” from the systems in which they were entered. The System of Entry for a customer record may be a Customer Relationship Management (CRM) or Enterprise Resource Planning (ERP) system. Having multiple systems of entry for customer data can lead to poor quality of customer master data – even giving your employees bad information to work off of.

But why not simply integrate the CRM and ERP systems, so that each system has the same information about each customer? In theory, this is a perfect solution; in practice, it can be difficult to achieve. Consider these problems:

  1. What if there are duplicate records in the CRM? Should two records be entered into each ERP? Or the reverse: what if one CRM customer should generate two customer in the ERP (each with different pricing terms, for example)?
  2. What if one or more ERP systems require data to create a record, but that data is not typically (or ever) collected in the CRM? Should the integration process fail, what will be the remediation process?
  3. What if one of your ERP systems cannot accommodate the data entered in CRM or other systems? For example, what if one of your ERP systems cannot support international postal codes? Are you prepared to customize or upgrade that system?

There are many more compatibility issues that can occur. The more Systems of Entry you must integrate, the more likely you are to have many obstacles standing between you and full integration. If your business process assumptions change over time, the automated nature of systems integration itself can become a source of data corruption, as mistakes in one system are automatically mirrored in others.

Golden Record Management, by contrast, offers a significantly less risky approach. Golden Records are created in the Master Data Management (MDM) system, not in the business systems. This means that corrections and enhancements to the master data can be made without impacting your current operations.

6 Steps For Creating Golden Records

At a high level, the process of creating Golden Records looks like this:

  1. Create a model for your master data in the master data management system. This model should include all the key attributes MDM can pull from Systems of Entry that could be useful to creating a Golden Record.
  2. Load data into the model from the variety of SOE’s available. These can be business systems, spreadsheets, or external data sources. Maintain the identity of each record, so that you know where the data came from and how the SOE identifies it (for example, the System ID for the record).
  3. Standardize the attributes that will be used to create clusters of records. For Customers and Vendors, location and address information should be standardized.
  4. If possible, verify attributes that will be used to create clusters of records.
  5. Create clusters of records, by Matching key attributes, to create groups of master data records. The cluster identifier will be the Golden Record identifier. You can also think of this in terms of a hierarchy. The Golden Record is the Parent and the source records are the Children.
  6. Populate the Golden Record, created in MDM, with attributes from the records in its cluster (the source data). This final step, called Survivorship, requires a deeper understanding of how the source data was entered than the previous five steps. We want to create a Golden Record that contains all the best data. Therefore, we need to make some judgements about which of the SOE’s is also the best System of Record for a given attribute (or set of attributes).

Great! We’ve consolidated our master data, entered from a variety of systems, into one system which also contains a reference to a parent record, called the Golden Record. This Golden Record is our best representation of the “thing” we need to understand better.

But wait! The systems of entry, the systems your business USES to operate, have not been updated. Can you still take advantage of these Golden Records?

The answer is “yes” – you can take advantage of the Golden Records in two ways:

  1. As the basis for reporting, because each Golden Record is also a “roll-up” of real system records that are referenced by orders, returns, commissions, etc. Golden Records provide a foundation for consistent Enterprise Reporting.
  2. As the basis for data quality improvements in each system of entry, assuming these systems can import a batch of data and update existing records that match a system ID.

These benefits of Golden Records are gained without the high risk and high costs that come with systems integration. Further, if you have modeled your master data correctly, it is possible to automate the data quality benefits of Golden Records Management, by updating these systems in real-time. See how BlumShapiro can help with your master data needs and golden record creation.

Berry_Brian-240About Brian: Brian Berry leads the Microsoft Business Intelligence and Data Analytics practice at BlumShapiro. He has over 15 years of experience with information technology (IT), software design and consulting. Brian specializes in identifying business intelligence (BI) and data management solutions for upper mid-market manufacturing, distribution and retail firms in New England. He focuses on technologies which drive value in analytics: data integration, self-service BI, cloud computing and predictive analytics

Technology Talks Podcast

Listen to our new podcast, Technology Talks, hosted by Hector Luciano, Consulting Manager at BlumShapiro Consulting. Each month, Hector will talk about the latest news and trends in technology with different leaders in the field.

Catch up with our first two episodes today:

In this first episode, Hector speaks with Noah Ullman, Director at BlumShapiro Consulting about the 4th Industrial Revolution and Digital Transformation. The two discuss what digital transformation means for your organization and how you can prepare to be a leader in this new digital age.


In episode two, Hector speaks with Brian Berry, Director at BlumShapiro Consulting about big data, the role it can play for your organization and how it connects to Digital Transformation and the 4th Industrial Revolution.

 

 

 

Do Data Scientists Fear for Their Jobs?

What happened in this last election, November 2016? Rather, what happened to the analysts in this last election? Just about every poll and news report prediction had Hillary Clinton leading by a comfortable margin over Donald Trump. In every election I can recall from years past, the number crunchers have been pretty accurate on their predictions—at least on who would win if not the actual numerical results. However, this turned out not to be the case for the 2016 presidential race.

But this is not the first time this has happened. In 1936, Franklin Delano Roosevelt defeated Alfred Landon, much to the chagrin of The Literary Digest, a magazine that collected two and a half million mail-in surveys—roughly five percent of the voting population at the time. George Gallup, on the other hand, predicted a Roosevelt victory with a mere 3,000 interviews. The difference, according to the article’s author, was that Literary Digest’s mailing lists were sourced from vehicle registration records. How did this impact the results? In 1936 not everyone could afford a car, therefore, the Literary Digest sample was not a truly representative sample of the population. This is known as a sampling bias, where the very method used to collect the data points introduces its own force on the numbers collected. On the other hand, Gallup’s interviews were more in-line with the voting public.

The article cited above also mentions Boston’s ‘Street Bump’ smartphone app “that uses the phone’s accelerometer to detect potholes… as citizen’s of Boston … drive around, their phones automatically notify City Hall of the need to repair the road surface.” What a great idea! Or was it? The app was only collecting data from people who a) owned a smart phone, b) were willing to download the app, and c) drove regularly. Poorer neighborhoods were pretty much left out of the equation. Again, an example of sample bias.

The final case, and not to pick on Boston, but I recently heard that data scientists analyzing Twitter feeds for positive and negative sentiment, had to factor in the term “wicked,” as a positive sentiment force, but only for greater Boston. Apparently, that adjective doesn’t mean what the rest of the country assumes is means.

Along with sampling bias, another driving factor in erroneous conclusions from analyzing data is the ‘undocumented confounder.’ Suppose, for example, you wanted to see which coffee people prefer better, that from Starbucks or Dunkin’ Donuts. For this ‘experiment’, we’re interested only in the coffee itself, nothing else. So we have each shop prepare several pots with varying additions like ‘cream only’, ‘light and sweet’, ‘black no sugar’, etc. We then take these to a neutral location and do a side-by-side blind taste comparison. From our taste results we draw some conclusions as to which coffee is more preferred by the sample population. But unbeknownst to us, when the individual shops prepared their various samples of coffee, one shop used brown sugar and one used white sugar, or one used half-and-half while the other used heavy cream. The cream and sugar are now both undocumented confounders of the experiment, possibly driving results one way or the other.

So, back to the elections, how did this year’s political analysts miss the mark? Without knowing their sampling methods, I’m willing to suggest that some form of sample bias or confounder may have played a part. Was it the well known ‘cell-only problem’ again (households with no land-line are less likely to be reached by pollsters)? Did they take into consideration that Trump used Twitter as a means to deliver sound byte like messages to his followers, bypassing the main-stream media’s content filters? Some other factor perhaps as yet unidentified? As technology advances and society trends morph over time, so must political polling and data analysis methods.

Pollsters and data scientists are continually refining their methods of collection, compensation factors and models to eliminate any form of sample bias in order to get closer to the ‘truth.’ My guess is that the election analysts will eventually figure out where they went wrong. After all, they’ve got three years to work it out before the next presidential race starts. Heck, they probably started sloshing through all the data the day after the election!

One needs to realize that data science is just that, a science, and not something that can simply be stepped into without knowledge of the complexities of the discipline. Attempting to do so without the full understanding of sample bias, undocumented confounders and a host of other factors will lead you down the path to a wrong conclusion, aka ‘failure’. History has shown that, for ANY science, there are many failed experiments before a breakthrough. Laboratory scientists need to exercise caution and adhere to strict protocols to keep their work from getting ruined from outside contaminants. The same for data scientists who continually refine collection methods and models for experiments that fail.

So what about the ‘data science’ efforts for YOUR business? Are you trying to predict outcomes based on limited datasets and rudimentary Excel skills, then wondering why you can’t make any sense out of your analysis models? Do you need help identifying and eliminating sample bias, accounting for those pesky ‘undocumented confounders’? Social media sentiment analysis is a big buzz-word these days, with lots of potential for companies to mix this with their own performance metrics. But many just don’t know how to go about it, or are afraid of the cost.

At BlumShapiro Consulting, our team of consultants are constantly looking at the latest trends and technologies associated with data collection and analysis. Some of the same principles associated with election polling can be applied to your organization through predictive analytics and demand planning. Using Microsoft’s Azure framework we can quickly develop a prototype solution that can help take your organization’s data reporting and predicting to the next level.

About Todd: Todd Chittenden started his programming and reporting career with industrial maintenance applications in the late 1990’s. When SQL Server 2005 was introduced, he quickly became certified in Microsoft’s latest RDBMS technology and has added certifications over the years. He currently holds an MCSE in Business Intelligence. He has applied his knowledge of relational databases, data warehouses, business intelligence and analytics to a variety of projects for BlumShapiro since 2011. 

Data scientist

Three Steps to High Quality Master Data

Data quality is critical to business, because poor business data leads to poor operations and poor management decisions. For any business to succeed, especially now in this digital-first era, data is “the air your business needs to breathe”.  If leadership at your organization is starting to consider what digital transformation means to your business or industry – and how your business needs to evolve to thrive in these changing times, they will likely assess the current business and technology state. One of the most common outcomes management may observe is that the business systems are “outdated” and “need to be replaced”. As a result, many businesses resolve to replace legacy systems with modern business systems as part of their digital transformation strategy.

Digital Transformation Starts with Data

More than likely, those legacy systems did a terrible job with your business data. They often permitted numerous, incomplete master data records to be entered into the system. Now, you have customer records which aren’t really customers. The “Bill To’s” are “Sold-To’s”, the “Sold-To’s” are “Ship-To’s”, and the data won’t tell you which is which. You might even have international customers with all of their pertinent information in the NOTES section. Each system which shares customer master data with other systems contains just a small piece of the customer, not the complete record.

This may have been the way things were “always done” or departments made due with the systems available, but now it’s a much larger problem, because in order to transform itself, a business must leverage its data assets. It’s a significant problem when you consider all the data your legacy systems maintain. Parts, assets, locations, vendors, material, GL accounts: each suffer from different, slightly nuanced data quality problems. Now it hits you: your legacy systems have resulted in legacy data.  And as the old saying goes – “garbage in, garbage out.” In order to modernize your systems, you must first get a handle on data and your data practices.

Data Quality Processes

The data modernization process should begin with Master Data Management (MDM), because MDM can be an effective data quality improvement tool to launch your business’ Digital Transformation journey. Here’s how a data quality process works in MDM.

Data Validation – Master Data Management systems provide the ability to define data quality rules for the master data. You’ll want these rules to be robust — checking for completeness and accuracy. Once defined and applied, these rules highlight the gaps you have in your source data and anticipate problems which will present themselves when that master data is loaded into your shiny new modern business applications.

Data Standardization – Master Data thrives in a standardized world. Whether it is address standardization, ISO standardization, UPC standardization, DUNS standardization, standards assist greatly with the final step in the process.

Matching and Survivorship – If you have master data residing in more than one system, then your data quality process must consider the creation of a “golden record”. The golden record is the best, single representation of the master data, and it must be arrived at by matching similar records from heterogeneous systems and grouping them into clusters. Once these clusters are formed, a golden record emerges which contains the “survivors” from the source data. For example, the data from a CRM system may be the most authoritative source for location information, because service personnel are working in CRM regularly, but the AR system may have the best DUNS credit rating information.

Modernize Your Data and Modernize Your Business

BB Art

These three data quality processes result in a radical transformation in the quality of master data, laying the foundation for critical steps which follow. Whether or not your digital transformation involves system modernization, your journey requires clean, usable data. Digital transformation can improve your ability to engage with customers, but only if you have a complete view of who your customers are. Digital transformation can empower your employees, but only if your employees have accurate information about the core assets of the business. Digital transformation can help optimize operations, but only if management has can make informed data driven decisions. Finally, digital transformation can drive product innovation, but only if you know what your products can and cannot currently do.

Berry_Brian-240About Brian: Brian Berry leads the Microsoft Business Intelligence and Data Analytics practice at BlumShapiro. He has over 15 years of experience with information technology (IT), software design and consulting. Brian specializes in identifying business intelligence (BI) and data management solutions for upper mid-market manufacturing, distribution and retail firms in New England. He focuses on technologies which drive value in analytics: data integration, self-service BI, cloud computing and predictive analytics. 

Technology Talks Newsletter CTA