5 Reasons to Keep CRM and MDM Separate

In previous articles, I have identified 5 Critical Success Factors for Initiating Master Data Management at your organization, and delved more deeply into the first of these: the creation of a new system which intentionally avoids creating another silo of information.  The second critical success factor is to recognize that MDM tools work best when kept separate from the sources of master data.  A prime example of this is CRM.  Customer Relationship Management (CRM) solutions are often a key topic in my discussions with clients, chiefly with respect to a proposed Customer MDM solution.   I’m going to use CRM to demonstrate why organizations fail to implement Data Governance when they elect to integrate MDM processing into an existing operational system.

It can be enticing to think of CRM as a good place to “do” Customer MDM.   Modern CRM systems are built from the ground up to promote data quality.  Modern CRM solutions have an extensible data model, making it easy to add data from other systems.    Customer data often starts in CRM: following the “Garbage In, Garbage Out” maxim, it seems important to get it right there first.  Finally, software vendors often claim to have an integrated MDM component, for Customers, Products, Vendors and others.

But here are the problems this approach creates:

  1. More than Data Quality – if an operational system like CRM can offer address standardization, third party verification of publicly available data, or de-duplication of records then you should leverage these services.  But keep in mind – these services are to help you achieve quality data for the purposes of making operations in that system work smoothly.  If you have only one operational system, then you probably have no need for MDM.  If you have more than one, and you pick one as the winner, you’ll tie the two systems together very closely, making future integrations extremely daunting.
  2. Data Stewardship Matters – Data Stewardship refers to a role in the organization responsible for maintenance and quality of data required throughout the organization.  In a well-designed Data Governance Framework, data stewards report to the governance team.  It’s not always possible for an organization to have dedicated data stewards; more often,  ”Data Steward” is one role added to operational responsibilities.  Now, I would love to tell you that CRM users care about data quality; many of them do.  But sales professionals are often focused on the data they need to close a deal, not the myriad other pieces of information needed to truly drive customer engagement.  Asking them to be responsible for doing so sets the organization up for failure.
  3. Governors Don’t Play Favorites – an MDM system should have the ability to store and represent data as it actually exists and is used in ANY source of master data.  Without this, your data stewardship team cannot really see the data.  If you insist on making CRM the source for master data, your technology team will spend all of their time mapping and normalizing data to match what CRM needs and wants.  This is a waste of time.  The Federation MDM model is designed to move data in quickly and show data stewards how things really look.  Then, and only then, can decisions be made (and automated) about which systems adhere most closely to Enterprise Standards for quality.
  4. Information Silo or Reference Data Set – CRM meets the definition of an Information Silo: it has its own database, and it invents its own identities for Customers, Accounts, Leads, etc.  What happens when an account must be deactivated or merged with another account in order to streamline operational processes.  Well, if any systems are using CRM as their Reference Data Set, you will have massive problems.
  5. Present at Creation – you probably realize that there are lots of sources of Customer Data, some the business likes to talk about, and some they don’t.  I like to separate the two into Sanctioned and Unsanctioned Master Data.  Unlike Sanctioned Master Data, which lives in CRM and ERP and other operational systems managed by IT, Unsanctioned Master Data lives in spreadsheets and small user databases (ex. Microsoft Access) or even websites.  This may surprise you – unsanctioned master data is often the most valuable data in the governance process!  This is where your analysts and knowledge workers are storing important attributes and relationships about your customers, and the source of real customer engagement.  MDM needs to make room for it.

One of the most common misconceptions about how to build an MDM system is the idea that Master Data Management can be best achieved by maintaining a Golden Record in one of many pre-existing operational systems.  This can be a costly mistake and sink your prospects for achieving Data Governance in the long term.  A well implemented Master Data Management system has no operational process aim other than high quality master data.  It must take this stance in order to accept representations of Master Data from all relevant sources.  When this is accomplished, it creates a process agnostic place for stewardship, governance and quality to thrive.

Fun with Machine Learning

All my blog articles over the years have been technical in nature. I decided to break out of that mold today. I almost titled this article “It’s not a train robbery, it’s a science experiment” (Doc Brown, in Back to the Future III). I hope you enjoy reading it as much as I did writing it.

The title is not meant to imply that machine learning isn’t inherently fun (I personally happen to think it’s a cool use of aggregated technologies). Rather it’s to say that we’re going to have some fun with machine learning in a way you wouldn’t have otherwise considered. But in order to do so, the reader must understand at least the fundamental concepts of machine learning. Don’t worry, we’re not going to be diving into data mining algorithms or the R language or python code or anything remotely technical. Instead, a real life analogy is best, and we’ll dumb this one right down to the level of a two-year-old toddler! Kids between the ages of about one and six are GREAT at ‘machine learning,’ but NOT in the LEARNING side of machine learning. No, they’re on the TEACHING side of machine learning, the ‘writing of the algorithms’, the ‘Python and R code’, that the ‘machines’ (their parents) use to learn. Let’s take a look at how this works.

Ever try to get a two-year-old to eat something he or she just does NOT want to eat? Like broccoli or cauliflower? Even adults are split about evenly on the likes and dislikes of vegetables. Two-year-olds, on the other hand, tend to swing to the dislike side on just about all varieties. So what happens? The child absolutely will not eat said vegetables. Babies and toddlers being spoon-fed from a jar tend to take a different and sometimes visually humorous approach: they let you spoon it into their mouth, but it quickly comes back out like toothpaste accompanied by a grimace. Having wasted an entire jar of baby food on the bib, the father (as a new father I had to take my turn feeding the kids!) turns to his wife and says, “Honey, he doesn’t like the green beans, but he loves the applesauce.” “OK,” comes the reply, “I won’t buy the beans again.”

What just happened here? Believe it or not, that was “machine learning” on a micro scale. The ‘machine’, the parents, just ‘learned’ something. Two data points, in fact. Green beans are icky, while applesauce gets a ‘thumbs up.’ Now if all the toddlers in the town were to teach those various bits of knowledge to their respective parents, you have just built yourself a ‘reference dataset.’ Suppose now a bunch of those mothers interact at the weekly “Mommy and Me”. Just now joining their group is a new mother whose daughter is ready to switch from the bottle to semi-solid food. The discussion is likely to descend around what each child likes and dislikes in that area. The new mother listens intently and comes away with knowledge of what her daughter is MOST LIKELY to prefer, but WITHOUT actually having to experience a bib full of pureed sweet potatoes! This is machine learning in action. The machine has applied an algorithm to a reference dataset to predict a probable outcome.

Now, no child dislikes ALL foods, even three and four-year-olds, as much as some parents tend to perceive. (My six-year-old son wouldn’t eat a peanut butter and honey sandwich unless it was cut diagonally! Go figure!) If you think your child dislikes ALL foods, it’s more likely he or she only dislikes all the foods YOU like. Since you’re not likely to buy stuff you personally wouldn’t eat, the child has no chance to find what he or she actually enjoys. The parents will then broaden their variety to find something acceptable.

Let’s take a look at another real world scenario, this time closer to the topic at hand.

Many on-line retailers use machine learning and data mining to present to the consumer things they are MOST LIKELY to purchase based on any number of information points and reference datasets. These include your past purchases, your demographics, and the things other consumers have purchase together. The algorithms employed can be ‘market basket analyses’, ‘clustering’, or others (and I promise that’s as technical as we’ll get in this article). We’ve all seen it in action at Amazon and Netflix. “Based on your viewing history…” or “People who bought X also bought…” Even grocery stores learned that beer was often purchased in conjunction with diapers. Seems that young mothers often sent their husbands to the store in times of diaper needs, hence the beer.

I decided to try an experiment this morning, and this is where the fun comes in. I wanted to take a finicky two-year-old’s stance on my internet steaming audio. Pandora, Rhapsody, iHeartRadio and the like often apply machine learning type of logic to decide the next song to queue to your personal listening stream based on your likes and dislikes. What would happen if I started a new ‘radio station’, then flagged every song it presented to me as ‘thumbs down?’ Would it just keep letting me spit out the offerings until it found something I actually liked? What if I didn’t like ANYTHING? Would it cut me off and kick me out for being impossible to please? I decided I just had to find out.

I started by naming my new station “Billy Joel.” (Hey, if the experiment were to fail, I figured why not fail with something decent!) Within 5 seconds starting the first song, I had hit the ‘Thumbs down’ button. OK, no, problem, it moved on to the next. Six more songs were dispatched in similar fashion. “Hey, this is fun” I thought. On the next song, however, it allowed me to dislike it, but I was forced to listen to the entire song while a banner displayed the message about not being fed that particular vegetable variety again. Five more disliked songs all brought up the same message while still playing the song to completion. Oh, well, at least I had some good music to listen to. After a dozen similar results, and realizing I wasn’t getting anywhere trying to fool the machine, I threw it a curve and hit the ‘Thumbs up’ on the next few tracks. (I think I smelled smoke coming from my router.) The next six tracks were all skipped by flagging as disliked in similar fashion to the first batch. I settled into a back-and-forth of liking and disliking bunches of songs in groups. In the end, I had to like at least a couple of songs it presented to me before I could dislike AND SKIP a bunch of other tracks.

After two hours the machine won as I had to produce some useful work at the office. There was a practical limit to how much it could ‘learn’ from this picky two-year-old music consumer. Likewise, parents all think they win in the end, too, or do they? They will tell you they eventually ‘got their child to like’ certain foods when in fact they simply settled on a repertoire of foods that their child wouldn’t reject, kind of like…wait for it…machine learning.

Avoid Creating Another Information Silo with MDM

In my last article, I discussed the five most critical success factors when implementing a Master Data Management solution. At the top of the list was a warning: “Don’t Create Another Information Silo”. What does that mean? Why is that important? What is different about this new system we are calling “MDM”?

I define “Information Silo” as an application which has the following characteristics:

1. The application has its own database.

2. The database is highly normalized, because the designer of the data model has made an effort to reduce duplication of data.

3. Identity is invented in the database, because things like Customer, Product, User all need primary keys to identify the record within the system.

4. Domain Logic controls most, if not all, of the data quality requirements, because encapsulating business logic outside a database promotes re-use.

Each of these principles makes perfect sense for a custom application. But if you use them in your Master Data Management solution, you will make another Information Silo, and you’ll be right back where you started.

Certainly, our Master Data Management solution will include a database, the “MDM Hub”, which will be the central place for data in our system to reside. But we need to design a different kind of database model, unlike many other systems we have designed for the enterprise. What should this data model look like? There are 3 modeling approaches to consider. I’ll explain each, and then tell you which is the best and why.

A Registry approach is the lightweight model whereby only the pointers to source data are stored in the Master Data Hub. We are capturing only minimal attributes in the hub with this pattern, and consumers of data will be required to seek out an authoritative data source elsewhere. Essentially, this MDM pattern is designed to store nothing, only to tell consumers where to get the data. This does not work well when it comes time to implement Data Governance, because the MDM system has not defined a Jurisdiction for the Data Governance team to work in.

A Transaction approach represents the opposite end of the spectrum from Registry, and it helps to address the common fear of “Garbage In, Garbage Out” which so many first-timers experience. The idea with Transaction is that Master Data should be created in a tightly controlled environment which imposes rigor on the master data creation process and ensures that ALL data is collected up front. This approach sounds worthwhile, until you consider what it will take to build such a system: you may as well build a new ERP system. This is the classic trap leading directly to another silo of information!

A Federated approach represents a middle ground between Registry and Transaction. Think of this as the “come as you are” alternative, because we are going to pull the master data from the sources with as little translation as needed, and we are going to leverage the source identity in MDM, combining the name of the source system with the source system’s identifier. The Federated approach recognizes that in order for the data governance team to effectively govern, it needs enough master data attributes to discern critical differences in the MDM hub, but not all of it.

Here is an example of how the Federated model works. Let’s say that we have 5 sources of Customer Master data: 3 ERP (JD Edwards Enterprise One, SAP and Dynamics AX), 1 CRM (SalesForce) and 1 Custom SQL solution used by a Web site. A Federated design would dictate:

  • An Entity named “Source System” whose members would define the sources of data (i.e. JDE, SAP, AX, SFDC, SQL)
  • An Entity named “Customer” whose members would have an identity which would be the combination of the Actual Id value from the source system, prefixed by the Source System code itself (i.e. “JDE-100054″, “SAP-000005478″, “SQL-1″). If the Actual Source System Id is used, then these MDM Identifiers will be unique in the Federated model.
  • Some number of additional attributes which help the stewards of the master data understand if these records represent a common customer. This needs to be defined by the business stakeholders, not the database administrators.
  • A solution for creating a Golden Record in the MDM solution. This solution should match members based upon matching rules and group them together as proposed units. The common grouping for these records is a reference to the Golden Record. For example, the solution should present hierarchy with the Golden Records as parents of the sources.
  • MDM-100 : “BlumShapiro”
  • JDE-100054 : “Blum Shapiro & Co”
  • SAP-0000005478 : “BlumShapiro”
  • SQL-1 : “Bloom Shapiro”

Advantages to this way of working include:

A. Because we are bringing source data in “as is”, data can be loaded quickly into the MDM hub

B. A Federated MDM solution can produce reports which tie out to legacy reports, because they use the same Master Data.

C. Your Data Stewardship team works with the data as it is, without translation or normalization, and has enough data to begin defining data quality rules

D. The solution is positioned nicely for source system synchronization, because a one-to-one relationship exists between the Authoritative record in MDM and the target

A Federated MDM Data Model for your Master Data Management program is the best approach for getting started. It is simple to design, easy for business users to grasp, and avoids creating another data silo. In fact, it is only aggregating from silos and grouping for matching, harmonization and coherence. Most importantly, this approach gets you something fast. It gets your governance team something they can touch and feel. An MDM initiative must deliver value to the business quickly, and to do this it must be relevant as soon as possible.

Next time, I’ll talk about how an MDM differs from CRM, and why you should treat your CRM(s) as “just another source of master data”.

The Business Value of Microsoft Azure – Part 3 – Backup

This article is part 3 of a series of articles that will focus on the Business Value of Microsoft Azure. Microsoft Azure provides a variety of cloud based technologies that can enable organizations in a number of ways. Rather than focusing on the technical aspects of Microsoft Azure (there’s plenty of that content out there) this series will focus on business situations and how Microsoft Azure services can benefit.

In our last article we focused on data security from the perspective of the ability for users to purposefully or inadvertently cause data to leave the organization. Today we’re going to focus on data loss from the standpoint of system failure, corruption or other disaster that requires access to a backup.

Many organizations still rely on tape based backup systems as the primary means of backing up critical business data. Let’s take the typical municipal office. Chances are that our fictional town of Gamehendge has either a traditional backup to tape solution or perhaps a disk-based virtual tape system where copies are then made to physical tapes. These tapes are sent offsite to a facility that manages tape archiving for disaster protection purposes. While this seems reasonable, our town faces a problem.

If a production system fails or if data needs to be restored due to user deletion/corruption it might take up to 24 hours for the IT department to work with the off-site records management company to request, locate and deliver the appropriate tape with the ensuing process to then actually recover the data.

One solution to this problem might be to set up a co-location solution with a hosting provider and replicate certain servers. Again, this is a fairly common practice. However, replicating all the servers in their environment is costly and so only a handful of the most high-priority systems are replicated. This approach, while a step in the right direction, only allows a few key systems to be restored in 2-3 hours, leaving the remaining systems to a 24 hour recovery period.

Our fictional town wants to free up their IT folks to spend time on value added activities. Right now the amount of time spent managing backups, restoring data and managing the replication processes makes this a challenge. Our town budget is just as tight as everyone else and so finding a creative way to address this issue without needing to hire another resource is critical.

Enter the Microsoft Azure StorSimple family of hybrid cloud storage solutions.

StorSimple is an on-premise enterprise storage area network that interoperates with Microsoft Azure to provide hybrid-cloud storage, data archiving, and fast disaster recovery. The solution replaces traditional backup processes with the concept of “cloud snapshots” that automate the process of creating copies of data remotely in the Azure Storage Cloud.

With our data securely backed up in Microsoft Azure Cloud Storage our town has a couple of options. They can purchase a second StorSimple appliance and deploy it at their co-location facility or they can use a virtual StorSimple appliance in the Microsoft Azure Cloud to quickly bring data or a virtual machine back online resulting in a significantly faster recovery time compare to tape storage.

Our town has realized additional benefits by pursuing this solution. Beyond the backup capability, the StorSimple device provides “bottomless” storage. With three tiers of data storage capability (SSD, HDD, and Cloud) the device intelligently transfers data from higher cost/higher performing storage (SSD) to lower cost HDD and eventually Azure Cloud Storage. This happens automatically based on usage characteristics and other criteria. Further, with the deduplication and compression technologies in the device our town has been able to reduce the total amount of storage space needed to protect its data.

There are other approaches that can be implemented as well using Microsoft Azure Cloud Storage, but with the StorSimple device it provides a significant step forward for cities and towns that have struggled to keep up with the ever growing demands for storage. Every town that has considered implementing a cop-cam officer mounted video camera solution will immediately face significant data storage and backup needs and StorSimple can play a key role in managing these costs.

As a partner with BlumShapiro Consulting, Michael Pelletier leads our Technology Consulting Practice. He consults with a range of businesses and industries on issues related to technology strategy and direction, enterprise and solution architecture, service oriented architecture and solution delivery.