Archive for March 28, 2015

The Business Value of Microsoft Azure – Part 4 – Virtual Machines

This article is part 4 of a series of articles that focus on the Business Value of Microsoft Azure. Microsoft Azure provides a variety of cloud based technologies that can enable organizations in a number of ways. Rather than focusing on the technical aspects of Microsoft Azure (there’s plenty of that content out there) this series will focus on business situations and how Microsoft Azure services can benefit.

In our last article we focused on data loss from the standpoint of system failure, corruption or other disasters that requires access to a backup. Today, and I’m surprised it took me to part 4 to get to it, we’re going to focus on Virtualization. One of the simplest and most common Infrastructure as a Service (IaaS) solutions is virtualization, or the creation of virtual machines in a cloud infrastructure.

Take, for example, the story of a local town. They have an ERP system that is currently running on Windows Server 2003. The nature of the application is such that it runs best through Remote Desktop (RDP) and is how both their local and remote users access the system. Like many towns they were wrestling with the best path forward for their infrastructure. Here are a few characteristics that defined them:

  1. They had an older ERP system that they needed to upgrade because it was currently running and only supported on Windows Server 2003. Windows Server 2003 hits end of life in July of 2015.
  2. They weren’t certain if the next version of their ERP system was where they wanted to be in the long run, but hadn’t found a suitable replacement as of yet.
  3. Any investments in hardware/software to support the new ERP would therefore need to be questioned given the fact that it was conceivable they would only run it for another year.
  4. They had several locations throughout the town that all connected over RDP to access the ERP system because it was not designed to run well as a client/server system over a WAN.

After an initial assessment it was determined that their existing infrastructure would not be able to support the new environment. A local IT vendor quoted them approximately $50,000 in hardware and software to create a new virtual server environment on-premise. Were it not for the ERP upgrade requirements, their existing hardware/software would continue to be sufficient for a number of years. $50,000 is a significant amount given that the town isn’t sure it is going to stick with the ERP system. What else could this town do?

Enter Microsoft Azure Virtual Machines

In order to give the town some breathing room on making a switch to a different ERP system and ease their need to upgrade their on-premise infrastructure, the town turned to Microsoft Azure. Using the virtualization capabilities of the Azure platform the town created a new RDP environment along with the ERP system server and database. This solution, which the town connected to their existing environment with the site-to-site VPN capabilities of Azure, provided the town with a secure, reliable and easily expandable environment to meet their needs.

The key benefits of this approach were as follows:

  1. Eliminated $50,000 of up-front cost for revamping their existing hardware and shifted them to a reasonable $1,000/month Azure subscription model
  2. Avoided a sunk cost should the town decide to move to a different ERP solution, perhaps one that follows a SaaS model. With Azure, if a set of services is no longer needed you simply turn them off and you don’t get billed further.
  3. Allowed them to continue to get life out of their existing on-premise infrastructure
  4. Established a pattern that could be followed for other applications in that the town now had the option to quickly and easily add additional virtual machines into their Azure subscription to support other workloads.

Before your town or business elects to go down the traditional path of investing 10s of thousands of dollars in a new on-premise infrastructure, take a look at Microsoft Azure for your virtualization needs.

As a partner with BlumShapiro Consulting, Michael Pelletier leads our Technology Consulting Practice. He consults with a range of businesses and industries on issues related to technology strategy and direction, enterprise and solution architecture, service oriented architecture and solution delivery.

5 Reasons to Keep CRM and MDM Separate

In previous articles, I have identified 5 Critical Success Factors for Initiating Master Data Management at your organization, and delved more deeply into the first of these: the creation of a new system which intentionally avoids creating another silo of information.  The second critical success factor is to recognize that MDM tools work best when kept separate from the sources of master data.  A prime example of this is CRM.  Customer Relationship Management (CRM) solutions are often a key topic in my discussions with clients, chiefly with respect to a proposed Customer MDM solution.   I’m going to use CRM to demonstrate why organizations fail to implement Data Governance when they elect to integrate MDM processing into an existing operational system.

It can be enticing to think of CRM as a good place to “do” Customer MDM.   Modern CRM systems are built from the ground up to promote data quality.  Modern CRM solutions have an extensible data model, making it easy to add data from other systems.    Customer data often starts in CRM: following the “Garbage In, Garbage Out” maxim, it seems important to get it right there first.  Finally, software vendors often claim to have an integrated MDM component, for Customers, Products, Vendors and others.

But here are the problems this approach creates:

  1. More than Data Quality – if an operational system like CRM can offer address standardization, third party verification of publicly available data, or de-duplication of records then you should leverage these services.  But keep in mind – these services are to help you achieve quality data for the purposes of making operations in that system work smoothly.  If you have only one operational system, then you probably have no need for MDM.  If you have more than one, and you pick one as the winner, you’ll tie the two systems together very closely, making future integrations extremely daunting.
  2. Data Stewardship Matters – Data Stewardship refers to a role in the organization responsible for maintenance and quality of data required throughout the organization.  In a well-designed Data Governance Framework, data stewards report to the governance team.  It’s not always possible for an organization to have dedicated data stewards; more often,  “Data Steward” is one role added to operational responsibilities.  Now, I would love to tell you that CRM users care about data quality; many of them do.  But sales professionals are often focused on the data they need to close a deal, not the myriad other pieces of information needed to truly drive customer engagement.  Asking them to be responsible for doing so sets the organization up for failure.
  3. Governors Don’t Play Favorites – an MDM system should have the ability to store and represent data as it actually exists and is used in ANY source of master data.  Without this, your data stewardship team cannot really see the data.  If you insist on making CRM the source for master data, your technology team will spend all of their time mapping and normalizing data to match what CRM needs and wants.  This is a waste of time.  The Federation MDM model is designed to move data in quickly and show data stewards how things really look.  Then, and only then, can decisions be made (and automated) about which systems adhere most closely to Enterprise Standards for quality.
  4. Information Silo or Reference Data Set – CRM meets the definition of an Information Silo: it has its own database, and it invents its own identities for Customers, Accounts, Leads, etc.  What happens when an account must be deactivated or merged with another account in order to streamline operational processes.  Well, if any systems are using CRM as their Reference Data Set, you will have massive problems.
  5. Present at Creation – you probably realize that there are lots of sources of Customer Data, some the business likes to talk about, and some they don’t.  I like to separate the two into Sanctioned and Unsanctioned Master Data.  Unlike Sanctioned Master Data, which lives in CRM and ERP and other operational systems managed by IT, Unsanctioned Master Data lives in spreadsheets and small user databases (ex. Microsoft Access) or even websites.  This may surprise you – unsanctioned master data is often the most valuable data in the governance process!  This is where your analysts and knowledge workers are storing important attributes and relationships about your customers, and the source of real customer engagement.  MDM needs to make room for it.

One of the most common misconceptions about how to build an MDM system is the idea that Master Data Management can be best achieved by maintaining a Golden Record in one of many pre-existing operational systems.  This can be a costly mistake and sink your prospects for achieving Data Governance in the long term.  A well implemented Master Data Management system has no operational process aim other than high quality master data.  It must take this stance in order to accept representations of Master Data from all relevant sources.  When this is accomplished, it creates a process agnostic place for stewardship, governance and quality to thrive.

Fun with Machine Learning

All my blog articles over the years have been technical in nature. I decided to break out of that mold today. I almost titled this article “It’s not a train robbery, it’s a science experiment” (Doc Brown, in Back to the Future III). I hope you enjoy reading it as much as I did writing it.

The title is not meant to imply that machine learning isn’t inherently fun (I personally happen to think it’s a cool use of aggregated technologies). Rather it’s to say that we’re going to have some fun with machine learning in a way you wouldn’t have otherwise considered. But in order to do so, the reader must understand at least the fundamental concepts of machine learning. Don’t worry, we’re not going to be diving into data mining algorithms or the R language or python code or anything remotely technical. Instead, a real life analogy is best, and we’ll dumb this one right down to the level of a two-year-old toddler! Kids between the ages of about one and six are GREAT at ‘machine learning,’ but NOT in the LEARNING side of machine learning. No, they’re on the TEACHING side of machine learning, the ‘writing of the algorithms’, the ‘Python and R code’, that the ‘machines’ (their parents) use to learn. Let’s take a look at how this works.

Ever try to get a two-year-old to eat something he or she just does NOT want to eat? Like broccoli or cauliflower? Even adults are split about evenly on the likes and dislikes of vegetables. Two-year-olds, on the other hand, tend to swing to the dislike side on just about all varieties. So what happens? The child absolutely will not eat said vegetables. Babies and toddlers being spoon-fed from a jar tend to take a different and sometimes visually humorous approach: they let you spoon it into their mouth, but it quickly comes back out like toothpaste accompanied by a grimace. Having wasted an entire jar of baby food on the bib, the father (as a new father I had to take my turn feeding the kids!) turns to his wife and says, “Honey, he doesn’t like the green beans, but he loves the applesauce.” “OK,” comes the reply, “I won’t buy the beans again.”

What just happened here? Believe it or not, that was “machine learning” on a micro scale. The ‘machine’, the parents, just ‘learned’ something. Two data points, in fact. Green beans are icky, while applesauce gets a ‘thumbs up.’ Now if all the toddlers in the town were to teach those various bits of knowledge to their respective parents, you have just built yourself a ‘reference dataset.’ Suppose now a bunch of those mothers interact at the weekly “Mommy and Me”. Just now joining their group is a new mother whose daughter is ready to switch from the bottle to semi-solid food. The discussion is likely to descend around what each child likes and dislikes in that area. The new mother listens intently and comes away with knowledge of what her daughter is MOST LIKELY to prefer, but WITHOUT actually having to experience a bib full of pureed sweet potatoes! This is machine learning in action. The machine has applied an algorithm to a reference dataset to predict a probable outcome.

Now, no child dislikes ALL foods, even three and four-year-olds, as much as some parents tend to perceive. (My six-year-old son wouldn’t eat a peanut butter and honey sandwich unless it was cut diagonally! Go figure!) If you think your child dislikes ALL foods, it’s more likely he or she only dislikes all the foods YOU like. Since you’re not likely to buy stuff you personally wouldn’t eat, the child has no chance to find what he or she actually enjoys. The parents will then broaden their variety to find something acceptable.

Let’s take a look at another real world scenario, this time closer to the topic at hand.

Many on-line retailers use machine learning and data mining to present to the consumer things they are MOST LIKELY to purchase based on any number of information points and reference datasets. These include your past purchases, your demographics, and the things other consumers have purchase together. The algorithms employed can be ‘market basket analyses’, ‘clustering’, or others (and I promise that’s as technical as we’ll get in this article). We’ve all seen it in action at Amazon and Netflix. “Based on your viewing history…” or “People who bought X also bought…” Even grocery stores learned that beer was often purchased in conjunction with diapers. Seems that young mothers often sent their husbands to the store in times of diaper needs, hence the beer.

I decided to try an experiment this morning, and this is where the fun comes in. I wanted to take a finicky two-year-old’s stance on my internet steaming audio. Pandora, Rhapsody, iHeartRadio and the like often apply machine learning type of logic to decide the next song to queue to your personal listening stream based on your likes and dislikes. What would happen if I started a new ‘radio station’, then flagged every song it presented to me as ‘thumbs down?’ Would it just keep letting me spit out the offerings until it found something I actually liked? What if I didn’t like ANYTHING? Would it cut me off and kick me out for being impossible to please? I decided I just had to find out.

I started by naming my new station “Billy Joel.” (Hey, if the experiment were to fail, I figured why not fail with something decent!) Within 5 seconds starting the first song, I had hit the ‘Thumbs down’ button. OK, no, problem, it moved on to the next. Six more songs were dispatched in similar fashion. “Hey, this is fun” I thought. On the next song, however, it allowed me to dislike it, but I was forced to listen to the entire song while a banner displayed the message about not being fed that particular vegetable variety again. Five more disliked songs all brought up the same message while still playing the song to completion. Oh, well, at least I had some good music to listen to. After a dozen similar results, and realizing I wasn’t getting anywhere trying to fool the machine, I threw it a curve and hit the ‘Thumbs up’ on the next few tracks. (I think I smelled smoke coming from my router.) The next six tracks were all skipped by flagging as disliked in similar fashion to the first batch. I settled into a back-and-forth of liking and disliking bunches of songs in groups. In the end, I had to like at least a couple of songs it presented to me before I could dislike AND SKIP a bunch of other tracks.

After two hours the machine won as I had to produce some useful work at the office. There was a practical limit to how much it could ‘learn’ from this picky two-year-old music consumer. Likewise, parents all think they win in the end, too, or do they? They will tell you they eventually ‘got their child to like’ certain foods when in fact they simply settled on a repertoire of foods that their child wouldn’t reject, kind of like…wait for it…machine learning.

Avoid Creating Another Information Silo with MDM

In my last article, I discussed the five most critical success factors when implementing a Master Data Management solution. At the top of the list was a warning: “Don’t Create Another Information Silo”. What does that mean? Why is that important? What is different about this new system we are calling “MDM”?

I define “Information Silo” as an application which has the following characteristics:

1. The application has its own database.

2. The database is highly normalized, because the designer of the data model has made an effort to reduce duplication of data.

3. Identity is invented in the database, because things like Customer, Product, User all need primary keys to identify the record within the system.

4. Domain Logic controls most, if not all, of the data quality requirements, because encapsulating business logic outside a database promotes re-use.

Each of these principles makes perfect sense for a custom application. But if you use them in your Master Data Management solution, you will make another Information Silo, and you’ll be right back where you started.

Certainly, our Master Data Management solution will include a database, the “MDM Hub”, which will be the central place for data in our system to reside. But we need to design a different kind of database model, unlike many other systems we have designed for the enterprise. What should this data model look like? There are 3 modeling approaches to consider. I’ll explain each, and then tell you which is the best and why.

A Registry approach is the lightweight model whereby only the pointers to source data are stored in the Master Data Hub. We are capturing only minimal attributes in the hub with this pattern, and consumers of data will be required to seek out an authoritative data source elsewhere. Essentially, this MDM pattern is designed to store nothing, only to tell consumers where to get the data. This does not work well when it comes time to implement Data Governance, because the MDM system has not defined a Jurisdiction for the Data Governance team to work in.

A Transaction approach represents the opposite end of the spectrum from Registry, and it helps to address the common fear of “Garbage In, Garbage Out” which so many first-timers experience. The idea with Transaction is that Master Data should be created in a tightly controlled environment which imposes rigor on the master data creation process and ensures that ALL data is collected up front. This approach sounds worthwhile, until you consider what it will take to build such a system: you may as well build a new ERP system. This is the classic trap leading directly to another silo of information!

A Federated approach represents a middle ground between Registry and Transaction. Think of this as the “come as you are” alternative, because we are going to pull the master data from the sources with as little translation as needed, and we are going to leverage the source identity in MDM, combining the name of the source system with the source system’s identifier. The Federated approach recognizes that in order for the data governance team to effectively govern, it needs enough master data attributes to discern critical differences in the MDM hub, but not all of it.

Here is an example of how the Federated model works. Let’s say that we have 5 sources of Customer Master data: 3 ERP (JD Edwards Enterprise One, SAP and Dynamics AX), 1 CRM (SalesForce) and 1 Custom SQL solution used by a Web site. A Federated design would dictate:

  • An Entity named “Source System” whose members would define the sources of data (i.e. JDE, SAP, AX, SFDC, SQL)
  • An Entity named “Customer” whose members would have an identity which would be the combination of the Actual Id value from the source system, prefixed by the Source System code itself (i.e. “JDE-100054”, “SAP-000005478”, “SQL-1”). If the Actual Source System Id is used, then these MDM Identifiers will be unique in the Federated model.
  • Some number of additional attributes which help the stewards of the master data understand if these records represent a common customer. This needs to be defined by the business stakeholders, not the database administrators.
  • A solution for creating a Golden Record in the MDM solution. This solution should match members based upon matching rules and group them together as proposed units. The common grouping for these records is a reference to the Golden Record. For example, the solution should present hierarchy with the Golden Records as parents of the sources.
  • MDM-100 : “BlumShapiro”
  • JDE-100054 : “Blum Shapiro & Co”
  • SAP-0000005478 : “BlumShapiro”
  • SQL-1 : “Bloom Shapiro”

Advantages to this way of working include:

A. Because we are bringing source data in “as is”, data can be loaded quickly into the MDM hub

B. A Federated MDM solution can produce reports which tie out to legacy reports, because they use the same Master Data.

C. Your Data Stewardship team works with the data as it is, without translation or normalization, and has enough data to begin defining data quality rules

D. The solution is positioned nicely for source system synchronization, because a one-to-one relationship exists between the Authoritative record in MDM and the target

A Federated MDM Data Model for your Master Data Management program is the best approach for getting started. It is simple to design, easy for business users to grasp, and avoids creating another data silo. In fact, it is only aggregating from silos and grouping for matching, harmonization and coherence. Most importantly, this approach gets you something fast. It gets your governance team something they can touch and feel. An MDM initiative must deliver value to the business quickly, and to do this it must be relevant as soon as possible.

Next time, I’ll talk about how an MDM differs from CRM, and why you should treat your CRM(s) as “just another source of master data”.