I thought it would be a good idea to write up our thinking on MDM solution architectures. There are many ways to do MDM. Some of them work well with SQL Server 2008 R2 Master Data Services, some not so much, and some really require collaboration with some other system to do message brokering or human workflow. I like to think if these four sample architectures listed here as starting points: in reality, most enterprises have a good example of one of each of these in their systems. We believe that a successful MDM solution starts by understanding the models, assessing what is working and what is not, and then applying the tools to take advantage of aspects of these models in a rational manner. In other words, before we can really understand what an MDM solution might look like for your firm, we need to evaluate the architectural models to see which will be a good fit for your enterprise architecture.
So before jumping into any discussion of key concepts in Master Data Services with an audience who is not familiar with MDM work, I think it is helpful to describe 4 architectural models. I think it’s helpful in discussing these to recall the fundamental data issues we are trying to solve:
1. Who owns the Master Data?
2. Where will the Master Data reside?
3. How should Master Data subscribers be notified of updates?
If these are simple questions to answer, then you likely do not need to go down the MDM path. MDM architectures are designed to provide alternatives when these questions are hard to answer. However, there is rarely a straightforward or easy answer to these questions.
On one end of the spectrum is the Repository architecture. This architecture describes a single data source, a services layer or gateway and multiple source systems which consume Master Data from the services layer.
In the Repository architecture, we have a single data location and all of the master data is stored and managed there. Subscribing applications do not have a local copy of the master data. This can present some issues with data connectivity and response time for these applications; in response to this issue, solution architects often employ a data caching approach either at the Service Gateway layer, at the subscribing application layer, or both. I’m amused by the fact that when I look at this picture, it’s immediately clear to me that this represents an ideal state – the design for a system which is services oriented and can provide data pervasively throughout the enterprise. When I design an enterprise application, I tend to think of this architecture first.
Of course the problem with Repository is that it is best implemented when you have a blank canvas to start with.
On the other end of the spectrum is the Registry architecture. In this architecture, each application owns its own data – it has a local copy. The MDM solution itself has no local copy of the data; instead, it has pointers to where the data can be retrieved from. The column on the left refers to a single Master Data Record or Entity. Each application owns a piece of that record. I’ve color-coded the applications to illustrate which attributes of the entity are owned by which entity. Applications which need data which is not owned by them may look to the Registry to find the data; however, they are not permitted to write data to that attribute. If an end-user wants to update a master record attribute owned by App 1, they must login to App 1 to perform that operation.
Based upon the potential for “spaghetti” integration (all of those lines scare me!), this may seem like an irrational approach, but in fact, this architecture describes a great many ad-hoc MDM solutions. If you do MDM integration at the data tier, you may be using Views or Linked Servers to accomplish this. If you are farther along towards SOA, you may be employing an Enterprise Service Bus to pick up all of the data from source systems and respond with a complete data set for service clients looking for a complete Master Data Record. The point to this approach, I think, is that each application is permitted to maintain stewardship and governance over its piece of the Master Data. Further, the registry helps keep the clients insulated from changes to other subscribing applications.
I think there are two big problems with the Registry approach. First, I expect that system responsiveness would be intolerable if each application must fetch data from multiple places in order to compile a complete picture of the Master Data record. Second, it seems to me that it is very difficult to enforce coherent business rules for a Master Data Record when the complete data record lives in multiple systems. Changes in App 1 may invalidate data in App 4 (or simply make that data functionally unusable). With these issues in mind, I’m not wholly certain if this architecture represents a solution, or the problem itself.
Somewhere between the repository and registry approaches lie two architectures which I think appear more realistic.
The Federation architecture attempts to reconcile these two approaches by allowing each subscribing application to keep a local copy of its data while maintaining the “Master” copy of the Master Data in a central repository. The result is an architecture which resembles a Database Replication strategy, with a database acting as the Publisher to a number of subscription databases.
Federation ensures that you indeed have a single version of the truth while also helping each application remain responsive to end users. The down side to this is that, unlike the Registry architecture wherein each application owns a segment of the master data, the applications in the Federated approach do not have ownership of any attributes which comprise the master data record. Data Stewards who are tasked with maintaining the master data are therefore required to move outside of the subscribing applications and interact with a new Stewardship application. Depending upon the current state of your systems, this approach may be extremely difficult to implement; depending on the disposition of your users, it may make your MDM solution appear worse than the problem.
The Hybrid architecture offers a great deal of benefit in the way of addressing issues uncovered in the first three architectures. First, each subscribing application has ownership of its own data; it can read and write to its local data. Second, each subscribing application has a complete picture of the master data records it requires. Third, management and stewardship of the master data occurs outside of each application, within the Master Data Hub, which acts as a Broker orchestrating the notification and validation of changes across each subscribing application in the enterprise. The Master Data Repository keeps a copy of the Master Data for the purposes of defining business rules and notification requirements for the broker.
This approach is quite complex and includes heavy reliance upon using the right tool to act as the “Man in the Middle” – the Master Data Broker. It should be clear from this picture that a lot of coordination and orchestration is occurring within that broker system: this system is responsible for notifying each of the subscribing systems of changes to its master data. The opportunity for applications to get out of sync is enormous. However, for those enterprises which have heavy investments in ERP packages which are extremely difficult to customize, a hybrid approach to MDM may be the correct solution.
In my next post, I’ll talk about Microsoft technologies such as SQL Server 2008 R2 Master Data Services, Microsoft Office InfoPath 2010, Microsoft SharePoint Server 2010 and Microsoft BizTalk Server 2009. I’ll identify which products are well suited to each of these architectures and decompose the pieces to show the aspects fulfilled by each product. Stay tuned.
I want offer acknowledgements to Roger Wolter at Microsoft and David Loshin at Knowledge Integrity, Inc. for setting us along a reasonable path. Please feel free to offer your thoughts on these approaches – which have worked well for you?