Tag Archive for Master Data Management

Data Science Foundations – Classification and Regression

The Big Data journey has to start somewhere.  My observation in talking to Microsoft technologists is that, while Big Data is fascinating and exciting, they don’t know where to start.  Should we start by learning Hadoop?  R? Python?

Before we jump into tools, let’s understand how data science works and what can be gained from it.  By now, you understand that Predictive Analytics (or Machine Learning) is a relatively new branch of Business Intelligence.  Instead of asking how our business/department/employee has been performing (recently, and as compared to historical trends), we are now seeking to predict what will happen in the future, based upon data collected in the past.  We can do this at a very granular level.  We can identify “which thing” will behave “which way”.  Some examples: which customer is likely to cancel their subscription plan, which transactions are fraudulent, which machine on the factory floor is about to fail.

There are several approaches to applying statistics and mathematics to answer these questions.  In this blog post, I will focus on two data science tasks: Classification and Regression.

Classification is used to predict which of a small set of classes a thing belongs to.  Ideally, the classes are a small set and mutually exclusive (Male or Female, Republican or Democrat, Legitimate or Fraudulent).   They need not be “either/or”, but it is easiest to think of them in that manner.

Closely related to Classification is the task of predicting the probability that the thing is classified that way.  This is called Class Probability Estimation.  We can determine that a transaction is “Legitimate” with 72.34% certainty, for example.

What can be gained from Classification?  There are many iconic stories of how forward thinking companies anticipating business issues before they arrive – and then take action.  My favorite is story Signet Bank, whose credit card division was unprofitable, due to “bad” customer defaults on loans and “good” customers being lost to larger financial institutions who could offer better terms and conditions.  The answer, revolutionary at the time, was to apply Classification to their customer data.  They separated the “Bad” from the “Good”, cut the “Bad” ones loose and nurtured the “Good” ones with offers and incentives.  Today, we know them as Capital One.

Regression, on the other hand, is a task used to estimate some numeric value of some variable for some thing.  For example, “How much should I expect to pay for a given commodity?”  or “How hot will the temperature be in my home before a human turns the heat down?” This is often confused with Class Probability Estimation.  Classification is related to Regression, but they have different goals.  Classification is for determining whether something will happen.  Regression is for determining how much of something will happen.

What can be gained from Regression?  In manufacturing, it is very useful to understand how much use a particular machine part should be expected to deliver, before performance degrades below an acceptable tolerance level.  Any financial services firm does this routinely to price securities and options.

In my next blog, I will discuss other data science tasks which are related to “Customers who bought this, also bought that”.

 

Master Data Maestro 3.0 Released

If you are working with SQL Server Master Data Services 2012 to develop real-world Master Data Models for your enterprise, then you have likely struggled with the model design environment provided out of the box by Microsoft.  This environment does not support large data models well.

Here are some common scenarios:

1. When adding new attributes to an existing entity, the design environment displays a very short list box containing all existing attributes.  Its very difficult to see the complete list of attributes, their Master Data types, and re-ordering for ease of browsing is accomplished with up down arrows.

2. When organizing attributes into attribute groups, a different interface is used.  Again, its a web interface and again a short list box is the only means of organizing the attribute group.

I am currently working with a Product model which includes over 75 entities and over 1000 attributes; the largest entity contains nearly 400 attributes.  After the initial design session with the data governance team, my team brainstormed how best to create the model in Master Data Services.  Each of us knew that working directly in the web design interface would be extremely painful.

We finally resolved to use SQL Server itself.  We created a database with tables and columns, each  annotated with Extended Properties.  For example, a property of the database recorded the name of the Model which the database represents, each table included an Entity name property to tell us the name of the entity, and each column had several properties to tell us the name of the attribute, any attribute groups to which it belonged, the type of attribute (FreeForm, Domain Based or File), the sort order for the attribute, the entity to which it referred (if it was a Domain-Based one) etc.  Finally, we created an application which read the schema for the database and, using the MDS API, generated the desired model.

These issues are now fully addressed with Advanced Modeling in Master Data Maestro: Profisee announced the GA release of Master Data Maestro 3.0 last week.  The Advanced Modeling tool is built directly into Maestro and shows all attributes for an entity in a grid.

image

This allows you to see and change data types easily, drag and drop attributes in order to adjust the order in which the attributes are presented.

You also get a rich UI for adding attributes to an Attribute Group and ordering the attributes in that group correctly.

image

Master Data Maestro is a critical tool for enterprises looking to deliver real-world master data models using Microsoft’s Master Data Services.

Where is my “Homeless” Master Data?

The first question to be asked in any Master Data Management project is: Where is my Master Data? The prevailing assumption seems to be that master data lives in an ERP table called “the Customer Master” or “the Item Master” for example. From here, project stakeholders focus intensely upon making the ERP data complete, aligned and in-sync. These analyses are all valid and important.

But consider the analytical side of MDM: analytical databases provide the ability to aggregate and roll-up similar entities, or concepts. Therefore, reporting systems (OLAP or Business Intelligence systems) need and thrive upon data consolidation concepts – roll-ups, hierarchies, collections of master data which can be used to construct dimensional analysis.

For example, a customer may be a stand-alone business, but more often a place of business is owned by a legal entity. Credit Analysts want to see the total credit being extended to a business, not simply a customer. In manufacturing, the phrase “chain” is common to describe essentially a collection or consolidation of customers. In order to provide intelligent customer chaining, the master data needs to include these kinds of “sibling” relationships.

Some ERP systems do this well, and therefore are able to offer analysts a tightly integrated Business Intelligence experience over the ERP data. But no matter what your ERP system, this approach assumes that the enterprise is under a single ERP; indeed, this is rarely the case. If you are an organization which has grown by mergers and acquisition, you may have dozens of ERP systems in the enterprise, at varying levels of capability. Those that do have BI capabilities often promote a fairly rigid, out-of-the-box solution to hierarchy management, incompatible with other systems.

So what happens? These consolidations become mappings tables in Excel and Access applications and analysts continually scramble to keep their version of this institutional data up to date. These common data assets are essentially “living on the streets” – not inside an ERP system and not inside an MDM solution. And the astonishing thing here is: this data is highly valuable master data. Without it, the enterprise continually struggles achieving simple and reasonable Business Intelligence goals.

It’s this recognition, I believe, which has driven Microsoft in SQL Server 2012 to deliver an Excel 2010 Add-In for Master Data Services. This add-in should help ease the transition for analysts and Information Workers who have taken the homeless data in. Master Data Services provides a full-featured MDM home.

Brian Berry is a Director of Technology Consulting with BlumShapiro, focusing on Microsoft Business Intelligence solutions, with a strong focus on Systems Integration, Master Data Management and PerformancePoint Services. He has been helping companies optimize their investments in Microsoft technology for over 12 years.

Now Officially Partners with Profisee

BlumShapiro is now one of a select few consulting firms working directly with Profisee to deliver rich tools for SQL Server 2008 R2 Master Data Services.

This makes life in our Microsoft BI and Master Data practice much easier because I can freely demo both the Master Data Maestro Client Application and the Master Data Maestro server solution coming in Version 2 of the product.

Version 1 Master Data Maestro offered several benefits to organizations looking to leverage Master Data Services:

1. Workspaces for Data Stewards working with Master Data Day-in Day –out

2. Merging to the “Golden Record” enabling native merge capabilities found in the WCF Services layer which are difficult to leverage out-of-the-box

3. Hierarchy Navigation and Management because nobody wants to manage a Hierarchy in a browser (trust me)

Version 2 adds a Server component to the product which a key Data Quality ask for nearly every one of my clients: Address Standardization with Bing Maps. I can’t wait to get the beta installed in the Blum Lab.

 

Brian Berry is a Director of Technology Consulting with BlumShapiro, focusing on Microsoft Business Intelligence solutions, with a strong focus on Systems Integration, Master Data Management and PerformancePoint Services. He has been helping companies optimize their investments in Microsoft technology for over 12 years.