Tag Archive for Master Data Services

T-SQL Inner Join Need Not Always Be an Equal Join

Have you ever had to write a T-SQL query that joined a table to itself? The classic example is an employee table that lists the ID of the person’s supervisor or boss, which refers to another row in the same table. This example is nothing like that (Hint: If you do have to do that, do a search for “recursive hierarchy”). Our example today is something different.

In our example we’re going off of the following scenario: given a list of cities on a map, generate a table that can record the distance between any two. In other words, create a list where every value in the table is paired with every other value. For this exercise, we’re not going to be calculating the actual distance, simply creating the records and leaving the space for someone else to fill in. Sounds simple enough. Let’s start with the simplest example: TWO points: New York and San Francisco:

We could easily create a simple three column table with the following entries:

But the distance in one direction is the same as the return trip so really, we only need one row. After deleting the redundant record and adding Chicago to our list of cities, we get this:

 

As you can see, with the addition of just one value to our list of cities, we need two additional rows in the Distances table. As you might expect, the fourth city in the first table would require three more rows in the second. The fifth city would require four more and so on. The growth is almost exponential. At 100 cities, we would need 4950 pairs of cities in the distance table, and at 1,000 we would need close to half a million. The actual number is

{ N * ( N – 1 ) / 2 }.

Time to automate, which is the focus of this article.

Let’s settle on a total of five cities, with the last two being New Orleans and Las Vegas. We could easily build the grid below and fill in any of the appropriate blocks. The black squares are same-city pairs (New York to New York, or Chicago to Chicago, etc.) so we don’t need to worry about them. And if we can get the distance from New York to San Francisco, for example, we don’t need the distance from San Francisco back to New York, so we can also eliminate all the squares marked in gray:

Let’s replace our city names with numbers. After all, any decent database design will have a unique key assigned to each row of the City table.

And our grid now looks like this:

 

So how do we create this mega list of relationships from ONE table? An INNER JOIN of the City table to itself on the City ID column will generate the squares marked in black. But that is exactly what we DON’T want. What about a CROSS JOIN?

That will do it, but with over twice the results we need. Remember that once we know the distance from, say, Chicago to Las Vegas, we don’t also need to record the distance from Las Vegas to Chicago. And we’re still left with some extra rows where the cities are the same, like the distance from New Orleans to New Orleans.

Comparing the City ID columns in the result set above with the City ID axis of the grid, we can see that we’re only looking for pairs (cells) where the ID of City 2 is greater than the ID of City 1. From here, we have two choices: Stick with the CROSS JOIN and add a WHERE clause, or go back to the INNER JOIN and change the ON clause:

The results are the same. In fact, the estimated execution plan from the query optimizer is the same as well.

The point of this whole exercise is this: An INNER JOIN need not always be an EQUAL JOIN, though that is by far the most common occurrence. When faced with unique query needs, think outside the box. In this case, simple substituting a GREATER THAN for the EQUAL in the ON clause did the trick.

Learn more about how BlumShapiro Consulting can help with a project like this and additional application development.

About Todd: Todd Chittenden started his programming and reporting career with industrial maintenance applications in the late 1990’s. When SQL Server 2005 was introduced, he quickly became certified in Microsoft’s latest RDBMS technology and has added certifications over the years. He currently holds an MCSE in Business Intelligence . He has applied his knowledge of relational databases, data warehouses, business intelligence and analytics to a variety of projects for BlumShapiro since 2011. 

Master Data Maestro 3.0 Released

If you are working with SQL Server Master Data Services 2012 to develop real-world Master Data Models for your enterprise, then you have likely struggled with the model design environment provided out of the box by Microsoft.  This environment does not support large data models well.

Here are some common scenarios:

1. When adding new attributes to an existing entity, the design environment displays a very short list box containing all existing attributes.  Its very difficult to see the complete list of attributes, their Master Data types, and re-ordering for ease of browsing is accomplished with up down arrows.

2. When organizing attributes into attribute groups, a different interface is used.  Again, its a web interface and again a short list box is the only means of organizing the attribute group.

I am currently working with a Product model which includes over 75 entities and over 1000 attributes; the largest entity contains nearly 400 attributes.  After the initial design session with the data governance team, my team brainstormed how best to create the model in Master Data Services.  Each of us knew that working directly in the web design interface would be extremely painful.

We finally resolved to use SQL Server itself.  We created a database with tables and columns, each  annotated with Extended Properties.  For example, a property of the database recorded the name of the Model which the database represents, each table included an Entity name property to tell us the name of the entity, and each column had several properties to tell us the name of the attribute, any attribute groups to which it belonged, the type of attribute (FreeForm, Domain Based or File), the sort order for the attribute, the entity to which it referred (if it was a Domain-Based one) etc.  Finally, we created an application which read the schema for the database and, using the MDS API, generated the desired model.

These issues are now fully addressed with Advanced Modeling in Master Data Maestro: Profisee announced the GA release of Master Data Maestro 3.0 last week.  The Advanced Modeling tool is built directly into Maestro and shows all attributes for an entity in a grid.

image

This allows you to see and change data types easily, drag and drop attributes in order to adjust the order in which the attributes are presented.

You also get a rich UI for adding attributes to an Attribute Group and ordering the attributes in that group correctly.

image

Master Data Maestro is a critical tool for enterprises looking to deliver real-world master data models using Microsoft’s Master Data Services.

Securing Large Data Models in Master Data Services 2012

I ran into this issue today when experimenting with the security capabilities in SQL Server 2012 Master Data Services.  I am working with a very large Product data model for a Manufacturing client.  The core entity has 391 attributes at this point in time.  I was attempting to understand how I can secure this entity such that users responsible for mass updating a set of attributes can refine their view only to  those attributes.

In the prior version of MDS (SQL Server 2008 R2), Attribute Groups were the solution to this problem. In 2012, it would seem that Attribute Groups are not secured in the same way. But that is a topic for another time.

When I attempted to set a Deny permission on a single attribute in the model, I received a generic error in the Web Application interface: An Unknown Error has occurred. When I turned on the Trace in web.config (located under C:\Program Files\SQL Server\110\Master Data Services\WebApplication), saw the following error:

MDS Error: 0 : LogError : Operation is not valid due to the current state of the object.
    DateTime=2013-01-18T16:46:19.5107689Z

StackOverflow to the rescue!  This error is apparently a common one in ASP.NET web applications where large collections are being serialized.  By adding an appSettings key to the web.config, I was able to increase the limit of collection keys/members:

<appSettings> <add key=”aspnet:MaxHttpCollectionKeys” value=”2001″ /> </appSettings>

http://stackoverflow.com/questions/8832470/operation-is-not-valid-due-to-the-current-state-of-the-object-error-during-pos

And that solved the problem!

Master Data Services API at Code Camp 5

This weekend was a bit of stretch for me: 2 presentations on MDM and 1 production “Go-Live” with MDS, BizTalk and JD Edwards 9.  Stressful.  But Bob and SB who run the CT Code Camps made the second presentation pretty easy – thanks guys!

Code Camp is a great opportunity to step back and try to package “what I do” into bite-size chunks for my fellow .NET solutions developers. It’s also an opportunity to take some time and learn about stuff I don’t have time to read up on.  Metro UI, Speech Recognition, SQL Azure, SharePoint Online … lot’s of valuable information.

My friend John Flannery at Evolution1 did an excellent job covering SQL Server 2012 enhancements.  Nice job, John!

Another shout out to Brian Zebarth and Matt Rankin, my colleagues from BlumShapiro who handled all the arrangements around our sponsorship of the event.  Thanks a million guys!

My presentation was titled “SQL Server Master Data Services API Fundamentals” – aka MDS for Developers.  I wanted to keep this sharply focused on:

  1. What is Master Data Services?
  2. Why would I use it?
  3. How do I setup a Development Environment?
  4. How do I get data in?
  5. How do I get data out?

And for those who would like to review my slides and code samples:

PowerPoint- http://sdrv.ms/Lwbpnc

Code Samples – http://sdrv.ms/LWXEgP

See you next time!

Brian Berry is a Director of Technology Consulting with BlumShapiro, focusing on Microsoft Business Intelligence solutions, with a strong focus on Systems Integration, Master Data Management and PerformancePoint Services. He has been helping companies optimize their investments in Microsoft technology for over 12 years.