Archive for Todd Chittenden

Create a Pareto Chart in Power BI

“Baseball is 90% mental, and the other half is physical.” – Yogi Berra

You just have to love Yogi Berra quotes like this. We all pretty much know what he’s talking about, even if his math is not spot on. It’s a restatement of the Pareto Principle, the 80/20 rule! It applies to just about anything in life or business. If I had to write a definition of it for technical documentation, it would look something like this:

“A situation where eighty percent of events attributed to a group are caused by twenty percent of the members of the group.”

Re-stated as examples:

  • Eighty percent of your human resource issues are caused by twenty percent of your employees.
  • Eighty percent of your maintenance issues are caused by twenty percent of your equipment.
  • Eighty percent of your sales are attributed to twenty percent of your products.
  • Eighty percent of the wealth is controlled by twenty percent of the population.

 

(And I’m certainly not in that last twenty percent. If I was, I wouldn’t have to write articles like this one!)

Now that we’ve got an understanding of the principle, let’s look at how it can be visualized. Excel has a very simple wizard for creating a Pareto Chart that can be found on the Insert menu:

But we want one of these in Power BI. And Power BI doesn’t have one (yet, maybe later). We’ll need to ‘roll our own’.  Let’s discuss the various parts of the chart itself, so we know what we’re shooting for.

  • The categories, or series, at the bottom (One, Two, Three, etc.) represent the different members of the ‘group’ we are trying to analyze. They may be employees, machines on the manufacturing line, or products in our catalog.
  • The blue bar above each ‘member’ is its respective measurement (count of HR issues, money spent on maintenance, annual sales, etc.). The scale of this measurement is on the left side, in our case going from 0 to 120.
  • The final element is the curved line and the right-side scale measuring from zero to one hundred percent. This represents, at each category member, the percentage of the cumulative total of all members to the left of the member in question, inclusive. To put it another way, as we add each category’s number to the running total of those on its left, the line represents that running total divided by the entire total for all members of the category.

The arrow points to the spot on the line where it crosses 80%, in our case, after about the first four members, as can be seen by following the green dashed line from right to left, then down. The first four members would be the ‘twenty percent’ of the Pareto Principle, and their cumulative measure would be the eighty.

Note: Math wizards may point out that four members divided by a total member count of fifteen is closer to thirty percent than twenty, but remember that this is a rule of thumb, and we all know that some thumbs are bigger or smaller than others.

To plot some data in a Pareto Chart, we’ll need a couple of pieces of information from it:

  • Each member’s respective total
  • The grand total
  • The running total at each member, sorted from largest to smallest
  • The percentage that running total represents compared to the grand total

Now that we understand what we’re shooting for, let’s get started.

If your data includes a running sum of the measurement for each member, sorted by the respective member’s measurement, then you’re golden and can skip to the section titled Add the Grand Total and Running Percent. Your data may include a Ranking column so you may be able to skip the respective steps in each of the following two sections. For the rest of you, keep reading. We’ll look at two approaches to getting the intermediate bits of data: Power Query (M), and DAX.

Create the Rank and Running Sum in Power Query

Let’s start with some simple data in Excel, in fact the same data used to generate the Excel Pareto chart we used to explain the concepts:

We’ll load this data (it’s in an Excel table called “Table1”) and edit it in the Power BI Query Editor. First, we need to sort the data by the [Measure] column, sorted descending. Click the down-arrow next to the Measure column title and select Sort Descending.

Next, on the Add Column menu, select Index Column. Keep the defaults of Starting Index of 1 and Increment of 1.

I renamed my column to [Power Query Rank] to differentiate it from ranking step we’ll introduce in the model later via DAX.

Next, we’ll add the running total as a Custom Column with a formula as shown below:

Hint: If you can’t read the formula from the screen shot, it is:

= Table.Range ( #”Renamed Columns”, 0, [Power Query Rank] )

Attribution should go to Sam Vanga and SQL Server Central for this bit of M code: http://www.sqlservercentral.com/blogs/samvangassql/

The Power Query function Table.Range can be explained like this: Given a table of data, in our case the last of our query steps, a.k.a. #”Renamed Columns”, start at the 0 row (top), and go down the number of rows represented by the value in column [Power Query Rank]. The result is a table associated with each row in the query. The first row of the query has a table with one row of data in it. The second row has a table with two rows, and so forth. This table is represented by the word “Table” on each row of the column we just added.

From here, click on the ‘expand’ arrow in the column header and select the Aggregate radio button, check off the “Sum of Measure” column, and un-check “Use original column name as prefix”:

I renamed the resulting column [Power Query Running Total] (not shown).

Click Close and Apply on the Home menu.

Create the Rank and Running Sum in DAX

As with all things Microsoft, there is more than one way to accomplish a goal. In our case, the goal is to get the running total, and just like before, we’ll need the ranking first. For this exercise, we’ll be using DAX instead of Power Query, but should get the same results.

Create a column with the formula as follows:

DAX Rank = RANKX (All ( Table1 ), [Measure] )

Next, create the DAX Running Total measure as:

DAX Running Total =

CALCULATE (

SUM ( Table1[Measure] ),

FILTER (

ALLSELECTED ( Table1 ),

Table1[DAX Rank] <= MAX ( Table1[DAX Rank] )

)

)

This DAX formula does pretty much the same thing as the Power Query Range.Table function above, the only difference is that it includes the aggregate within, eliminating the need for an extra column.

Note: Know the difference between Columns and Measures in DAX. Mistaking the two will cause error, frustration, and hair loss.

Plotting all these columns and measures on a simple table visual shows that Power Query and DAX come up with the same answers for Rank and Running Total, a good sanity check. Also, the ranks are easy to verify as to accuracy, and with a little mental math, running totals are as well. I had to re-format some of the numbers to make them show without decimals.

Add the Grand Total and Running Percentage

There’s two more pieces we need: [Grand Total] which is self-explanatory, and [Running Percent], which is the ‘percentage of the [Running Total] compared to the [Grand Total]’. These can only be done in DAX. Add a measure as follows:

Grand Total = CALCULATE ( SUM ( Table1[Measure] ) , ALL ( Table1 ) )

This calculates the Grand Total and makes it available at every slice (row of each Member).

Now add the last item, a column with the expression:

Running Percent = [Power Query Running Total] / [Grand Total]

Or:

Running Percent = DIVIDE ( [Power Query Running Total] , [Grand Total] )

Note: The column [DAX Running Total] would work just as well as its Power Query equivalent since we know it has the same number.

Format this last one as a percent.

Create the Chart

Now the fun part. For this we’ll need either a “Line and Stacked Column Chart” or a “Line and Clustered Column Chart”. This is the easiest part of the whole exercise:

  • The Shared Axis is the [Member] column (“One”, “Two”, “Three”, etc.)
  • The Column values is the [Measure] column
  • The Line values in the [Running Percent] column

Like I said, simple if you have all of the data pieces in front of you.

Need help getting the right data pieces? Not sure what charts you can generate from the data pieces you have? There’s probably a way to get to where you want to be. Reach out to our team of data scientists at BlumShapiro Consulting to learn more about how data can help guide your organization into the future.

T-SQL Inner Join Need Not Always Be an Equal Join

Have you ever had to write a T-SQL query that joined a table to itself? The classic example is an employee table that lists the ID of the person’s supervisor or boss, which refers to another row in the same table. This example is nothing like that (Hint: If you do have to do that, do a search for “recursive hierarchy”). Our example today is something different.

In our example we’re going off of the following scenario: given a list of cities on a map, generate a table that can record the distance between any two. In other words, create a list where every value in the table is paired with every other value. For this exercise, we’re not going to be calculating the actual distance, simply creating the records and leaving the space for someone else to fill in. Sounds simple enough. Let’s start with the simplest example: TWO points: New York and San Francisco:

We could easily create a simple three column table with the following entries:

But the distance in one direction is the same as the return trip so really, we only need one row. After deleting the redundant record and adding Chicago to our list of cities, we get this:

 

As you can see, with the addition of just one value to our list of cities, we need two additional rows in the Distances table. As you might expect, the fourth city in the first table would require three more rows in the second. The fifth city would require four more and so on. The growth is almost exponential. At 100 cities, we would need 4950 pairs of cities in the distance table, and at 1,000 we would need close to half a million. The actual number is

{ N * ( N – 1 ) / 2 }.

Time to automate, which is the focus of this article.

Let’s settle on a total of five cities, with the last two being New Orleans and Las Vegas. We could easily build the grid below and fill in any of the appropriate blocks. The black squares are same-city pairs (New York to New York, or Chicago to Chicago, etc.) so we don’t need to worry about them. And if we can get the distance from New York to San Francisco, for example, we don’t need the distance from San Francisco back to New York, so we can also eliminate all the squares marked in gray:

Let’s replace our city names with numbers. After all, any decent database design will have a unique key assigned to each row of the City table.

And our grid now looks like this:

 

So how do we create this mega list of relationships from ONE table? An INNER JOIN of the City table to itself on the City ID column will generate the squares marked in black. But that is exactly what we DON’T want. What about a CROSS JOIN?

That will do it, but with over twice the results we need. Remember that once we know the distance from, say, Chicago to Las Vegas, we don’t also need to record the distance from Las Vegas to Chicago. And we’re still left with some extra rows where the cities are the same, like the distance from New Orleans to New Orleans.

Comparing the City ID columns in the result set above with the City ID axis of the grid, we can see that we’re only looking for pairs (cells) where the ID of City 2 is greater than the ID of City 1. From here, we have two choices: Stick with the CROSS JOIN and add a WHERE clause, or go back to the INNER JOIN and change the ON clause:

The results are the same. In fact, the estimated execution plan from the query optimizer is the same as well.

The point of this whole exercise is this: An INNER JOIN need not always be an EQUAL JOIN, though that is by far the most common occurrence. When faced with unique query needs, think outside the box. In this case, simple substituting a GREATER THAN for the EQUAL in the ON clause did the trick.

Learn more about how BlumShapiro Consulting can help with a project like this and additional application development.

About Todd: Todd Chittenden started his programming and reporting career with industrial maintenance applications in the late 1990’s. When SQL Server 2005 was introduced, he quickly became certified in Microsoft’s latest RDBMS technology and has added certifications over the years. He currently holds an MCSE in Business Intelligence . He has applied his knowledge of relational databases, data warehouses, business intelligence and analytics to a variety of projects for BlumShapiro since 2011. 

Row Level Security in Power BI – Part 2

In Part 1 of this series, we discussed Row Level Security in Power BI, that it is different from RLS in SQL Server 2016, and then went on to demonstrate two simple scenarios where RLS can be used to filter data in a model based on Role assignment utilizing some DAX filter expressions. We introduced the USERNAME() DAX function and demonstrated its usefulness. In this second article, we’ll be diving a little deeper into RLS.

Row Level Security in Power BI Scenario 3:

Those first two scenarios from Part 1 were not that bad to implement. One line of a DAX expression and you’ve got a simple filter covered. But life seldom ever hands us a scenario that is so cut and dry. For this scenario, we’re going to add some requirements that might get handed down by the business users such as:

  • Ability to grant access to a group of countries, like continents, or regions.
  • Ability to have any one country be in multiple geo-political groups.
  • Ability to grant and revoke access to an entire group of people at once rather than needing to address each person individually
  • Ability to grant access on any one country to any one individual

For this we’re going to again go back to the database and create a few tables to help us. We already have the Country table so no need to do anything there. But we’ll add four more:

  • Users
  • Groups
  • User Group Membership
  • Country Group Membership

The T-SQL code for this can be found in the attachment, and a simple database drawing is shown here:

Row Security 3-1-1

We’ll also add some rows to the respective tables via basic INSERT statements:

For those of you who, like me, are full-blooded SQL nerds, I have included the CREATE TABLE, INSERT INTO, and CREATE VIEW scripts that can be executed in your database.  But we don’t need to import all four additional tables into your model, we simply need the distinct list of Users and the Countries to which each has access. This is represented in the [dbo].[SecurityQuery] view (Script #05). If you look at the view definition, note the DISTINCT key word in the SELECT clause, the optional fields to show the User’s Full Name and the Country Name, the absence of any fields from the [Group] or[ GroupUser] tables, and the WHERE clause at the bottom that filters for only Active records.

First, we’ll need to remove the Continent table from the model, then import the SecurityQuery dataset. Since this is not an exercise in how to import data, I’ll leave it up to you to get that done. And while you’re adding it, you might as well add the [SecurityReference] view as well and we’ll cover its usage at the end of the article.

Once you have the [SecurityQuery] in your model, it needs to be joined to the Country table, on Country code:

Row 3-1-2

 

Hint: If you have sufficient rows in the Security Query dataset, when you set up the relationship between [Country Code] in the Country table and the [Country Code] in the Security Query table, Power BI will recognize that this is a one-to-many relationship with the Country table being at the “one” side.

As before, we will use the Manage Roles button on the Modelling tab to configure the DAX Filter for the Roles. We’ll start with what we learned from before:  a filter on the [UserName] field and making use of the USERNAME() DAX function as before:

Row 3-1-3

But if we were to browse the model now for any one member in our set, we’ll see that he or she is limited in the rows returned by the Security Query, but NOT limited by rows returned in the Country table. Why not? It worked before. It is because the table on which the USERNAME() filter is applied is not at the top of the hierarchy, it’s at the bottom. To make things more complicated, there are two branches from the top of that hierarchy, one branch to sales, through the country table, and one branch directly to the security table. If only there was a way to filter the countries in that table based on those listed in the Security Query table. We need something more.

Enter the CONTAINS() DAX function. The description from MDSN is as follows:

Syntax

CONTAINS(<table>, <columnName>, <value>[, <columnName>, <value>]…)

Return Value

A value of TRUE if each specified value can be found in the corresponding columnName, or are contained, in those columns; otherwise, the function returns FALSE.

If you’re like me, you’ll read that and think, “say WHAT?” Let’s implement it first, then we’ll sort out and explain how the parameters are used. Back under the Manage Roles, add a Filter to the Country table as the following (hint: it doesn’t matter which column you select as you will be replacing it with the entire text below):

CONTAINS (

‘SecurityQuery’,

‘SecurityQuery'[UserName],

USERNAME(),

‘SecurityQuery'[CountryCode],

‘Country'[CountryCode]

)

Note: The line breaks are not required in this, but are added for clarity of reading.

Now here’s the layman’s description of how the DAX statement above works with the five parameters, in order of their appearance:

Go to the ‘SecurityQuery’ table (1st parameter), and in the column ‘SecurityColumn’[UserName] (2nd parameter), look for any rows that match the value returned by the function USERNAME() (3rd parameter).  Then for those rows, take the values in the column ‘SecurityQuery’[CountryCode] (4th), and see if those values exist in the column ‘Country’[CountryCode] (5th). If that [CountryCode] value is found return TRUE for that row, and allow it to be viewed in this context.

There are a couple of things to watch for when implementing this approach. First, you need to have a relationship defined between the Country table and the SecurityQuery table. Second, the <value> parameter (3rd parameter) of the CONTAINS() function can be just that, a single value, not a list or table. But the USERNAME() function fits this bill nicely. Third, you still need the [UserName] = USERNAME() security filter on the Security Query dataset.

If you followed the INSERT scripts included, you will recall that there were only two people involved: Fred and Bob. And you may recall that we granted Fred access to a Group called “Pacific Rim” (which included countries such as Japan, Hong Kong, and the Philippines) but that [GroupMembership] row was NOT flagged with [IsActive] = 0. Fred’s mapped data looks like this:

row 3-1-4

 

Now here’s where the [SecurityRefernce] dataset comes in. After adding it as part of the model, you need to make sure that A) it has NO relationships to any other tables and that B) it also has a [UserName] = USERNAME() DAX Filter expression applied.

I added a report page to my Power BI model and a simple Matrix visualization based on this table and configured it as follows:

  • Rows = [Country Name]
  • Columns = [GroupName]
  • Values = [IsActive], with a SUM aggregation
  • For the [IsActive], I set up Conditional Formatting with the following properties

row 3-1-5

The resulting visualization is a nice way to see exactly HOW a person has gotten access to any one particular country (via membership in which Group), and what countries are in any Group they have are also a member of, even if that membership is not active.

Row 3-1-6

 

It is true that there is a lot introduced here that may not specifically be “Row Level Security” stuff, but rather T-SQL overhead. After all, all you really need for this 3rd scenario to work is the distinct list of Users and Countries that are allowed access. But given the 173 countries in the world and say, two dozen people to which access control is required, that’s potentially up to 4,000 rows of data controlling who has access to what. Breaking it into Users and Groups and Memberships is a way to manage the mess.

About Todd: Todd Chittenden started his programming and reporting career with industrial maintenance applications in the late 1990’s. When SQL Server 2005 was introduced, he quickly became certified in Microsoft’s latest RDBMS technology and has added certifications over the years. He currently holds an MCSE in Business Intelligence . He has applied his knowledge of relational databases, data warehouses, business intelligence and analytics to a variety of projects for BlumShapiro since 2011. 

Power BI Demo CTA

 

Row Level Security in Power BI – Part 1

The folks at Microsoft have been steadily adding features to both the Power BI service and the downloadable Power BI Desktop over the past eighteen months since it went General Availability in July of 2015. Row Level Security is one such feature, allowing the developer to restrict which data is seen by users. Like most other major features, it was introduced into Power BI Service first, then eventually added to the Desktop. However, unlike other features, upon its addition to the Desktop, it, or most of it, was removed from the Service. After all, you really only need it in one place. Having it in both places would just cause confusion and conflicts. However, there are some lingering aspects in the Power BI service that require attention after deploying a model that has Row Level Security defined. In this post, we’ll look at this feature of Power BI, how to get it up and running in the Desktop, what needs to be done on the Service, and some common scenarios for when it might be useful.

But before we get started, you need to be made aware of two points: First is that RLS in Power BI is NOT the same as RLS in SQL Server 2016. Yes, both are great new feature in their respective products, and do pretty much the same thing of restricting user access to certain rows of data. They even share the same acronym, and the results are pretty much the same, but the implementation methods of each are quite different. RLS in SQL Server uses T-SQL functions and other artifacts inside the SQL database, while Power BI uses DAX. Another major difference is that RLS in Power BI can be used regardless of the data source type.

The second major point is that RLS and Power BI Dashboard’s Q&A feature are mutually exclusive. You can have one or the other, but not both. While some readers may not really care about that, others may simply stop reading right here based on that revelation alone. But before you click off the page because of this limitation, take a moment to voice your opinion at the Power BI Feedback site for this particular issue. Even if you don’t care, some day you might and I encourage you to vote on the item. Go ahead, we’ll wait.

Ok, you’re back? Good, let’s get started.

Row Level Security in Power BI Scenario 1

The first scenario we’re going to explore is one where you have hard-coded roles for each particular segment of data. For the demo model for all these scenarios, I generated data that was spread across the globe, and tied the access directly to continents, then later to countries. This makes it very easy to determine at a glance if data is being filtered properly. The Power BI map visualization is the ideal choice for this as your eyes can easily determine which geographic entities are represented by the data. We’ll see the map again later on, but for starters, the unfiltered data maps like this:

Row Level Security 1

 

Before we set the security for this data, I’ll explain a little about this particular model’s data structure: It is made up of three tables as shown here: country, customer, and Salessales. The country table lists all of the countries in the world, and which continent they are on. Also, that table is at the “one” side of a one-to-many relationship to Customer (on [CountryCode]), which is in turn on the “one” side of its relationship to the Sales table.

Row Level Security 21

 

If we can limit the countries to only those in, say, North America, that would limit the customers to those that are located there, and that would limit the sales, in a cascading filter sort of way. We end up seeing only sales to Customers that are in North America. It would be like adding a WHERE predicate to a SQL query which filters as follows:

SELECT * FROM dbo.Country WHERE Continent = ‘North America’

Let’s get started.

  • On the Modelling tab, click Manage Roles.
  • Click Create to add a new Role, and name it “North America”.
  • Under Tables, click the ellipsis next to Country and select Add Filter, then select the [Continent] field
  • Edit the Table Filter DAX Expression it generates, substituting “Value” for “North America”

row 3

 

To test out this Role, back on the Modelling tab, click View As Roles. Select the North America role. My resulting map looks like this: Row 4

 

Notice the yellow band at the top of the report page that not only shows the security context under which the data is being viewed, but also offers a link to go back to unfiltered data.

We’ve successfully created a Security Role for North America. With only six distinct continents in the Country table, it is a simple operation to add additional roles for the remaining five, each appropriately named.

After deploying the model to the Power BI service, we have some additional work. Locate the data set for the model, click the ellipsis next to it and select security. This is where you would specify who has membership in which role. And obviously, membership in any one role gives you access to see the data associated with that particular continent. There is nothing stopping you from adding any one name to multiple roles. As you can see from the screenshot below, I have already added someone to the North America role.

row 5

 

Get used to this operation as it will be referenced in each of the other two scenarios, but not directly explained again as the operation is exactly the same. Only the name of the Role(s) will change.

This Row Level Security approach is useful if you have a relatively small number of distinct values at the top of a hierarchy (Continent / Country / Sales in this case), and that list is not likely to change. Changes in that list of values would dictate corresponding changes to the Roles, and re-deployments of the Power BI model. But I think we’re safe with our six continents on this earth for at least another googol years or so.

Row Level Security in Power BI Scenario 2

In this scenario, we’re going to take the security role assignments out of the Power BI Service and hold them at the database layer. We’ll use the same model from the previous scenario, minus the six roles we defined before. This model includes one additional table, Continent as shown below: row 6

And as you can imagine, this table now moves the top of our de-facto hierarchy, with a one-to-many relationship between it and the Country table, on field [Continent]:

row 7

To set security in the model we will do the same thing using the Manage Roles dialog box. If you are continuing on from Scenario 1 and have added the Continent table, you can delete the six Roles you added before as they are not needed here. But this time the single role of “Users” will filter on the Continent table, with the DAX expression of: [UserName] = USERNAME()

Row 8

 

The DAX USERNAME() function returns the login name and domain of the logged in user in the format of <name>@<domain>. This returned value ties nicely with the [UserName] column of the Continent table.

To test out this method of securing the data, we will again go to the View As Roles button on the Modelling tab, but this time we’ll check off both the Role of “Users”, and also fill in the “Other user” option by supplying a user name. If you recall the Continent table above, Eustice is assigned to Europe, so she only sees data for Europe. Notice below also how the Continent slicer gets filtered to just one value, and the Country slicer below it gets filtered as well to only Countries to which Eustice has access — those in Europe.

Row 9

After deploying the model, we still need to add email addresses to our User role just like before, but this time we have one role to worry about, and so all six names can be added under that one Role. Control over who gets to see what continent is defined in the rows of the Continent table back in the database. By changing or removing a name and refreshing the data to Power BI, we will alter data access rights without the need to redeploy the model or make adjustments to the Dataset’s Security settings on the site.

This scenario is handy when you again have a clearly defined top-down hierarchy, and when the users’ access is mutually exclusive. One and only one person can have access to each continent. I foresee this being implemented in a model where a salesperson table is at the head of the hierarchy and by filtering the salesperson, the customers and their respective sales get filtered as well. One certainly would not want to be creating a role for each salesperson, re-deploying the model, and then assigning each salesperson to their respective role as we did in Scenario 1. Instead, this approach handles the security assignment dynamically with the USERNAME() function.

Stay tuned for Part 2 where we will look at a third scenario which is a bit more involved, with multiple users, multiple groups, multiple group memberships, and multiple geo-political regions.

About Todd: Todd Chittenden started his programming and reporting career with industrial maintenance applications in the late 1990’s. When SQL Server 2005 was introduced, he quickly became certified in Microsoft’s latest RDBMS technology and has added certifications over the years. He currently holds an MCSE in Business Intelligence . He has applied his knowledge of relational databases, data warehouses, business intelligence and analytics to a variety of projects for BlumShapiro since 2011. 

Power BI Demo CTA