Tag Archive for SQL Server

T-SQL Inner Join Need Not Always Be an Equal Join

Have you ever had to write a T-SQL query that joined a table to itself? The classic example is an employee table that lists the ID of the person’s supervisor or boss, which refers to another row in the same table. This example is nothing like that (Hint: If you do have to do that, do a search for “recursive hierarchy”). Our example today is something different.

In our example we’re going off of the following scenario: given a list of cities on a map, generate a table that can record the distance between any two. In other words, create a list where every value in the table is paired with every other value. For this exercise, we’re not going to be calculating the actual distance, simply creating the records and leaving the space for someone else to fill in. Sounds simple enough. Let’s start with the simplest example: TWO points: New York and San Francisco:

We could easily create a simple three column table with the following entries:

But the distance in one direction is the same as the return trip so really, we only need one row. After deleting the redundant record and adding Chicago to our list of cities, we get this:

 

As you can see, with the addition of just one value to our list of cities, we need two additional rows in the Distances table. As you might expect, the fourth city in the first table would require three more rows in the second. The fifth city would require four more and so on. The growth is almost exponential. At 100 cities, we would need 4950 pairs of cities in the distance table, and at 1,000 we would need close to half a million. The actual number is

{ N * ( N – 1 ) / 2 }.

Time to automate, which is the focus of this article.

Let’s settle on a total of five cities, with the last two being New Orleans and Las Vegas. We could easily build the grid below and fill in any of the appropriate blocks. The black squares are same-city pairs (New York to New York, or Chicago to Chicago, etc.) so we don’t need to worry about them. And if we can get the distance from New York to San Francisco, for example, we don’t need the distance from San Francisco back to New York, so we can also eliminate all the squares marked in gray:

Let’s replace our city names with numbers. After all, any decent database design will have a unique key assigned to each row of the City table.

And our grid now looks like this:

 

So how do we create this mega list of relationships from ONE table? An INNER JOIN of the City table to itself on the City ID column will generate the squares marked in black. But that is exactly what we DON’T want. What about a CROSS JOIN?

That will do it, but with over twice the results we need. Remember that once we know the distance from, say, Chicago to Las Vegas, we don’t also need to record the distance from Las Vegas to Chicago. And we’re still left with some extra rows where the cities are the same, like the distance from New Orleans to New Orleans.

Comparing the City ID columns in the result set above with the City ID axis of the grid, we can see that we’re only looking for pairs (cells) where the ID of City 2 is greater than the ID of City 1. From here, we have two choices: Stick with the CROSS JOIN and add a WHERE clause, or go back to the INNER JOIN and change the ON clause:

The results are the same. In fact, the estimated execution plan from the query optimizer is the same as well.

The point of this whole exercise is this: An INNER JOIN need not always be an EQUAL JOIN, though that is by far the most common occurrence. When faced with unique query needs, think outside the box. In this case, simple substituting a GREATER THAN for the EQUAL in the ON clause did the trick.

Learn more about how BlumShapiro Consulting can help with a project like this and additional application development.

About Todd: Todd Chittenden started his programming and reporting career with industrial maintenance applications in the late 1990’s. When SQL Server 2005 was introduced, he quickly became certified in Microsoft’s latest RDBMS technology and has added certifications over the years. He currently holds an MCSE in Business Intelligence . He has applied his knowledge of relational databases, data warehouses, business intelligence and analytics to a variety of projects for BlumShapiro since 2011. 

How Much is Your Data Worth?

Data is the new currency in today’s modern businesses. From the largest international conglomerate down to the smallest neighborhood Mom-and-Pop shop, data is EVERYTHING! Without data, you don’t know who to bill for services, or for how much. You don’t know how much inventory you need on hand, or who to buy it from if you run out. Seriously, if you lost all of your data, or even a small but vitally important piece of it, could your company recover? I’m guessing not.

“But,” you say, “We have a disaster recovery site we can switch to!”

That’s fine if your racks melt down into a pool of heavy metals on the server room floor, then yes, by all means switch over to your disaster recovery site because molten discs certainly qualify as a “disaster!” Databases hosted on private or public cloud virtual machines are less susceptible, but not immune, to hardware failures.  But what about a failure of a lesser nature? What if one of your production databases gets corrupted because of a SQL Injection hack, cleaned out by a disgruntled employee, or is accidentally purged because a developer thought he was working against the DEV environment? Inadvertent changes to data are no respecter of where such data is stored, or how it is stored! And, sorry to say, clustering or other HADR solutions (High Availability/Disaster Recovery, such as SQL Server Always On technology) may not be able to save you in some cases. Suppose some data gets deleted or is modified in error. These ‘changes’, be they accidental or on purpose, may get replicated to the inactive node of the cluster before the issue is discovered. After all, the database system doesn’t know if it should stop such changes from happening when the command to modify data is issued. How can it tell an ‘accidental purge’ from regular record maintenance? So the system replicates those changes to the failover node. You end up with TWO copies of an incorrect database instead of one good one and one bad! And worse yet, depending on your data replication latency from your primary site to the disaster recovery site, and how quickly you stop the DR site from replicating, THAT may get hosed too if you don’t catch it in time!

Enter the DATABASE BACKUP AND RESTORE, the subject of this article. Database backups have been around as long as Relational Database Management Systems (RDBMS). In my humble opinion, a product cannot be considered a full-featured RDBMS unless it has the capability of performing routine backups and allows for granular restore to a point in time. (Sorry, but Microsoft Excel and Access simply do not qualify.) Being a Microsoft guy, I’m going to zero in on their flagship product: SQL Server, but Oracle, SAP, IBM and many others will have similar functionality. (See the Gartner Magic Quadrant for database systems for a quick look at various vendors, including Microsoft a clear leader in this Magic Quadrant.)

So what is a BACKUP? “Is it not simply a copy of the database?” you say, “I can make file copies of my Excel spreadsheet. Isn’t that the same as a backup?” Let me explain how database backups work and then you can decide the answer to that question.

First of all, you’ll need the system to create a FULL database backup. This is a file generated by the database server system, stored on the file system, the format of which is proprietary to the system. Typically, full backups are taken once per night for a moderately sized database, for example under 100 GB, and should be handled via an automated scheduling service such as SQL Agent.

iStock_000006412772XSmallNext, you’ll need TRANSACTION LOG backups. Log backups, as they are known, record every single change in the database that has occurred since the last full or log backup. A good starting point is scheduling log backups at least every hour, with possible tightening down to every few minutes if the database is extremely active.

Now, to restore a database in the event of a failure, you need to do one very important step: backup the transaction log one last time if you want to have any hope of restoring to a recent point. To perform the actual restore, you’ll need what is known as the ‘chain of backups’ which includes the most recent full backup and every subsequent log backup. During the restore, you will be able to specify a point in time anywhere from the time of the full backup to the time of the latest log backup, right down to the second or millisecond.

So we’re all set right? Almost. The mantra of Database Administrators the world over regarding backups is this: “The backups are only as good and sure as the last time we tested the RESTORE capability.” In other words, if you haven’t tested your ability to restore your database to a particular point in time, you can’t be sure you’re doing it right. Case in point: I saw a backup strategy once where the FULL backups were written directly to a tape drive every night, then first thing in the morning, the IT guys would dutifully eject the tapes and immediately ship them out to an off-site storage location. How can you restore a database if your backups are not available? Case two: The IT guys, not understanding SQL backup functionality and benefits, used a third party tool to take database backups, but didn’t bother with the logs. After four years of this, they had a log that was 15 times the size of the database! So big, in fact, that there was no space available to hold its backup. About a year after I got the situation straightened out with regular full AND transaction log backups going, the physical server (virtualization was not common practice then) experienced a debilitating hardware failure and the whole system was down for three days. Once running again, the system (a financials software package with over 20,000 tables!) was restored to a point in time right before the failure. Having the daily FULL backups saved the financials system (and the company). But also having the log backups saved many people a day’s work if we had had to go back to the latest FULL backup.

So, what’s your data worth? If your data is critical to your business, it is critical that you properly back up the data. Talk to us to learn how we can help with this.

About Todd: Todd Chittenden started his programming and reporting career with industrial maintenance applications in the late 1990’s. When SQL Server 2005 was introduced, he quickly became certified in Microsoft’s latest RDBMS technology and has added certifications over the years. He currently holds an MCSE in Business Intelligence. He has applied his knowledge of relational databases, data warehouses, business intelligence and analytics to a variety of projects for BlumShapiro since 2011. 

Technology Talks Newsletter CTA

Electrify Your Business with Data

Data is a lot like electricity – a little bit at a time, not too much direct contact, and you’re fine. For example, a single nine-volt battery doesn’t provide enough power to light a single residential bulb. In fact, it would take about a dozen nine-volt batteries to light that single bulb, and it would only last about an hour. It’s only when you get massive amounts of electricity flowing in a controlled way that you can see real results, like running electric motors, lighting up highway billboards, heating an oven or powering commuter trains.

iStock_000006412772XSmallIt’s the same with data. The sale of a single blue shirt at a single outlet store is not much data. And it’s still not much even when you combine it with all the sales for that store in a single day. But what about a year’s worth of data, from multiple locations? Massive amounts of data can do amazing things for us as well. We have all seen in today’s data centric business environment what controlled usage of data can do.

Some examples include:

  • The National Oceanic and Atmospheric Administration (NOAA) can now more accurately predict a hurricane’s path thanks to data that has been collected over time
  • Marketing firms can save money on culled down distribution lists based on customer demographics, shopping habits and preferences.
  • Medical experts can identify and treat conditions and diseases much better based on a patient’s history, life risks and other factors.
  • Big ‘multi-plex’ movie houses can predict more accurately the number of theatres it will need to provision for the latest summer block buster by analyzing Twitter and other social media feeds as related to the movie.

All of this can be done thanks to controlled data analytics.

The key word here is “controlled.” With a background in marine engineering and shore-side power generation, I have seen my share of what can happen when electricity and other sources of energy are not kept ‘controlled.’ Ever see what happens when a handful of welding rods go through a steam turbine spinning at 36,000 RPM and designed for tolerances of thousandths of an inch? It’s not pretty. After as many years in database technologies, data analysis and visualizations, I have also seen the damage resulting from large quantities of uncontrolled data. In his book Signal: Discerning What Matters Most in a World of Noise, author Steven Few shows a somewhat tongue-in-cheek graph that ‘proves’ that the sale of ice cream is a direct cause of violent crime. Or was it the other way around?  It’s an obvious comic hyperbole that serves to illustrate his point that we need to be careful of how we analyze and correlate data.

With the ‘big data‘ explosion, proponents will tell you that ‘if a little is good, then more is better.’ It’s an obvious extension, but is it accurate? Is there such a thing as ‘too much data’?

Let’s say you are a clothing retail store in the mall. Having data for all of your sales over the past ten years, broken down by item, store, date, salesperson and any number of other dimensions may be essential. What if we were to also include ALL the sales of ALL competitors’ products, seasonal weather history, demographic changes, foot traffic patterns in the mall and just about anything else that could influence a customer’s decision to buy your product even down to what they had for lunch just before they made the purchase? The result would most likely be UN-controlled data analysis. This tends to lead to erroneous correlations and bad decisions. For instance, you just might discover that customers are four times as likely to make a purchase if they had pizza for lunch, never realizing that there are more pizza restaurants near your stores than any other type of food service!

When it comes to data, stick with what you know is good data. Make sure it’s clean, reliable and most of all, relevant. Most of all, make sure you CONTROL your data. Without control, there may be no stopping the damage that results.

About Todd: Todd Chittenden started his programming and reporting career with industrial maintenance applications in the late 1990’s. When SQL Server 2005 was introduced, he quickly became certified in Microsoft’s latest RDBMS technology and has added certifications over the years. He currently holds an MCSE in Business Intelligence. He has applied his knowledge of relational databases, data warehouses, business intelligence and analytics to a variety of projects for BlumShapiro since 2011.

BI Call to Action

Dynamic Cell Formatting in SSRS

Early in my Business Intelligence Designer days using other report designer tools, I was all about the data. Get the data onto the report, black and white, never mind fancy fonts or colors. After all, the numbers won’t be any different if you try to make them prettier. A bit like putting lipstick on a pig! Then I came to realize that judicious deviation from the default fonts was not only acceptable, but desirable as evidenced in SQL Server Reporting Services report templates. Sub-total numbers started showing up in a heavier and more prominent font and weight. Grand total number even more so. Such emphasis on the more important numbers on a report help the eyes to quickly identify them as such, leaving the brain to concentrate on other things, such as the numbers themselves. You end up spending more time thinking about the data, and less time finding what you want.

But report cell formatting can go so much further than simply putting more emphasis on numbers at the bottom of the page. With SQL Server Reporting Services, you have the ability to do ‘conditional formatting’, much like can be done in Excel or Access. (If one number is greater than another, format the cell one way, if not, do something else.) To take advantage of the rich formatting options available in SSRS, you need to understand a few basic concepts, and some nuances of Reporting Services.

Know Your Properties

Click on a tablix cell in a report, then click the Properties button.


Note the myriad of properties that can be set just for the cell formatting. Each property, when you go to access its options via the drop-down menu, will have a choice labeled “Expression…”. This is where we will be putting in the ‘conditions’ of the conditional formatting.


Know your Expression Editor, Program Flow expressions, and Property expressions

The Expression Editor in Reporting Services is a powerful tool. For the purposes of our conditional formatting, we’ll be using it to a) examine some value, b) set a Property, and c) supply a default. For the Font Color property of a cell, click the Expression… option below the color palette (see the screen shot above). At this point, the Expression will most likely contain the single word “Black“, assuming you are going with a standard font color. Also note that the work Black is NOT enclosed in double quotes.

 

In the Category list in the lower left, expand the Common Functions node, then click on Program Flow. Note that there are three functions in the Item list:

Choose
IIf
Switch

Each of these can be used effectively for this task, but we’ll stick with the IIf ( ) function as it is probably the easiest to understand. Basically it works like this: IF a certain condition evaluates to TRUE, THEN do what it says in the second section, OTHERWISE, do what it says in the third section (the default). To see it in action, enter the following formula:

=IIf ( 1 = 1 , “Black” , “Red” )

Click OK, then Preview the report. Because 1 always equals 1, the first section evaluates to TRUE, so the second section is used, in this case “Black”. Note that we need to enclose our color names in double quotes in the Expression Editor for this to work. Now edit the formula and replace the 1 = 1 with 1 = 0. The preview will show the cell contents in a red font. Armed with this knowledge, we can replace the 1 = 0 with a more appropriate business condition. Delete the 1 = 0 text, but leave the cursor in that spot. Click on the Fields (DatasetName) node to see a list of fields available for the Dataset. Double click one to have it inserted in the formula. Add some examination criteria for it that makes sense to your business users. Remember that this part of the expression must evaluate to either True or False. Your final Expression might look like this:

= IIf ( Fields!YTD_Sales.Value >= Fields!PriorYear_YTD_Sales.Value , “Black”, “Red” )

The above conditional expression template can also be used to set any other property available. The only trick is to know what to put in place of the “Black” and “Red” components. The easiest way to determine that is to open the list of choices for the property, and use any set of values you see displayed, as long as you enclose them in double quotes in your expression. Shown below is a list of possible values for the Font Weight property

 

Using the same True/False comparison from the previous example, the Font Weight property could contain the expression:

= IIf ( Fields!YTD_Sales.Value >= Fields!PriorYear_YTD_Sales.Value , “Bold”, “Normal” )

Font Size is a little trickier as it does not list any valid values. Just know that “8pt” is an 8 point font, “10pt” is 10 point, and so on.

Other Conditional functions

We skipped over the Choose () and Switch () functions, but they operate in a similar manner to IIf(). They return a value based on some comparison. Microsoft’s example of the Choose() function leaves out one simple element: the return value. Sure, it is nice to show the function in use:

=Choose (2, “13”, “15”, “21)

But it would be even better to explain that the function would be ‘choosing’ the second option, which is “15”. How could we use this in a report? Suppose your Salespeople are ranked on performance, and that ranking is available as a report field as an integer. You could select a Font Weight based on the ranking as follows:

= Choose ( Fields!EmployeeRank.Value, “Heavy”, “ExtraBold”, “Bold”, “Semibold”, “Medium”, “Normal”)

The Switch() function lets you list ‘comparison/return’ pairs, all separated by commas, and the function will take the return value that matches the first comparison that evaluates to True. The basic format for this function is:

= Switch ( Comparison 2, Return Value 1, Comparison 2, Return Value 2, … Comparison N, Return Value N )

A Font Size expression could be written as:

= Switch ( Fields!EmployeeLevel.Value = “Manager”, “16pt”,
Fields!AnnualSales.Value <= Fields!SalesGoal.Value, “14pt”,
Fields!AnnualSales.Value > Fields!SalesGoal.Value, “10pt” )

Note here that we have used a different field in the second and third comparisons than is used in the first. There is no rule that says that the comparison elements need to look at the same fields.

Summary

It’s easy to see that the Expression editor in SQL Server Reporting Services can be used to build powerful and dynamic expressions that drive various formatting properties. By manipulating multiple Property Expressions for a single cell, the report designer can draw attention to various numbers and report elements if and when the numbers being reported warrant it. It is a much better viewing experience than a simple black on white report where every number is given the same emphasis.