Tag Archive for Big Data

Technology Talks Podcast

Listen to our new podcast, Technology Talks, hosted by Hector Luciano, Consulting Manager at BlumShapiro Consulting. Each month, Hector will talk about the latest news and trends in technology with different leaders in the field.

Catch up with our first two episodes today:

In this first episode, Hector speaks with Noah Ullman, Director at BlumShapiro Consulting about the 4th Industrial Revolution and Digital Transformation. The two discuss what digital transformation means for your organization and how you can prepare to be a leader in this new digital age.


In episode two, Hector speaks with Brian Berry, Director at BlumShapiro Consulting about big data, the role it can play for your organization and how it connects to Digital Transformation and the 4th Industrial Revolution.

 

 

 

Do Data Scientists Fear for Their Jobs?

iStock_000006412772XSmall

What happened in this last election, November 2016? Rather, what happened to the analysts in this last election? Just about every poll and news report prediction had Hillary Clinton leading by a comfortable margin over Donald Trump. In every election I can recall from years past, the number crunchers have been pretty accurate on their predictions—at least on who would win if not the actual numerical results. However, this turned out not to be the case for the 2016 presidential race.

But this is not the first time this has happened. In 1936, Franklin Delano Roosevelt defeated Alfred Landon, much to the chagrin of The Literary Digest, a magazine that collected two and a half million mail-in surveys—roughly five percent of the voting population at the time. George Gallup, on the other hand, predicted a Roosevelt victory with a mere 3,000 interviews. The difference, according to the article’s author, was that Literary Digest’s mailing lists were sourced from vehicle registration records. How did this impact the results? In 1936 not everyone could afford a car, therefore, the Literary Digest sample was not a truly representative sample of the population. This is known as a sampling bias, where the very method used to collect the data points introduces its own force on the numbers collected. On the other hand, Gallup’s interviews were more in-line with the voting public.

The article cited above also mentions Boston’s ‘Street Bump’ smartphone app “that uses the phone’s accelerometer to detect potholes… as citizen’s of Boston … drive around, their phones automatically notify City Hall of the need to repair the road surface.” What a great idea! Or was it? The app was only collecting data from people who a) owned a smart phone, b) were willing to download the app, and c) drove regularly. Poorer neighborhoods were pretty much left out of the equation. Again, an example of sample bias.

The final case, and not to pick on Boston, but I recently heard that data scientists analyzing Twitter feeds for positive and negative sentiment, had to factor in the term “wicked,” as a positive sentiment force, but only for greater Boston. Apparently, that adjective doesn’t mean what the rest of the country assumes is means.

Along with sampling bias, another driving factor in erroneous conclusions from analyzing data is the ‘undocumented confounder.’ Suppose, for example, you wanted to see which coffee people prefer better, that from Starbucks or Dunkin’ Donuts. For this ‘experiment’, we’re interested only in the coffee itself, nothing else. So we have each shop prepare several pots with varying additions like ‘cream only’, ‘light and sweet’, ‘black no sugar’, etc. We then take these to a neutral location and do a side-by-side blind taste comparison. From our taste results we draw some conclusions as to which coffee is more preferred by the sample population. But unbeknownst to us, when the individual shops prepared their various samples of coffee, one shop used brown sugar and one used white sugar, or one used half-and-half while the other used heavy cream. The cream and sugar are now both undocumented confounders of the experiment, possibly driving results one way or the other.

So, back to the elections, how did this year’s political analysts miss the mark? Without knowing their sampling methods, I’m willing to suggest that some form of sample bias or confounder may have played a part. Was it the well known ‘cell-only problem’ again (households with no land-line are less likely to be reached by pollsters)? Did they take into consideration that Trump used Twitter as a means to deliver sound byte like messages to his followers, bypassing the main-stream media’s content filters? Some other factor perhaps as yet unidentified? As technology advances and society trends morph over time, so must political polling and data analysis methods.

Pollsters and data scientists are continually refining their methods of collection, compensation factors and models to eliminate any form of sample bias in order to get closer to the ‘truth.’ My guess is that the election analysts will eventually figure out where they went wrong. After all, they’ve got three years to work it out before the next presidential race starts. Heck, they probably started sloshing through all the data the day after the election!

One needs to realize that data science is just that, a science, and not something that can simply be stepped into without knowledge of the complexities of the discipline. Attempting to do so without the full understanding of sample bias, undocumented confounders and a host of other factors will lead you down the path to a wrong conclusion, aka ‘failure’. History has shown that, for ANY science, there are many failed experiments before a breakthrough. Laboratory scientists need to exercise caution and adhere to strict protocols to keep their work from getting ruined from outside contaminants. The same for data scientists who continually refine collection methods and models for experiments that fail.

So what about the ‘data science’ efforts for YOUR business? Are you trying to predict outcomes based on limited datasets and rudimentary Excel skills, then wondering why you can’t make any sense out of your analysis models? Do you need help identifying and eliminating sample bias, accounting for those pesky ‘undocumented confounders’? Social media sentiment analysis is a big buzz-word these days, with lots of potential for companies to mix this with their own performance metrics. But many just don’t know how to go about it, or are afraid of the cost.

At BlumShapiro Consulting, our team of consultants are constantly looking at the latest trends and technologies associated with data collection and analysis. Some of the same principles associated with election polling can be applied to your organization through predictive analytics and demand planning. Using Microsoft’s Azure framework we can quickly develop a prototype solution that can help take your organization’s data reporting and predicting to the next level.

About Todd: Todd Chittenden started his programming and reporting career with industrial maintenance applications in the late 1990’s. When SQL Server 2005 was introduced, he quickly became certified in Microsoft’s latest RDBMS technology and has added certifications over the years. He currently holds an MCSE in Business Intelligence. He has applied his knowledge of relational databases, data warehouses, business intelligence and analytics to a variety of projects for BlumShapiro since 2011. 

Data scientist

Three Steps to High Quality Master Data

shutterstock_173530310

Data quality is critical to business, because poor business data leads to poor operations and poor management decisions. For any business to succeed, especially now in this digital-first era, data is “the air your business needs to breathe”.  If leadership at your organization is starting to consider what digital transformation means to your business or industry – and how your business needs to evolve to thrive in these changing times, they will likely assess the current business and technology state. One of the most common outcomes management may observe is that the business systems are “outdated” and “need to be replaced”. As a result, many businesses resolve to replace legacy systems with modern business systems as part of their digital transformation strategy.

Digital Transformation Starts with Data

More than likely, those legacy systems did a terrible job with your business data. They often permitted numerous, incomplete master data records to be entered into the system. Now, you have customer records which aren’t really customers. The “Bill To’s” are “Sold-To’s”, the “Sold-To’s” are “Ship-To’s”, and the data won’t tell you which is which. You might even have international customers with all of their pertinent information in the NOTES section. Each system which shares customer master data with other systems contains just a small piece of the customer, not the complete record.

This may have been the way things were “always done” or departments made due with the systems available, but now it’s a much larger problem, because in order to transform itself, a business must leverage its data assets. It’s a significant problem when you consider all the data your legacy systems maintain. Parts, assets, locations, vendors, material, GL accounts: each suffer from different, slightly nuanced data quality problems. Now it hits you: your legacy systems have resulted in legacy data.  And as the old saying goes – “garbage in, garbage out.” In order to modernize your systems, you must first get a handle on data and your data practices.

Data Quality Processes

The data modernization process should begin with Master Data Management (MDM), because MDM can be an effective data quality improvement tool to launch your business’ Digital Transformation journey. Here’s how a data quality process works in MDM.

Data Validation – Master Data Management systems provide the ability to define data quality rules for the master data. You’ll want these rules to be robust — checking for completeness and accuracy. Once defined and applied, these rules highlight the gaps you have in your source data and anticipate problems which will present themselves when that master data is loaded into your shiny new modern business applications.

Data Standardization – Master Data thrives in a standardized world. Whether it is address standardization, ISO standardization, UPC standardization, DUNS standardization, standards assist greatly with the final step in the process.

Matching and Survivorship – If you have master data residing in more than one system, then your data quality process must consider the creation of a “golden record”. The golden record is the best, single representation of the master data, and it must be arrived at by matching similar records from heterogeneous systems and grouping them into clusters. Once these clusters are formed, a golden record emerges which contains the “survivors” from the source data. For example, the data from a CRM system may be the most authoritative source for location information, because service personnel are working in CRM regularly, but the AR system may have the best DUNS credit rating information.

Modernize Your Data and Modernize Your Business

BB Art

These three data quality processes result in a radical transformation in the quality of master data, laying the foundation for critical steps which follow. Whether or not your digital transformation involves system modernization, your journey requires clean, usable data. Digital transformation can improve your ability to engage with customers, but only if you have a complete view of who your customers are. Digital transformation can empower your employees, but only if your employees have accurate information about the core assets of the business. Digital transformation can help optimize operations, but only if management has can make informed data driven decisions. Finally, digital transformation can drive product innovation, but only if you know what your products can and cannot currently do.

Berry_Brian-240About Brian: Brian Berry leads the Microsoft Business Intelligence and Data Analytics practice at BlumShapiro. He has over 15 years of experience with information technology (IT), software design and consulting. Brian specializes in identifying business intelligence (BI) and data management solutions for upper mid-market manufacturing, distribution and retail firms in New England. He focuses on technologies which drive value in analytics: data integration, self-service BI, cloud computing and predictive analytics. 

Technology Talks Newsletter CTA

A Digital Transformation – From the Printing Press to Modern Data Reporting

shutterstock_190884689

Imagine producing, marketing and selling a product that has only a four-hour shelf life! After four hours, your product is no longer of much value or relevance to your primary consumer. After eight hours, you would be lucky to sell any of the day’s remaining stock. Within 24 hours, nobody is going to buy it; you have to start fresh the next morning. There is such a product line being produced, sold and consumed to millions of people around the world every day. And it’s probably more common than you think.

It’s the daily newspaper.

With such a tight production schedule, news printers have always been under the gun to be able to take the latest news stories and turn them into a finished printed product quickly. Mechanization and automation have pretty much made the production of the modern daily paper a non-event, but it has not always been that way.

150 years ago, the typesetter (someone who set your words, or ‘type’, into a printing press) was the key to getting your printed paper mass produced. With typesetters working faster than your competitors, you could get your product, your story, out to your consumers faster, gaining market share. However, it was still very much a manual process. In the late 1800s the stage was set for a faster method of setting type. One such machine, the Paige Compositor, was as big as a mini-van and had about 15,000 moving parts. (Samuel Clemens, a.k.a. Mark Twain, invested hundreds of thousands of dollars in the failed invention, leading to his financial ruin.) On a more personal scale and at the modern end of the spectrum, we think nothing of sending our finished work, perhaps the big annual report, off to the color printer or ‘office machine’, or upload it to a local printing vendor who will print, collate and bind the whole job for us in a fraction of the time it would take a typesetter to layout even the first page!

So why am I telling you all this? It’s certainly not for a history lesson. The point is that the printed news industry went through a transformation from nothing (monks with quill pens), to ‘mechanization’ (Gutenberg’s printing press), to ‘automation and finally to ‘digitalization.’ And, they had to do so as the news consumer evolved from wanting their printed subscription on a monthly basis, down to the weekly, to the daily and even to the ‘morning’ and ‘evening’ editions. Remember, after four hours, the product is going stale and just about useless. (We could debate whether the faster technologies was what drove news consumers to want information faster, or if the needs of the consumer inspired the advancements in technology, but we won’t.)

Data and reporting has followed the same phases of transformation, albeit not along a much accelerated time span. The modern data consumer is no longer satisfied with having to request a green-bar, tractor fed report from the mainframe, then wait overnight for the ‘job’ to get scheduled and run. They’re not even satisfied with receiving a morning email report with yesterday’s data, or even being able to get the latest analytics report from the server farm on demand. No, they want it now, they want it in hand (smart phones), and they want it concise and relevant. To satisfy this market, products are popping up that fill this need in today’s data reporting market. Products like Microsoft’s Power BI can deliver data quickly and efficiently and in the mobile format demanded due to the industry’s transformation to digital processing. Technologies in Microsoft’s Azure cloud services such as Stream Analytics, coupled with Big Data processing, Machine Learning and Event Hubs have the capabilities to push data in real time to Power BI. I’ll never forget the feeling of elation I had upon completing a simple real-time Azure solution that streamed data every few seconds from a portable temperature sensor in my hand to a Power BI Dashboard. It must have been something like Johannes Gutenberg felt after that first page rolled off his printing press.

Gutenberg and Clemens would be amazed at the printing technology available today to the everyday consumer, yet we seem to take it for granted. Having gone through some of the transformation phases with regard to information delivery myself (yes, I do in fact recall 11×17 green-bar tractor-fed reports) I tend to be amazed at what technologies are being developed these days. Eighteen months ago (an eon in technology life) the Apple watch and Power BI teamed up to deliver KPI’s right on the watch! What will we have in another eighteen months? I can’t wait to find out.

About Todd: Todd Chittenden started his programming and reporting career with industrial maintenance applications in the late 1990’s. When SQL Server 2005 was introduced, he quickly became certified in Microsoft’s latest RDBMS technology and has added certifications over the years. He currently holds an MCSE in Business Intelligence. He has applied his knowledge of relational databases, data warehouses, business intelligence and analytics to a variety of projects for BlumShapiro since 2011. 

Power BI Demo CTA