Archive for Brian Berry

Our 5 Rules of Data Science

In manufacturing, the better the raw materials, the better the product. The same goes for data science, where a team cannot be effective unless the raw materials of data science are available to them. In this realm, data is the raw material which produces a prediction. However, raw materials alone are not sufficient. Business people who oversee machine learning teams must demand that best practices be applied, otherwise investments in machine learning will produce dubious business results. These best practices can be summarized into our five rules of data science.

For the purpose of illustration, let’s assume the data science problem our team is working on is related to the predictive maintenance of equipment on a manufacturing floor. Our team is working on helping the firm predict equipment failure, so that operations can replace the equipment before it impacts the manufacturing process.

Our 5 Rules of Data Science

1. Have a Sharp Question

A sharp question is specific and unambiguous. Computers do not appreciate nuance. They are not able to classify events into yes/no buckets if the question is: “Is Component X ready to fail?” Nor does the question need to concern itself with causes. Computers do not ask why – they calculate probability based upon correlation. “Will component X overheat?” is a question posed by a human who believes that heat contributes to equipment failure. A better question is: “Will component X fail in the next 30 minutes?”

2. Measure at the Right Level

Supervised learning requires real examples from which a computer can learn. The data you use to produce a successful machine learning model must demonstrate cases where failure has occurred. It must also demonstrate examples where equipment continues to operate smoothly. We must be able to unambiguously identify events that were failure events, otherwise, we will not be able to train the machine learning model to classify data correctly.

3. Make Sure Your Data is Accurate

Did a failure really occur? If not, the machine learning model will not produce accurate results. Computers are naïve – they believe what we tell them. Data science teams should be more skeptical, particularly when they believe they have made a breakthrough discovery after months of false starts. Data science leaders should avoid getting caught up in the irrational exuberance of a model that appears to provide new insight. Like any scientific endeavor, test your assumptions, beginning with the accuracy and reliability of the observations you started with to create the model.

4. Make Sure Your Data is Connected

The data used to train your model may be anonymized, because factors that correlate closely to machine failure are measurements, not identifiers. However, once the model is ready to be used, the new data must be connected to the real world – otherwise, you will not be able to take action. If you have no central authoritative record of “things”, you may need to develop a master data management solution before your Internet of Things with predictive maintenance machine learning can yield value. Also, your response to a prediction should be connected. Once a prediction of failure has been obtained, management should already know what needs to happen – use insights to take swift action.

5. Make Sure You Have Enough Data

The accuracy of predictions improve with more data. Make sure you have sufficient examples of both positive and negative outcomes, otherwise it will be difficult to be certain that you are truly gaining information from the exercise.

The benefits of predictive maintenance, and other applications of machine learning, are being embraced by businesses everywhere. For some, the process may appear a bit mysterious, but it needn’t be. The goal is to create a model which, when fed real-life data, improves the decision making of the humans involved in the process. To achieve this, data science teams need the right data and the right business problem to solve. Management should work to ensure that these five questions are answered to their satisfaction before investing in data science activities.

Not sure if you have the right raw materials? Talk to BlumShapiro Consulting about your machine learning ambitions. Our technology team is building next generation predictive analytics solutions that connect to the Internet of Things. We are helping our clients along each step of their digital transformation journey.

Berry_Brian-240About Brian: Brian Berry leads the Microsoft Business Intelligence and Data Analytics practice at BlumShapiro. He has over 15 years of experience with information technology (IT), software design and consulting. Brian specializes in identifying business intelligence (BI) and data management solutions for upper mid-market manufacturing, distribution and retail firms in New England. He focuses on technologies which drive value in analytics: data integration, self-service BI, cloud computing and predictive analytics

Microsoft Announces Power BI Premium: Removes Functionality on Free Version

Many of our clients come to us looking for solutions to help them achieve “Business Intelligence for Everyone” in their organization while avoiding the pitfalls of reporting in Excel. Our response is simple: Microsoft Power BI is an easy-to-use, non-technical business intelligence tool which is far more robust than Microsoft Excel for reporting. End users who rely upon Excel for reporting often view Power BI as a logical step up. With Power BI, users can automate mundane data transformation steps, connect to a broad range of data sources and securely collaborate with colleagues  —all within an environment that looks and feels just like Excel. Our clients have reported that Power BI’s free edition includes enough functionality to get started on any reporting initiative, automate data extraction and transformation activities and share the results with a team of executives, analysts, managers and colleagues. However, as Power BI data and report volumes grow, organizations may choose to step up to Power BI Pro, which upgrades users from 1GB to 10GB of data and enables complex analytics sharing capabilities, even outside the organization.

Finding a Solution for Larger Organizations

The current Power BI service does present some challenges to larger, more sophisticated organizations. Some of the issues include:

  •   Sharing and collaboration features would often become complex and difficult to manage
  • Compute resources are shared, not dedicated, and there is no ability to provision additional compute resources
  • Structured reporting capabilities are not well suited for interactive reports and “single pane of glass” dashboards delivered in Power Bi

These issues begged for a simpler, more manageable model for large organizations.

Introducing Power BI Premium

In early May 2017, Microsoft announced its intention to introduce a new licensing level for Power BI, Power BI Premium. Power BI Premium is designed to address the shortcomings of Power BI Pro. Here are three things to know about Power BI Premium:

  1. Power BI Premium Edition will support Power BI Apps. Power BI Apps replace Content Packs and Power BI Embedded. Organizations that currently share Power BI content externally with Power BI Embedded should plan to migrate to Power BI Premium Edition.
  1. Power BI Premium Edition offers dedicated capacity for organizations that need more control. Instead of paying strictly per user, Power BI Premium is licensed on a combined capacity and usage model. This enables organizations who struggle with the per user data limits enforced on Free and Pro Edition users (1 GB and 10GB maximums, respectively) to load data models that are much larger. As with other Azure services, organizations can scale up and scale down capacity as their needs change.
  1. Power BI Premium Edition includes a license for Power BI Report Server—a full featured on-premises solution supporting both Power BI (interactive) reports and Reporting Services (paginated, structured) reports.

Important Note for Power BI Free Edition Users

Power BI Free Edition became quite attractive because many users within the same organization could share content without paying any fee. Unfortunately, Power BI Free Edition functionality will be changing soon. Users on the Free Edition will no longer be able to share dashboards with colleagues, other than by printing them out, or showing others their “personal dashboard” in a browser. As of June 1, users enjoying dashboard sharing will no longer be able to do so under the Free Edition.

June 1st is right around the corner, and some organizations have built fully functional company dashboards using Free Edition licenses. These organizations now face the prospect of having to either upgrade to Power BI Pro Edition ($10/user/month) or lose vital collaboration features. This is why Microsoft is offering a 1-year trial of Power BI Pro licenses to users who have previously signed up for Power BI Free Edition. This allows organizations to carefully consider which users need Power BI Pro for data model, report and dashboard creation and collaboration and which do not. Some organizations will stay on the Free Edition, and simply share their BI content via PowerPoint. Others will look at Power BI Pro or Premium licensing and continue to see value.

Next Steps

Microsoft has stated that general availability of Power BI Premium is on the horizon, but no specific release date has been communicated. If your organization has many users creating reports and dashboards with the Free Edition, here are some things you can do to get ready for the change.

  1. Take advantage of the 1-Year Power BI Pro trial – encourage users to respond to any email communication from Microsoft and take advantage of the grace period
  1. Download the Power BI Report Server and take it for a spin
  1. Review the Power BI Premium Calculator to understand what your costs would look like under the Power BI Premium model

For more information on how to achieve high performance analytics and reporting with Power BI, contact Brian Berry and our Data Analytics team at, or by phone at 860.570.6368.

Berry_Brian-240About Brian: Brian Berry leads the Microsoft Business Intelligence and Data Analytics practice at BlumShapiro. He has over 15 years of experience with information technology (IT), software design and consulting. Brian specializes in identifying business intelligence (BI) and data management solutions for upper mid-market manufacturing, distribution and retail firms in New England. He focuses on technologies which drive value in analytics: data integration, self-service BI, cloud computing and predictive analytics

Using Real Time Data Analytics and Visualization Tools to Drive Your Business Forward

Business leaders need timely information about the operations and profitability of the businesses they manage to help make informed decisions. But when information delivery is delayed, decision makers lose precious time to adjust and respond to changing market conditions, customer preferences, supplier issues or all three. When thinking about any business analytics solution, a critical question to ask is: how frequently can we (or should we) update the underlying data? Often, the first answer from the business stakeholders is “as frequently as possible.” The concept of “real time analytics,” with data being provided up-to-the minute, is usually quite attractive. But there may be some confusion about what this really means.

While the term real time analytics does refer to data which is frequently changing, it is not the same as simply refreshing data frequently. Traditional analytics packages which take advantage of data marts, data warehouses and data cubes are often collectively referred to as a Decision Support System (DSS). A DSS helps business analysts, management and ownership understand historical trends in their business, perform root cause analysis and enable strategic decisions. Whereas a DSS system aggregates and analyzes sales, costs and other transactions, a real time analytics system ingests and processes events. One can imagine a $25 million business recording 10,000 transactions a day. One can imagine that same business recording events on their website: login, searches, shopping cart adds, shopping card deletes, product image zoom events. If the business is 100% online, how many events would that be? The answer may astonish you.

Why Real Time Analytics?

DSS solutions answer questions such as “What was our net income last month?”, “What was our net income compared to the same month last year?” or “Which customers were most profitable last month?” Real time analytics answers questions such as “Is the customer experience positive right now?” or “How can we optimize this transaction right now?” In the retail industry, listening to social media channels to hear what customers are saying about their experience in your stores, can drive service level adjustments or pricing promotions. When that analysis is real-time, store managers can adjust that day for optimized profitability. Some examples:

  1. Social media sentiment analysis – addressing customer satisfaction concerns
  2. Eliminating business disruption costs with equipment maintenance analytics
  3. Promotion and marketing optimization with web and mobile analytics
  4. Product recommendations throughout the shopping experience, online or “brick and mortar”
  5. Improved health care services with real time patient health metrics from wearable technology

In today’s world, customers expect world class service. Implicit in that expectation is the assumption that companies with whom they do business “know them”, anticipate their needs and respond to them. That’s easy to say, but harder to execute. Companies who must meet that expectation need technology leaders to be aware of three concepts critical to making real time analytics a real thing.

The first is Internet of Things or IoT. The velocity and volume of data generated by mobile devices, social media, factory floor sensors, etc. is the basis for real time analytics. “Internet of Things” refers to devices or sensors which are connected to the internet, providing data about usage or simply their physical environment (where the device is powered on). Like social media and mobile devices, IoT sensors can generate enormous volumes of data very, very quickly – this is the “big data” phenomenon.

The second is Cloud Computing. The massive scale of IoT and big data can only be achieved with cloud scale data storage and cloud scale data processing. Unless your company’s name is Google, Amazon or Microsoft, you probably cannot keep up. So, to achieve real-time analytics, you must embrace cloud computing.

The third is Intelligent Systems. IBM’s “Watson” computer achieved a significant milestone by out-performing humans on Jeopardy. Since then, companies have been integrating artificial intelligence (AI) into large scale systems. AI in this sense is simply a mathematical model which calculates the probability that data represents something a human would recognize: a supplier disruption, a dissatisfied customer about to cancel their order, an equipment breakdown. Using real time data, machine learning models can recognize events which are about to occur. From there, they can automate a response, or raise an alert to the humans involved in the process. Intelligent systems help humans make nimble adjustments to improve the bottom line.

What technologies will my company need to make this happen?

From a technology perspective, a clear understanding of cloud computing is essential. When evaluating a cloud platform, CIO’s should look for breadth of capability and support for multiple frameworks. As a Microsoft Partner, BlumShapiro Consulting works with Microsoft Azure and its Cortana Intelligence platform. This gives our clients cloud scale, low cost and a wide variety of real time and big data processing options.

CIO Article 1

This diagram describes the Azure resources which comprise Cortana Intelligence. The most relevant resources for real time analytics are:

  1. Event Hubs ingest high velocity streaming data being sent by Event Providers (i.e. Sensors and Devices)
  2. Data Lake Store provide low cost cloud storage which no practical limits
  3. Stream Analytics perform in-flight processing of streaming data
  4. Machine Learning, or AzureML, supports the design, evaluation and integration of predictive models into the real-time pipeline
  5. Cognitive Services are out-of-the-box Artificial Intelligence services, addressing a broad range of common machine intelligence scenarios
  6. Power BI supports streaming datasets made visible in a dashboard context

Four Steps to Get Started with Real Time Analytics

Start with the Eye Candy – If you do not have a dashboard tool which supports real-time data streaming, consider solutions such as Power BI. Even if you are not ready to implement an IoT solution, Power BI makes any social media or customer marketing campaigns much more feasible. Power BI can be used to connect databases, data marts, data warehouses and data cubes, and is valuable as a dashboard and visualization tool for existing DSS systems. Without visualization, it will be very difficult to provide human insights and actions for any kind of data, slow or fast.

Get to the Cloud – Cloud storage costs and cloud processing scale are the only mechanisms by which real time analytics is economically feasible (for most companies). Learn how investing in technologies like Cloud Computing can really help move your business forward.

Embrace Machine Intelligence – To make intelligent systems a reality, you will need to understand machine learning technologies, if only at a high level. Historically, this has meant developing a team of data scientists, many of whom have PhD’s in Mathematics or Statistics, and open source tools like R or Python. Today, machine learning is much more accessible then it has ever been. AzureML helps to fast track both the evaluation and operationalization of predictive models.

Find the Real-Time Opportunity – As the technology leader in the organization, CIO’s will need to work closely with other business leaders to understand where real-time information can increase revenue, decrease costs or both. This may require imagination. Start with the question – what would we like to know faster? If we knew our customer was going to do this sooner, how would we respond? If we knew our equipment was going to fail sooner, how would we respond? If we knew there was an opportunity to sell more, how would we respond?

Berry_Brian-240About Brian: Brian Berry leads the Microsoft Business Intelligence and Data Analytics practice at BlumShapiro. He has over 15 years of experience with information technology (IT), software design and consulting. Brian specializes in identifying business intelligence (BI) and data management solutions for upper mid-market manufacturing, distribution and retail firms in New England. He focuses on technologies which drive value in analytics: data integration, self-service BI, cloud computing and predictive analytics

Power BI Demo CTA


4 Cost Saving DevOps Tools on Azure

Technology leaders need to pay attention to DevOps. Yes, it’s a funny little name. Wikipedia states that DevOps is a compound of “development” and “operations” before explaining it as “a culture, movement or practice that emphasizes the collaboration and communication of both software developers and other information-technology (IT) professionals while automating the process of software delivery and infrastructure changes.”

Technology professionals know that identifying, tracking and resolving bugs costs money. If you are the one writing the software (and sooner or later, everyone will), the bugs are on your dime. Good testing practices can help minimize bugs and costs. However, sometimes bugs result from deployment practices. Indeed, the best technology operations focus on standardized, automated testing and release management practices. By DevOps best practices, software teams treat software deliverables the way a manufacturing company treats finished goods – ruthlessly eliminating deviations with automation.

If you have tried and failed to create innovative solutions within your company by writing software, there could be several reasons why that happened. If you think you got the requirements right, and think the architecture was right, and your software developers understand the technology, then examine the process of delivering the software to the users.

Delivering Software Cost Effectively

The concept behind DevOps has been known as Continuous Integration (CI), Application Lifecycle Management (ALM) and by other names. Often, IT departments found ALM complex, or did not have the knowledge required to design a pipeline for software development. But, the tools have continued to evolve, and the processes have simplified. Today, Cloud vendors deliver DevOps services to technology professionals which are very hard to dismiss. Among the very best is Microsoft’s Azure platform. Microsoft Azure provides many tools for standardizing, testing and delivering high quality software.

Here are my four favorites:

Azure Resource Management (ARM) templates

Azure Resource Management templates are JSON documents which can be used to describe a complete set of Azure services. These documents can be saved and managed by IT operations personnel. This highlights a key cloud computing value proposition: the cloud offers technology as a “standard service” and each service can be encapsulated to be brought up and down as needed.

ARM templates can describe Infrastructure-as-a-Service offerings (i.e. Virtual Machines, Networks and Storage). This enables Dev / Test Labs to be designed, templated, deployed and undeployed as needed. Technology teams which must plan for an upgrade by providing a test environment no longer need to buy infrastructure to support a virtual environment. Instead, they can define the environment as an ARM. Azure allows you to build the environment once, extract the ARM template for later use, and then destroy the resources.

ARM templates can describe Platform-as-a-Service offerings (i.e. Websites, Services, Databases). This enables the exact same concept, with even better results. In the end, you don’t even have any servers to manage or patch: the underlying infrastructure is standardized. This brings me to Deployment Slots.

Deployment Slots

A common best practice in delivering software is to have at least one Quality Assurance (QA) environment. This shadow environment should replicate production as closely as possible. However. in the PaaS world, we don’t have control of the underlying infrastructure – that’s great, it’s standardized and we want to keep it that way. But we don’t want to abandon the practice of performing final testing before deploying to production.

With deployment slots, we get the ability to create a number of “environments” for our applications and services, then switch them back and forth as needed. Let’s say you have a new software release which you want to ensure passes some tests before releasing to the user community. Simply create a slot called “Staging” for deployment, perform your tests, then switch to production.

azure deployment

Uh oh – we missed something. We’re human after all. Users are reporting bugs and they liked it better the way we had it. Switch it back – no harm no foul.

Deployment Azure 2

There are some important things to consider before adding Deployment Slots to your DevOps pipeline. For example, if your application relies upon a database of some kind, you may need to provision staging copy for your tests. You also need to be aware that Connection Strings are one of the configuration values which can switch with the slot, unless configured to do otherwise.

Deploy to Azure

I was recently treated to some excellent material on the Cortana Analytics Suite of products. Paying close attention (as I sometimes do), I noticed that the lab environment was prepared for me as an ARM template. I was directed to GitHub (an online public software repository) and told to push the button marked “Deploy to Azure”. When I did, I was brought to – and the URL included a reference to the GitHub location, or repository, which I had just visited. The author of the software had placed an ARM template describing the entire lab environment, and included a few parameters so that I could fill in the information from my Azure subscription. 20 minutes later, I had Machine Learning, Hadoop/Spark, Data Factory and Power BI resources at my fingertips. Later in the day, we did deployed again, this time deploying a simple Web app which consumed Advanced Analytics services. When I was finished, I simply deleted the resources – the entire day cost me less than $20 of Azure consumption costs. Deploying an app has never been easier.

Azure Container Services

No discussion of DevOps would be complete without mentioning Docker. Docker is a platform gaining popularity among developers and IT operations for its consistency with Virtual Machines and lower overhead. Essentially, Docker runs as a subsystem which hosts containers. A container is similar in functionality to ARM.

Azure Container Services 1

Azure Container Services 2








DevOps Tools on Azure

Linux or Windows, Open Source or Closed, Infrastructure or Platform, TFS or GitHub. None of that matters anymore. No more excuses – Microsoft Azure provides outstanding DevOps tooling for Modern Application Development. If you have not deployed your first application to Azure, let’s talk. We can get you optimized quickly

Berry_Brian-240About Brian: Brian Berry leads the Microsoft Business Intelligence and Data Analytics practice at BlumShapiro. He has over 15 years of experience with information technology (IT), software design and consulting. Brian specializes in identifying business intelligence (BI) and data management solutions for upper mid-market manufacturing, distribution and retail firms in New England. He focuses on technologies which drive value in analytics: data integration, self-service BI, cloud computing and predictive analytics

Technology Talks Newsletter CTA