Developing a data collection process and documenting

When building out your business intelligence solution an important step of developing data collection processes and documenting those processes is critical to your business and its success. Why develop a data collection process? Not only will creating a data collection process standardize the way you collect data for all the groups in your business but it will ensure the integrity of your data is kept high. Processes will ensure that your actions are repeatable, reproducible, accurate, and stable. Think about it: if a business had an employee that was collecting critical data on the business and the business had no idea how it was being collected and that employee left, that would have some impact on the business. Would the business be able to figure out how the employee was doing this but after how long? At what cost to the business? Could there be repercussions? Ensuring processes are in place for your data collection will improve the likelihood that the data is secure, available, clean and the measurements derived from that data will help the business now and in the future.

There are many reasons why every business should be documenting the data collection process. If you are then documenting your processes becomes transparent and your data becomes comprehensible in the future for yourself and others. Your documentation of each of the data collection process should include:

• A good description behind the data that is being collected.
• Provide all the answers: the who, what, why, where and how of the data.
• Provide any conditions of the use of the data as well as the confidentiality.
• Provide any history around the project for collecting this data

Python – Unable to load psycopg2

Well this was very frustrating to fix “Unable to load psycopg2” but I wanted to share my findings with everyone so they will not go through the ordeal that I did.

I installed a new version of python to my work station, updated my python code to support that new version of python. After executing the python script I received the following error message “Library not loaded: @loader_path/../lib/libssl.1.0.0.dylib”. After much searching I found several sites that helped me figure out the issue. So what I figured out is that if I linked the files from /usr/local/lib you don’t get that error message.

You might have to edit the below command line items to your paths but this should give you a pretty good idea on what to do

cd /usr/local/lib
sudo ln -s /Library/PostgreSQL/9.6/lib/libssl.1.0.0.dylib
sudo ln -s /Library/PostgreSQL/9.6/lib/libcrypto.1.0.0.dylib
sudo ln -s /Library/PostgreSQL/9.6/lib/libpq.5.dylib

Who and what does no business intelligence solution impact

Having no business intelligence solution in place impacts many things and many people. It impacts your bottom line, it impacts your customers, it impacts your employees, and it impacts the understanding of your business right now and in the future.

Your bottom line

With having no business intelligence solution in place, a business cannot see the full picture on what is happening. The business cannot see where all its money is being spent, it cannot see what areas are most inefficient, they cannot see what area they are most efficient in and why and then drive those efficiencies to other areas of their business. They slow down and prevent the business from making real-time data driven decisions. Without seeing the big picture, the business can be limited or just wrong on actions that are taken that they believe are good for the business. All of these reasons as well as many more impacts your business’s bottom line.

Your customers

Having no business intelligence solution does impact your customer. I get a lot of feedback on this one and it is usually from the people that do have the silos of data running within their business. They want to feel better by giving me excuses on how not having a business intelligence solution does not impact their customers. I always tell people that if you cannot get a full picture, a full understanding of the journey a customer takes from the start of the relationship to the end of their relationship within your business you sir or mam will never see its full potential. With so many touch points that a customer could have with a business you have the potential of creating many silos of data. These silos can be in reference to marketing, selling customers additional services, interaction with customer support and even the customer on-boarding process. All these systems and any others should be integrated in order for you to get a clear understanding of the experiences your customers will go through. The silos of data, the lack of the single source of truth will hamper that understanding and the business will never understand the customer’s journey. Picture this: a stack chart and on the X axis you have date values representing the last two weeks, and on the Y axis you have a numeric representation of hours. Within the charting you have multiple columns (different colors) representing different touch points that a new customer had to take in order to become a customer and then received their service from your business. Within seconds you can see where you can improve the process and where the process is working. That is just one powerful reason on how business intelligence can impact your customers and having a single source of truth can help your business.

Your employees

How does no business intelligence solution impact your employees? Without a business intelligence solution, you have silo’s that can affect employees when other departments within your business do not wish to share their data. Think about how many times have you heard or have been involved when a department within a business has identified a problem but cannot do anything about it. I have seen a business identify a problem and could not take the appropriate measures to correct that problem because of silos of data. This can be corrected by doing several things. First, leadership must create a unified front and be creative and tactical in their approach. Work towards a common goal. I know each department has its own responsibilities but the business should have one shared vision — the business’s mission statement. Create the data governance group because that will encourage collaboration, build repeatable processes, share measures across the business, and the group will act as one team pushing to that common goal. Most humans instinctively will get behind a common goal and will feel more united when they can share the same measure of excellence to be obtained with the person next to them.

Descriptive, Predictive and Prescriptive Analytics

What I have been seeing with all my clients over the last three years is them trying to get their arms around their data, cleaning it, gathering it into a central location which then they typically create dashboards and reports to see how their business did in the past but some are looking at how they are doing right now. So, the way most of my clients are looking at their data is called descriptive. Descriptive data analysis gives businesses insight into the past. Descriptive looks at the data, summarizes the data and then interprets that data into human readable format to give us analytics of the past. The vast majority of the statistics we use fall into this category. (Think basic arithmetic like sums, averages, percent changes). Most often, the underlying data is an aggregate or count of a filtered column of data to which basic math is applied. For all practical purposes, there are an infinite number of these statistics. Descriptive statistics are useful to show things like total stock in inventory, average dollars spent per customer and year over year, or even change in sales.

When I talk about Predictive data analysis I am looking to understand the future. Predictive analytics want to look at the data and then predict what can happen in the future. Predictive analytics want to give actionable information to its owner on what could be coming. Currently there is no predictive data analysis that can give you with a 100 percent accuracy on what the future holds. A business should take and read the results on what might happen in the future and decide on the path based on that knowledge.

These two statistics — descriptive and predictive — try to take the data that you have, and fill in the missing data with best guesses. They combine historical data found in CRM, ERP, HR and POS systems to identify patterns in the data and apply statistical models and algorithms to capture relationships between various data sets. Businesses use predictive statistics and analytics anytime they want to look into the future. Predictive analytics can be used throughout the organization from forecasting customer behavior and purchasing patterns to identifying trends in sales activities. These statistics also help to forecast demand for inputs from the supply chain, operations and inventory.

The last analytic option we will talk about is prescriptive data analytics. Prescriptive data analytics is when you want to be guided on all the possible outcomes. The relatively new field of prescriptive analytics allows users to “prescribe” a number of separate actions to and direct them towards a solution. These analytics are all about providing direction. Prescriptive analytics attempts to quantify the effect of future decisions in order to advise on all the possible outcomes before the decisions are actually made. When prescriptive analytics are at their best it will help predict not only what will happen, but also why it will happen providing recommendations regarding actions that will take advantage of the predictions. With this type of decision analytics, support business should feel comfortable with the actions that they need to take, either staying the course or pivoting to right the ship.

Which analytics does your business need? Does your business need descriptive, predictive and prescriptive data analytics? I believe in order to answer that question the business needs to know how advanced of a business intelligence solution it needs in order to be successful. In understanding how each descriptive, predictive and prescriptive and what questions they can answer for the business will drive the business to implement a simple or more complex business intelligence solution. One piece of advice that I would like to give here is start off with the simple solution and once that solution is providing the information you need, then enhance your business intelligence into a more and more complex solution. I believe taking this approach will give you a much higher success rate of implementing your business intelligence solution as well as a higher user adaption.

To quickly summarize the last three paragraphs, descriptive as we know answers the question of the how it looks at data in the past. We also reviewed predictive where we talked about how it will most likely answer questions on how something might happen. And lastly prescriptive will give you answers to questions on what actions can happen. Depending on your business goals and what answers you need from your data, the decision on if you need descriptive, predictive and prescriptive data analytics is very personal to you and the business.

I think it is important to show you the different levels of human input to draw conclusions from descriptive, predictive and prescriptive as well how each analytic area answers which questions. This will give you a good sense of employee time that will be needed depending on the way you will be looking at your data

One reason why more businesses are not building a Business Intelligence Solution

I have been asked many times by my clients and by peers why do you think more business are not implementing a business intelligence solution. It is a great question and I am sure many executives, directors, and managers have discussed this in reference to their own businesses and have their own reasons why a business intelligence solution never got started.

One of many reasons is there is no understanding of data that they are collecting. I have found that some of the businesses that I have dealt with do not understand what data they are collecting, why are they collecting that data, where that data is being stored, and when they are collecting it. To the what data, before business start to collect any data from any source (ERP, CRM, financial, web site, etc), I think they have to ask themselves are they collecting the right data? Remember that data can and should be analyzed to make sure your business is not only going in the right direction but help maintain that direction in the future and shift or pivot if need be. Asking solid questions is one of the primary ways of collecting data. As author Edward Hodnett noted, “If you don’t ask the right questions, you don’t get the right answers.” Questions asked in the right way often point to its own answers. Asking questions is the A-B-C of diagnosis. “Only the inquiring mind solves problems.” You are probably saying to yourself that is easy. Well it is easy but like everything else if you do not keep practicing your skills in asking questions you will never get better.

Like most businesses their primary business is most likely not collecting data. Sure, great questions are being asked and are being pinpointed to the problems that your business is trying to solve. You will find your data will be more useful and meaningful to what the mission the business is trying to accomplish as well deriving actions from a business intelligence solution. Identifying all the reasons why a business needs data upfront will not only help with the reporting of that data but significantly with the data collection. If you understand fully all the factors such as location, time, internal and external resources, vendors, etc. you will have the WHY knowledge of the data points that are important to the business and the data you will be collecting will be extremely valuable. The last thing that a business wants to do is to repeat the data collection process over and over because they have failed to think about the data factors in the beginning. Not only will this add time to obtaining valuable information but the data that you have been collecting might have pointed the business in the wrong direction.

Once you have defined the factors, developed the questions, and you are now collecting that data, where are you going to store it? Say you are a business that is collecting data from multiple applications which are important to your business such as custom web application, a ticketing system that helps customer service, and you have some sort of external data coming into a SFTP server. Each one of these applications are storing data in their own way whether it is contained in a data repository or in a flat file. Yes you can extract information from each one and get some type of analysis out of each source of data that can help the business but think about the insight you can get if you could join the data. Each one of these applications or process is a silo and does not know the other exists. Let’s take for example that your custom web application is used to manage projects and crews. Then you have data feeds from an external data source that has data from vendors related to this project. Your project manager looks at one set of data points from the web application and another set of data points in a spreadsheet application. That project manager then has to merge the data manually in some way like a spreadsheet. Moving the data is usually done by copying and pasting the web application data into another tab of the spreadsheet that they are viewing the vendor information from. Once they are done with that they now create another tab to merge the data points they want to see together. Then they will create another tab for graphs and calculations. Think about all the disparate source systems that you have in your network and then picture yourself repeating the above process in spreadsheets over and over. It is error prone, it is not scalable, and it is unmanageable. If that information is invaluable to the business then you need to kick start your business intelligence solution and then watch your business get meaning from that information with one simple point and click of a mouse. I can tell you one thing that a business does not want and that is having each one of their employees being their own data collector (their own single source of truth) because the business as a whole will not be able to identify any of the areas of the business’ shortcomings as well as the areas that they are excelling. The business is flying blind and are making decisions without seeing the full picture or the wrong picture.

SQL Right Joins

RIGHT JOIN Keyword

Since I did LEFT JOIN the other day, I figured I might as well do a blog on the Right Join to keep it fair.

Define


In SQL the RIGHT JOIN keyword will return all records from the right table (table b), and the matched records from the left table (table a). The result is NULL from the left side, if there is no match.

RIGHT JOIN Syntax

SELECT column_names(s)
FROM table_a
RIGHT JOIN table_b ON table_b.column_name = table_a.column_name

Instructional


In this instruction we will use two tables to demonstrate using RIGHT JOIN within a SQL statement. Lets first define two database tables as well as include the data within those tables.

User table

Id Name Address Zip_Code
5 Bob Vance 45 Geear St 12345
6 Dwight Schrute 55 Sweet St 65565
7 Jim Halbret 65 Nice Rd 43455

Order Table

Id User_Id Employee_Id Order_Date
3 5 67 2019-01-01
45 15 89 2019-01-01
55 16 34 2019-01-01

Example


The following SQL statement will select all users, and any orders they might have.

SQL Demo

SELECT User.Name, Order.Id
FROM Order
RIGHT JOIN User ON Order.User_Id = User.Id
ORDER BY User.Name DESC;

SQL Left Joins

LEFT JOIN Keyword

I wanted to write a quick blog on LEFT JOIN because I get so many questions about this topic from young DB Admin’s and developers as well as Business Managers.

Define


In SQL the LEFT JOIN keyword will return all records from the left table (table a), and the matched records from the right table (table b). The result is NULL from the right side, if there is no match.

LEFT JOIN Syntax

SELECT column_names(s)
FROM table_a
LEFT JOIN table_b ON table_a.column_name = tableb.column_name

Instructional


In this instruction we will use two tables to demonstrate using LEFT JOIN within a SQL statement. Lets first define two database tables as well as include the data within those tables.

User table

Id Name Address Zip_Code
5 Bob Vance 45 Geear St 12345
6 Dwight Schrute 55 Sweet St 65565
7 Jim Halbret 65 Nice Rd 43455

Order Table

Id User_Id Employee_Id Order_Date
3 5 67 2019-01-01
45 15 89 2019-01-01
55 16 34 2019-01-01

Example


The following SQL statement will select all users, and any orders they might have.

SQL Demo

SELECT User.Name, Order.Id
FROM User
LEFT JOIN Order ON User.Id = Order.User_Id
ORDER BY User.Name DESC;

Your Single Source of Truth

Purchase The Kindle Edition
Purchase The Paperback Edition
Your Single Source of Truth is a quick-read for busy business and IT professionals struggling to create a Business Intelligence solution. Packed with advice, proven methods, and real-world uses cases, this book provides the knowledge to get you not only started but to keep your Business Intelligence solution going.

This book is intended to help you understand how a business can deal with their epidemic data problems and see a bigger clearer picture from the data they are collecting. There are mountains of data being collected in many different departments each with their own transactional system (silos). And each silo is not being joined to give a bigger and clearer picture to the business. This is a data centric world and businesses are collecting and saving data at an enormous rate but most are doing nothing with that data. They are not learning from the data and not making actionable and informed decisions from the data.

Business Intelligence and silos of data is not just a small business issue — it’s an issue that all different size businesses are facing and are having problems getting their arms around. Whether it is lack of resources, low priority, or a lack of understanding that there is a problem. I believe if I can explain the issue, analyze it and point companies in the direction in solving their Business Intelligence issues then I would get to see many businesses grow and flourish. I want to help businesses answer those questions that I believe every business wants to answer: How is my business doing right now? How is my business doing compared to how it did in the past? Are all my areas of my business performing well? Which areas can have better efficiencies? What are my customers thinking and how can I better serve them? This is just a very small sample of questions that I know a business intelligence solution can help businesses answer and this book will help get you started.

AWS Layers: How to include dependencies with your AWS Lambda Python Function

I wanted to write a quick how to blog in reference to show someone how to include dependencies with their python AWS Serverless function.

The Use Case

You have been assigned a project and you need to create a Lambda Serverless function in python that needs to execute a stored procedure or a SQL statement against a RDS Postgres Database. You need this function to be ran on a time schedule and it has to interact with the Postgres database.


What is AWS Layers

AWS has implemented some nice functionality called layers for us to easily include dependencies that our python script needs. Layers allow you to configure your Lambda function to pull in additional code and content in the form of layers. A layer is a ZIP archive that contains libraries, a custom runtime, or other dependencies. With layers, you can use libraries in your function without needing to include them in your deployment package. This allows your deployment package to be smaller.

Prerequisite: You have gone through the steps of writing and saving your Lambda function written in python you now want to add the psycopg2 library for your code to use.

Step one: Download the psycopg2 from https://github.com/jkehler/awslambda-psycopg2

Step two: Create a directory name python and put the psycopg2 folder inside the newly created python folder

Step three: Zip up the python directory that you created in step two

Step four: Save your function and then go to the Lambda home screen and click on “Layers”

Step five: Click the “Create Layer” button located at the top right part of the AWS console

Step six: Layer Configuration. Give your Layer a unique name. Enter a description of your layer. Click the “Upload” button and browse to your python zip file and select. Choose the Runtime version that you want. Click the “Create” button.

Step seven: Now that your layer is created time to setup in your function. Go back to the Lambda home screen and select functions. Choose the function that you created and has the dependency for psycopg2.

Step eight: Select the Layers box right under the name of your function

Step nine: Select the “Add Layer” button

Step ten: Select the Layer your created from the “layer” dropdown. Select the version from the “Version” dropdown. Click the “Add” Button.

You have now added a layer to your Lambda python function and you can go ahead and test it. Hope this helps.

Being a data driven business

Once you start to understand your data and the information is available throughout the business, a whole new world will open up to everyone that has access to the data. You will be able to see not only how the business is doing as a whole, but how each department of the business is doing as well as how employees are doing personally. With the same information (your single source of truth) being accessed across your business and the same story being told to everyone, not only is that powerful because you now have everyone seeing and reading the same data but the entire business is now moving in the same direction. To keep the business moving all in the same direction, to share a common goal, and to make sure all departments are doing what they need to do is extremely important because this will help drive up revenue and this will lower cost by seeing which areas within departments can be enhanced. With all the great information that you start to see, do not ever be satisfied with what your data is telling you. I think the most important thing that a business can do once the business intelligence solution is in place is not stop asking questions around the data. I know I have had many people state to me if you understand data then you can make smart decisions. Understanding the data is really beneficial and really helpful in guiding your business at that moment, I am talking about not being satisfied with all the results that you are getting now from the information the business is seeing. What I am stating is for you and others to keep asking the questions that I have suggested below. You might remember these questions from elementary school in regards to a English class. By asking these questions, you will keep moving the business, the department and the employees in the right direction.

Who: Who is the data about? Who is collecting the data? Collecting data across multiple who’s will give you a firm grasp of what is going on across multiple who’s and not a single who.

What: Of course you want to know what your data is about. Know what makes up the data and know what makes up the numbers behind the data.

When: Have a firm understanding of when your data has been or is being collected. Most data that is being collected is linked to a timeframe and knowing the time of your data is important to what decisions you make now or in the future.

Why: Knowing why the data has been collected can help you understand what needs to be shown to help the business. The constant asking of why this data is being collected will keep the data in check to make sure it has meaning and not tied to some crazy agenda that someone has gone of on. If it is tied to an agenda be cautious and keep asking questions.

How: The last question is how. How was this data collected? If you are purchasing or using an external data file you need to know how this data was collected and/or aggregated. Do you feel comfortable with the data, do you trust it?