Looking at your data in different ways can help your business

You can look at your data many different ways and it will help your business to look at that data in as many different ways as possible and not just one. We will cover some of the basics here and hopefully it will get your mind thinking about what data you currently have and how you can look at it. By looking at your data in different ways or different perspectives you can obtain different business values.


You can group your data. What I mean by grouping is when you have rows and rows of data and you want to bring that data together then grouping logical can help you. Take for instance you have eight to ten sales reps making calls to physicians daily trying to sell them their pharmaceutical drugs. You can group all that data by day for a giving week to show what days the sales people are making the most successful sales calls.


Looking at your data granular is the processing of driving further down into the details of the data. Let’s take the above example of showing successful sales people calls for a giving week. Well we know that Bob on the sales team has be crushing it since he joined, but looking at the data for daily sales calls he and the other sells team members are placing about the same amount of calls Monday through Friday. Let’s get a little more granular and drill down into each of the days calls and see what time he is making his calls. Now the data is showing us that Bob is making all his calls either early in the morning or late afternoon and all the other sales people are consistently calling in between Bob’s hours. Deriving this types of answers and drilling into the data will give you more answers that will help you guide others in the business to be better.


One of the most important ways to look at your data is seeing it in a visual concept or chart. I feel this is one of the most important ways for your data to tell it’s story. One of the biggest reasons visualizing your data is so important is because people are so visual and seeing data in row or tabular form to understand the data can be difficult and sometimes mind numbing. Showing a data story through a picture (chart and/or graph) allows people to understand the meaning quicker. If there was a significant increase or decrease in products sales or customer service calls seeing that spike in a graph will be quick and easy. If there is bad data in your warehouse a picture will definitely tell that story and you will be able to visually see that very quickly. Data visualization tools have greatly changed how businesses have worked with their data as well how businesses have dug deeper into their data gaining valuable insight. So make sure you ask the question “Can this data be visually representing?” and if the answer is yes build out the data visualization and share it with the business. 

Data Relationship

Can my data be related? I think this question not only does not get asked enough, but it is one of the most powerful ways to look at your data. If your data can be related to other data how strong is that relationship? Even though asking about data relationships is one of the most basic questions in looking at your data it is the most overlooked and the most informative. Let’s look at a simple example: You have a subscription form on your website that allows potential customers to fill out and subscribe to your newsletter. You have two sets of data, one set being the customers information and the other being website. If you look at each data separately you can know who the customer is, from the form information. Looking now at just the website data you can see how an unknown person entered your web site and walked it (What I mean by walking, is what web pages did they visit on your web site). Now let’s join the data (the relationship), Once we do that we know who the person is, how they got to the web site, what they did on the web site, what page they came from before filling out the form, and then what they did after filling out the web form. Now you have a complete picture and you can make some pretty good assumptions about the user. Powerful right. Relating your data will take time and thought but trust me it is definitely worth the work.

These are just some of the ways that you can look at your data and get answers to questions that have been asked and possibly questions that have not been asked yet. Never stopping asking questions and never stop looking at data in different ways.

Building out a Business Intelligence Solution? Make sure you document your data collection process

The first step of building out a business intelligence solution is that you have identify all the data sources within the business. It is right after you identify the datasources is that you want to develop  and document all the data collection processes. Why develop a data collection process? Not only will creating a data collection process standardize the way you collect data for all the groups from within your business but it will ensure the integrity of your data is kept very high. Processes will ensure that your actions are repeatable, reproducible, accurate, and stable. Think about it if a business had an employee that was collecting critical data for the business and the business had no idea how it was being collected and that employee left, well I think that would have some impact on the business. The business most likely will be able to figure out how the employee was doing this but after how long? At what cost to the business? Could there be repercussions? I think you are understanding the point I am trying to make.  Ensuring processes are in place for your data collection will improve the likelihood that the data is secure, available, clean and the measurements derived from that data will help the business in the now and in the future.

There are many reasons why every business should be documenting the data collection process. If you are documenting, your processes become transparent and your data becomes comprehensible in the future for yourself and others. Your documentation of each of the data collection process should include:

  • Give a good description behind the data that is being collected.
  • Provide all the answers the who, what, why, where and how of the data.
  • Provide any conditions of the use of the data as well as the confidentiality.
  • Provide any history around the project for collecting this data
  • What are the measures and/or reports that will be built from this data

Even if you currently have a business intelligence solution in place but do not have all your data collection processes documented, go back and start getting them documented as soon as possible.

Advantages of a data warehouse in the AWS cloud

There are several advantages of why I believe you should go with a data warehouse living in the cloud and not on premise. I know everyone has heard all the news and excitement around what AWS has been doing around their data warehouse offerings. AWS has been one of the leaders in the Cloud industry for a while and the way they are continually updating their existing products as well as releasing new cloud products is awesome. Lets now go over the advantages that I see a cloud warehouse has over on-premise.

Speed of implementation: With a cloud solution your server can be up and running within fifteen minutes. For the on-premise you will have to order a server, rack the server, install the software, apply patches, put the server on the network, and all that can take several weeks to months from ordering to your warehouse is live and ready.

Flexibility and scalability: Let’s say in a year or two that your company no longer wants a data warehouse, with the cloud solution you just terminate it and now you are no longer paying for those resources. With the on-premise you have to figure out on how to recoup the capital expense from those purchased server and/or servers. Scalability is what the cloud is made for. You want another server you spin it up. With the on-premise you have to plan accordingly and go through the entire process again order a server, racking the server, install the software, apply patches, putting the server on the network.

Ongoing cost: Hardware usually has a three year life span and then you have to replace the server. Along with replacing the server you will have to plan the migration of the warehouse and then migrate. All of that adds up the cost as well. In the cloud you are not responsible for the hardware so this is all handled without your involvement you can focus on the warehouse schema and the data which is the most important pieces of the business intelligence solution.

Security: All cloud based providers are constantly checking the security of their infrastructure with teams of people that try to compromise the system. To go long with that, infrastructure physical security is very tight around their data centers usually taking two to three methods of identification before being allowed to enter the facility.

I think it is safe to say that I am a little biased when it comes to cloud solutions and what they can offer to businesses. That is not to say that I do not rule out on-premise infrastructures. There are requirements and situations that the business really needs to look hard at and make the decision that on-premise is the best way to go for them.


Seeing the picture

Join Goggle and marketing data to get big picture of your prospects. The why and how

The Why:

When you look at each silo of data individually you do not get a complete picture of what a visitor or prospect is interested in, you only see what that one piece of the puzzle is showing you. You can get good information from your Google Analytics data (silo) when you look at it by itself and you can get good information from your marketing data (silo) when you look at that data by itself. What I am writing about in this blog is bringing those silos of data together to give you more insight into a potential customers journey, like when you put pieces of a puzzle together you see the picture clearer. Lets say that your business is utilizing Pardot as its marketing tool to not only manage email campaigns that are guiding potential customers to landing pages that are hosted on your web site but you are also gather customer information from forms that are being embedded within your business’s website. Wouldn’t it be great to see what content that prospect was looking at prior to and after they filled out a form on your website or where did they navigate to after getting to that landing page from one of your email campaigns? You have all the information in Pardot first name, last name, email address, etc and now you would be able to gather additional intelligence about them by what they are viewing within your business’s web site. You could even use that information to send out more direct campaigns based on the new knowledge that you have obtained from joining the data. For instance the prospect was on your website for 5 minutes and they visited the integration, business analytics, and warehouse pages. I know what I want to send to that prospect now.


The How: 

You have to join that data somewhere so lets bring that data into a warehouse. I used the CopperHill AIR Platform to run all the jobs that I created with Talend to move the data from Google and Pardot into the AIR Warehouse. Lets start with the steps I took with Google first. I opened Google Query Explorer which is a tool that allows you to play with the Core Reporting API by building queries to get data from your Google Analytics views. This tool allows you to set different metrics, dimensions, sorts, filters and segments so that you can see the end results before applying them into your data integration tool. Once I got the queries working just as I wanted them to and the data looked good, I move those exact settings inside my Talend tool. With Talend I had to install the Google Analytics Component as it is not part of the base install. The component is free and very easy to install. Once I setup the component and deployed it to the AIR Platform I was immediately streaming data. Below is just an example of one call to Google’s Core Reporting API within the job. I repeated the same steps for each call I wanted to make to Google Analytics API and all the data transfers were handled inside that one job. Channel Data

The next step was to get Pardot data flowing. Pardot does have a great API that you can access and read your marketing data from. Again I used the AIR Platform to run the job that I created in Talend to move this data out of Pardot and into the AIR warehouse. Let me say this, you do have to have knowledge of the Pardot API or creating this job can be a little tricky. What I did was setup a REST call to each of the Pardot objects that I wanted to receive data from and store within the AIR warehouse. I have attached a portion of the job I configured to retrieve data from the Pardot campaign object. As you can see the setup and configuration is a little more complex then the call to the Google API but the data is flowing just as fast.

Now that all the data from the two sources Google and Pardot are streaming into the AIR warehouse we can create the reports showing exactly what we stated in The Why section. This thought process and build out can be applied to really any marketing tool that a business is using and join that to your Google Analytics data. If you want to gather great actionable information around your prospects then joining google and marketing data is definitely a great idea for you and your business. Finish the data puzzle so you can see the big picture and just do not look at the pieces by themselves.

Data warehouse in the Cloud versus On-premise

There are a lot of misconceived notions that cloud solutions have speed constraints leading to latency and have huge security vulnerabilities. These thoughts and some others have many companies move the cloud solution option to the bottom of the pile and that is simple not a good option for the business. To address the first notion that cloud solutions have speed constraints leading to latency, I think is misleading when people state that. If your warehouse and your visualization tool are all in the same cloud infrastructure then I do not think latency will be an issue at all. It is when your warehouse is in the cloud and visualization tool is hosted on premise is when you can see a little latency if any at all. Both AWS and Azure offer tools to support high speed on-premise to cloud technologies. AWS calls its tool Direct Connect and Azure call theirs ExpressRoute. Direct Connect and ExpressRoute  are essentially private cloud connections that offer faster speeds, more reliability, lower latencies, and higher security than standard Internet connections. These tools essentially connect your network (private) directly to your cloud provider without crossing the public internet.  The second notion in reference to security vulnerabilities. Lets address physical security first. AWS and Microsoft have invested a lot of money into the physical security around protecting their infrastructure and it greatly exceeds any normal IT environment. Now for data security that is something that you have complete control of. Yes it is a little more complex than your traditional on-premise environment but if done properly your data is just as secure. Use all the security control tools that are available deterrent, preventative, detective and corrective and along with creating policies and using best practices you will put in place some outstanding data security measures that will allow everyone in the business to sleep good at night. There are many books out that review this topic in great detail, I do suggest investing in one or two and further educate yourself in this area.

In my opinion cloud based solutions, especially for small to midsize companies is an excellent option and can provide  enterprise like excellent performance and availability. I also believe that most companies can really reap the benefits from the cloud solution which typically can lower cost, scale easier, and be up and running very quickly. Two cloud warehouse solutions you should definitely look at are AWS Redshift and Google Big Query. These are the two leaders in the industry and they have proven themselves as reliable and scalable. If you do review them please reach out to me and let me know your thoughts and which one you decided to go with and why.

How much does it cost?

This question is usually the first asked question by the executive team. If your executive team does not ask this within the first five minutes of your conversation then you have a team that is investing in doing the right thing by looking at what technology is best for the business and not worried about cost. There will be a difference in cost on what solution you do pick whether it is the cloud solution or an on-prem. With an on-prem solution you will have bigger capital expenses as well as the cost of additional software licenses depending what database you choose for your data warehouse and what operating system that database needs to run on. These expenses include but are not limited to hardware (server(s), network switches, ups, etc), space, and power consumption. Do not forgot about the human cost on maintenance of the servers and the database. The cloud solution will already have server, power and space included in the price. The server patches as well as patches to the database will be taken care for you with the cloud solution so no worries about additional man hours around those areas as a monthly resource cost.

There are plenty of articles and tools out there they go over on how to calculate the cost of on-prem and cloud solutions and they are very informative. One of the tools I use often is the AWS total cost of ownership tool https://aws.amazon.com/tco-calculator/. This tool will assist you in seeing how AWS can reduce your capital expenses in a project such as implementing a business intelligence solution as well provide you some very detailed reports that you can utilize in your presentation to management. The one thing I will say if you are only implementing a one server solution the cloud solution might cost a little more but will give you several advantages. In my next blog I will talk about some of those advantages in detail.

data governance

What is Data Governance

Businesses having a data governance committee is one of the most overlooked and undervalued areas when businesses are looking to start and/or maintain their business intelligence solution aka your single source of truth. I have fun exercise for you to do, one day go out and ask different friends and/or family of yours that are in different businesses (it could be a company with 25 employees to a company with thousands). Just ask them these two questions: Do you have data governance in your business? What is your definition of data governance? I think you might be very surprised by the answers that you receive from the people you have asked and especially if you ask people that are from different sizes of businesses.

I define data governance as the overall management (with the help of processes) to ensure the availability, usability, integrity, and security of the data that is employed throughout the entire business. I said it throughout the entire business. Data governance is a cultural change within your business and can be a positive change if it is approached and implemented properly. Data governance can be disruptive in the beginning but that does not mean it cannot be a positive disruption. You have to take the approach that this is an ownership of the data and each department belongs to that feeling of ownership and has a responsibility to themselves and the business to ensure that they are doing everything they can to follow the data governance guidelines. Empower the employees and you will see that data governance within your business will go along way.

Some key benefit that a data governance can give your business:

  • Better data quality
  • Better decision making
  • Increasing Operational Efficiencies
  • Improved Data Knowledge
  • Higher Revenue
  • Regulatory Compliance

The importance of a Data warehouse in a Business Intelligence Solution

I consider the data warehouse as the foundation of a business intelligence solution – your single source of truth. As with any good structure if the foundation is not good the structure will come crumbling down. Before we go any further about the importance of a data warehouse in a bi solution lets define it. A data warehouse is a large repository that contains data that is collected from a wide range of sources. These sources of data are usually the business’s transactional systems. The data warehouse is used to guide decision making at all levels and is optimized from read access. Taking the time here to build your warehouse by defining your schemas and tables  is crucial to the solution and the business. You do not want to rush through this part by any means. By not designing the data warehouse correctly the project’s cost can increase significantly, cause delays as well as potentially incorrect reporting of the data that is being stored in the warehouse incorrectly. Take your time here to ask the right questions, develop the right solution, review the solution with your team, and then implement and test. You will need to make sure the data warehouse is not only designed right but can also grow and scale as your business’s data needs grow. One thing you do not want to happen is in six months to a year  you are getting reports that the system is either slow and/or unresponsive. If one of my clients were asking me about this situation, my first thoughts would be around did the team calculate the right size warehouse or server, did they miss something within the database structure (wrong join, wrong columns were index, etc.). I can not say this enough, the data warehouse is the foundation of the business intelligence solution and it deserves the time to be designed, built, and tested correctly. Like I stated earlier if the foundation is bad then your business intelligence solution will have many issues, if you do not do your due diligence around the building of the data warehouse your solution will have a lot of issues in the future.

Middleware versus custom code

I have been asked many times in my career about my thoughts around using middleware versus writing custom code to integrate systems that need to communicate with each other. My thoughts have changed over the years as my experience has grown significantly. There are  a lot of challenges that need to be thought about and considered when you are making a decision between middleware or writing custom code and what is best solution for your situation.

With whatever solution you go with, the solution will have to govern or handle the following areas.

Internal Applications or SaaS that have changing API’s: A lot of the applications that you might be using have APIs available to allow other applications to integrate with them. These APIs can change with new releases of the application as a company enhances their software to accommodate new functionality and features. If you decide to write custom code to handle the integration you will need to get skilled programmers who will then have to get a firm understanding on how the API works (transaction rules, capabilities, schema, authentication, etc) for each application, after each new revision or deployment. If you are using middleware the configuration of the integration process can be changed easily and quickly when APIs change.

Handling business process changes: As a company grows business process will most likely change. These changes can include new or different business rules, data transformations, and business logic. With constant changes to the business and if you choose to write custom code, that code can grow to become unmanageable. With the use of middleware all changes are done through a graphical user interface which includes connections, transformations, business rules, flow logic, and data mappings. Most if not all these changes are done without writing any code.

Monitoring: More complex integrations require the integration between the systems to be monitored. If you have choose to do the system integration with custom code the developers will need to write all the monitoring and error handling code which can take a significant time to write possibly pushing your project longer then expected. With most middleware solutions monitoring, error handling, and logging is implemented with some point and clicks and an update to the configuration file.

Optimizing Processes: Most API’s have stipulations within them to help manage the integrity and performance of the application. These stipulations can add to the complexity of the integrations by restricting the amount that can be sent or received in each call, required authentication, how many times you can make the call to just name a few. With just some of the stipulations that I have name, the complexity of the custom code goes way up and a significant amount of code must be written and tested to handle them. Middleware usually contains functionality (tested functionality) to handle these stipulations and makes sure all the processes are optimized and the data is delivered faster.

My Conclusion: I have built many middleware solutions in my career and the codebase at times was very large depending on the complexity of the integrations. Within each project the business owner always thought that integration should be easy by saying “Hey you are just moving data from one point to another point”. I can tell you that is usually not the case at all. Based on my experience using a middleware solution is usually the most cost effective solution to go with in delivering a speedy ROI back to the business. Most if not all the functionality is already been developed and tested thoroughly so there is no need to reinvent functionality that has already existed.

Silos of Data

Silos of data

What are silos of data? I define silos of data as points of data that have been collected and being stored in many locations that could bring value to your business. Many times these silos of data store similar to identical information. Usually these systems are considered transactional systems that employees and/or customers are entering data into every single day. These transactional systems typically have different databases that are storing data related to your business and can be hosted on and off premise. Other silos of data that you will need to consider are spreadsheets, word documents, text files that are being maintained by employees on their personal computers. Employees (which includes everyone from the CEO down) are most likely storing data that could be important to the business. Identifying and coming up with a solution to collect all that data will not only help give you the big picture of the business but also protect one of the businesses greatest assets their data.

I would like you to think about not only where all the silos of data are and how valuable it would be once connected but also be thinking about how that data is being protected and who else is viewing that data (if that does not keep you up at night, I do not know what will).

I typically tell my clients, ask yourself two questions every time you walk out your business doors. Is the data on my laptop important to the business and/or sensitive in nature? If it is, how is that data being protected on my laptop? This is an area were I think a lot of businesses fall short and not only do they not have policies in place but do not even think of this situation. I have seen so many times employees storing sensitive information on their laptops, walk out their companies doors without even considering the security and responsibility that they should have in protecting that data.

Why you need a Data warehouse

Let’s start off by defining Data Warehouse. A data warehouse is a central repository of information that can be analyzed to make better-informed decisions. It can also be further defined as a repository that stores large amounts of data that has been collected and integrated from multiple sources – such as a CRM, payroll or accounting software, or inventory and sales systems.

“Why do we need  a data warehouse that is separate from our business transactional systems?”  This is a question that we get asked frequently at CopperHill Consulting. We answer this with several statements:

  • Business transactional systems are built for tasks and very specific work flows.
  • Business transactional systems allows editing, while a data warehouse is read-only.
  • Business transactional systems should only hold current data. A data warehouse can hold historical and current data.


If you want to move forward with your BI strategy, you need a data warehouse.

The data warehouse is a core component of Business Intelligence. Here’s how a data warehouse makes an impact:

  • Maintains a copy of data from your transactional systems. This allows you to keep your transactional systems lean and processing quickly with only the most recent and relevant data visible. It also lets you keep a history of all past transactions for recordkeeping and analysis.
  • Improves the quality of the data. Identify duplicate entries and records. Find anomalies in your data. Build custom views. These are all ways that a data warehouse can help improve the quality of your data both in your transactional systems and in your reporting.
  • Restructures information for different users. Create different user roles to restrict permissions and set different views to make it easier for users to understand the story their data is telling them.
  • Integrates data from multiple transactional systems. This lets you see a bigger and clearer picture of your business across all departments and silos.
  • Delivers excellent query performance without compromising business transactional systems. No need to worry about your systems slowing down, timing out, or crashing.

Your data warehouse will change and evolve as your business gets larger and greater over time. As your company grows, your requirements shift.  Your data warehouse needs to be designed to be flexible and scalable so it can handle changing requirements. Automated integration solutions to move company data from your business transactional systems and flat files to the data warehouse is one way you can make sure your data warehouse can grow with your company. Automations keep costs low as well as lower the chance of errors.