Reading CSV and inserting data with Python

One of the easiest things that you can do in Python is read a CSV file and insert that data into a Postgres database. In this blog I will show you how to open up and read a CSV file and then loop thru that CSV creating an insert statement to move that data into the database.

Table structure in Postgres database
CREATE TABLE user
(
business CHARACTER VARYING(155),
firstname CHARACTER VARYING(75),
lastname CHARACTER VARYING(100),
title CHARACTER VARYING(50)
);

First thing we want to do is create a connection and create a cursor
conn = psycopg2.connect(host=”HOSTNAME“, database=”DATABASE NAME“,user=”USER“,password=”PASSWORD“)
cur = conn.cursor()

Second lets create an insert statement
insert_stmt = ” INSERT INTO TEST (business, firstname, lastname, title) VALUES (%s,%s,%s,%s)”

Third we are now going to open the CSV file, read it, build insert statements and insert. Sample file lines

“BUSINESS”,”FIRSTNAME”,”LASTNAME”,”TITLE”
“Jakes Landscape”,”Jake”,”Switch”,”President”
“Lamps etc”,”Robert”,”Jump”,”Sales”
“PMP Light Bulbs”,”David”,”Downes”,”Warehouse”

file_name =”PATH TO CSV FILE
reader = csv.DictReader(open(file_name))
# Count how many records have been processed
recordcount = 0
for row in reader:

try:
cur.execute(insert_stmt, (row[‘BUSINESS’],
row[‘FIRSTNAME’],
row[‘LASTNAME’],
row[‘TITLE’]))
conn.commit()
recordcount = recordcount+1
except (Exception, psycopg2.DatabaseError) as error:
print(sys.stderr)

conn.close()
cur.close
print (‘Process ‘+str(recordcount)+’ rows from ‘+file_name)

It is as simple as that to read a CSV file and insert its data into a database.

Developing a data collection process and documenting

When building out your business intelligence solution an important step of developing data collection processes and documenting those processes is critical to your business and its success. Why develop a data collection process? Not only will creating a data collection process standardize the way you collect data for all the groups in your business but it will ensure the integrity of your data is kept high. Processes will ensure that your actions are repeatable, reproducible, accurate, and stable. Think about it: if a business had an employee that was collecting critical data on the business and the business had no idea how it was being collected and that employee left, that would have some impact on the business. Would the business be able to figure out how the employee was doing this but after how long? At what cost to the business? Could there be repercussions? Ensuring processes are in place for your data collection will improve the likelihood that the data is secure, available, clean and the measurements derived from that data will help the business now and in the future.

There are many reasons why every business should be documenting the data collection process. If you are then documenting your processes becomes transparent and your data becomes comprehensible in the future for yourself and others. Your documentation of each of the data collection process should include:

• A good description behind the data that is being collected.
• Provide all the answers: the who, what, why, where and how of the data.
• Provide any conditions of the use of the data as well as the confidentiality.
• Provide any history around the project for collecting this data

Being a data driven business

Once you start to understand your data and the information is available throughout the business, a whole new world will open up to everyone that has access to the data. You will be able to see not only how the business is doing as a whole, but how each department of the business is doing as well as how employees are doing personally. With the same information (your single source of truth) being accessed across your business and the same story being told to everyone, not only is that powerful because you now have everyone seeing and reading the same data but the entire business is now moving in the same direction. To keep the business moving all in the same direction, to share a common goal, and to make sure all departments are doing what they need to do is extremely important because this will help drive up revenue and this will lower cost by seeing which areas within departments can be enhanced. With all the great information that you start to see, do not ever be satisfied with what your data is telling you. I think the most important thing that a business can do once the business intelligence solution is in place is not stop asking questions around the data. I know I have had many people state to me if you understand data then you can make smart decisions. Understanding the data is really beneficial and really helpful in guiding your business at that moment, I am talking about not being satisfied with all the results that you are getting now from the information the business is seeing. What I am stating is for you and others to keep asking the questions that I have suggested below. You might remember these questions from elementary school in regards to a English class. By asking these questions, you will keep moving the business, the department and the employees in the right direction.

Who: Who is the data about? Who is collecting the data? Collecting data across multiple who’s will give you a firm grasp of what is going on across multiple who’s and not a single who.

What: Of course you want to know what your data is about. Know what makes up the data and know what makes up the numbers behind the data.

When: Have a firm understanding of when your data has been or is being collected. Most data that is being collected is linked to a timeframe and knowing the time of your data is important to what decisions you make now or in the future.

Why: Knowing why the data has been collected can help you understand what needs to be shown to help the business. The constant asking of why this data is being collected will keep the data in check to make sure it has meaning and not tied to some crazy agenda that someone has gone of on. If it is tied to an agenda be cautious and keep asking questions.

How: The last question is how. How was this data collected? If you are purchasing or using an external data file you need to know how this data was collected and/or aggregated. Do you feel comfortable with the data, do you trust it?

Starting

Getting ready to start a BI Solution?

Before building out the plan for a bi solution project, I always like to think about what other projects that I have done or know about that are similar to the one I am going to embark on. I always ask myself what issues or obstacles did I encounter in those projects? There are so many reasons why projects get dragged out or just do not succeed, either from the start or they dwindle out if the project exceeds its expected delivery date. Lets just review some and see how many you came across in your professional career on why a project has failed.

 

  • No Executive sponsorship. There is no real buy in from the senior management team, which can lead to many different problems for the success of the project. You have to be persistent here and spend time presenting and socializing how important this project is to them and the company.

 

  • Battle between departments. Believe it or not I have run into this issue so many times when I have investigated why so many companies have failed in implementing a business intelligence solution. So what I mean here is that each department is acting like a ten year old child that does not want to share their data or their process because they believe it is their own secret sauce or they have something to hide.

 

  • No real project plan in place. This is were a company decides they would like to shoot from the hip on this project. They have no real kick off, no steps, no definition of what success is for the project, no anything. They fail to realize that this is a company wide project and the need for a project plan not only holds everyone accountable but ensures the right steps are completed before the next step are taken. It is this mentality that will kill any project almost from the start,

 

  • The IT department wants to build everything from scratch. This can be a problem in a lot of companies were the IT department wants to build everything custom. This can be not only very time extensive but also very costly. I always tell my clients to take the build, buy, or align approach when looking at the different areas of a project. There are several reasons when you should consider building a custom application over buying or aligning: Off the shelf products cannot meet every need, off the shelf products are to rigid, off the shelf products may not be compatible with your existing applications. Now let me give you several reason when you should be thinking of buying an off the shelf product: budget is limited, lack of time, lack of technical proficiency, and technology would not give you a competitive edge.

 

Above are just a few examples why your project could take longer than expected to be completed or just fail. I am sure that you can think of many more or even have experienced many more. I know in my long career as an IT professional I have come across reasons why projects have failed that would make people laugh and then some other reason that would make people cry. I will save those stories maybe for another blog. I think the main reason why I wanted to get you thinking about project failure is so that history doesn’t repeat itself and I am a firm believer that if you can learn from all failures, you will have a more successful future.

Looking at your data in different ways can help your business

You can look at your data many different ways and it will help your business to look at that data in as many different ways as possible and not just one. We will cover some of the basics here and hopefully it will get your mind thinking about what data you currently have and how you can look at it. By looking at your data in different ways or different perspectives you can obtain different business values.

Grouping

You can group your data. What I mean by grouping is when you have rows and rows of data and you want to bring that data together then grouping logical can help you. Take for instance you have eight to ten sales reps making calls to physicians daily trying to sell them their pharmaceutical drugs. You can group all that data by day for a giving week to show what days the sales people are making the most successful sales calls.

Granular

Looking at your data granular is the processing of driving further down into the details of the data. Let’s take the above example of showing successful sales people calls for a giving week. Well we know that Bob on the sales team has be crushing it since he joined, but looking at the data for daily sales calls he and the other sells team members are placing about the same amount of calls Monday through Friday. Let’s get a little more granular and drill down into each of the days calls and see what time he is making his calls. Now the data is showing us that Bob is making all his calls either early in the morning or late afternoon and all the other sales people are consistently calling in between Bob’s hours. Deriving this types of answers and drilling into the data will give you more answers that will help you guide others in the business to be better.

Visualizing

One of the most important ways to look at your data is seeing it in a visual concept or chart. I feel this is one of the most important ways for your data to tell it’s story. One of the biggest reasons visualizing your data is so important is because people are so visual and seeing data in row or tabular form to understand the data can be difficult and sometimes mind numbing. Showing a data story through a picture (chart and/or graph) allows people to understand the meaning quicker. If there was a significant increase or decrease in products sales or customer service calls seeing that spike in a graph will be quick and easy. If there is bad data in your warehouse a picture will definitely tell that story and you will be able to visually see that very quickly. Data visualization tools have greatly changed how businesses have worked with their data as well how businesses have dug deeper into their data gaining valuable insight. So make sure you ask the question “Can this data be visually representing?” and if the answer is yes build out the data visualization and share it with the business. 

Data Relationship

Can my data be related? I think this question not only does not get asked enough, but it is one of the most powerful ways to look at your data. If your data can be related to other data how strong is that relationship? Even though asking about data relationships is one of the most basic questions in looking at your data it is the most overlooked and the most informative. Let’s look at a simple example: You have a subscription form on your website that allows potential customers to fill out and subscribe to your newsletter. You have two sets of data, one set being the customers information and the other being website. If you look at each data separately you can know who the customer is, from the form information. Looking now at just the website data you can see how an unknown person entered your web site and walked it (What I mean by walking, is what web pages did they visit on your web site). Now let’s join the data (the relationship), Once we do that we know who the person is, how they got to the web site, what they did on the web site, what page they came from before filling out the form, and then what they did after filling out the web form. Now you have a complete picture and you can make some pretty good assumptions about the user. Powerful right. Relating your data will take time and thought but trust me it is definitely worth the work.

These are just some of the ways that you can look at your data and get answers to questions that have been asked and possibly questions that have not been asked yet. Never stopping asking questions and never stop looking at data in different ways.

Seeing the picture

Join Goggle and marketing data to get big picture of your prospects. The why and how

The Why:

When you look at each silo of data individually you do not get a complete picture of what a visitor or prospect is interested in, you only see what that one piece of the puzzle is showing you. You can get good information from your Google Analytics data (silo) when you look at it by itself and you can get good information from your marketing data (silo) when you look at that data by itself. What I am writing about in this blog is bringing those silos of data together to give you more insight into a potential customers journey, like when you put pieces of a puzzle together you see the picture clearer. Lets say that your business is utilizing Pardot as its marketing tool to not only manage email campaigns that are guiding potential customers to landing pages that are hosted on your web site but you are also gather customer information from forms that are being embedded within your business’s website. Wouldn’t it be great to see what content that prospect was looking at prior to and after they filled out a form on your website or where did they navigate to after getting to that landing page from one of your email campaigns? You have all the information in Pardot first name, last name, email address, etc and now you would be able to gather additional intelligence about them by what they are viewing within your business’s web site. You could even use that information to send out more direct campaigns based on the new knowledge that you have obtained from joining the data. For instance the prospect was on your website for 5 minutes and they visited the integration, business analytics, and warehouse pages. I know what I want to send to that prospect now.

 

The How: 

You have to join that data somewhere so lets bring that data into a warehouse. I used the CopperHill AIR Platform to run all the jobs that I created with Talend to move the data from Google and Pardot into the AIR Warehouse. Lets start with the steps I took with Google first. I opened Google Query Explorer which is a tool that allows you to play with the Core Reporting API by building queries to get data from your Google Analytics views. This tool allows you to set different metrics, dimensions, sorts, filters and segments so that you can see the end results before applying them into your data integration tool. Once I got the queries working just as I wanted them to and the data looked good, I move those exact settings inside my Talend tool. With Talend I had to install the Google Analytics Component as it is not part of the base install. The component is free and very easy to install. Once I setup the component and deployed it to the AIR Platform I was immediately streaming data. Below is just an example of one call to Google’s Core Reporting API within the job. I repeated the same steps for each call I wanted to make to Google Analytics API and all the data transfers were handled inside that one job. Channel Data

The next step was to get Pardot data flowing. Pardot does have a great API that you can access and read your marketing data from. Again I used the AIR Platform to run the job that I created in Talend to move this data out of Pardot and into the AIR warehouse. Let me say this, you do have to have knowledge of the Pardot API or creating this job can be a little tricky. What I did was setup a REST call to each of the Pardot objects that I wanted to receive data from and store within the AIR warehouse. I have attached a portion of the job I configured to retrieve data from the Pardot campaign object. As you can see the setup and configuration is a little more complex then the call to the Google API but the data is flowing just as fast.

Now that all the data from the two sources Google and Pardot are streaming into the AIR warehouse we can create the reports showing exactly what we stated in The Why section. This thought process and build out can be applied to really any marketing tool that a business is using and join that to your Google Analytics data. If you want to gather great actionable information around your prospects then joining google and marketing data is definitely a great idea for you and your business. Finish the data puzzle so you can see the big picture and just do not look at the pieces by themselves.

Why you need a Data warehouse

Let’s start off by defining Data Warehouse. A data warehouse is a central repository of information that can be analyzed to make better-informed decisions. It can also be further defined as a repository that stores large amounts of data that has been collected and integrated from multiple sources – such as a CRM, payroll or accounting software, or inventory and sales systems.

“Why do we need  a data warehouse that is separate from our business transactional systems?”  This is a question that we get asked frequently at CopperHill Consulting. We answer this with several statements:

  • Business transactional systems are built for tasks and very specific work flows.
  • Business transactional systems allows editing, while a data warehouse is read-only.
  • Business transactional systems should only hold current data. A data warehouse can hold historical and current data.

 

If you want to move forward with your BI strategy, you need a data warehouse.

The data warehouse is a core component of Business Intelligence. Here’s how a data warehouse makes an impact:

  • Maintains a copy of data from your transactional systems. This allows you to keep your transactional systems lean and processing quickly with only the most recent and relevant data visible. It also lets you keep a history of all past transactions for recordkeeping and analysis.
  • Improves the quality of the data. Identify duplicate entries and records. Find anomalies in your data. Build custom views. These are all ways that a data warehouse can help improve the quality of your data both in your transactional systems and in your reporting.
  • Restructures information for different users. Create different user roles to restrict permissions and set different views to make it easier for users to understand the story their data is telling them.
  • Integrates data from multiple transactional systems. This lets you see a bigger and clearer picture of your business across all departments and silos.
  • Delivers excellent query performance without compromising business transactional systems. No need to worry about your systems slowing down, timing out, or crashing.

Your data warehouse will change and evolve as your business gets larger and greater over time. As your company grows, your requirements shift.  Your data warehouse needs to be designed to be flexible and scalable so it can handle changing requirements. Automated integration solutions to move company data from your business transactional systems and flat files to the data warehouse is one way you can make sure your data warehouse can grow with your company. Automations keep costs low as well as lower the chance of errors.