Analyzing unstructured data

Before we start to talk about unstructured data (or unstructured information) lets define it. Wikipedia defines it as information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional programs as compared to data stored in fielded form in databases or annotated (semantically tagged) in documents. I think that is pretty spot on. Today more than ever businesses are collecting more and more unstructured data whether it is from the vast social media sites to the enormous volumes of emails that are being sent out and received every day. There is a lot of great data just within social media sites and emails that are being collected and desperately need to be analyzed. This new data can be stored in a relational database as well as a NoSql database but my suggestion to you is store this data where you are storing all the other data sources — your warehouse. For example, if you are using Salesforce’s Pardot as your marketing tool you will have all your business’s campaign data, visitor data, and prospect data streaming from Pardot to your warehouse. Now you created a post in Facebook, you set up a campaign in Pardot and then you pushed an email out to all your prospects with a link to that post. You now know who opened that email and you know who clicked on the link. Please tell me how important would it be for your business to know the sentiment of all the comments that the prospects have left on the Facebook post? If you have not thought about that trust me it is powerful and it will make your data actionable. To analyze this unstructured data one of the best ways is to use Natural Language Processing (NLP). NLP is a form of artificial intelligence that focuses on analyzing the human language to draw insights, create advertisements and more. NLP is being used more and more and is driving many forms of Artificial Intelligence (AI). Think about it — you can decipher the sentiment of all the comments left on a post and based on the sentiment you need to pivot because the post has a negative sentiment towards it or better yet you do not have to do anything because the posts results have a positive sentiment around them. Not only is that information extremely important to your marketing teams but you will know how your product’s message is being received by the public.

There are a several important data points that you can get from NLP including sentiment analysis, keyword extraction, syntax analysis, entity recognition, and topic modeling to name a few. We are going to touch a little bit on each of these to show you not only how important the information can be for you and your business but to also make sure you have a general understanding of each of the topics. I utilize AWS comprehend which is a natural language processing (NLP) service that uses machine learning to discover insights from text and provides all the above functionality and returns the result in JSON format to either store in a database or display in real time inside an application.

Descriptive, Predictive and Prescriptive Analytics

What I have been seeing with all my clients over the last three years is them trying to get their arms around their data, cleaning it, gathering it into a central location which then they typically create dashboards and reports to see how their business did in the past but some are looking at how they are doing right now. So, the way most of my clients are looking at their data is called descriptive. Descriptive data analysis gives businesses insight into the past. Descriptive looks at the data, summarizes the data and then interprets that data into human readable format to give us analytics of the past. The vast majority of the statistics we use fall into this category. (Think basic arithmetic like sums, averages, percent changes). Most often, the underlying data is an aggregate or count of a filtered column of data to which basic math is applied. For all practical purposes, there are an infinite number of these statistics. Descriptive statistics are useful to show things like total stock in inventory, average dollars spent per customer and year over year, or even change in sales.

When I talk about Predictive data analysis I am looking to understand the future. Predictive analytics want to look at the data and then predict what can happen in the future. Predictive analytics want to give actionable information to its owner on what could be coming. Currently there is no predictive data analysis that can give you with a 100 percent accuracy on what the future holds. A business should take and read the results on what might happen in the future and decide on the path based on that knowledge.

These two statistics — descriptive and predictive — try to take the data that you have, and fill in the missing data with best guesses. They combine historical data found in CRM, ERP, HR and POS systems to identify patterns in the data and apply statistical models and algorithms to capture relationships between various data sets. Businesses use predictive statistics and analytics anytime they want to look into the future. Predictive analytics can be used throughout the organization from forecasting customer behavior and purchasing patterns to identifying trends in sales activities. These statistics also help to forecast demand for inputs from the supply chain, operations and inventory.

The last analytic option we will talk about is prescriptive data analytics. Prescriptive data analytics is when you want to be guided on all the possible outcomes. The relatively new field of prescriptive analytics allows users to “prescribe” a number of separate actions to and direct them towards a solution. These analytics are all about providing direction. Prescriptive analytics attempts to quantify the effect of future decisions in order to advise on all the possible outcomes before the decisions are actually made. When prescriptive analytics are at their best it will help predict not only what will happen, but also why it will happen providing recommendations regarding actions that will take advantage of the predictions. With this type of decision analytics, support business should feel comfortable with the actions that they need to take, either staying the course or pivoting to right the ship.

Which analytics does your business need? Does your business need descriptive, predictive and prescriptive data analytics? I believe in order to answer that question the business needs to know how advanced of a business intelligence solution it needs in order to be successful. In understanding how each descriptive, predictive and prescriptive and what questions they can answer for the business will drive the business to implement a simple or more complex business intelligence solution. One piece of advice that I would like to give here is start off with the simple solution and once that solution is providing the information you need, then enhance your business intelligence into a more and more complex solution. I believe taking this approach will give you a much higher success rate of implementing your business intelligence solution as well as a higher user adaption.

To quickly summarize the last three paragraphs, descriptive as we know answers the question of the how it looks at data in the past. We also reviewed predictive where we talked about how it will most likely answer questions on how something might happen. And lastly prescriptive will give you answers to questions on what actions can happen. Depending on your business goals and what answers you need from your data, the decision on if you need descriptive, predictive and prescriptive data analytics is very personal to you and the business.

I think it is important to show you the different levels of human input to draw conclusions from descriptive, predictive and prescriptive as well how each analytic area answers which questions. This will give you a good sense of employee time that will be needed depending on the way you will be looking at your data