Analyzing unstructured data

Before we start to talk about unstructured data (or unstructured information) lets define it. Wikipedia defines it as information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional programs as compared to data stored in fielded form in databases or annotated (semantically tagged) in documents. I think that is pretty spot on. Today more than ever businesses are collecting more and more unstructured data whether it is from the vast social media sites to the enormous volumes of emails that are being sent out and received every day. There is a lot of great data just within social media sites and emails that are being collected and desperately need to be analyzed. This new data can be stored in a relational database as well as a NoSql database but my suggestion to you is store this data where you are storing all the other data sources — your warehouse. For example, if you are using Salesforce’s Pardot as your marketing tool you will have all your business’s campaign data, visitor data, and prospect data streaming from Pardot to your warehouse. Now you created a post in Facebook, you set up a campaign in Pardot and then you pushed an email out to all your prospects with a link to that post. You now know who opened that email and you know who clicked on the link. Please tell me how important would it be for your business to know the sentiment of all the comments that the prospects have left on the Facebook post? If you have not thought about that trust me it is powerful and it will make your data actionable. To analyze this unstructured data one of the best ways is to use Natural Language Processing (NLP). NLP is a form of artificial intelligence that focuses on analyzing the human language to draw insights, create advertisements and more. NLP is being used more and more and is driving many forms of Artificial Intelligence (AI). Think about it — you can decipher the sentiment of all the comments left on a post and based on the sentiment you need to pivot because the post has a negative sentiment towards it or better yet you do not have to do anything because the posts results have a positive sentiment around them. Not only is that information extremely important to your marketing teams but you will know how your product’s message is being received by the public.

There are a several important data points that you can get from NLP including sentiment analysis, keyword extraction, syntax analysis, entity recognition, and topic modeling to name a few. We are going to touch a little bit on each of these to show you not only how important the information can be for you and your business but to also make sure you have a general understanding of each of the topics. I utilize AWS comprehend which is a natural language processing (NLP) service that uses machine learning to discover insights from text and provides all the above functionality and returns the result in JSON format to either store in a database or display in real time inside an application.

Developing a data collection process and documenting

When building out your business intelligence solution an important step of developing data collection processes and documenting those processes is critical to your business and its success. Why develop a data collection process? Not only will creating a data collection process standardize the way you collect data for all the groups in your business but it will ensure the integrity of your data is kept high. Processes will ensure that your actions are repeatable, reproducible, accurate, and stable. Think about it: if a business had an employee that was collecting critical data on the business and the business had no idea how it was being collected and that employee left, that would have some impact on the business. Would the business be able to figure out how the employee was doing this but after how long? At what cost to the business? Could there be repercussions? Ensuring processes are in place for your data collection will improve the likelihood that the data is secure, available, clean and the measurements derived from that data will help the business now and in the future.

There are many reasons why every business should be documenting the data collection process. If you are then documenting your processes becomes transparent and your data becomes comprehensible in the future for yourself and others. Your documentation of each of the data collection process should include:

• A good description behind the data that is being collected.
• Provide all the answers: the who, what, why, where and how of the data.
• Provide any conditions of the use of the data as well as the confidentiality.
• Provide any history around the project for collecting this data