Before we start to talk about unstructured data (or unstructured information) lets define it. Wikipedia defines it as information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional programs as compared to data stored in fielded form in databases or annotated (semantically tagged) in documents. I think that is pretty spot on. Today more than ever businesses are collecting more and more unstructured data whether it is from the vast social media sites to the enormous volumes of emails that are being sent out and received every day. There is a lot of great data just within social media sites and emails that are being collected and desperately need to be analyzed. This new data can be stored in a relational database as well as a NoSql database but my suggestion to you is store this data where you are storing all the other data sources — your warehouse. For example, if you are using Salesforce’s Pardot as your marketing tool you will have all your business’s campaign data, visitor data, and prospect data streaming from Pardot to your warehouse. Now you created a post in Facebook, you set up a campaign in Pardot and then you pushed an email out to all your prospects with a link to that post. You now know who opened that email and you know who clicked on the link. Please tell me how important would it be for your business to know the sentiment of all the comments that the prospects have left on the Facebook post? If you have not thought about that trust me it is powerful and it will make your data actionable. To analyze this unstructured data one of the best ways is to use Natural Language Processing (NLP). NLP is a form of artificial intelligence that focuses on analyzing the human language to draw insights, create advertisements and more. NLP is being used more and more and is driving many forms of Artificial Intelligence (AI). Think about it — you can decipher the sentiment of all the comments left on a post and based on the sentiment you need to pivot because the post has a negative sentiment towards it or better yet you do not have to do anything because the posts results have a positive sentiment around them. Not only is that information extremely important to your marketing teams but you will know how your product’s message is being received by the public.
There are a several important data points that you can get from NLP including sentiment analysis, keyword extraction, syntax analysis, entity recognition, and topic modeling to name a few. We are going to touch a little bit on each of these to show you not only how important the information can be for you and your business but to also make sure you have a general understanding of each of the topics. I utilize AWS comprehend which is a natural language processing (NLP) service that uses machine learning to discover insights from text and provides all the above functionality and returns the result in JSON format to either store in a database or display in real time inside an application.