Big data are characterized by a variety of content, including text data, which can be customer product reviews, service center notes on repairs and product issues, and call center recordings on customer complaints, to mention just a few. Text data are full of rich information, which must be extracted and made useful. This is why text analytics is essential.
Text data, unlike numerical data, are unstructured: they do not follow well-defined patterns and rules so that statistical and mathematical methods are challenging to apply. They are prone to wide variations in spelling, grammar, idioms, jargon, syntax, and abbreviations that complicate processing and analysis. In addition, there are many “stop words” that are used that have to be filtered since they add little to no insight.