Text analytics is a process for extracting information from documents. It is particularly useful in tasks requiring the analysis of large quantities of information that would be impossible to do manually. Linguistic and statistical techniques are used to classify and categorize the documents, and to discover concepts and relationships within them. Linguistic techniques include identifying synonyms, determining parts of speech and disambiguation, in which context is used to determine which of several possible alternative meanings a word might have. Statistical techniques include calculations of word frequency and proximity as well as pattern analysis.
Although text analytics has long been used for drawing meaning out of large quantities of data in many fields, without a doubt the most dynamic areas right now are e-discovery and analysis of customer information from social media.
LeClairRyan is a law firm that offers corporate and litigation services, including e-discovery collection, review and production services. The firm uses a variety of e-discovery platforms but most often turns to Relativity from kCura, delivered on demand through kCura hosting partner Planet Data Discovery Management Solutions. Relativity is an e-discovery software solution for review and management of both electronic and paper-based documents. LeClairRyan recently added Relativity's text analytics to its platform.
"One of the capabilities we use regularly is document clustering," says William Belt, team leader for e-discovery practice at LeClairRyan. "In the past, we reviewed documents in a linear fashion-for example, in chronological sequence. Having the documents clustered by topic is more efficient because the reviewers do not have to shift gears as much."
Clustering can also help identify relevancy, so that groups of documents that are likely to be highly relevant are together. "This ability helps with early case assessment," Belt says. "Previously we were using key words to identify relevant documents, but text analytics gives us a much clearer picture of the data set." Another benefit of clustering is quickly identifying potential areas of risk. "We can prioritize documents according to their likely level of risk," he adds, "which puts us in a better strategic position in the review process."
Quicker relevance review
The volume of information that must be searched is a primary driver in the quest for greater efficiency. "All the documents eventually get reviewed either by attorneys or paralegals," Belt says. "With Relativity, we are able to separate out tens of thousands of documents from the original group-which might have contained 300,000 documents originally-as not relevant, and then paralegals can quickly review them to verify that classification." With a large number of documents, even a small savings in time for each one adds up to significant savings in both time and money.
kCura releases frequent updates to Relativity. "We release a new version every two months," says Nick Robertson, VP of sales and marketing. "We want Relativity to be a real law firm workhorse. Automated review workflows, integrated productions and visual data analysis have been incorporated into the product for some time. This year we added a lot of new features, such as OCR and search term reports to allow case teams to better understand and work with their documents." Newly introduced APIs allow third parties to more easily integrate additional functionality.
In particular, kCura wants to make text analytics ubiquitous in e-discovery. "We made our index building process faster and easier," says Robertson, and introduced a workflow that codes documents based on decisions of experts. Teams of experts sample the documents and code them for relevancy, and Relativity's text analytics codes the rest of the documents. Past versions of Relativity also allowed users to create relevant samples, but the review workflow is new. Based on kCura's recent look at usage across its customers, text analytics is being used more than 10 times as often in 2011 as it was in the last quarter of 2009.
Voice of the customer
The other booming sector for text analytics is in the area of customer feedback through social media as well as traditional channels such as customer relationship management (CRM) systems. Whirlpool manufactures consumer products under its own brand name and also under Amana, Jenn-Air, KitchenAid and others. Like many consumer products manufacturers, Whirlpool wants to keep a close watch on customer sentiment, and has used software products from Attensity for a number of years to provide an integrated view of its customers' experiences from its CRM system, e-mails and customer service records.
In 2010, Whirlpool began using Attensity360, which monitors social media, and extended its use of Attensity Respond, which allows Whirlpool to respond to and track customer comments online, into its social media monitoring. The company is now able to extract information from online customer conversations and other social media sources to gain insights into possible new products and take action in the case of customer dissatisfaction. Whirlpool also has developed a set of metrics that indicate the number of customers contacted, positive and negative comments about their own brands and those of competitors, and other information that assists in product development and customer service.
Companies are not always sure what they should be doing with social media, either in terms of their own participation in corporate blogs, Facebook and Twitter, or in terms of using such information productively. "We are seeing a new class of buyers of Attensity who have been asked by their CEOs, ‘Why don't we know what's going on in social media?'" says Michelle de Haaff, CMO of Attensity. "There is definitely a drive to use text analytics in large companies to unleash the voice of the customer."
Interpreting the Tweets poses some special challenges. Attensity360 offers Twitter's "firehose," a streaming API that provides access to Tweets for analysis, as a native service in Attensity360. To make the Tweets meaningful, however, Attensity must translate them into standard English. "We had linguists studying this ‘slanguage' for the past two years," de Haaff says, "so that every word is parsed, making it possible to see relationships among words, even if the words are abbreviations, acronyms or emoticons."
Social media's info avalanche
Attensity also provides in its response software a method for prioritizing responses to customers who ask questions in forums or on a company's Facebook page. "Attensity360 does the prioritizing based on predefined business rules," adds de Haaff. Therefore a telecom company can quickly respond to a customer who is contemplating switching services, one whose sentiment is strongly negative, or someone who counts as a "good" customer.
Social media provides the low-hanging fruit for automated gathering of customer intelligence, according to James Kobielus, senior analyst at Forrester Research. "In the past, companies did surveys or held focus groups," he says, "but now, customers are constantly telling companies what they think via social media such as Twitter and Facebook." The challenge is to capture and make actionable that avalanche of information. Those "listening engines" that tap into social media streams provide different types of feedback to companies ranging from awareness to sentiment to most appropriate offers, promotions and other responses.
One specific insight companies are seeking is a customer's propensity for purchasing a product or service. "It's possible to do deep text analytics and then use the results of those analyses to create predictive models of likely customer behaviors," Kobielus says. "There may be a correlation between a certain price and propensity to buy, and the company can then develop a campaign to validate the propensity model and possibly identify areas where the models needs tweaking." That process, often called "next best offer" modeling, helps companies invest their marketing resources more wisely.
Text analysis meets business analytics
Text analyses also can be combined with quantitative data from business analytics systems. "We are seeing a lot of traction in these types of analyses," says Dan Lahl, senior director of product marketing at Sybase. "For example, if discussions in customer support forums indicate that a product is hard to use, it's helpful to know if it's your top customers who are commenting." However, identifying the top customers depends on certain criteria such as buying patterns and historical revenue amounts that are stored in structured databases.
Sybase IQ is a business analytics solution that indexes and stores unstructured data in the same system as structured data. Therefore, queries that combine both can be run, analyzed and correlated in the same system by Sybase IQ. For example, analysis of the customer comments, which originated as unstructured data, would be coded and stored in the database. Sybase IQ analyzes the coded comments relative to other factors, such as the number of years the customer has used the product.
Processing the raw text data is necessary for Sybase IQ to carry out its analyses. Sybase has partnered with ISYS Search Software to parse text for text indexing, and with Kapow Software to extract and transform data from the Web so that it can be consumed by Sybase IQ. "The processing power of Sybase IQ allows us to analyze data in real time so our customers can react quickly to changing circumstances that may be reflected in different types of data," Lahl explains. "Analysis of structured and unstructured data in combination offers insights that neither can provide alone."
Text analytics takes on bribery
In addition to providing more sophisticated e-discovery capability, text analytics technology is being used proactively by companies to search their records for vulnerabilities. "Text analytics can be useful at several stages," says Howard Sklar, senior counsel at Recommind. Recommind offers a family of products for information access, governance, e-discovery and compliance based on its Content Optimized Relevancy Engine (CORE) technology.
Bribery scandals in the mid-1970s prompted the passage of the Foreign Corruption Practices Act (FCPA) of 1977, which outlawed payments to foreign officials by U.S. companies in order to obtain business. In fact, bribery was quite widespread. A survey conducted by the Securities and Exchange Commission (SEC) indicated that many public companies, including about 20 percent of Fortune 500 companies, bribed foreign officials. In Europe, those payments were not only allowed but were tax-deductible as "ordinary and reasonable" business expenses until anti-bribery laws were passed there in the late 1990s.
The incentives for companies to put an end to those payments are now many and varied. Around 2005, the SEC began prosecuting cases more aggressively, and recently the Department of Justice (DOJ) began prosecuting individuals rather than just corporations. Financial penalties include not only fines, but also legal costs that may far exceed the fines. Shareholders can also sue the corporations, based on violations of the portion of the FCPA that requires accurate financial records. "General counsels at the firms began going to their compliance officers and saying, ‘What are we doing about this?'" Sklar explains. "That began a move toward a more proactive approach."
Through the use of text analytics combined with analysis of business intelligence (BI) data, companies can analyze information centered on high-risk employees and high-risk geographic areas and look for key indicators of problems. Detecting issues early is far better than having them discovered through an audit. "Over the next few years companies doing business globally will face increasing needs for risk management," Sklar says, "and text analysis is part of the solution."