What if enterprise search were less about telling and more about asking? When you know how the information you’re looking for fits into the algorithmic structure that powers your enterprise search, you know which keywords get the system to take you where you’re going. But that’s how computers find things—not people. Imagine walking into a colleague’s office and saying "mission statement" … and nothing more. Sure, they might intuit what you wanted, but it would be a good deal more effective to say,"We need to revise our mission statement; could you locate the current one?"
For that matter, human search doesn’t end with document retrieval. When you search for "Budget Report 2008," maybe you’re really wondering, "What is my department’s budget for this quarter?" And maybe the answer spans four or five other documents you hadn’t even considered.
Within the workplace and out on the wider web, semantic and natural language search believers argue it’s time we were able to search for knowledge—a deeper set of information that comprise pieces of a whole answer.
Maybe computers will never be able to understand what you’re really asking the way a co-worker might. But a number of recent solutions are giving computers a way to infer deeper levels of meaning from natural language queries. They’re defining language rules, customizing dictionaries, and mining data—meta and beyond—to bring enterprise searchers answers, not just information.
A More Natural Approach
The majority of people seem happy with the way they search the web: A 2007 Keynote Systems, Inc. study found that nearly 85% of total internet traffic was driven through search engines. Keyword- and algorithm-driven search engines such as Google, Yahoo!, MSN/Live Search, and AOL swept the top four spots for market share according to May 2008 Nielsen Online rankings. Holding steady at number five is Ask.com, a pioneer of consumer web natural language search. Ask.com (known as AskJeeves until 2005) only commands 2.1% of the consumer-search market, but it is up 35.8% in year-to-year market share growth as of May 2008, even more than Google’s 35.4% increase. And, according to the Keynote study, Ask.com now ranks third for overall search customer satisfaction index, behind only Google and Yahoo!.
As AskJeeves, the site pushed consumers to ask questions rather than punch in keywords. While the site has now evolved to include keyword search, the site ranks results by an authority-based algorithm. Depending on how much they know about what they’re looking for, users can enter either a string of keywords or a natural language query. While order doesn’t matter on Google, Ask.com’s technology takes into account word order and inferred meaning.
For natural language search, the process goes beyond extracting key or repeated phrases and attempts to "understand" the way the language on the page relates to the language of the query. They might both eschew traditional keyword-based algorithms, but natural language and semantic search aren’t the same things. Natural language search relies on visible information; semantic search delves into information embedded in RDF, OWL, XML, or other metadata behind the scenes. Natural language search endeavors torelate information inferred from each page, based on how it interprets word meaning and order in the query. On the other hand, semantic search processes structured metadata to ask the page directly what knowledge it contains and then present that in a subject-predicate-object formula that can relate to fully phrased questions.
Take Powerset (which Microsoft acquired in July), the latest to employ AskJeeves’ "ask your question" search philosophy. When AskJeevesfirst debuted, its natural language function excelled when presented with common questions, because past searches or a team of human experts had preselected a variety of resources to answer the question. Likewise, Powerset, a search engine powered by natural language and semantic search technologies licensed from the Palo Alto Research Center, Inc. (PARC), sticks to a preselected set of research-oriented data: Wikipedia.
For natural language search purposes, Wikipedia is an ideal testing ground. Every entry is strictly informational and similarly structured. Different term meanings are neatly disambiguated, so the engine can see different semantic options for words such as "autonomy" (the ruling system) and"Autonomy" (the company). Wikipedia presents a much smaller subsection to map than the web at large. It’s designed to "understand" the information, "understand" the query, and then deliver a list of straightforward links, as well as "Factz" that correlate query language directly to language in the Wikipedia page. For instance, a Factz search for "Where is China?"brings back a map and one word: "Asia."
Another newcomer, Yahoo!’s Microsearch, released at the beginning of 2008, takes a semantic approach to research queries. Rather than try to "read" the pages it indexes, Microsearch uses microformats such as hCard and hCalendar, as well as embedded RDF and RDFa metadata, torelate things such as name, location, and past searches to results to enhance the information’s dimension. Maps, timelines, and tangential search results are displayed alongside more direct pathways to create different layers of meaning tailored to the metadata.
Out on the open web, semantic search engines have hit a major snag by relying on invisible info. Most online creators are more concerned with creating front pages targeted to keyword-based search, rather than investing in back-end content to enable semantic findability. "There is a clear and obvious feedback loop which impels me to improve the page’s appearance," says Edwin Cooper, chief scientist and co-founder of natural language enterprise search vendor InQuira. "Until the payoff for adding semantics to content is a lot more clear and immediate, and visible through an easy feedback loop in my development environment, it is hard for me to believe that it will be done for the web as a whole."
In a Web 2.0 reality, the natural language approach seems better suited to handle the vast amount of unstructured data. Unless web creators are equally willing to create specific, targeted metadata, semantic search results can be limited, outdated, and fail to represent multifaceted content. Hypothetically, this is where natural language search should shine,since it can "read" all those blog entries to figure out whether "cat" means an entry about cat diseases or the cat Fluffy. Actually understanding all that text would take an incredibly fast, incredibly nuanced search solution capable of applying complicated language technology to a vast amount of data. But breadth isn’t natural language search’s strong suit; it’s depth.
Worldwide Versus Work-Wise
Our searches are getting shorter, but our questions are getting more sophisticated, as more knowledge comes online. Reference librarians have devoted centuries to elucidating the subtleties in people’s questions, when the questions aren’t "Who is the U.S. president?" but "How has the U.S. president affected global politics since 2001?" natural language search may require users to elaborate on their algorithm-pleasing one-or-two-keyword search, but those seeking nuance beyond a standard RDF subject-predicate-object reading might need to rephrase the question several times to get to the real information they’re seeking.
Then there’s the subtlety of human language, which may never be quite computer-compatible. Case in point: A Powerset search for "Where is Kansas?" brings up identical search results for "Who is Kansas?" By contrast,the Ask.com query "Where is Kansas?" brings up first a map of the state (plus current weather conditions in Topeka), and "Who is Kansas?"displays a picture of the 1970s band. Posing "Where is Kansas?"to Yahoo! Microsearch gets a state-library document that locates any city in the state, a Yahoo! satellite image of Kansas City, and a dragable timeline that doesn’t seem to do much. On the other hand,"Who is Kansas?" retrieves an assortment of state, political, and musical information, plus a world map overlaid by several hyperlinked bubbles, one pointing to the state of Kansas, and the other to an online dating profile for someone from Kansas named Tammy.
"The enormous potential of natural language search on the web is matched by the enormous difficulty of pulling it off," says InQuira’s Cooper. Even distinct who-where-what-when modifiers are extraordinarily hard to comprehend compared to keywords such as "Kansas." But while toddlers use experience, inference, and other human-specific communication skills to learn the meaning of "where" versus "who," computers will probably not develop sufficiently human comprehension skills in the foreseeable future. "Will it happen someday?" speculates Cooper. "Probably. But HAL-like capabilities (‘I’m sorry Dave, I cannot do that’) certainly didn’t happen by 2001, and I’d say they probably won’t happen by 2101 either."
If a semantic search-topia is possible, according to Cooper, it might be the enterprise environment that first fosters it.While offices might have their own data avalanche, it’s nothing compared to the consumer web overload. Correspondingly, says Cooper, "there is a much smaller set of intended actions than you get on Google." Officer searchers are more likely to complete the same few kinds of searches repeatedly, giving the engine a "large and useful chunk of natural language understanding" to better tailor results, as opposed to trying to find everything from "What are microprocessors?" to"Where can I buy a minivan?" The vertical nature of the information within closed networks also limits the kinds of search terms and vocabulary the search function needs to "learn," allowing the system to "invest in understanding the specific vocabulary that is used in the context of an enterprise-specific search," says Cooper.
Google Enterprise product manager Cyrus Mistry agrees that the deeper context and intent conveyed in a natural language query can benefit from the niche vocabulary associated with enterprise information. Google.com, the consumer engine, gets its context from behavioral data "derived from millions and millions of queries." However, he says "in the enterprise, you get the benefit of highly-contextualized dictionaries. In every industry out there, we already have queries and terms. We’ve given every customer the chance to tell us if they make sense."
Tracy Holloway King, area manager for PARC’s Natural Language Theory and Technology team, concurred with Cooper’s optimism about natural language enterprise search. While Google’s link-weighting algorithm may win it web fans, "There is not typically a robust link structure within an enterprise intranet that can help determine document relevance," says King."This is one reason why enterprise search has historically had poor precision,with irrelevant documents being returned. There is less redundancy in enterprise search compared to full web search, which means that keyword choice strongly affects traditional search methods over enterprise collections."
Results from within a corporate network are less likely to be algorithmically-primed to grab top search results. Across the web, "the relationship between search and content is more or less antagonistic," Cooper says. "With enterprise search, on the other hand, the people who make the content are on our side … A virtuous cycle that includes automated analysis of content needs as expressed through search queries, which in turn triggers a workflow process for content changes is a big advance."
Natural language enterprise search doesn’t require any additional layers of metadata from users. But it’s possible the same metadata hurdles that trip up semantic web search wouldn’t be so problematic for enterprise search. Enterprise content creators are already accustomed to working with information in what Cooper calls a "universal structure" with "the same set of tools and procedures" as their co-workers. Although the metadata is still an extra step,users can experience faster results and benefits upfront, encouraging them to help further define the semantic search structure of their own work and building the system from end to end.
Natural Language in the Enterprise
"Ten years ago, we saw search technologies that were developed for the web, and then applied to the enterprise," says Cooper. "That really isn’t feasible in today’s market." Just making a Google-for-your-intranet product won’t suffice, and attempts to do so haven’t led to an enterprise-search breakthrough yet.
"Pure" natural language enterprise solutions can be added on top of existing content management systems, since they read document language, not invisible metadata. They aren’t billed as complete enterprise search solutions— not yet, anyway. Natural language enterprise search features are being primarily deployed to perform specialized search functions such as broad-based research, knowledge mapping, interacting with consumers, and deep search that mines knowledge from unstructured company data.
In this example, the customer searches for "e700 trouble."InQuira understands the search string in its entirety and first presents a troubleshooting wizard specifically for that product. Other search results show how "trouble" is synonymous with "problem" and returns content specific to the model specified in the search.
InQuira is taking what vice president of product marketing Nav Chakravarti calls a "very precise, R&D-intensive approach to actually understanding meaning." Its Intelligent Search function draws from a predefined ontology, language rules governing standard query language such as "how" and"where," and parameters defining business rules and functions. As a stand-alone tool or part of InQuira’s Information Manager knowledge management system, it helps employees searching across the corporate intranet, but it also brings less-experienced searchers closer to what they’re seeking.
For instance, customer-service centers have used Intelligent Search to answer common customer queries by automatically generating answers from within corporate data sets based on a natural language interpretation ofthe customer’s query language. Live "chats" can also be almost entirely computer-generated when natural language processors understand customers’ direct questions and formulate their own natural language reply. Behind the firewall,Intelligent Search can tap unstructured information in emails or chats."Critical enterprise knowledge tends to exist in unstructured form that makes it hard to harvest without technologies like natural language search solutions that can understand the intent behind the worker’s search and index the searchable information sources for their semantic meaning,"says InQuira’s Jason Hekl,vice president of corporate marketing.
The most exciting natural language search solutions might be right around the corner, if the recent flurry of acquisitions and developments are any indicators. In the midst of its ongoing acrimonious negotiations to acquire Yahoo!, Microsoft announced its purchase of Powerset for $100 million. Combined with its acquisition of Norwegian enterprise search-solution company FAST, some speculate Microsoft could be preparing a new hybrid search model combining the enterprise capability and scalability of FAST with the natural language, intelligent knowledge discovery of Powerset. Major players such as Google and IBM will be watching how Microsoft is able to translate Powerset’s ability to mine Wikipedia into something that can be deployed in a widespread way.
However, natural language search has its limits within enterprise search as it stands today. "Better semantic understanding would not necessarily help with knowing which is the latest version of a document or finding the best content to use in a sales proposal," says Lawrence Lee, director of business development at PARC’s Intelligent Systems Laboratory. Making the corporate content semantically-searchable would require adding searchable metadata to pre-existing information, as well as creating tools that can keep up with any additional content created. "The reason it is not deployable is that it does not scale," says technology analyst Steve Arnold. "What we have is hardware lagging behind algorithms … The technology has to be applied in an extremely focused manner."
Additionally, people might find it hard to break their bad search habits. "There’s no arguing that most people associate the search experience with Google," says Hekl, "and are acclimated to guessing at which combination of words will generate the results they want. It’s a discovery process of sorts—workers may well start with keyword searches, quickly see that the [natural language] results are at least on par with what they had before, and will then be encouraged to be more explicit."
Google has not jumped onto the natural language search bandwagon. The company believes that its approach provides more effective results. With Google OneBox for Enterprise,for example, users can search for highly specific business information and receive real-time information from a variety of sources, such as CRM, ERP, and business intelligence systems.
Even Google thinks the Google-ization of consumer search has raised the bar high on user expectation in the enterprise environment. "They want to give you very little information, they want the result in sub-seconds,and they don’t expect and want to go through three pages" of results, says Google Enterprise’s Mistry. One search box should handle all kinds of searches,from looking specifically within certain sites to telling you exactly how many pints are in a gallon without retrieving a document at all.The consumer version of Google benefits from keyword familiarity and breadth of behavioral context, but the enterprise side of Google is all about wringing context from questions to deliver specific results in a direct answer that cuts to the query’s intent.
"The ultimate search engine," says Mistry, "you’ll ask it what you want, and it will give you that." One of Google’s enterprise product features, OneBox, is borrowed from functionality already familiar to consumer Google enthusiasts. When you search for "Microsoft stock prices," Google.com pulls up the stock’s pertinent statistics for the day first and then lists related documents. Google’s OneBox enterprise tool does the same thing, feeding from live data sets within the intranet. But within a corporate network, where only certain data sets are available, OneBox can infer greater meaning from each search term to bring back pinpointed answers without leaving the search window. For that matter, they don’t need to open a single document.
This isn’t the case for all consumer Google searches— there are just too many different types of searches to prepare canned responses fed constantly by live data. According to Mistry, "20% of queries Google has seen are queries they’ve never seen before. "The enterprise environment isn’t much different, says Hekl. "An analysis of query logs from an enterprise telecommunications site shows that the 20 most frequent unique query strings represent just 9% of all queries. When those same queries are categorized and grouped by their intent, the shape of the curve changes dramatically. The most common intent represents about six per cent of all queries, and the most frequent 20 intents now cover over 41% of all visitor queries." If consumers switch their language to adjust to the shortened long tail of enterprise search, tools such as Google OneBox or InQuira can better map high-priority areas for quick, accurate answers.
Implementation methods are critical. "The most successful companies are very conscious of adoption and invest both in design and promotion that encourage use," says Hekl. "For example, when the search boxes are larger, centrally-located, and labeled ‘Ask,’ use increases. And I’ve seen great examples of companies that put an entire marketing and communication plan in place to promote the rollout of the new search capabilities."
To Arnold,"natural language search" is just a "marketing buzzword." Even natural language search enthusiasts don’t necessarily envision a "pure" solution that meets all enterprise needs just yet. For InQuira’s developers, it’s a question of how people adapt their search habits from online shopping to intranet knowledge retrieval. "Natural language search on its own will not be the panacea to increasing productivity," says Hekl. "Companies that have had the greatest success with natural language search solutions recognize that search informs content strategy, and an integrated analytics mechanism is needed to determine if the search technology and content management tool are working in concert to deliver the expected value to the knowledge worker."
PARC, which spent the last 30 years fine-tuning thetechnology that Powerset licensed, also foresees a hybrid solution. "You still need to index document metadata or offer social tagging features like other enterprise search engines," says PARC’s Lawrence Lee. "The natural language search technologies can be integrated with the more traditional key word and link analysis techniques to provide the best of both worlds." According to Lee, PARC is turning its attention to integrating "semantic search with other search approaches" and looking for commercial partners to focus on enterprise solutions. And PARC’s also looking to the wider web and thinking about how natural language processing and semantic indexing can influence the consumer web, especially by exploring user-generated content.
Whether natural language search’s time has come on the web, one thing is clear: Enterprise search could use some shaking up, and knowledge-centric search might be the answer. The keyword for semantic search is "indexing;" the keyword for natural language search is "understanding." Co-workers don’t reply to questions with documents, they reply with answers. Many question why all search experiences aren’t that simple.
About the Author
Jessica Dye is an Illinois-based freelance writer.