Many enterprise search tools don’t differ from their web-based cohorts: They center around entering a few keywords into a blank box then clicking "search."The search tool then looks for information across one or more repositories and produces a list of results that match the keywords. Most users ignore advanced search functions, which also haven’t significantly changed in many years. While there are certainly exceptions—both advanced searchers and more sophisticated tools are out there—a search process typically produces hundreds of hits, with little way of sorting them beyond relevance ranking. Users are left to find the appropriate information by scrolling through the list of results, hoping to find an item that matches their requirements. If there are multiple meanings to the search keywords, it becomes even more difficult to locate the desired information.
Whether the material is on the web, in an RSS feed, or locked in any number of enterprise application repositories, many of today’s enterprise search tools may not be able to find a relevant item simply because the tool doesn’t have access to the right information. Without doubt, there has to be a better way to do this, and some believe it lies in the burgeoning area of semantic search.
A Question of Semantics
Semantic search differs from traditional keyword search in a number of ways. Instead of looking at the words, the search tool attempts to look at the meaning behind the words. It makes use of ontologies (dictionaries defining the meaning and relationships among words), metadata, and entities (data about data), and it attempts to nail down ambiguities when a term has multiple meanings. It is capable of crossing enterprise applications,databases, a variety of content repositories, the open web, and RSS feeds to produce a set of targeted results.
Linda Moulton, a semantic search expert and an analyst at Gilbane and Co., puts it in basic terms. She says, "The first thing is the query itself, the ability to use natural language or meaningful queries to find content through retrieval software designed to understand linguistically meaningful questions and the target content." The objective is to be able to ask a question and have the search engine understand what it is you are asking for.
Semantic search is a tough concept to nail down, according to Bradley Allen, CTO and founder of Siderean Software, Inc. He says when people talk about semantic search, they are generally talking about one of two things: "One is taking the traditional free text search model and putting some kind of intelligence in the query part of what is going on to interpret what the user is asking for and to map that down to a sophisticated representation of what information is out there, rather than what people call the ‘bag of words’ way of looking at a document."
The other part involves breaking that query down to find the underlying meaning. "Think about semantics as something that gets applied at the indexing [stage] of the search process where we take unstructured information, extract the semantics, and create a more structured way of laying out what things are being talked about in these documents, how they are related to other things in the document, as well as standard concepts and topics that are generally relevant to a particular vertical area or domain of discourse." Allen explains that this allows people to mix free-text search with what’s called faceted navigation (or what Siderean calls relational navigation),and exploit the structure that’s been pulled out of the underlying information. Armed with results of this nature, users can not only find the relevant information, but they can also follow logical paths and make more meaningful connections from the results.
Eric Miller, who is president of Zepheira, a consulting firm that helps companies develop semantic web strategies, and who helped architect the semantic web standards at the World Wide Web Consortium, explains that this is very different from your traditional web search: "It’s not full text searching, it’s the ability to search a database for a particular concept, say ‘semantic web,’ and not only get the relevant results that you might get from a service such as Google, but find people working on semantic web, presentations about semantic web, upcoming events related to it, books, concepts—all of the different kinds of things that allow us to narrow in on the kinds of search strategies we are interested in." Miller points out that, if you search for "semantic web" with a traditional web search tool, you are going to get results based on popularity—which might not always get you the results you are looking for.
Ontologies Define and Organize
One of the key technologies underlying semantic search is the notion of ontologies, which provide a way to map a term to multiple meanings. For example, if a search includes the word "cell"—which could be related to a prison, a biological entity, a part of a spreadsheet, and so forth—a conventional search wouldn’t automatically sort out just which one meaning is relevant to your specific needs, and popularity ranking certainly wouldn’t be of much help.
Moulton defines an ontology as "an assembly of concepts in which all possible relationships that might exist among concepts are explicitly mapped." She adds that the goal of semantic search engines is to be able to understand the meaning based on the context and the ontology. "When there are ontologies for semantic search engines to consult, it’s like looking things up in the dictionary. Here’s the word ‘cell’ and what are all the possible ways in which cell could be related to other words." She says, that armed with this knowledge, a search engine can deliver more accurate results.
Figure Caption: Semantic search engines help generate meaningful results regardless of how the userenters the query
One of the problems with this approach, however, is the difficulty in building ontologies. It can be a time-consuming and expensive exercise, but Edwin Cooper, founder and chief scientist of semantic search vendor InQuira, says his company has helped minimize the pain by generating generic ontologies for specific vertical markets. He explains that the purpose of these ontologies is to represent the way things work in the real world.
"In general, our approach to semantic search is to have that form or representation have an ontology representing the real world in a variety of different industries. For example, we have an ontology for telecommunications, one in the automotive industry, and one for financial services." He adds that this approach can help nail down the exact document even when the query is not spot-on.
Figure Caption: In this search, users can refine results by author, coverage area, and user rating to narrow results, or they can pivot the result set to other related views for new insight into the data. Moreover, the tool can distinguish the difference between "(Jack) London" the author; and "London (England)"the city.
"When we are dealing with the entire web—everything out there—there is a pretty good chance something matches your query exactly and you might not need to understand any of the synonyms or relate it to a specific piece of content. On the other hand, when you are in a particular industry vertical, there may very well be a piece of content that exactly suits what you are looking for, but it may not be an exact match to the question and you might have to do more analysis and get a deeper understanding both of the content and of the question to be able to link those two together, especially in the limited content set of these enterprise verticals," Cooper says.
Tim Shettler, VP of marketing at InQuira, cites Honda’s website as a good example of using synonyms in an ontology to link car colors to search queries. Most car companies use unusual names to represent car colors, ones that the visitor might not know, so if a user enters "gold" into the InQuira search engine, the results come back with sunset metallic, the closest color to gold, rather than telling the user no such car color exists. By building some intelligence into the search process, it produces meaningful results, making the site far more usable.
Download the free PDF with all the illustrations.
Metadata and Entities Facilitate Machines’Communication
Another factor in empowering semantic search is metadata, information about the content and entities, which helps break down information into meaningful categories, such as name, product, industry, and so forth.These items help further separate semantic search from keyword search, while also providing a means for machine-to-machine communication, which enables the search engine to search multiple repositories.
Gary Price, editor of ResourceShelf and director of online information resources at Ask.com, defines metadata as using a controlled vocabulary to decide what the record or document is about. Another way of looking at it, according to Chris Davis, CEO of semantic search company Semantra, is to think of metadata as "data about data."
Siderean’s Allen says that while search is first and foremost a human activity, using metadata does help to search across different data sources. "By using semantics you can create overlaying organization coming from disparate sources like structured database, content repositories, RSS feeds, and so forth, and create semantic structures to consider them. Semantics give people a way of organizing information across different sources." He says this enables users to leverage this information to extract data without having to go through pages of results. Like ontologies, creating quality metadata is not easy, but Allen believes that there are ways to simplify the process. He says,"The challenge we face is building products so that burden is lifted off of IT and brought on the machine where it belongs."
Figure Caption: Usingmetadata enables users to better find content located in multiple repositories.Semantra uses this technology to extract data from structured databases.
One of the ways Siderean and others achieve this is through a process called entity extraction, which looks for broad categories called"entities" and applies the metadata to the entities. Russell Glass,VP of products and marketing at ZoomInfo, a company that uses semantic search techniques to extract business information, says his company uses entities to build richer results. "We crawl the same pages [as a conventional search engine], but instead of caring about keywords, we care about entities. When we look at a set of keywords, we see that keyword is actually referring to person, company,title, or an industry and we start to aggregate these entities and build information around them. And the semantic nature is that we are including extra data on top of keywords to help us understand where they are." For instance, if the keyword is "Joe Smith," they see he is a person who works at Ford Motor Co., which is in the auto industry, and so forth.
Finding Meaning
Imagine entering a search term or question and having the search engine understand the query, break down thes emantic subtleties, and produce a set of results organized and divided further by possible meaning. Moulton says there is currently no such search tool out there that does this explicitly, but that is the ultimate goal of semantics search researchers. They hope to make the search experience approximate the library experience, in which users will browse the possibilities in an organized way to find the types of information they are seeking without having to know any special rules. She says this goes far beyond relevance ranking and the navigation model we use today.
"True semantic search is the idea that nobody has to know any special syntax.You can just start asking questions and by the way you ask the questions and the contextual material you include in the query part, you are going to get back richer results that are right on target. It’s not going to be relevance-ranked. It’s going to be ranked according to the question you asked. That’s really what semantic search is," says Moulton.
She adds that this way of searching more closely mirrors a time-honored information-seeking process. According to Moulton, "It’s beginning to approximate the whole process of the reference model that a librarian would engage in by helping to narrow down the results by entering into a dialog with the person asking the questions. I think the concept of language here is really huge."
With semantic search you can browse the content, so not only can you find the most relevant information, you can see the gaps in content,something that wouldn’t be possible with today’s search technology. In a sense,it brings back the browsing of the search directory, but instead of just humans editing the directory,it adds a level of intelligence unsurpassed in the current generation of search tools.
Semantic search provides the user with the ability to engage in an extended process of finding and following connections. As Allen describes it, "This is a much broader set of activities than relevance ranks can accommodate." He adds that semantics expose relationships that enable the user to find a set of documents that specifically speak to the search query.
Moulton explains that no search vendor is close to achieving true semantic search, but there are many examples out there now of search tools using semantic elements. As search engines become more intelligent, semantic search will enable enterprise users (and consumers) to move beyond keywords to find the information that matters most to them without the constraints of current search technology. We may have a ways to go, but the possibilities are quite intriguing.
Download the free PDF with all the illustrations.
About the Author
RON MILLER (ronsmiller@ronsmiller.com) is a freelance technology writer based in Massachusetts.