EnterpriseSearchCenter.com Home
  News   Features   White Papers   Research Reports   Web Events   Conferences  
 
RESOURCES FOR EVALUATING ENTERPRISE SEARCH TECHNOLOGIES
October 15, 2008

Table of Contents

Semantic Search Takes Root IN THE ENTERPRISE
Google Acquires Korean Blogging Software Company TNC
Expert System Introduces New Contextual Advertising Solution
Semantra 2.5 Searches for Business Intelligence
Science.gov 5.0 Released
Caringo Announces CAStor Content Router
Kalido Announces Subscription-Based Offering
Time Inc. and Getty Images Jointly Launch LIFE.com
IBM Expands eDiscovery Portfolio
Exterro and kCura Integrate
Exalead announces new product line

Semantic Search Takes Root IN THE ENTERPRISE

Many enterprise search tools don’t differ from their web-based cohorts: They center around entering a few keywords into a blank box then clicking "search."The search tool then looks for information across one or more repositories and produces a list of results that match the keywords. Most users ignore advanced search functions, which also haven’t significantly changed in many years. While there are certainly excep­tions—both advanced searchers and more sophisticated tools are out there—a search process typically produces hundreds of hits, with little way of sorting them beyond relevance ranking. Users are left to find the appropriate information by scrolling through the list of results, hoping to find an item that matches their requirements. If there are multiple meanings to the search keywords, it becomes even more difficult to locate the desired information.

Whether the material is on the web, in an RSS feed, or locked in any number of enterprise application repositories, many of today’s enterprise search tools may not be able to find a relevant item simply because the tool doesn’t have access to the right information. Without doubt, there has to be a better way to do this, and some believe it lies in the burgeoning area of semantic search.

A Question of Semantics

Semantic search differs from traditional keyword search in a number of ways. Instead of looking at the words, the search tool attempts to look at the meaning behind the words. It makes use of ontologies (dictionaries defining the meaning and relationships among words), metadata, and entities (data about data), and it attempts to nail down ambiguities when a term has multiple meanings. It is capable of crossing enterprise applications,databases, a variety of content repositories, the open web, and RSS feeds to produce a set of targeted results.

Linda Moulton, a semantic search expert and an analyst at Gilbane and Co., puts it in basic terms. She says, "The first thing is the query itself, the ability to use natural language or meaningful queries to find content through retrieval software designed to understand linguistically meaningful questions and the target content." The objec­tive is to be able to ask a question and have the search engine understand what it is you are asking for.

Semantic search is a tough concept to nail down, according to Bradley Allen, CTO and founder of Siderean Software, Inc. He says when people talk about semantic search, they are generally talking about one of two things: "One is taking the traditional free text search model and putting some kind of intelligence in the query part of what is going on to interpret what the user is asking for and to map that down to a sophisticated representation of what information is out there, rather than what people call the ‘bag of words’ way of looking at a document."

The other part involves breaking that query down to find the underlying meaning. "Think about semantics as something that gets applied at the indexing [stage] of the search process where we take unstructured information, extract the semantics, and create a more structured way of laying out what things are being talked about in these documents, how they are related to other things in the document, as well as standard concepts and topics that are generally relevant to a particular vertical area or domain of discourse." Allen explains that this allows people to mix free-text search with what’s called faceted navigation (or what Siderean calls relational navigation),and exploit the structure that’s been pulled out of the underlying information. Armed with results of this nature, users can not only find the relevant information, but they can also follow logical paths and make more meaningful connections from the results.

 

 

Eric Miller, who is president of Zepheira, a consulting firm that helps companies develop semantic web strategies, and who helped architect the semantic web standards at the World Wide Web Consortium, explains that this is very different from your traditional web search: "It’s not full text searching, it’s the ability to search a database for a particular concept, say ‘semantic web,’ and not only get the relevant results that you might get from a service such as Google, but find people working on semantic web, presen­tations about semantic web, upcoming events related to it, books, concepts—all of the different kinds of things that allow us to narrow in on the kinds of search strategies we are interested in." Miller points out that, if you search for "semantic web" with a traditional web search tool, you are going to get results based on popularity—which might not always get you the results you are looking for.

Ontologies Define and Organize

One of the key technologies underlying semantic search is the notion of ontologies, which provide a way to map a term to multiple meanings. For example, if a search includes the word "cell"—which could be related to a prison, a biological entity, a part of a spreadsheet, and so forth—a conventional search wouldn’t automatically sort out just which one meaning is relevant to your specific needs, and popularity ranking certainly wouldn’t be of much help.

Moulton defines an ontology as "an assembly of concepts in which all possible relationships that might exist among concepts are explicitly mapped." She adds that the goal of semantic search engines is to be able to understand the meaning based on the context and the ontology. "When there are ontologies for semantic search engines to consult, it’s like looking things up in the dictionary. Here’s the word ‘cell’ and what are all the possible ways in which cell could be related to other words." She says, that armed with this knowledge, a search engine can deliver more accurate results.

Figure Caption:  Semantic search engines help generate meaningful results regardless of how the userenters the query

One of the problems with this approach, however, is the difficulty in building ontologies. It can be a time-consuming and expensive exercise, but Edwin Cooper, founder and chief scientist of semantic search vendor InQuira, says his company has helped minimize the pain by generating generic ontologies for specific vertical markets. He explains that the purpose of these ontologies is to represent the way things work in the real world.

"In general, our approach to semantic search is to have that form or representation have an ontology representing the real world in a variety of different industries. For example, we have an ontology for telecommunications, one in the automotive industry, and one for financial services." He adds that this approach can help nail down the exact document even when the query is not spot-on.

Figure Caption:  In this search, users can refine results by author, coverage area, and user rating to narrow results, or they can pivot the result set to other related views for new insight into the data. Moreover, the tool can distinguish the difference between "(Jack) London" the author; and "London (England)"the city.

"When we are dealing with the entire web—everything out there—there is a pretty good chance something matches your query exactly and you might not need to understand any of the synonyms or relate it to a specific piece of content. On the other hand, when you are in a particular industry vertical, there may very well be a piece of content that exactly suits what you are looking for, but it may not be an exact match to the question and you might have to do more analysis and get a deeper understanding both of the content and of the question to be able to link those two together, especially in the limited content set of these enterprise verticals," Cooper says.

Tim Shettler, VP of marketing at InQuira, cites Honda’s website as a good example of using synonyms in an ontology to link car colors to search queries. Most car companies use unusual names to represent car colors, ones that the visitor might not know, so if a user enters "gold" into the InQuira search engine, the results come back with sunset metallic, the closest color to gold, rather than telling the user no such car color exists. By building some intelligence into the search process, it produces meaningful results, making the site far more usable.


Download the free PDF with all the illustrations.

 

 

Metadata and Entities Facilitate Machines’Communication

Another factor in empowering semantic search is metadata, information about the content and entities, which helps break down information into meaningful categories, such as name, product, industry, and so forth.These items help further separate semantic search from keyword search, while also providing a means for machine-to-machine communication, which enables the search engine to search multiple repositories.

Gary Price, editor of ResourceShelf and director of online information resources at Ask.com, defines metadata as using a controlled vocabulary to decide what the record or document is about. Another way of looking at it, according to Chris Davis, CEO of semantic search company Semantra, is to think of metadata as "data about data."

Siderean’s Allen says that while search is first and foremost a human activity, using metadata does help to search across different data sources. "By using semantics you can create overlaying organization coming from disparate sources like structured database, content repositories, RSS feeds, and so forth, and create semantic structures to consider them. Semantics give people a way of organizing information across different sources." He says this enables users to leverage this information to extract data without having to go through pages of results. Like ontologies, creating quality metadata is not easy, but Allen believes that there are ways to simplify the process. He says,"The challenge we face is building products so that burden is lifted off of IT and brought on the machine where it belongs."

Figure Caption:  Usingmetadata enables users to better find content located in multiple repositories.Semantra uses this technology to extract data from structured databases.

One of the ways Siderean and others achieve this is through a process called entity extraction, which looks for broad categories called"entities" and applies the metadata to the entities. Russell Glass,VP of products and marketing at ZoomInfo, a company that uses semantic search tech­niques to extract business information, says his company uses entities to build richer results. "We crawl the same pages [as a conventional search engine], but instead of caring about keywords, we care about entities. When we look at a set of keywords, we see that keyword is actually referring to person, company,title, or an industry and we start to aggregate these entities and build information around them. And the semantic nature is that we are including extra data on top of keywords to help us understand where they are." For instance, if the keyword is "Joe Smith," they see he is a person who works at Ford Motor Co., which is in the auto industry, and so forth.

Finding Meaning

Imagine entering a search term or question and having the search engine understand the query, break down thes emantic subtleties, and produce a set of results organized and divided further by possible meaning. Moulton says there is currently no such search tool out there that does this explicitly, but that is the ultimate goal of semantics search researchers. They hope to make the search experience approx­imate the library experience, in which users will browse the possibilities in an organized way to find the types of infor­mation they are seeking without having to know any special rules. She says this goes far beyond relevance ranking and the navigation model we use today.

"True semantic search is the idea that nobody has to know any special syntax.You can just start asking questions and by the way you ask the questions and the contextual material you include in the query part, you are going to get back richer results that are right on target. It’s not going to be relevance-ranked. It’s going to be ranked according to the question you asked. That’s really what semantic search is," says Moulton. 

She adds that this way of searching more closely mirrors a time-honored information-seeking process. According to Moulton, "It’s beginning to approximate the whole process of the reference model that a librarian would engage in by helping to narrow down the results by entering into a dialog with the person asking the questions. I think the concept of language here is really huge."

With semantic search you can browse the content, so not only can you find the most relevant information, you can see the gaps in content,something that wouldn’t be possible with today’s search technology. In a sense,it brings back the browsing of the search directory, but instead of just humans editing the directory,it adds a level of intelligence unsurpassed in the current generation of search tools.

Semantic search provides the user with the ability to engage in an extended process of finding and following connections. As Allen describes it, "This is a much broader set of activities than relevance ranks can accommodate." He adds that semantics expose relationships that enable the user to find a set of documents that specifically speak to the search query.

Moulton explains that no search vendor is close to achieving true semantic search, but there are many examples out there now of search tools using semantic elements. As search engines become more intelligent, semantic search will enable enterprise users (and consumers) to move beyond keywords to find the information that matters most to them without the constraints of current search technology. We may have a ways to go, but the possibilities are quite intriguing.


Download the free PDF with all the illustrations.


About the Author

RON MILLER (ronsmiller@ronsmiller.com) is a freelance technology writer based in Massachusetts.

 

 

 

 

Back to Contents...

Google Acquires Korean Blogging Software Company TNC

Google has bought Korea-based blogging software provider TNC. TNC (full name Tatter and Company) is a Korean blog specialty company. It has been reported that TNC received an undisclosed amount of funding from Softbank Ventures’ Ranger Fund, which invests in Web startups. TNC founder, Chang Kim, also claims that TNC supplies blogging software and services to more than 400,000 users.


(www.google.com, www.tnccompany.com)

Back to Contents...

Expert System Introduces New Contextual Advertising Solution

Expert System, a provider of semantic software, announced the launch of COGITO Advertiser, a tool that automatically processes the meaning of text to help ensure ad placement is relevant to its assigned web pages. Using semantic intelligence, COGITO Advertiser analyzes the text on each page and ensures ads are placed appropriately in an effort to increase click-through rates. COGITO analyzes web pages to identify the most relevant topics and classifies content by assigning the category related to the text in real time. COGITO Advertiser provides details on users of specific web sites, so companies can tailor their messaging, plan for future investments, and coordinate timing of ad placements.

(www.expertsystem.net)

Back to Contents...

Semantra 2.5 Searches for Business Intelligence

On Oct. 1, natural language search company Semantra will release Semantra 2.5 for Microsoft Dynamics CRM. With this new product, Semantra seeks to bring effective, efficient search capabilities to the realm of business intelligence (BI). Semantra 2.5, which sits on top of Microsoft Dynamics CRM, streamlines searching within that enterprise application, with capabilities including summary analytics, linking results back to the CRM, exporting results into Excel, and dynamically adjusting results sets. This set of functions and Semantra’s streamlined search interface are designed to simplify business intelligence and analytics so that every member of a sales team, irrespective of their technical expertise, has access to as much valuable BI information as possible.


"Sales guys are generally not all that technically savvy," says Cody Aufricht, VP of marketing for Semantra. "And they generally don’t have a lot of patience." By providing a search capability within Microsoft Dynamics CRM that looks a lot like the single search box web search that they’re used to, Semantra 2.5 hopes to break down some of the barriers between information users and their critical corporate data. Organizations that were previously forced to rely on a handful of IT or BI specialists to make decisions based on CRM data can now rely on users of all levels to have access to the same important information and make informed decisions.


A May 2008 report from Forrester by Boris Evelson and Matthew Brown titled "Search + BI + Unified Information Access" pointed out the need in the BI marketplace for a product with natural language capabilities. "Most casual BI users can’t translate a business question such as ‘What is the value of our sales pipeline in the first half of the year?’ into the technical language," the report claims. "Semantra specializes in this functionality. Rather than learn an entire application, a sales rep can type into a search box a question in the form of a statement like ‘list accounts scheduled to close before July 2008’ to generate a list of opportunities from a customer relationship management (CRM) system."


According to the company, Semantra 2.5 can be installed in about 15 minutes, and it can be used right out of the box. However, Semantra’s Aufricht says that’s not how the product works best. Because every industry and every company has its own unique terminology and nomenclature, a couple of weeks spent developing and refining the product’s ontology will greatly improve the accessibility of information in a company’s CRM database.


Semantra asserts that its partnership with Microsoft on Semantra 2.5 does Microsoft as much good as it does for Microsoft’s Dynamics CRM customers. Microsoft’s benefits from the collaboration include an increased number of licenses sold as a result of the enlarged userbase for Dynamics CRM, differentiation from other CRM products, and an ad hoc analytics standard for all Dynamics and Platform products. Over the course of the coming year, Semantra intends to expand its natural language BI search offerings to work with products from Oracle and SAP.


By adhering to the "paradigm of the search box," Semantra is making the promise of business intelligence come true, Aufricht says. "There’s always been a phenomenon in offices where decisions are made based on who has the most compelling argument. Semantra 2.5 gives everyone in an organization easy, rapid access to the same important information, meaning business decisions can be made with confidence in real time."


(www.semantra.com)

Back to Contents...

Science.gov 5.0 Released

The latest version of Science.gov—Science.gov 5.0—has been launched. Science.gov 5.0 allows users to search additional collections of science resources, target their searches, and find links to information on a variety of science topics. Science.gov 5.0 offers seven new databases and portals which allow researchers access pages of scientific information. Another feature of Science.gov 5.0 is a "clustering" tool which helps target searches by grouping results by subtopics or dates. Science.gov 5.0 now provides links to related EurekAlert! Science News and Wikipedia, and provides the capability to download research results into personal files or citation software.

(www.science.gov)

Back to Contents...

Caringo Announces CAStor Content Router

Caringo, Inc., a provider of content storage software providing clustered storage infrastructure for both active and archive content, announced the availability of CAStor Content Router. The Content Router allows administrators and application providers the ability to set metadata values to mirror content in clusters and replicate over distance for disaster recovery (DR). The Content Router leverages a rules engine to filter metadata that defines how to process and where to store the content based on metadata values. Rules can be set to distribute content to specified remote clusters. Applications can automatically fail-over to a remote cluster in an emergency to ensure continuous data availability and accessibility.

(www.caringo.com)

Back to Contents...

Kalido Announces Subscription-Based Offering

Kalido, an  information management company, announced a new subscription-based offering of the Kalido Information Engine that delivers enterprise-class business intelligence (BI) infrastructure to midsize companies. Subscription Offering Highlights include the Kalido Business Information Modelerthe Kalido Dynamic Information Warehouseand the Kalido Universal Information Directorfor Microsoft. Kalido can also be used with Microsoft Excel. The Kalido-based subscription offering enables companies to build, deliver, and deploy enterprise-class BI by paying a monthly subscription fee.

(www.kalido.com)

Back to Contents...

Time Inc. and Getty Images Jointly Launch LIFE.com

Time Inc. and Getty Images will jointly launch LIFE.com. LIFE.com will be owned and operated by Time Inc. and Getty Images, and will provide access to iconic and professional photography collections available online. LIFE.com will offer access to new photographs from Getty Images' photographers, including today's news, entertainment, sports, celebrities, travel, animals, and many others. Consumers will also have access to images from LIFE magazine. The new site, which launches in early 2009, was designed with Getty Images' search technology. Searching for and viewing images on the site will be free.

(www.timeinc.com, www.gettyimages.com)

Back to Contents...

IBM Expands eDiscovery Portfolio

IBM announced eDiscovery Analyzer, a new conceptual search and content analysis software to help legal professionals accelerate eDiscovery as they collect and assess evidence, agree upon search terms, and cull information for use in legal cases. IBM eDiscovery Analyzer, together with the recently announced IBM eDiscovery Manager, provide the foundation for an in-house eDiscovery platform, enabling organizations to control electronic information and respond to litigation. Using the scalable IBM Enterprise Content Management (ECM) platform, IBM eDiscovery collects, organizes, manages, and retrieves relevant enterprise information.

(www.ibm.com)

Back to Contents...

Exterro and kCura Integrate

September 24, 2008

Exterro, a provider of legal hold and discovery management software solutions for the legal industry, and kCura, a software company providing the web-based litigation support platform Relativity, announced the integration between their flagship products, Exterro Fusion & Relativity. The combined toolset will allow corporate or law firm legal teams to manage the discovery process from identification, legal hold/preservation, review, and production of documents. Exterro Fusion’s Legal Hold and Discovery Workflow Management software solutions provide legal teams with an end-to-end workflow, which centralizes discovery management. Relativity provides image and native file review, searching, coding options, work-flow capabilities, unicode support, and conceptual searching.

(www.exterro.com)

Back to Contents...

Exalead announces new product line

Exalead has introduced the CloudView family of products, which is slated for availability beginning in Q4 2008. The solutions have evolved from the Exalead one:enterprise line with a new architecture designed to better adapt to the scaling requirements of today’s enterprise.

The company believes business users need a new generation of information access systems that bring relevance and context in addressing their information requirements. The industrywide shift to housing data in a large number of silos, which Exalead refers to as "data clouds," creates the demand for information access platforms with better connectivity, interoperability and scalability. As a result, IT executives will have the control they require, but also the adaptability and flexibility to address the needs of the business.

Exalead reports the CloudView product line will offer:

  • a fully distributed architecture with centralized administration,
  • business-level tuning and management of the search experience,
  • ability to extend business intelligence applications to textual search,
  • streamlined administration UI for greater use of search tools in the enterprise,
  • full traceability within the product,
  • WYSIWYG configuration of indexing and search workflows,
  • advanced configuration management system (with built-in version control),
  • improvements in the relevancy model, and
  • provision for additional connectors with simple and advanced APIs for third-party implementations.

Back to Contents...
 
[Newsletters] [Home]

Problems with this site? Please contact the webmaster. | About ITI | Privacy Policy