Enterprise Search Center

RESOURCES FOR EVALUATING ENTERPRISE SEARCH TECHNOLOGIES

September 17, 2008

Table of Contents

Fight Search Fallacies with Well-Formed Content

Automated document organization

X1 takes on SharePoint Search

Emantix brings value to enterprise content

NextLabs Announces Solution to Address Data Loss Due to Misdirected Email

Kalido Announces New Release

Talisma improves search in V. 8.

XyEnterprise announces ContentaView 3.6

Northern Light Launches Enterprise Market Research Portal

Nexidia Announces Release of Enterprise Speech Intelligence 7.0

Mining benefits for readers and newspapers alike

Salesforce.com acquires InStranet

Interwoven and LexisNexis Team Up

Recommind Announces eDiscovery 3.0

Fight Search Fallacies with Well-Formed Content

Most organizations purchase a search system based on a set of goals. They want to save time that staff spends searching through mountains of unstructured "stuff" (meaning not tagged, organized, nor fielded full text), and they want to preserve knowledge— dare we say wisdom—of retiring staff members who hold the most information about what the organization does.

Unfortunately, many organizations fall prey to some common fallacies:

1. Full-text search is sufficient for good results.

2. It’s possible to get good search results without structure/metadata/well-formed data.

3. Most search engines automatically know how to make the most of available structure.

4. Taxonomy modules within search engines support a full-featured taxonomy/thesaurus.

What can happen at this point is that the goals become secondary to the software purchase. The software is bought and then implementation is attempted. This is backward. The overall intellectual property (IP) strategy should not be based on the selection of a search solution. Rather, careful consideration of the uses, content, context, and structure of the organization’s current data can make a big difference in the selection of a search tool—one that will really work.

Final Piece of a Big Puzzle

The search software should be the last thing you purchase in an information solution investment, not the first. Most companies overrely and overinvest in search software as the solution to their information management problems. They treat search software as the centerpiece of a strategy when it should play a much smaller role. In fact, search technology, in its current state, is creating problems rather than solving them.

In the last few years,since search has been widely available within companies for their employees, the amount of time spent searching has increased to the point where staff now spends more time searching than any other job function—up to 15 hours per week. More time is spent on searching than on thinking about and analyzing the information staff memebers finally retrieve. Not only researchers and information professionals are affected, but those in administration, accounting, human resources, and other departments are facing the same problems. This costs hundreds of millions of dollars per year for large companies. Desperately seeking a better solution, the average large organization has not one but four search software systems and is dissatisfied with all of them.

>So what should happen? Ask yourself the following:

What kind of information are searchers looking for?
How is it organized now?
What should the answers look like?
Where is the search software to be used?
Is your new search engine for internal use, a public website, or both?
Is the data really unstructured? Does it have a title? A date? (MS Word files, for example, contain elements of structured metadata that has not been captured.)

If you want the general public coming to your website to have the same experience it gets at most sites, then you can do it all with most any search engine on the market. For a basic search solution that meets low-level expectations, consider MySQL, Postgres,or Lucene,which are free.They are also under the hood of many for-fee systems on the market.

However, inside the enterprise, you certainly want users to have an above-average experience, with rich, accurate, and fast search. This requires a robust metadata and taxonomic strategy deployed with the appropriate tools.

Get Organized

Most search engines these days have some categorization and taxonomy capabilities. The reason for this is simple: Search engines don’t work very well without them. Most search software vendors have resorted to kludging on some sort of categorization functionality or rules system to get to an acceptable accuracy rate. Most don’t really believe in taxonomies or thesauri or metadata. They are steeped in the theoretical basis that these add-ons are unnecessary. They are seduced by the thought that human eyes never need to touch the data to make it searchable. However, since it doesn’t work that way in real life, they begrudgingly provide a pseudo taxonomy system with minimal features.

Some of the biggest names in the search software industry have a long and colorful history of publicly denouncing thesauri as worthless and dead. Now they claim to have supported them all along. Search is a complex undertaking with many variables in implementation and many constituencies to serve! If you want your colleagues to be able to develop new products from your treasure house of intellectual property, then you will also need a robust taxonomic strategy deployed with the appropriate tools.

Unfortunately, information technology (IT) departments purchase new search software with little consideration of how to make it actually work and, more importantly, how it is going to work with your content. The search software alone will not provide the results hoped for unless the data is "well-formed." To further aggravate the situation, many of the search software companies convince the buyer in IT that the software will work with minimal training and "automatically" tag the data. No need, they say, to add that to the budget. In fact, it takes up to 1 hour per term searched to build the training sets. Most organizations need about 5,000 to 6,000 terms to cover their intellectual holdings. That means you have 6,000 staff hours—3 years of staff time—invested before the search can begin to work. This substantial investment in time and money is conveniently not included in the purchase estimate. No wonder many vendors make most of their money on the "associated services" and not on software sales.

Search software alone will not accomplish organizational retrieval goals. Consider the issue of persistence of retrieved documents among search results. Most search engines will automatically place a document into one or more categories with about 50% accuracy. That means half of the data is correct for the query and half of it is wrong. What a waste of time. Take the document out of a particular search environment and the original categorization is lost. When changes are made to the search system, everything gets moved to new category folders. This lack of persistence and precision frustrates users. Researchers want precision (exactly what they want), recall (all of what they want), and relevance (the data matches their request). This can also be expressed in hits (exactly what humans would choose), misses (stuff they wanted that was not retrieved), and noise (stuff the computer suggests that they didn’t want). Searchers want to feel confident they got all the relevant data and not a lot of extraneous junk.

Get Your Content in Shape

Under the hood of a search engine is a set of algorithms or logic statements. The kinds of search algorithms used to build and implement a search software system vary widely. There are Bayesian engines, inference, vector, ranking, natural language processing and its parts (semantic, syntactic, phraseological, morphological, grammatical, common sense, etc.), co-occurrence, clustering, sequel rules, neural networks, etc. How does one create content to work with so many different search system options? The basic concepts to support well-formed content while ensuring that the end user will be able to find your data easily, quickly, and accurately have not changed. Good search depends on well-formatted, well-formed data.

Look at your data. Decide how it is currently organized and how you would like it to be organized.What elements do you want to be able to search on? Create a sample of your content identifying those elements. Then investigate which search engines will work with your data. Don’t try to squeeze your data into a tool that is not made for it. Understand the critical elements that result in search success, without being confused by inaccurate claims and being misled into believing fallacies.

About the Author

MARJORIE M.K.HLAVA is president, chairman, and founder of Access Innovations,Inc. (www.accessinn.com). She is past president of NFAIS and the American Society for Information Science and Technology. Hlava has done extensive research and given numerous presentations domestically and internationally on thesaurus development, taxonomy creation, natural language processing, machine translations, and machine-aided indexing.

Back to Contents...

Automated document organization

iCONECT, a provider of litigation support and collaboration software, and Content Analyst, which develops conceptual search and advanced text analytics software, announced an integration that automatically identifies related documents and groups them into folders in iCONECT software so a document collection is organized in category clusters and ready for review.

Content Analyst works by analyzing entire documents--based on concepts vs. keyword or search terms--so it will identify appropriate categories and appropriate documents even if key terms aren't present in those documents. This technology is also resistant to typographical and transcription errors--like OCR errors--so documents can be correctly grouped and clustered despite inadvertent or even deliberate misspellings.

Back to Contents...

X1 takes on SharePoint Search

X1 Technologies will unveil two new Content Connectors for customers who require a unified enterprise search solution that encompasses Microsoft SharePoint 2007.

X1 reports the first to be generally available is the new X1 Content Connector for Microsoft SharePoint 2007 that works in conjunction with the X1 Enterprise Server to provide a centralized search solution for the enterprise. This new Content Connector goes beyond the functionality of the Connector for Microsoft SharePoint 2003, says X1, by indexing all data stored within SharePoint while providing a unified (federated) method for searching data both within SharePoint and throughout the enterprise. Enterprises can search for any type of content that may reside in SharePoint, e-mail, personal computers, file servers or enterprise applications. The new server-based Content Connector provides indexing and full X1 search capabilities to all data (over 400 file types) within Microsoft SharePoint 2007.

Along with the server-based Content Connector for Microsoft SharePoint 2007 is the new client-side Content Connector, now available for beta, which allows users of the X1 Professional Client to directly query the Microsoft SharePoint 2007 search engine without the need for a server component. The difference with this direct query to SharePoint 2007 vs. the server-based connector is in the indexing and controls over the viewable data. The Content Connector for the X1 Professional Client will be limited to the access provided by Microsoft SharePoint query protocols and does not provide the full X1 query language capabilities.

Back to Contents...

Emantix brings value to enterprise content

Emantix has announced a new software platform that automatically determines meaning, context and relevance in content and digital media. The platform uses semantic and statistical methods to analyze millions of words to find the precise definition of each word in any given content. The Emantix platform also assembles a "concept set" of words that may not occur in the content but are highly relevant to the content's context.

The enterprise benefits of the Emantix platform include an enhanced user experience and better interaction with enterprise content. Connections to related information carry rich additional data so that only highly relevant linkages and associations are made. Content can be effectively navigated for knowledge discovery, facilitating learning and expertise sharing. Enterprise search is enhanced by empowering users to refine search queries from a highly relevant set of predetermined concepts.

The Emantix platform can also benefit content external to the enterprise by providing precise associations for advertisers, ensuring that the only advertising displayed is meaningful and relevant to the content. This not only increases effectiveness, click-through and conversion rates, but builds brand loyalty as well.

Back to Contents...

NextLabs Announces Solution to Address Data Loss Due to Misdirected Email

NextLabs, Inc., a provider of policy-driven information risk management software, announced the availability of Enterprise DLP 3.5, the latest release of its host-based data loss prevention system that proactively applies identity-driven policy to prevent external data leaks, conflict of interest, insider risk, and unauthorized data sharing between employees, trusted partners and customers to achieve continuous compliance.

New capabilities in enterprise DLP 3.5 include: Enterprise DLP 3.5 applies Identity-Driven policy to detect cases of misdirected email, interactively warn the sender who is able to self-remediate before the email is sent. Enterprise DLP 3.5 extends the NextLabs endpoint DLP solution with the addition of the Removable Device Enforcer for Windows, which provides granular control over the type, brand, and model of removable devices that can be used at an endpoint. The new version also provides enhanced prevention of data loss from document hidden data, role-based dashboards and enhanced reporting.

(www.nextlabs.com)

Back to Contents...

Kalido Announces New Release

Kalido, the active information management company, announced a new release of the core components of the Kalido Information Engine. The new release includes enhancements such as new support for VMware, the ability to run multiple Kalido instances within a single Oracle database, and the ability to validate a subset of master data. Changes to the Kalido Business Information Modeler enables businesses to identify, change, and expand details about their business model. Enhanced Management Capabilities: Kalido Dynamic Information Warehouse and Kalido Master Data Management provide users with control over physical database configuration. This release of the Kalido Dynamic Information Warehouse will include native support for the business intelligence tool QlikView.

(www.kalido.com)

Back to Contents...

Talisma improves search in V. 8.

Talisma, a provider of customer interaction management software, has introduced Version 8.1 of the Talisma Knowledgebase, which includes significant enhancement to its search capabilities.

Talisma reports that search improvements include:

concept matching, which builds a conceptual understanding of content and ranks search results by relevance to the user’s query;
automatic summarization, which adds value to search results by dynamically building and displaying summaries for each document returned, making it easier for users to identify the right document before they click to view it;
a new auto recommend feature, which automatically builds hyperlinks between related documents based on its conceptual understanding of content, eliminating manual linking and maintenance; and
a link search results feature, which takes users directly to the relevant section of a document, even extremely long documents, and large PDFs.

The company adds that the Talisma Knowledgebase search functionality includes both federated and progressive search features that give users access to remote sites and repositories and powerful tools for refining searches on the fly, as well as spelling correction.

Back to Contents...

XyEnterprise announces ContentaView 3.6

XyEnterprise has released ContentaView 3.6, an update to its software solution suite for the delivery of interactive, textual and multimedia content via CD-ROM or the Web.

This new version features a powerful Web-based update server, providing a universal means for updating all digital publications with a simple Internet connection. The company explains that the updated server in ContentaView 3.6 makes it fast and easy to push content updates from the server to the desktop and ensures that users are working with the correct version of their publication. Further, it enables unattended updates to be performed at pre-determined intervals. Because these new features in ContentaView 3.6 are Web-based, they do not require any additional hardware or software.

Also included in ContentaView 3.6 is a tracking mechanism that compiles an audit trail of every user who connects to the server and every update that is downloaded. This tracking mechanism provides up-to-date information about all users along with the publications, versions and historical information in each client record.

Additional capabilities of ContentaView 3.6 include a new generic export tool, which allows users to export the XML into any desired file format, such as RTF, Excel, Word, CSV and plain text, making it easier to update files and accommodating users not familiar with XML. Also included are improved search capabilities that permit users to search for full-text or categorical content, using phrase searching, search stemming, search excerpts or saved searches.

Back to Contents...

Northern Light Launches Enterprise Market Research Portal

Northern Light, provider of strategic research portals, business research content, and search technology to global enterprises, launched a hosted market research portal specifically geared to help companies identify sales opportunities with federal, state, and local governments throughout the United States. The SinglePoint Government Opportunities Edition indexes government buying intention announcements, reports, and news stories from research sources specializing in government procurement. Powered by Northern Light’s search engine, SinglePoint Government Opportunities Edition provides a search index of targeted government RFIs, RFPs, and contract awards. SinglePoint features MI Analyst, an application designed specifically for market intelligence, market research, and product research. Another option, SinglePoint Connects, is a series of "Web 2.0" capabilities that enables users to collaborate online in a variety of ways within the portal environment.

(www.northernlight.com)

Back to Contents...

Nexidia Announces Release of Enterprise Speech Intelligence 7.0

Nexidia, the provider of audio search and speech analytics solutions, announced the release of Nexidia Enterprise Speech Intelligence (ESI) 7.0, the latest version of its speech analytics solution. New dashboards enable users to pinpoint "hot spots" driving call volume, while reporting features reveal the critical factors impacting service levels, talk times, agent & team performance, among other key metrics.

(www.nexidia.com)

Back to Contents...

Mining benefits for readers and newspapers alike

Omaha World-Herald Company has deployed text mining technology to automatically generate rich metadata. The deployment enables the company to better leverage a comprehensive inventory of its content.

Omaha World-Herald, which owns daily and weekly newspapers in Nebraska and Iowa, selected Nstein Text Mining Engine to semantically organize its library of media assets. With the solution, it can more efficiently and consistently tag all user and editorial content including articles, image and videos, Nstein recently reported. The tags will also help the company improve search engine optimization on its sites, automate workflows, enable personalization and streamline syndication efforts, according to Nstein.

"When we first saw the product demoed, we were intrigued with how quickly content could be tagged—and found," says Jeff Carney, assistant managing editor, Omaha World-Herald. "Management immediately recognized that if we wanted to create a new vertical on, say, outdoor sports, we would now be able to mine our assets and create new products almost on the fly. Eventually we plan to tag advertising as well—to better associate relevant content with ads."

Back to Contents...

Salesforce.com acquires InStranet

Salesforce.com has acquired InStranet, a provider of knowledge management technology for business to consumer (B2C) call centers. The amount of the deal was approximately $31.5 million, which includes the assumption of $4.2 million in cash on Instranet's balance sheet.

InStranet adds the customer's context, such as product or geography, to the knowledgebase to facilitate honing in on the right solution and eliminating irrelevant search results. In addition, because of the technology's open architecture, it provides rapid time to value, the companies claim, with deployments taking place in weeks as opposed to months. Salesforce.com is adding this technology innovation as a key component to Salesforce CRM Customer Service & Support, enhancing its Call Center and Customer Portal applications in use today by customers worldwide.

Back to Contents...

Interwoven and LexisNexis Team Up

Interwoven, Inc., provider of content management solutions, announced it has entered into an agreement with global business information solutions provider, LexisNexis, to integrate and launch Lexis Search Advantage, a new search enrichment offering, with Interwoven Universal Search. The integration of Lexis Search Advantage with Interwoven Universal Search will provide law firm attorneys a single destination to find and leverage internal work product and trusted content from LexisNexis.

(www.lexisnexis.com, www.reedelsevier.com, www.interwoven.com)

Back to Contents...

Recommind Announces eDiscovery 3.0

Recommind, a provider of enterprise search, email management, and eDiscovery systems for enterprises and law firms, announced the availability of version 3.0 of its advanced Axcelerate eDiscovery software. Axcelerate eDiscovery version 3.0 adds automatic language detection, integrated workflow management, and an administration console. Axcelerate eDiscovery offers First-Pass Review and patent-pending One-Click Coding functionality, which work with any language and any type of structured or unstructured data. With Axcelerate eDiscovery 3.0, Recommind has added automated language detection and filtering, including double-byte languages, allowing the system to index, cull, filter, search, review and code documents in myriad languages. Other new capabilities include: smart filters 2.0, native file viewing, and enhanced processing.

(www.recommind.com)

Back to Contents...

[Newsletters] [Home]