Enterprise Search Center

RESOURCES FOR EVALUATING ENTERPRISE SEARCH TECHNOLOGIES

January 10, 2007

Table of Contents

Featured Content: Restoring Browse in the Enterprise

Exalead Announces OEM Agreement With Messaging Architects

Jadu Launches "Web 2.0" Content Management Products

Google Search Appliance Adds New Features to Customize Enterprise Search

Recommind's MindServer 5.0

Traction integrates InStream

SearchInform Technologies Inc. Introduces New Version

Toward the transparent enterprise

More counterterrorism partnering

Reader ready

Featured Content: Restoring Browse in the Enterprise

Before Google's rise to dominance in the consumer search space, portals provided more than a search box. Perhaps best exemplified by Yahoo!, most portals provided a browseable taxonomy of the Web. Users were able to find what they were looking for as they do now, by entering search terms; but they could also click their way to content, browsing the subject hierarchy.

This format has all but disappeared from the IT landscape. The change may be due to the appeal of Google's minimalist aesthetic, or the relative difficulty of maintaining a taxonomy; maybe there are other factors. Whatever the reason, search has largely replaced browse. Powerful desktop search tools offer easier alternatives to file system browsers like Windows Explorer or the Mac Finder. Email systems provide convenient search-as-you-type functionality to locate messages in crowded inboxes. Web browsers have built-in search boxes and incremental search functions. As enterprises are scrambling to find the best search vendors to handle their information retrieval needs, we are left wondering: What happened to browse? And are we really better off without it?

Click Here to Download the Complete PDF

Why Search Isn't Enough

Search is a powerful tool, and its power is constantly growing. Sophisticated algorithms, faster processors, and cheap storage allow near-instant searching of huge volumes of content. Intelligently categorized results, smart summarization, and the ability to find related information—tools developed in the post-browse world—further increase the scope of search.

But one thing hasn't changed. A search always begins with a query—a small chunk of content intended to guide a user to a broader destination. No matter how sophisticated, every search result is the answer to a content-based question. But not all questions are content-based. Put another way, search requires that you know what you're looking for, and sometimes you don't.

Here are some common information-retrieval questions that are not content-based:

What content is available?
What's new and important in my inbox?
What should I do next?
Where is that thing I was looking at the other day?
What's happening in our CRM system?
Which projects need the most attention?
Who's online? Where are they?

These are key questions that search can't answer, and the need for tools to find the answers is growing. Hundreds of new emails, files on shared servers, blogs, RSS feeds, CRM databases, paper trails: These are all unread documents whose content cannot be queried because it is unknown.

Browse, on the other hand, does not require foreknowledge of content. It shifts that responsibility from the user onto the system. This certainly presents challenges, but they are not insurmountable. And due to common vocabularies and taxonomies, the enterprise is often an easier target for browse technologies than the consumer space. Given proper attention, browse tools can become as sophisticated as their search counterparts.

By excluding browse from their information retrieval landscape, enterprises are doing themselves and their employees a disservice.

Faceted Navigation

Browse is not universally ignored. In both the consumer and enterprise spaces, innovation continues on many fronts.

Faceted navigation systems (such as those produced by Endeca and Siderean) provide sophisticated content filtering. Each document or object is defined by a set of facets—things like color, price, size, or manufacturer in an ecommerce context. Users select one facet after another, narrowing their results until they find what they want. Faceted navigation systems are often more powerful and flexible than traditional taxonomies because they're not hierarchical: Selecting a color followed by a price is identical to selecting a price followed by a color.This results in multiple paths to data and thus greater flexibility in responding to users' thoughts.

These systems can be combined with search. Each can be used as a second step to narrow the other's results. Or, in a more flexible implementation, the user can switch back and forth at will. And, in general, faceted navigation systems are mature and are already integrated into numerous sites and applications.

But faceted navigation requires facets for each document in the collection. This is often difficult—and sometimes impossible—to add manually, particularly for a preexisting collection of thousands or millions of facet-free documents. But hope is in sight: There are technologies that promise to automate this process. (See Categorizers below for some examples.)

Metadata and Taxonomies

Even without the addition of new technologies, existing data and tried-and-true techniques can be used to supplement and enhance search.

Metadata is becoming ubiquitous. Microsoft Office, Adobe Creative Suite, email systems, digital cameras, and other tools embed metadata directly in documents. Some vendors (such as Siderean) can improve its accuracy and extract additional metadata. Search and indexing systems read this metadata. Yet most applications embed less metadata than is available or can be indexed. Enterprises can take advantage of this by integrating more metadata into document creation and other workflows. This can be done manually (by document authors or other workers) as well as automatically via a categorizer.

Browse and search interfaces can take greater advantage of existing metadata. Every document has basic information such as modification date; many documents incorporate more, most notably the author. Some documents (digital photographs, for example) include a rich collection of information. Providing users with a way to browse these fields can assist them in finding information. And with indexing services built into all major operating systems, IT departments and integrators don't have to reinvent the wheel or build a complete custom solution.

Org charts, classification systems, and other preexisting taxonomies can also aid in enhancing browse-based information retrieval. Enterprises should encourage authors and administrators to classify documents within these taxonomies. And metadata can be combined with them to yield more powerful browse capabilities without significant effort. For example, combining the author information present in most Microsoft Office documents with a corporate org chart would allow intranet users to browse documents by department and would not require retrofitting or additional effort by individual document authors.

Tags and Folksonomies

In the consumer space, user-driven tagging is gaining popularity. Sites like Flickr and Del.icio.us rely on users to categorize the content they submit by applying tags to it. For example, a user uploading vacation photos to Flickr (a photo-sharing site) might type tags like "vacation" or "Denmark". As thousands of users supply tags (often identical for similar content), it becomes possible to browse and search across users. For example, viewing the "Denmark" tag on Flickr will yield vacation photos from multiple users who visited Denmark.This ad hoc categorization scheme is often referred to as a "folksonomy," though it differs from a typical taxonomy in its nonhierarchical nature.

While browse capabilities are very simple in popular folksonomy applications—owing in part, no doubt, to their consumer focus— something more robust like a faceted navigation system could easily be built on top of them, providing filter-style narrowing of content by clicking on one tag (facet) at a time. In effect, the tagging facility then solves the problem of generating facets.

Many tagging sites have introduced an intriguing browse interface, the "tag cloud". This simple visualizer shows a collection of linked tags whose size corresponds to the number of documents that use it. It's interesting not only for its utility, but also for its simplicity. And again, it could serve as the front end for a multi-level, tag-based filtering system without undue development effort.

Tagging has significant drawbacks, most notably its ad hoc nature.There is no guarantee that a distributed group of users will generate an effective categorization scheme. Users looking for information must rely on untrained users to supply sufficient, appropriate tags and to use consistent terminology. But enterprises enjoy a shared vocabulary of projects, products, and methods, and shared jargon that could make an intra-organizational folksonomy more effective.

An enterprise might also increase its folksonomy's power and accuracy by using it to supplement an existing corporate taxonomy. For example, workers might use the official taxonomy to begin locating information; where the taxonomy left off they could use the folksonomy to narrow their results further. The two schemes could be provided together for more free form filtering. An enterprise could even relate the two schemes proactively, associating terms based on frequency of co-occurrence to clean up inconsistencies in tagging.

Categorizers

Providing enough content information to make browsing effective is a key challenge for all browse systems. Traditionally, it has required manual categorization according to a preexisting taxonomy. One reason for the relative obscurity of browse is probably the difficulty of providing the metadata to drive it, particularly in constantly changing, distributed, or user-controlled databases—such as the Web, email, or a user's desktop.

Automated categorization and other text processing technologies provide some hope here. There are numerous approaches, but all provide some way for computers to discern patterns in document collections. While these systems are still evolving and are typically far from being perfectly accurate, they nonetheless allow automation of the categorization process. (And it is worth noting that manual categorization incurs human error and is itself far from 100% accurate.) The picture gets rosier in the enterprise: Most of these technologies work much better when restricted to a particular domain (i.e., the industry or jargon of a given company) or paired with an existing taxonomy.

For an example of a categorizer applied to Web search, try Vivisimo's Clusty (http://clusty.com). There are numerous other vendors of categorization software including Autonomy, ClearForest, Data Harmony, Factiva, Gammasite, Inxight, Lexalytics, Nstein, Recommind, Schemalogic, Semagix, Siderean, Stratify, Temis, and Teragram. Siderean is focused specifically on extracting facets for faceted navigation. A number of open source efforts exist as well. And many vendors incorporate categorization into a larger search product.

For a more detailed look at categorizers, see the 2004 IDC Insight, "Why Categorize?" (IDC document #31717). For an overview of text mining software see the 2005 IDC Opinion, "Text mining: Mining for Gold in Unstructured Information."2 (IDC document #CA1503SWD, http://idc.com/getdoc.jsp?containerId=CA1503SWD).

Menus, Indexes, and Site Maps

Intranets and other Web sites have built-in navigational hierarchies. Enterprises should focus on making them better and exposing them effectively. Ideally, a user arriving at an intranet should see a clear browse path to her destination, and users frustrated by both navigation and search should have some recourse in the form of a comprehensive site map or index. All too often search becomes a de facto navigation tool when better navigational hierarchies would be more effective. And good navigation involves good categorization, which in turn aids other browse and search interfaces.

Visualizers

More adventurous (or overloaded) enterprises might look into browsable "visualizers," tools that provide a graphical representation of a document collection that allows users to see various aspects of it and drill down to find individual documents. Such tools—also referred to as "dynamic query" systems for the way they break down the query/result barrier—range from the simple tag cloud described above to complex three-dimensional models and incorporate research going back over 20 years. Many are approachable and potentially useful to workers with limited technical backgrounds. Some are commercial, while others are in the public domain.

Search Vendors

Search and browse are complementary, and the ideal information retrieval system incorporates both. Some search vendors do not provide browse functionality, but many do. Search technology should be selected based in part on how well it provides or works with browse tools. It may be helpful to develop a list of common information retrieval questions specific to a workforce and to determine which are "search" and which are "browse." Bringing this list to potential search vendors or using it as a guide might be of benefit to the process of selecting an appropriate technology. More generally, consider vendors whose technologies integrate well with other systems so that existing and future information can be leveraged effectively. And remember that a hybrid solution is an option: Sometimes the best strategy combines the offerings of multiple vendors.

Conclusion

Browse is too often overlooked. Enterprise workers need effective ways to answer key information questions; search can't always help. A better information retrieval strategy—one that combines the benefits of search and browse—will increase worker productivity and bolster the bottom line. While browse is too often overlooked, the landscape of browse options is thoroughly varied—from tried-and-true techniques that any enterprise can put into practice to innovative solutions from trusted vendors. Browse requires greater attention to content indexing and categorization, but products and methods exist to automate the process, at least in part. And with research in these areas advancing, the options are continuing to improve. Enterprises that consider browse as part of an integrated information retrieval solution will see concrete benefits; enterprises that do not will see information, time, and money lost.

CLICK HERE TO DOWNLOAD THE COMPLETE PDF

David Feldman is principal consultant for InterfaceThis, a software and user interface design firm. E-mail: dfeldman@interfacethis.com

Back to Contents...

Exalead Announces OEM Agreement With Messaging Architects

Exalead, a global provider of search software for business and the web, has announced an OEM agreement with Messaging Architects, a provider of Risk Management software and services for enterprise email systems.

Under the terms of the agreement, Messaging Architects will integrate the exalead one:search platform into its GWArchive 3.5 solution, which is designed to help organizations address the challenges of email retention, regulatory compliance, storage, and retrieval.

By embedding exalead one:search technology into GWArchive, customers will be able to retrieve archived emails through a unified user interface. GWArchive, which is designed for Novell Groupwise Collaboration software users, offers storage management, policy-based retention, full information lifecycle management for email, and long-term data portability.

(www.messagingarchitects.com; www.exalead.com)

Back to Contents...

Jadu Launches "Web 2.0" Content Management Products

Jadu, a U.K.-based provider of enterprise content management systems has announced a range of products developed to provide 'Web 2.0' technology to help organize information and content within an organization and enhance corporate communication.

The new products range from feature enhancing modules for the Jadu CMS, such as Jadu Multimedia, which enables video casting and podcasting, to full platform products such as XFORMS Professional, an eforms system that integrates with CRM systems and enables self-service, and Jadu Intranet 2.0, a social computing web portal for organizing, publishing, and blogging information within an organization.

Jadu has also developed Rupa, a cross-platform software interface for the Google Search Appliance (GSA) which, when integrated with Microsoft Active Directory, enables personalized and controlled enterprise search over an internal network. In addition, Jadu has delivered an internal customer services content management system for local authorities called Services, designed to enable customer services operators to use scripted information on council services and search for information using an in-built Google search.

(www.jadu.co.uk)

Back to Contents...

Google Search Appliance Adds New Features to Customize Enterprise Search

Google has announced new features for the Google Search Appliance designed to give enterprise customers the ability to customize search results for their individual corporate environments. The Google Search Appliance also offers improved integration with Google services as well as additional content sources.

New features include: results hit clustering, which are groups of dynamically formed sub-categories based on the results of each search query. These clusters appear at the top of search results and can help searchers refine their queries from possible ambiguous terms. Administrators can customize the location and appearance of Results Hit Clustering within search results; and source biasing is designed to enable administrators to assign various weights to search results on their corporate network, based on source or type of content. A menu-driven interface allows weak or strong increases or decreases, and requires no complex coding or scripting.

The new version of the Google Search Appliance also adds improved integration with Google Sitemaps export (for simpler export of information about web pages available for crawling), as well as open source connectors for indexing content in SharePoint 2003 and SharePoint 2007.

(www.google.com)

Back to Contents...

Recommind's MindServer 5.0

Recommind has launched a new version of its popular MindServer platform, which includes new e-discovery functionality that enables enterprises to quickly and easily locate electronically stored information (ESI) that must be preserved for ongoing or anticipated litigation.

The company says MindServer 5.0 is able to quickly identify and find potentially relevant and responsive information within the enterprise that must be preserved as part of a litigation hold, a requirement of any legal proceeding. Further, Recommind reports, in many enterprise environments, the MindServer platform itself can lock down any document or other piece of information returned in a search query. Otherwise, MindServer 5.0 is able to pass the result set from any query to a separate application, such as a content management system or database, for immediate litigation hold lock-down. MindServer 5.0 also supports multiselection filters within the user interface, a prerequisite for the highly comprehensive and detailed searches needed for effective litigation hold.

Recommind adds that the platform's improved APIs enable knowledge and content management, e-mail archiving and e-discovery system vendors to incorporate Recommind's enterprise search functionality into their products.

MindServer 5.0 also incorporates significant improvements to search relevancy and ease of use, delivers even faster query performance, simplifies integration with Microsoft SharePoint portal server and other existing systems and provides enhanced reporting capabilities. Some of these features include:

a significant increase in indexing performance;
customized connectors for document and knowledge-management products from vendors such as EMC/Documentum, LexisNexis and Microsoft;
support for 64-bit hardware; and;
higher scalability that supports search parameters across terabytes of information.

Recommind MindServer 5.0 is available immediately.

Back to Contents...

Traction integrates InStream

Traction Software has integrated FAST InStream search technology to enable secure search, entity extraction and drill down navigation for TeamPage, its wiki and blog offering.

Traction describes the module as an easily installed option that extends Traction's permissioned search model to more than 370 document formats for files attached to TeamPage posts or stored in TeamPage Web folders. FAST's linguistic analysis adds relevance ranking and automatic entity extraction to support interactive permission-filtered drill-down by person, company, location and other attributes.

Back to Contents...

SearchInform Technologies Inc. Introduces New Version

SearchInform Technologies Inc. has introduced a new version of SearchInform, a program of full text search and search for documents similar in their content, featuring enhanced indexing and information search algorithms as well as a new request caching system, influencing the work of the system as a whole and its speed.

SearchInform 3.0 boasts improved results because of its new request caching system, developed by the SearchInform Technologies technical specialists. The number of unique queries generated by the users comes up to around 20% of all queries so the SearchInform 3.0 search uses query caching has to fully process only one fifth of all queries. All other queries have ready results available in cache.

SearchInform 3.0 features include: Phrase search with due consideration to stemming and thesaurus; new SoftInform to search for similar documents; high indexing speed (from 15 to 30 GB/hour); index size of 15-25% from the actual size of the text data; query caching system; support for over 60 most popular text formats, Outlook & TheBat electronic messages, MP3 & AVI tags, and logs of MSN and ICQ instant messaging programs.

(www.searchinform.com)

Back to Contents...

Toward the transparent enterprise

Open Text has introduced Livelink ECM 10, which, it says, lets customers implement a legitimate enterprisewide content management strategy, allowing them to preserve and protect intellectual capital, leverage business content across all applications and effectively address governance and regulatory compliance requirements.

Accomplishing those goals allows users to reap the full benefits of Open Text's new initiative, dubbed Enterprise Transparency, which, the company says, reflects the evolution of content management from simply tracking and controlling information to leveraging it for business advantage--setting it in action to drive business processes, create content-centric business applications, bridge structured and unstructured content repositories, and unleash information workers to make better, faster decisions based on a holistic, centralized view of business content.

Open Text reports that some of what's new in Livelink ECM 10 is available immediately, with the rest being rolled out in 2007.

New features in Livelink ECM 10 include:

Enterprise Library Services--In Livelink ECM 10, seamlessly integrated archival, metadata management, enterprise records management and search capabilities are exposed as Enterprise Library Services, providing organizations with the ability to effectively implement enterprisewide retention strategies. In addition to managing content stored in Livelink ECM repositories, Livelink ECM 10 also provides the ability to manage the metadata and life cycle of content stored in enterprise applications from vendors such as Microsoft, SAP and Oracle, as well as business content stored in SharePoint Server 2003, e-mail, file systems and other repositories.

Bridging business content with enterprise applications--Version 10 allows users to access business content in ERP systems, such as customer information or purchase order documents, from their familiar Microsoft Office Outlook 2003 interface.

Flexibility for provisioning basic content services--Open Text reports organizations have the flexibility to build and deploy solutions on any basic content services offering (such as SharePoint Portal Server 2003), while managing the enterprisewide retention of business content with Enterprise Library Services.

Emphasis on user experience--In addition to Livelink ECM's Web-based user interface, Livelink ECM 10 will feature a new rich client interface, providing seamless access to business content from within leading Microsoft desktop applications.

New Web services APIs: Livelink ECM 10 will provide published Web Services APIs for Enterprise Library Services and Livelink Content Services.

Back to Contents...

More counterterrorism partnering

Inxight Federal Systems Group has partnered with Mosaic, a small business providing consulting to the U.S. government, to offer professional services and knowledge-based solutions to government organizations focused on enemy threats.

Inxight reports its partnership with Mosaic will bring the ability to understand and represent the requirements of analysts in defining and delivering enterprise solutions. Mosaic employs senior technical resources across a broad range of technical competencies that are essential to the future of analytical systems development.

Back to Contents...

Reader ready

Adobe Systems announces that its Reader 8 software is available as a free download. Adobe explains that in addition to enabling information exchange between enterprises, government agencies, constituents and consumers that view, print, search, digitally sign and collaborate with PDF files, Version Reader 8 also features a new "Start Meeting" button that launches Adobe Acrobat Connect, an Adobe-hosted software service that provides real-time online collaboration through Adobe Flash Player.

The company says Reader 8 also features a new, streamlined interface with user customizable toolbars. Further, it reports, Adobe Acrobat Professional users can now enable Adobe Reader users to fill and submit forms, save data and digitally sign documents. Adobe Reader 8 also offers graphics processing unit (GPU) acceleration, which boosts performance when viewing graphics-intense PDF files, such as 3-D content, the company says.

Adobe has also published the sixth edition of its PDF Reference, a free guide for developers implementing the open PDF specification in third-party products and plug-ins. The PDF Reference, complete with the current PDF 1.7 specification, can be downloaded here. Further, the Adobe Acrobat 8 Software Development Kit (SDK), which contains documentation and tools that developers need to build Adobe Reader and Acrobat plug-ins, is a free download.

Adobe Reader 8 is now available as a free download at http://www.adobe.com/go/getreader/, and is available in English, French, German and Japanese versions. Chinese and Korean versions are expected to be available in early 2007. Adobe Reader 8 is available on Windows and Mac, and is expected on Linux, HP/UX, AIX, and Solaris (SPARC) platforms in 2007.

Back to Contents...

[Newsletters] [Home]