EnterpriseSearchCenter.com Home
  News   Features   White Papers   Research Reports   Web Events   Conferences  
 
RESOURCES FOR EVALUATING ENTERPRISE SEARCH TECHNOLOGIES
February 06, 2008

Table of Contents

Sharepoint Search--An Enterprise Contender? (Free PDF)
Oracle to acquire BEA Systems
Coveo Enhances Search Functionality
Kazeon forms alliance with ProStor
Consummate clustering
Attenex and PSS Systems Announce Alliance
National Institutes of Health Selects Collexis
Lycos Incorporates Pixsy
Silobreaker Launches New Service
Researching the power of semantic technology

Sharepoint Search--An Enterprise Contender? (Free PDF)

By Jean Graef

Microsoft is a household word, rivaled only by Google for mind share with "everyman" in the technology sector. However, unlike Google, the business name Microsoft is not synonymous with search. Yet while Google rules the wide-open web, many organizations are heavily invested in Microsoft technologies behind the firewall. Therefore it is worth considering whether or not the search component of Microsoft’s SharePoint suite is a viable option for enterprise search.

Here at the Montague Institute, some of our members have already chosen it, some have tried it and rejected it, and many more are considering it a serious contender. In its "Magic Quadrant for Information Access Technology," Gartner lists it along with Google in its Challenger category (September 2007).The reason is that,with the 2007 release of the product (now called Microsoft Office SharePoint Server or MOSS), SharePoint search now includes most of the basic features we’ve come to expect in enterprise
search, along with low cost and tight integration with existing SharePoint installations and other Microsoft applications. As one person put it, "It isn’t the best in class, but it’s good enough."

Click here to download the free PDF.

Whether you deploy MOSS for enterprise search depends on your technology strategy and budget, how much you’ve invested in metadata and taxonomies, and how you plan to search multiple content repositories. If you use SharePoint for collaboration and content management but choose another product for enterprise search, you’ll need to consider two kinds of complimentary products: taxonomy management programs that integrate with MOSS search, and search engines that can search SharePoint content.

Either way, you’ll need a strategy that will integrate SharePoint’s bottom-up (decentralized) publication and management model with the top-down (centralized) enterprise search deployment model.You want users to be able to find resources—documents, websites, people—regardless of company location or technology,yet not be overwhelmed by the minutiae of documents generated by local collaboration.

MOSS 2007: A Big Improvement

MOSS 2007 search is a big improvement over the 2003 version. It provides the basic functionality we’ve come to expect from search engines, such as the following:

  • Search "scopes" that allow users to broaden or narrow a search based on a content collection (e.g., intranet, department, or team)
  • Security trimming, so users only see those results that they’re authorized to access
  • Synonym suggestion (Did you mean …?) and term highlighting in search results
  • Improved relevancy and ranking algorithms using such factors as click distance, hyperlink anchor text, URL depth, and metadata extraction
  • Editorial control over which documents show at the top of search results through "Best Bets" and "Authoritative Sources;"
  • Greater control over what content is included in search results through crawl rules, immediate removal of any site or item from the search index, and multiple start addresses per content source
  • Better usage reports—Previously, administrators reported having trouble extracting data from search logs; now they can create their own reports from a variety of built-in templates
In addition, MOSS 2007 search has two new features: People search, which allows users to find employees not only by department or job title but also by expertise, social distance, and common interests (integration with Microsoft’s Active Directory saves implementation time, especially if the directory contains information about security levels and what people actually do), and the Business Data Catalog, which allows users to search content stored in SAP, Oracle, and other databases.

Evaluating Effectiveness

How effective are these new and improved features? One large global organization compared MOSS 2007 search with Google, using a test collection of half a million documents. From a relevancy standpoint, both gave similar results without using metadata cues. Three-fourths of the 500 users enrolled in the test said that MOSS 2007 search was better than what they had used before (a combination of SharePoint 2003 and a well-known enterprise search engine).

Testers especially liked three things: First, results were more informative because document summaries enabled users to determine what they were about, which was a big time saver. Second, the solution was simple and didn’t require users to learn how to search; they got reasonably good results by typing a word or phrase in the search box. Third, the MOSS solution integrated with desktop applications; SharePoint search is available in the upper righthand corner of the Internet Explorer 7 browser and is integrated with Windows desktop search. (Note that users only see results that they are permitted to access.)

The search manager also reported that MOSS 2007 is easier to administer and maintain, though he said the index update process is still too time-consuming. He liked the variety of usage reports, especially the one that shows the most popular search terms that have no Best Bets assigned to them (i.e., the editors have not selected one or more documents or sites to display at the top of the those results lists).

Room for Improvement

Even those who like MOSS for search point out that there is still room for improvement. Here are some features they would like to see:

  • Wild card search—Enables a user to substitute one or more letters of a search word with an asterisk. Visitors can search effectively for names and products without knowing the exact spelling.
  • Video/image/audio search—Search audio andvideo files; display image search results as thumbnail images.
  • Support for "near" operator—Enables a user to enter two search terms and specify their proximity to each other (e.g., "defense near 2002"would find documents in which the words "defense" and "2002" occur within a specified number of
    words within the document text).
  • Document highlighting—MOSS highlights search terms in the list of results, but not in the target webpage.

Many, if not all, of these features are available through third-party add-ons from vendors such as Coveo and Mondosoft Ontolica. Unlike other search engine vendors, which provide new features exclusively through the upgrade process, Microsoft encourages its customers to  purchase enhancement packages created by independent developers. These add-ons, however, do increase the total
cost of MOSS search deployment.

Influence of Strategy and Budget

MOSS search is especially compelling for those organizations that have standardized on Microsoft products as a way to reduce the cost of systems integration and support, or because Microsoft is a major business partner for software consulting services. On the other hand, MOSS is less appealing to organizations that subscribe to a best-of-breed strategy where products from multiple vendors, believed to be best at what they do, are purchased and then integrated.

Moreover, companies selecting MOSS tend to look at search as part of a single system in which people are key information resources; much of the firm’s intellectual capital resides locally in word processor documents, spreadsheets, and presentations (as opposed to centralized in formal document libraries); increasing knowledgeworker productivity is an explicit goal; and many document and content management functions occur in semi-autonomous knowledge centers.

In other words, MOSS search is well-suited to organizations that have standardized on the Microsoft technology platform, use SharePoint for collaboration, have a decentralized organization structure, and are in knowledgeintensive industries such as R&D or software consulting.

Investments in Metadata and Taxonomies

Organizations that have invested in populating content with metadata and creating extensive taxonomies naturally want to leverage this effort to enhance enterprise search. MOSS search can use existing metadata in documents, as well as some relationships from an external thesaurus.

The MOSS search crawler will discover metadata embedded within documents, then use it to filter search results and display options in Advanced Search. However, the administrator must first map the crawled metadata elements to "managed properties" (attributes such as author, title, and URL that can be used in search scopes and queries). The Dublin Core metadata library comes with MOSS out of the box.

Some common metadata elements are mapped by default, but it’s also possible to create new managed properties for such attributes as customer name, customer service representative, or customer service region. Managed properties can be incorporated into document and site templates to make it easier to add metadata values at creation time, but MOSS provides no auto-categorization program to add metadata retrospectively to an existing document collection.

Using a Thesaurus

With MOSS keywords and synonyms, it is possible to use some thesaurus data and relationships to expand a search or influence the order of documents in the results list. Keywords in a search engine context are somewhat different from terms in a thesaurus that is used for classification or browsing purposes. In a traditional thesaurus, there are preferred terms, nonpreferred (USE) terms, broader terms, narrower terms, and related terms. In the MOSS thesaurus file (used to expand or redirect a query), there are only three kinds of relationships:

  • Expand search: User enters "Internet Explorer," MOSS also returns documents containing "IE" and "IE7"
  • Replace query term: User enters "NT5," MOSS replaces it with "W2K"
  • Rank order: In the search results, MOSS ranks documents containing the word "automobile" with a weight of "1.0" ahead of those containing the phrase "beach wagon" with a weight of "0.7"
In MOSS you can also associate definitions with keywords.

It’s not possible to simply import a traditional thesaurus into the MOSS thesaurus XML format because they’re two different animals. For one thing, a search thesaurus (i.e., a list of synonyms) should contain words that real users will type in the search box. Their words are gathered from search logs; they’re not terms created by a professional indexer, although there will be some overlap.Another reason is that a traditional thesaurus may contain phrases such as "packaging law & legislation," while a search thesaurus should contain single words or, at most, two-word phrases.  Finally, there’s no way to show broader or narrower


Click here to download the free PDF.

relationships in search results (e.g., "see also" links or an expandable hierarchy of related topics). At least two organizations we know of have bumped into size and performance limitations with the MOSS thesaurus (Microsoft says there’s a 10 MB limit).

Changing the Order of Results

One major use of a thesaurus is to classify documents (i.e., assign terms to them). Organizations that have used a thesaurus in this way, either by using human indexers or an auto-categorization program, can leverage some of this work in MOSS through Best Bets and Authoritative Pages.

With Best Bets, MOSS administrators can associate keywords with specific webpages or sites.When a user types the keyword into the search box, MOSS displays those sites designated as Best Bets either at the top of the results list or in a sidebar and marks them with an icon, such as a star.

With Authoritative Sites, administrators increase or decrease the relevance of content within search results by assigning one of four levels to a webpage or site: most authoritative, second-level authoritative, third-level authoritative, or sites to demote in the ranking. Sites that are not assigned an Authoritative Page level are weighted based on their "click distance" from an authoritative
site. Click distance refers to the number of links between a page and an authoritative page linking to the content item.

So while it’s possible to tweak MOSS search results using a variety of techniques along with some data from an existing
thesaurus, it’s a labor-intensive endeavor. For this reason, some organizations with large, complex taxonomies opt to purchase third-party thesaurusmanagement software that integrates with SharePoint—an approach that Microsoft endorses. Examples of MOSS-compatible taxonomy management tools include Factiva Synaptica,Data Harmony Machine-Aided Indexer,Schemalogic
SchemaServer, and Interse I-box.

A Consistent Experience

Users want a simple, effective way to search all available content collections—whether they reside in SharePoint, on the company’s intranet, in databases, or in external information services. The ideal is a single search box, a results page that contains relevant listings without duplicates, and a way to match user security profiles with content access levels in each content source.

Within MOSS, an administrator can create a Shared Services Provider (SSP) and instruct it to crawl all the content sources deemed necessary for a particular business function. Sources can include SharePoint content, the company intranet, database applications such as SAP and Oracle, and external information services such as FindLaw. The crawl results are stored in a single index, which makes the search relatively fast and efficient.

However, large organizations typically have multiple SSPs. To allow a user to search all of them from a single user interface, you can purchase a third-party application, such as Mondosoft’s Ontolica (see the federated search option on the Ontolica website), or you can select an enterprise search engine that can crawl and index SharePoint content. Examples include Autonomy, FAST, Longitude (BAInsight), Oracle, Recommind, Vivísimo, and others.

Is MOSS 2007 Right for You?

Organizations that use SharePoint for collaboration and content management should consider MOSS for enterprise search. Its tight integration with Microsoft applications (especially Office), its low cost, and its new search features make it a serious contender. Because MOSS is designed for bottom-up implementation, it’s important to get input from business units, as well as the enterprise search team and taxonomy manager, if there is one.

Several of our members at Montague have mentioned the effort needed to customize MOSS search and to set up interfaces to other business applications through the MOSS Business Data Connector. Added to that is the cost of purchasing third-party programs for enhanced search features and taxonomy management.We suspect that for many organizations, the question is not "Should we use MOSS as our enterprise search engine?" but rather "What’s the best way to integrate our non-Microsoft enterprise search engine with MOSS?"

 Click here to download the free PDF.

About the Author

JEAN GRAEF is a founder of the Society of Knowledge Base Publishers, a membership organization sponsored by the Montague Institute, which conducts research and development on issues of information productivity.

Back to Contents...

Oracle to acquire BEA Systems

Oracle and BEA Systems report that they have entered into a definitive agreement under which Oracle will acquire all outstanding shares of BEA for $19.375 per share in cash.

In a press release, Oracle CEO Larry Ellison, said, "The addition of BEA products and technology will significantly enhance and extend Oracle's Fusion middleware software suite. Oracle Fusion middleware has an open 'hot-pluggable' architecture that allows customers the option of coupling BEA's WebLogic Java Server to virtually all the components of the Fusion software suite. That's just one example of how customers can choose among Oracle and BEA middleware products, knowing that those products will gracefully interoperate and be supported for years to come."

The board of directors of BEA Systems has unanimously approved the transaction. It is anticipated to close by mid-2008, subject to BEA stockholder approval, certain regulatory approvals and customary closing conditions.

Back to Contents...

Coveo Enhances Search Functionality

Coveo Solutions Inc., a global provider of secure, enterprise search solutions, has added new capabilities to its Coveo Enterprise Search technology. The following new capabilities are now available in Coveo Enterprise Search: Greater scalability with support for Windows 64-bit operating systems; New connectors for Symantec Enterprise Vault v2 offers more flexibility and controls; and Out-of-the-box integration with Microsoft Exchange and Symantec Enterprise Vault email archives allows for integrated search across all corporate email content.

(www.coveo.com)  

Back to Contents...

Kazeon forms alliance with ProStor

ProStor, a developer of removable disk storage solutions for data protection and archive applications, and Kazeon, a provider of intelligent e-discovery solutions, have announced that ProStor's InfiniVault archive appliance has been certified to run with Kazeon Information Server software, helping IT departments improve information access, optimize storage utilization and cut costs within tiered storage infrastructures.

The Kazeon Information Server allows companies to perform e-discovery tasks by intelligently searching, classifying and acting on electronically stored information, enabling IT administrators to reduce the amount of effort spent manually gathering information. This approach allows IT and line-of-business managers to focus on higher-value activities such as managing regulatory compliance, information security, privacy and e-discovery efforts; tracking trends for better planning; reviewing utilization by users and groups; and attributing storage consumption to different departments for chargeback.

ProStor’s InfiniVault provides a digital archive and compliance solution that reduces storage costs by combining long-term data retention, regulatory compliance, automated data archiving and disaster recovery protection in a single appliance.

Back to Contents...

Consummate clustering

Vivisimo has announced new patent-pending technology--Remix clustering--to help users find new topics and gain insights related to their search queries.

The company reports its standard clustering organizes search results into topical folders on the fly, without any pre-processing of source documents, adding that clustering gives a quick overview of the main topics, enables easy access to valuable but low-ranked search results and brings together related documents for joint consideration.

Vivisimo’s Remix clustering is said to take the searcher's productivity to a new level: The user first sees clustered results in the usual style. Then, a single click on Remix reveals submerged or secondary topics that were not generated in the initial clustering. Further, the company reports, it works by feedback: If the user clusters the same search results again, it explicitly ignores the topics that the user already saw.

Remix clustering can be tested here. More information is available here. Remix clustering functionality is built into Vivisimo's Velocity 6.0.

Back to Contents...

Attenex and PSS Systems Announce Alliance

Attenex Corp., an e-discovery software provider, and PSS Systems, a provider of legal holds and enterprise retention management software, announced an alliance to provide enterprise customers with an end-to-end e-discovery solution. The joint solution is intended to provide corporate legal departments with a greater amount of transparency into the complex e-discovery process, including: End-to-end software tools for managing the e-discovery process from triggering event through review; Document-level visibility and tracking from custodian collection to review and coding; Reports that aggregate matter data from scope of hold to document production; and Tools and information that help corporate legal departments better estimate costs, reduce data volume, and track custodian metrics.

(www.pss-systems.com, www.attenex.com)  

Back to Contents...

National Institutes of Health Selects Collexis

Collexis Holdings, Inc., a developer of high-definition search and discovery software, announced that the National Institutes of Health (NIH) has selected the company to develop an Advanced Expert Profiling System that will allow more than 8,000 NIH researchers to connect and share their expertise via a web interface. The agreement represents an expansion of Collexis' relationship with the NIH. Previously, the NIH signed an Enterprise License Agreement with Collexis, under which more than $20 billion in grant funding is analyzed using the company’s software platform.

Collexis will compile expert profiles that will then be integrated with Medline, the National Library of Medicine's bibliographic database which holds references for more than 16 million journal publications in the life sciences. Collexis has designed a virtual knowledge directory for the NIH based on its proprietary next generation search technology. The directory will provide researchers with immediate access to expert knowledge, objective profiling of that knowledge, a quick and seamless means of locating NIH experts most relevant to the issue at hand, and complete profiles of those experts.

(www.collexis.com)  

Back to Contents...

Lycos Incorporates Pixsy

Pixsy Corporation, a media search platform that powers private label video and image search engines, announced a distribution deal with Lycos, Inc., a provider of social publishing, media, and search services. Under the terms of the agreement, Lycos will utilize Pixsy's advanced media search platform to offer video and image search to their network of owned and operated properties and affiliates. Pixsy's media search technology enables Lycos users to search a vast index of the web's latest videos and images.

(www.pixsy.com, www.lycos.com)  

Back to Contents...

Silobreaker Launches New Service

Silobreaker, a search service for news and current events, announced the official release of its new service. Silobreaker provides relevance by looking at the data it finds like a person does. It recognizes people, companies, topics, places, and keywords; understands how they relate to each other in the news flow, and puts them in context for the user. The graphical search results enable users to understand connections, trends, and topics or navigate deeper into the most relevant stories for them. Silobreaker pulls content on global issues, science, technology, and business from approximately 10,000 news, blog, research, and multimedia sources.

(www.silobreaker.com)  

Back to Contents...

Researching the power of semantic technology

Expert System has announced a collaborative agreement with Dartmouth College involving use of its semantic software, which searches, classifies and analyzes text information.

The partnership focuses on the use of the technology for unstructured information management--the natural language in which people speak and communicate. The research will delve particularly into the effectiveness and accuracy of semantic intelligence in the analysis of large amounts of documents (reports, e-mail, etc.), according to a press release from Expert System.

Paul Thompson, a professor in the Computer Science Department at Dartmouth, says, "We are using Expert System's semantic technology for research on electronic discovery, or e-discovery, the process which takes place when two or more organizations engaged in litigation request access to each other's documents. We are participating in a U.S. government-sponsored benchmark evaluation of e-discovery technology to determine whether advanced information retrieval technology, such as semantic search, can provide better performance than the current state-of-the-art."

Stefano Spaggiari, CEO of Expert System, says, "Joint technical development initiatives are the most interesting opportunity to create stronger and smarter relationships between university researchers and leading corporations."

Back to Contents...
 
[Newsletters] [Home]

Problems with this site? Please contact the webmaster. | About ITI | Privacy Policy