By Jean GraefMicrosoft is a household word, rivaled only by Google for mind share with "everyman" in the technology sector. However, unlike Google, the business name Microsoft is not synonymous with search. Yet while Google rules the wide-open web, many organizations are heavily invested in Microsoft technologies behind the firewall. Therefore it is worth considering whether or not the search component of Microsoft’s SharePoint suite is a viable option for enterprise search.
Here at the Montague Institute, some of our members have already chosen it, some have tried it and rejected it, and many more are considering it a serious contender. In its "Magic Quadrant for Information Access Technology," Gartner lists it along with Google in its Challenger category (September 2007).The reason is that,with the 2007 release of the product (now called Microsoft Office SharePoint Server or MOSS), SharePoint search now includes most of the basic features we’ve come to expect in enterprise
search, along with low cost and tight integration with existing SharePoint installations and other Microsoft applications. As one person put it, "It isn’t the best in class, but it’s good enough."
Click here to download the free PDF.
Whether you deploy MOSS for enterprise search depends on your technology strategy and budget, how much you’ve invested in metadata and taxonomies, and how you plan to search multiple content repositories. If you use SharePoint for collaboration and content management but choose another product for enterprise search, you’ll need to consider two kinds of complimentary products: taxonomy management programs that integrate with MOSS search, and search engines that can search SharePoint content.
Either way, you’ll need a strategy that will integrate SharePoint’s bottom-up (decentralized) publication and management model with the top-down (centralized) enterprise search deployment model.You want users to be able to find resources—documents, websites, people—regardless of company location or technology,yet not be overwhelmed by the minutiae of documents generated by local collaboration.
MOSS 2007: A Big Improvement
MOSS 2007 search is a big improvement over the 2003 version. It provides the basic functionality we’ve come to expect from search engines, such as the following:
In addition, MOSS 2007 search has two new features: People search, which allows users to find employees not only by department or job title but also by expertise, social distance, and common interests (integration with Microsoft’s Active Directory saves implementation time, especially if the directory contains information about security levels and what people actually do), and the Business Data Catalog, which allows users to search content stored in SAP, Oracle, and other databases.
- Search "scopes" that allow users to broaden or narrow a search based on a content collection (e.g., intranet, department, or team)
- Security trimming, so users only see those results that they’re authorized to access
- Synonym suggestion (Did you mean …?) and term highlighting in search results
- Improved relevancy and ranking algorithms using such factors as click distance, hyperlink anchor text, URL depth, and metadata extraction
- Editorial control over which documents show at the top of search results through "Best Bets" and "Authoritative Sources;"
- Greater control over what content is included in search results through crawl rules, immediate removal of any site or item from the search index, and multiple start addresses per content source
- Better usage reports—Previously, administrators reported having trouble extracting data from search logs; now they can create their own reports from a variety of built-in templates
How effective are these new and improved features? One large global organization compared MOSS 2007 search with Google, using a test collection of half a million documents. From a relevancy standpoint, both gave similar results without using metadata cues. Three-fourths of the 500 users enrolled in the test said that MOSS 2007 search was better than what they had used before (a combination of SharePoint 2003 and a well-known enterprise search engine).
Testers especially liked three things: First, results were more informative because document summaries enabled users to determine what they were about, which was a big time saver. Second, the solution was simple and didn’t require users to learn how to search; they got reasonably good results by typing a word or phrase in the search box. Third, the MOSS solution integrated with desktop applications; SharePoint search is available in the upper righthand corner of the Internet Explorer 7 browser and is integrated with Windows desktop search. (Note that users only see results that they are permitted to access.)
The search manager also reported that MOSS 2007 is easier to administer and maintain, though he said the index update process is still too time-consuming. He liked the variety of usage reports, especially the one that shows the most popular search terms that have no Best Bets assigned to them (i.e., the editors have not selected one or more documents or sites to display at the top of the those results lists).
Room for Improvement
Even those who like MOSS for search point out that there is still room for improvement. Here are some features they would like to see:
- Wild card search—Enables a user to substitute one or more letters of a search word with an asterisk. Visitors can search effectively for names and products without knowing the exact spelling.
- Video/image/audio search—Search audio andvideo files; display image search results as thumbnail images.
- Support for "near" operator—Enables a user to enter two search terms and specify their proximity to each other (e.g., "defense near 2002"would find documents in which the words "defense" and "2002" occur within a specified number of
words within the document text).
- Document highlighting—MOSS highlights search terms in the list of results, but not in the target webpage.
Many, if not all, of these features are available through third-party add-ons from vendors such as Coveo and Mondosoft Ontolica. Unlike other search engine vendors, which provide new features exclusively through the upgrade process, Microsoft encourages its customers to purchase enhancement packages created by independent developers. These add-ons, however, do increase the total
cost of MOSS search deployment.
Influence of Strategy and Budget
MOSS search is especially compelling for those organizations that have standardized on Microsoft products as a way to reduce the cost of systems integration and support, or because Microsoft is a major business partner for software consulting services. On the other hand, MOSS is less appealing to organizations that subscribe to a best-of-breed strategy where products from multiple vendors, believed to be best at what they do, are purchased and then integrated.
Moreover, companies selecting MOSS tend to look at search as part of a single system in which people are key information resources; much of the firm’s intellectual capital resides locally in word processor documents, spreadsheets, and presentations (as opposed to centralized in formal document libraries); increasing knowledgeworker productivity is an explicit goal; and many document and content management functions occur in semi-autonomous knowledge centers.
In other words, MOSS search is well-suited to organizations that have standardized on the Microsoft technology platform, use SharePoint for collaboration, have a decentralized organization structure, and are in knowledgeintensive industries such as R&D or software consulting.
Investments in Metadata and Taxonomies
Organizations that have invested in populating content with metadata and creating extensive taxonomies naturally want to leverage this effort to enhance enterprise search. MOSS search can use existing metadata in documents, as well as some relationships from an external thesaurus.
The MOSS search crawler will discover metadata embedded within documents, then use it to filter search results and display options in Advanced Search. However, the administrator must first map the crawled metadata elements to "managed properties" (attributes such as author, title, and URL that can be used in search scopes and queries). The Dublin Core metadata library comes with MOSS out of the box.
Some common metadata elements are mapped by default, but it’s also possible to create new managed properties for such attributes as customer name, customer service representative, or customer service region. Managed properties can be incorporated into document and site templates to make it easier to add metadata values at creation time, but MOSS provides no auto-categorization program to add metadata retrospectively to an existing document collection.
Using a Thesaurus
With MOSS keywords and synonyms, it is possible to use some thesaurus data and relationships to expand a search or influence the order of documents in the results list. Keywords in a search engine context are somewhat different from terms in a thesaurus that is used for classification or browsing purposes. In a traditional thesaurus, there are preferred terms, nonpreferred (USE) terms, broader terms, narrower terms, and related terms. In the MOSS thesaurus file (used to expand or redirect a query), there are only three kinds of relationships:
In MOSS you can also associate definitions with keywords.
- Expand search: User enters "Internet Explorer," MOSS also returns documents containing "IE" and "IE7"
- Replace query term: User enters "NT5," MOSS replaces it with "W2K"
- Rank order: In the search results, MOSS ranks documents containing the word "automobile" with a weight of "1.0" ahead of those containing the phrase "beach wagon" with a weight of "0.7"
It’s not possible to simply import a traditional thesaurus into the MOSS thesaurus XML format because they’re two different animals. For one thing, a search thesaurus (i.e., a list of synonyms) should contain words that real users will type in the search box. Their words are gathered from search logs; they’re not terms created by a professional indexer, although there will be some overlap.Another reason is that a traditional thesaurus may contain phrases such as "packaging law & legislation," while a search thesaurus should contain single words or, at most, two-word phrases. Finally, there’s no way to show broader or narrower
relationships in search results (e.g., "see also" links or an expandable hierarchy of related topics). At least two organizations we know of have bumped into size and performance limitations with the MOSS thesaurus (Microsoft says there’s a 10 MB limit).
Changing the Order of Results
One major use of a thesaurus is to classify documents (i.e., assign terms to them). Organizations that have used a thesaurus in this way, either by using human indexers or an auto-categorization program, can leverage some of this work in MOSS through Best Bets and Authoritative Pages.
With Best Bets, MOSS administrators can associate keywords with specific webpages or sites.When a user types the keyword into the search box, MOSS displays those sites designated as Best Bets either at the top of the results list or in a sidebar and marks them with an icon, such as a star.
With Authoritative Sites, administrators increase or decrease the relevance of content within search results by assigning one of four levels to a webpage or site: most authoritative, second-level authoritative, third-level authoritative, or sites to demote in the ranking. Sites that are not assigned an Authoritative Page level are weighted based on their "click distance" from an authoritative
site. Click distance refers to the number of links between a page and an authoritative page linking to the content item.
So while it’s possible to tweak MOSS search results using a variety of techniques along with some data from an existing
thesaurus, it’s a labor-intensive endeavor. For this reason, some organizations with large, complex taxonomies opt to purchase third-party thesaurusmanagement software that integrates with SharePoint—an approach that Microsoft endorses. Examples of MOSS-compatible taxonomy management tools include Factiva Synaptica,Data Harmony Machine-Aided Indexer,Schemalogic
SchemaServer, and Interse I-box.
A Consistent Experience
Users want a simple, effective way to search all available content collections—whether they reside in SharePoint, on the company’s intranet, in databases, or in external information services. The ideal is a single search box, a results page that contains relevant listings without duplicates, and a way to match user security profiles with content access levels in each content source.
Within MOSS, an administrator can create a Shared Services Provider (SSP) and instruct it to crawl all the content sources deemed necessary for a particular business function. Sources can include SharePoint content, the company intranet, database applications such as SAP and Oracle, and external information services such as FindLaw. The crawl results are stored in a single index, which makes the search relatively fast and efficient.
However, large organizations typically have multiple SSPs. To allow a user to search all of them from a single user interface, you can purchase a third-party application, such as Mondosoft’s Ontolica (see the federated search option on the Ontolica website), or you can select an enterprise search engine that can crawl and index SharePoint content. Examples include Autonomy, FAST, Longitude (BAInsight), Oracle, Recommind, Vivísimo, and others.
Is MOSS 2007 Right for You?
Organizations that use SharePoint for collaboration and content management should consider MOSS for enterprise search. Its tight integration with Microsoft applications (especially Office), its low cost, and its new search features make it a serious contender. Because MOSS is designed for bottom-up implementation, it’s important to get input from business units, as well as the enterprise search team and taxonomy manager, if there is one.
Several of our members at Montague have mentioned the effort needed to customize MOSS search and to set up interfaces to other business applications through the MOSS Business Data Connector. Added to that is the cost of purchasing third-party programs for enhanced search features and taxonomy management.We suspect that for many organizations, the question is not "Should we use MOSS as our enterprise search engine?" but rather "What’s the best way to integrate our non-Microsoft enterprise search engine with MOSS?"
Click here to download the free PDF.
About the Author
JEAN GRAEF is a founder of the Society of Knowledge Base Publishers, a membership organization sponsored by the Montague Institute, which conducts research and development on issues of information productivity.