At the Montague Institute, we make a living through our content. Unlike many organizations, we do not use a search engine to find a needle in the intranet haystack. Instead, we use it as one of several discovery options in an environment where authors and editors pay close attention to document selection, preparation, and metadata.
The Montague Institute publishes briefings, course books, and two periodicals: the Montague Institute Review and the Knowledge Base Editor's Digest. Our public website contains abstracts of articles in the Review, while the full text is available to members on a passworded website.
Institute content, on the web since 1995, has reference value for both the public and members. Many people continue to read articles that are several years old. Over time, vendor names change and new buzzwords appear. A part of our value added is to help readers navigate this changing landscape through cross references and definitions.
How to Give Readers the Best of Both Worlds
By 1999, we realized that both our authors and readers needed a better way to find articles on a specific topic, vendor, or concept. Our CFO, who remembered the value of indexes from his days as a Ph.D. candidate, suggested that we create a topical A-Z index for the Review and other website content. To do this, of course, we needed to develop a list of terms and associate them with web pages. As the index evolved, we added thesaurus terms (cross references) and definitions.
For some tasks, such as finding a known item or a unique word quickly, a search engine works better than an index. For awhile, we used the Google search code on our site, but soon we developed a list of must-have features that weren't available with the free Google box:
- Create a list of "Best Bets"—editor-selected pages that display first in the list of search results;
- Customize the search page—e.g., remove the Google logo and add a link to our A-Z index;
- Customize the results page—bypass automatically created document summaries and use our own descriptions;
- Search member content—add a members-only search box that could access the full text of articles on the passworded website;
- Add topics and cross references—add "see also" references and a list of related topics to the search results.
Selecting a Search Engine
In looking for a replacement for the free Google search, we had three basic requirements:
- Leverage our existing metadata to make search more accurate and give users more discovery options.
- No programming required.
- Low cost.
When we went shopping in 2001, Ultraseek (then called Inktomi) was the obvious choice. We bought the Content Classification Engine (at that time an extra cost option) to provide a topic hierarchy, bread crumb trail, and related topic display in the search results.
Installing Inktomi and getting it to crawl our content was relatively easy. Customizing the "look and feel" of the pages was more difficult because it involved changing the complex code of the Inktomi search and results pages. With the current version of Ultraseek, this task is much easier and can be done by making choices in a web-based "style editor." Few code changes are now required.
After the initial cosmetic changes, we turned our attention to customizing Ultraseek's behavior in the following ways:
Searching public and member content. We wanted nonmembers to find out that an article exists, but we did not want them to be able to access the full-text version of it. To do this, we configured two Ultraseek collections: one with article abstracts and all other content on our public website, and the other with full-text articles on a passworded site.
Reorganizing content. We reorganized the folders on our site to isolate the "wheat" (articles and substantive pages) from the "chaff" (nonessential pages like navigation links). Then we instructed Ultraseek to crawl only the folders containing substantive content.
Entering topics and "Best Bets." We entered rules that told Ultraseek how to select documents and "Best Bets" for each topic created by the Content Classification Engine (CCE). This task was made easier by the effort we had already put into selecting and classifying documents for the A-Z index.
Customizing search results. By default, Ultraseek displays a computed summary, the URL, file size, publisher, relevancy percentage, and a "find similar" link for each document it finds. Instead, we wanted to omit the file size, relevancy percentage and "find similar" link. Instead of the computed summary, we wanted a description composed by our editors. Instead of the publisher, we wanted a true publication date (not the last-modified date). This involved making sure both the description and publication date were entered as metadata elements in each document and then telling Ultraseek where to find them.
Entering document metadata. Until recently, Ultraseek looked for metadata within each document. That meant adding metadata tags and values to all the pages that we wanted to appear in the search results. At the time, we used a program called Metabot, which scanned each web page and put existing metadata into a spreadsheet format. Each web page was displayed as a row, and each metatag was displayed as a column. You could add new metatags (columns) and new values in each cell. When you saved your work, Metabot would automatically insert the metatags into the right document. Eventually, we learned how to do the same thing with a relational database. Today, Ultraseek can read metadata directly from an external database, obviating the need to insert tags into documents.
Creating a thesaurus file. Ultraseek can read a table of equivalents ("thesaurus") to expand a search. Typically this file is used for acronyms, spelling variations, and synonyms. We exported the "see also" terms from our A-Z index database in an XML format that Ultraseek can read.