Before Google's rise to dominance in the consumer search space, portals provided more than a search box. Perhaps best exemplified by Yahoo!, most portals provided a browseable taxonomy of the Web. Users were able to find what they were looking for as they do now, by entering search terms; but they could also click their way to content, browsing the subject hierarchy.
This format has all but disappeared from the IT landscape. The change may be due to the appeal of Google's minimalist aesthetic, or the relative difficulty of maintaining a taxonomy; maybe there are other factors. Whatever the reason, search has largely replaced browse. Powerful desktop search tools offer easier alternatives to file system browsers like Windows Explorer or the Mac Finder. Email systems provide convenient search-as-you-type functionality to locate messages in crowded inboxes. Web browsers have built-in search boxes and incremental search functions. As enterprises are scrambling to find the best search vendors to handle their information retrieval needs, we are left wondering: What happened to browse? And are we really better off without it?
Click Here to Download the Complete PDF
Why Search Isn't Enough
Search is a powerful tool, and its power is constantly growing. Sophisticated algorithms, faster processors, and cheap storage allow near-instant searching of huge volumes of content. Intelligently categorized results, smart summarization, and the ability to find related information—tools developed in the post-browse world—further increase the scope of search.
But one thing hasn't changed. A search always begins with a query—a small chunk of content intended to guide a user to a broader destination. No matter how sophisticated, every search result is the answer to a content-based question. But not all questions are content-based. Put another way, search requires that you know what you're looking for, and sometimes you don't.
Here are some common information-retrieval questions that are not content-based:
- What content is available?
- What's new and important in my inbox?
- What should I do next?
- Where is that thing I was looking at the other day?
- What's happening in our CRM system?
- Which projects need the most attention?
- Who's online? Where are they?
These are key questions that search can't answer, and the need for tools to find the answers is growing. Hundreds of new emails, files on shared servers, blogs, RSS feeds, CRM databases, paper trails: These are all unread documents whose content cannot be queried because it is unknown.
Browse, on the other hand, does not require foreknowledge of content. It shifts that responsibility from the user onto the system. This certainly presents challenges, but they are not insurmountable. And due to common vocabularies and taxonomies, the enterprise is often an easier target for browse technologies than the consumer space. Given proper attention, browse tools can become as sophisticated as their search counterparts.
By excluding browse from their information retrieval landscape, enterprises are doing themselves and their employees a disservice.
Browse is not universally ignored. In both the consumer and enterprise spaces, innovation continues on many fronts.
Faceted navigation systems (such as those produced by Endeca and Siderean) provide sophisticated content filtering. Each document or object is defined by a set of facets—things like color, price, size, or manufacturer in an ecommerce context. Users select one facet after another, narrowing their results until they find what they want. Faceted navigation systems are often more powerful and flexible than traditional taxonomies because they're not hierarchical: Selecting a color followed by a price is identical to selecting a price followed by a color.This results in multiple paths to data and thus greater flexibility in responding to users' thoughts.
These systems can be combined with search. Each can be used as a second step to narrow the other's results. Or, in a more flexible implementation, the user can switch back and forth at will. And, in general, faceted navigation systems are mature and are already integrated into numerous sites and applications.
But faceted navigation requires facets for each document in the collection. This is often difficult—and sometimes impossible—to add manually, particularly for a preexisting collection of thousands or millions of facet-free documents. But hope is in sight: There are technologies that promise to automate this process. (See Categorizers below for some examples.)
Metadata and Taxonomies
Even without the addition of new technologies, existing data and tried-and-true techniques can be used to supplement and enhance search.
Metadata is becoming ubiquitous. Microsoft Office, Adobe Creative Suite, email systems, digital cameras, and other tools embed metadata directly in documents. Some vendors (such as Siderean) can improve its accuracy and extract additional metadata. Search and indexing systems read this metadata. Yet most applications embed less metadata than is available or can be indexed. Enterprises can take advantage of this by integrating more metadata into document creation and other workflows. This can be done manually (by document authors or other workers) as well as automatically via a categorizer.
Browse and search interfaces can take greater advantage of existing metadata. Every document has basic information such as modification date; many documents incorporate more, most notably the author. Some documents (digital photographs, for example) include a rich collection of information. Providing users with a way to browse these fields can assist them in finding information. And with indexing services built into all major operating systems, IT departments and integrators don't have to reinvent the wheel or build a complete custom solution.
Org charts, classification systems, and other preexisting taxonomies can also aid in enhancing browse-based information retrieval. Enterprises should encourage authors and administrators to classify documents within these taxonomies. And metadata can be combined with them to yield more powerful browse capabilities without significant effort. For example, combining the author information present in most Microsoft Office documents with a corporate org chart would allow intranet users to browse documents by department and would not require retrofitting or additional effort by individual document authors.
Tags and Folksonomies
In the consumer space, user-driven tagging is gaining popularity. Sites like Flickr and Del.icio.us rely on users to categorize the content they submit by applying tags to it. For example, a user uploading vacation photos to Flickr (a photo-sharing site) might type tags like "vacation" or "Denmark". As thousands of users supply tags (often identical for similar content), it becomes possible to browse and search across users. For example, viewing the "Denmark" tag on Flickr will yield vacation photos from multiple users who visited Denmark.This ad hoc categorization scheme is often referred to as a "folksonomy," though it differs from a typical taxonomy in its nonhierarchical nature.
While browse capabilities are very simple in popular folksonomy applications—owing in part, no doubt, to their consumer focus— something more robust like a faceted navigation system could easily be built on top of them, providing filter-style narrowing of content by clicking on one tag (facet) at a time. In effect, the tagging facility then solves the problem of generating facets.
Many tagging sites have introduced an intriguing browse interface, the "tag cloud". This simple visualizer shows a collection of linked tags whose size corresponds to the number of documents that use it. It's interesting not only for its utility, but also for its simplicity. And again, it could serve as the front end for a multi-level, tag-based filtering system without undue development effort.
Tagging has significant drawbacks, most notably its ad hoc nature.There is no guarantee that a distributed group of users will generate an effective categorization scheme. Users looking for information must rely on untrained users to supply sufficient, appropriate tags and to use consistent terminology. But enterprises enjoy a shared vocabulary of projects, products, and methods, and shared jargon that could make an intra-organizational folksonomy more effective.
An enterprise might also increase its folksonomy's power and accuracy by using it to supplement an existing corporate taxonomy. For example, workers might use the official taxonomy to begin locating information; where the taxonomy left off they could use the folksonomy to narrow their results further. The two schemes could be provided together for more free form filtering. An enterprise could even relate the two schemes proactively, associating terms based on frequency of co-occurrence to clean up inconsistencies in tagging.
Providing enough content information to make browsing effective is a key challenge for all browse systems. Traditionally, it has required manual categorization according to a preexisting taxonomy. One reason for the relative obscurity of browse is probably the difficulty of providing the metadata to drive it, particularly in constantly changing, distributed, or user-controlled databases—such as the Web, email, or a user's desktop.
Automated categorization and other text processing technologies provide some hope here. There are numerous approaches, but all provide some way for computers to discern patterns in document collections. While these systems are still evolving and are typically far from being perfectly accurate, they nonetheless allow automation of the categorization process. (And it is worth noting that manual categorization incurs human error and is itself far from 100% accurate.) The picture gets rosier in the enterprise: Most of these technologies work much better when restricted to a particular domain (i.e., the industry or jargon of a given company) or paired with an existing taxonomy.
For an example of a categorizer applied to Web search, try Vivisimo's Clusty (http://clusty.com). There are numerous other vendors of categorization software including Autonomy, ClearForest, Data Harmony, Factiva, Gammasite, Inxight, Lexalytics, Nstein, Recommind, Schemalogic, Semagix, Siderean, Stratify, Temis, and Teragram. Siderean is focused specifically on extracting facets for faceted navigation. A number of open source efforts exist as well. And many vendors incorporate categorization into a larger search product.
For a more detailed look at categorizers, see the 2004 IDC Insight, "Why Categorize?" (IDC document #31717). For an overview of text mining software see the 2005 IDC Opinion, "Text mining: Mining for Gold in Unstructured Information."2 (IDC document #CA1503SWD, http://idc.com/getdoc.jsp?containerId=CA1503SWD).
Menus, Indexes, and Site Maps
Intranets and other Web sites have built-in navigational hierarchies. Enterprises should focus on making them better and exposing them effectively. Ideally, a user arriving at an intranet should see a clear browse path to her destination, and users frustrated by both navigation and search should have some recourse in the form of a comprehensive site map or index. All too often search becomes a de facto navigation tool when better navigational hierarchies would be more effective. And good navigation involves good categorization, which in turn aids other browse and search interfaces.
More adventurous (or overloaded) enterprises might look into browsable "visualizers," tools that provide a graphical representation of a document collection that allows users to see various aspects of it and drill down to find individual documents. Such tools—also referred to as "dynamic query" systems for the way they break down the query/result barrier—range from the simple tag cloud described above to complex three-dimensional models and incorporate research going back over 20 years. Many are approachable and potentially useful to workers with limited technical backgrounds. Some are commercial, while others are in the public domain.
Search and browse are complementary, and the ideal information retrieval system incorporates both. Some search vendors do not provide browse functionality, but many do. Search technology should be selected based in part on how well it provides or works with browse tools. It may be helpful to develop a list of common information retrieval questions specific to a workforce and to determine which are "search" and which are "browse." Bringing this list to potential search vendors or using it as a guide might be of benefit to the process of selecting an appropriate technology. More generally, consider vendors whose technologies integrate well with other systems so that existing and future information can be leveraged effectively. And remember that a hybrid solution is an option: Sometimes the best strategy combines the offerings of multiple vendors.
Browse is too often overlooked. Enterprise workers need effective ways to answer key information questions; search can't always help. A better information retrieval strategy—one that combines the benefits of search and browse—will increase worker productivity and bolster the bottom line. While browse is too often overlooked, the landscape of browse options is thoroughly varied—from tried-and-true techniques that any enterprise can put into practice to innovative solutions from trusted vendors. Browse requires greater attention to content indexing and categorization, but products and methods exist to automate the process, at least in part. And with research in these areas advancing, the options are continuing to improve. Enterprises that consider browse as part of an integrated information retrieval solution will see concrete benefits; enterprises that do not will see information, time, and money lost.
CLICK HERE TO DOWNLOAD THE COMPLETE PDF
David Feldman is principal consultant for InterfaceThis, a software and user interface design firm. E-mail: email@example.com