Enterprise Search Center

RESOURCES FOR EVALUATING ENTERPRISE SEARCH TECHNOLOGIES

June 10, 2009

Table of Contents

SharePoint: the backbone of your information architecture

Google Announces Google Wave; Plans to Enter Ebook Business

Wolfram Alpha Officially Launched

Top of the fourth

One Box Extender

A legal marriage

Exalead and The vdR Group Release Search-Based Application

Gabriels Launches Product for The New York Times Using Endeca

Enterprise search: a key business enabler

ZyLAB deepens e-discovery

The meaning of the matter

MetaVis launches visual tool

EBSCO Publishing Introduces NoveList Select and Nonprofit Organization Reference Center

Kalido Integrates Netrics

SharePoint: the backbone of your information architecture

SharePoint buyers expect intuitive navigation, contextual search and easy administration out of the box—but such benefits depend on how content is structured, labeled and categorized, and they require a nuanced understanding of how different audiences will navigate and search for information.

The information architecture (IA) behind a SharePoint deployment has lasting consequences for the user experience and for Web site management. Information and knowledge management (I&KM) professionals should use their SharePoint implementations as an opportunity to set solid information architecture in place that turns today’s information overload into tomorrow’s valuable information assets.

The upshot?

Information workers will finally be able to find the critical information they need to do their jobs.

For the past 10 years, information architects have worked through how to organize and present information on corporate intranets. Common best practices and design guidelines have emerged, which include prioritizing directory lookups, news, and financial and human resource (HR) information on the home page, as well as offering task-driven or process-oriented navigation—such as how to orient a new employee or how to move offices—in addition to functional navigation. Organizing and controlling the information on an intranet has historically fallen to a small team of stakeholders who update the site map, scope the search engine and design the navigation. That manual approach does not scale well to large enterprises with diverse needs.

Many enterprises unveil SharePoint to facilitate developing their intranets—better employee communication and shared access to team information. But unlike a simple intranet or collaboration solution, SharePoint also includes portal, Web content management and business intelligence capabilities. A project plan focusing on quick deployment of SharePoint workspaces may overlook critical information classification tasks necessary to make SharePoint effective as an enterprise intranet and knowledge management vehicle.

In particular, SharePoint has some distinctive elements that affect an enterprise’s information architecture. For one, SharePoint content is stored in a SQL Server database, not in a hierarchical file server. SharePoint sites are managed in one or more "site collections." By default, the content in each Office SharePoint Server 2007 Web application lives in a single site collection and is stored together in the same database. Enterprises typically divide their content into multiple site collections due to performance, storage and management concerns. A single site collection cannot be stored in multiple databases. Thus, the absence of a treelike site structure defies traditional navigation of content from root to leaf.

Also, site collections can be thought of as secure containers that hold content of a similar stripe. A site collection administrator has full access to everything in the collection. Administrators can manage security, create elements such as libraries and calendars, and organize content how they see fit. That distributed model means that as SharePoint sites grow virally, IT and the business may struggle to balance control and chaos. As a result, large enterprises must decide what they should make mandatory and consistent across sites, and what they can delegate to project-, team- or department-level administrators.

The bottom line is that SharePoint is more than just a portal server. Its wide coverage of information management tools requires a dedicated, cross-functional approach to governance. Given that those capabilities are integrated, I&KM pros have an opportunity to manage content with greater rigor and with more user participation than has been possible before.

Sharepoint IA decisions affect key capabilities, not just content findability

The primary information architecture mechanism for MOSS 2007 is the site collection framework. Microsoft describes site collections as native containers of Office SharePoint Server 2007 sites and "the unit of ownership, quota and security management."

Basically, site collections are the linchpin of SharePoint information architecture. The way information is structured and stored affects its governance, security enforcement, disposition, accessibility and more.

Site collections affect operations like usage tracking, backup/restore abilities, storage quotas and security boundaries. A site collection’s Web parts, master pages and layouts, workflows, content types and templates control the common "look and feel" and functionality of its subsites. And, SharePoint’s navigation or site browsing structure, as well as search scopes, keywords and search "best bets," are set within site collection boundaries. Site collections offer extensive opportunity to manage metadata at multiple layers.

Add time for information architecture tasks to your project plan

It’s common to organize the sites by department structure and then department function (e.g., Purchasing>Contract negotiation) because existing security groups are often modeled with that hierarchy and it’s familiar to users. But information architects should make the most of their intuition about human behavior and skills in interface design, content analysis and technical know-how to challenge that status quo as needed. Some companies create site collections based on product names, client names or project names to offset the tradeoffs of hosting each department’s content in a separate site collection.

Site collections are just one piece of SharePoint information architecture. After determining how to structure SharePoint, I&KM pros must decide how to distribute universal information to multiple roles and groups, how to harmonize local and global metadata properties, and how to implement search.

There are two ways to get started on that. The first is to ask your users. Determine the boundaries of your user base: Does it include clients, partners, vendors, the whole enterprise or a limited subset of knowledge workers? Do geographic or functional boundaries matter? Interview a sample of users to understand what content they need and how they access it today.

The other is to analyze your content. Audit existing content stores to understand where high-value content lies and how it is organized. What content will be migrated to SharePoint, and how will you integrate what is not? How much content is duplicated? Is it templated and carefully managed throughout its life cycle? The answers to these questions will inform your decisions around content types, information management policy and metadata fields.

A rigorous approach to information architecture in the design phase is critical to facilitating flexible information delivery and access. SharePoint administrators translate the output of the design stage (e.g., paper prototypes and wireframes) into URL namespaces via "managed paths." Depending on circumstances, they might allow a single site collection under a specific path or allow users to create multiple top-level sites under a specific path.

Other mechanisms for contextual information access and delivery include audience targeting and search configuration. Audience targeting enables I&KM pros to define a subset of users by certain common criteria, such as a shared project or interest in a topic. Administrators can hide or show Web parts or target any item in a SharePoint list—like a news item—to defined audiences. As for search configuration, MOSS 2007 search can look across site collections, crawl shared drives and Web sites outside of SharePoint, map co-workers by "social distance" and retrieve data in line-of-business applications.

Further, search administrators can pick "authoritative pages" and assign best bets to popular search terms to optimize relevance. And, remember audience targeting and advanced search need clean, coherent metadata to run properly. Without significant commitment to taxonomy oversight, those capabilities will not work.

Rigorous IA is the silver bullet for business content

While a rigorous approach to information architecture benefits a structured portal architecture, the benefits can also be extended to user-generated business content. Just as users struggle to find information in a portal setting, they also struggle to find relevant business content. Spreadsheets, presentations, documents and a host of other content are generated and thrown into a sea of hard drives, file servers and e-mail folders, more often than not, never to be seen again.

The aggregate cost of lost content can be tremendous. Applying a structured taxonomy to business data has long been one of the keys to tapping into its value. Yet the burden metadata tagging puts on users has led to disappointing adoption because most are accustomed to very lightweight storage tools like file servers.

While users embrace simple file servers, finding information after the fact presents a challenge because file servers only store two pieces of descriptive metadata: file name and folder label. (See Figure 1., Page 9, KMWorld, June 2009) Frustrated by the inability to find information on file servers, organizations invested in content and knowledge management systems. Those systems provided the ability for extended application of metadata to content. However, the user experience suffered. (See Figure 2, Page 9, KMWorld, June 2009.)

Automating the application of metadata when business content is created, rather than asking users to manually apply metadata after the fact, may be the silver bullet. By leveraging rigorous information architecture principles, users can create SharePoint sites directly in an existing portal architecture. For example, a user starts developing finance-related content by starting a workspace from within the Finance section of SharePoint. (See Figure 3, Page 9, KMWorld, June 2009.) The custom site can adhere to best-practice workflow and approvals, and it can inherit metadata related to finance or the specific author. Thus, users interact with the system much like a file system without additional metadata input into the workspace.

When content and people using SharePoint are classified in multiple ways, there is unlimited potential for users to find dynamic connections between content and people that were not preconceived by content creators. For example, teams in different regions may generate sales collateral for the same product. If that content is tagged with controlled metadata values, then a new teammate can find all existing sales-related content and expertise regardless of regional boundaries. The database structure behind SharePoint offers a hint of a future world less burdened by file formats and content storage.

Make the most of an information architecture blank slate

Many organizations are looking at SharePoint as a foundation for better management of organizational unstructured data. SharePoint has technical capacity to organize data in compelling and usable ways. The key to success is to create a strategy that allows users to quickly access and create information that is broadly reusable within your organization. The strategy will begin with an intelligent information architecture that is reflected in your site collection plan.

Extend the benefits to user-generated business content. The same logic that applies to finding information in a portal environment can be extended to business content. Make your portal and information architecture a jumping off point for creating business workspaces that drive best practices and inherit key metadata.
Plot the life cycle of diverse content types. Some SharePoint content is ephemeral and ad hoc; some is long-lived and essential to key business transactions. Investigate the tradeoffs of using SharePoint to manage high-value content from its creation to disposition. In particular, assess the impact on existing records management, risk and compliance, and storage procedures.
Actively curate content. SharePoint is not a hands-off, self-service system. Enterprises that intend to start off slowly with straightforward collaborative information sharing often end up with anarchy if elements like storage quotas and search scopes are not vigorously monitored by a central team. Assign appropriate resources to managing SharePoint sites and workspaces.
Consider add-ons to achieve your goals. Microsoft has embraced a partner network to augment its out-of-the-box functionality. Some enterprises buy additional tools like Autonomy’s IDOL, FAST ESP (now a Microsoft subsidiary), Dow Jones’ Synaptica, Interse’s iBox, or SchemaLogic’s Enterprise Suite to compensatefor SharePoint’s shortcomings in search, autoclassification and taxonomy management.

SharePoint is part of an emerging class of information management tools from diverse vendors that are structured to treat content in a way similar to how data is treated in a database. That architecture allows fundamentally more structure for managing content that is currently largely unmanaged. In the future, as content moves through the enterprise, semantic meaning will be added, like an envelope with many postmarks.

However, keep in mind that getting there will be anything but easy. Just because the tools exist doesn’t mean the structure will build itself. Careful planning is required, and plans will need to adapt as new lessons are learned. Don’t take lightly the opportunity a blank slate offers.

Back to Contents...

Google Announces Google Wave; Plans to Enter Ebook Business

Google has announced Google Wave, which is described as a conversation and document, where people can communicate and work together with formatted text, photos, videos, maps, and more. In Google Wave, users can create a wave and add people to it. Everyone on a user’s wave can use formatted text, photos, gadgets, and even feeds from other sources on the web. They can insert a reply or edit the wave directly. It’s concurrent rich-text editing, where users can see on their screen what fellow collaborators are typing in their wave. They can also use "playback" to rewind the wave and see how it evolved. Google Wave has three layers: the product, the platform, and the protocol: The Google Wave product (available as a developer preview) is the web application people will use to access and edit waves. It includes a rich text editor and other functions like desktop drag-and-drop. Google Wave is also a platform with a set of open APIs that allow developers to embed waves in other web services, and to build new extensions that work inside waves. The Google Wave protocol is the underlying format for storing and the means of sharing waves, and includes the "live" concurrency control, which allows edits to be reflected instantly across users and services.

Google also announced its plans to enter the commercial ebook business this year. The New York Times reports that publishers will be able to set their own prices and Google would retain the right to lower rates. Readers would gain online access to digital titles but also would retain access offline through cached versions in browsers and access would not be limited to certain devices but would require internet access.

(www.google.com)

Back to Contents...

Wolfram Alpha Officially Launched

Wolfram Alpha LLC has officially launched Wolfram|Alpha. Wolfram|Alpha is the world's first computational knowledge engine, a free online service. Wolfram|Alpha draws on scientist Stephen Wolfram's work on Mathematica, a technical computing software platform, and on the discoveries he published in his book, A New Kind of Science. The long-term goal of Wolfram|Alpha is to make all systematic knowledge immediately computable and accessible to everyone. Wolfram|Alpha draws on multiple terabytes of curated data and synthesizes it into new combinations and presentations. The service answers questions, solves equations, cross-references data types, projects future behaviors, and more.

(www.wolframalpha.com)

Back to Contents...

Top of the fourth

Kazeon has released the fourth generation of its e-discovery software, a modular offering with three products: Analysis & Review, Collection & Culling and Legal Hold Management. All three e-discovery products are integrated within a single, underlying software platform to ensure a smooth workflow while eliminating hidden costs that exist in a multivendor or disparate product strategy. Further, Kazeon has developed an entirely new distributed, collaborative legal application, the eDiscovery Case Manager, which manages legal cases from "womb-to-tomb" and is geared toward the legal counsel and litigation support staff.

Kazeon highlights the following components of the Fourth Generation:

Analysis & Review. A set of capabilities such as patented analytics, concept extraction and search, search results visualization, result and review filtering, query expansion, e-mail thread analytics, and interactive and fast review. It is also said to provide exhaustive, transparent, accurate auditing and data verification reports to enable complete defensibility of the e-discovery process. Kazeon’s Review workflow includes linear review, collaboration across multiple reviewers, support for multiple law firm review and review escalation workflow.

Legal Hold. Features such as identification of ESI based on case requirements, e.g., custodians, date ranges, concepts, etc., case-based legal hold management, enforcement of legal hold, reporting of legal hold and finally, releasing legal hold when required. All of these can be performed in place or in target repositories.

Collection & Culling Elements include items such as fully indexed or index-less targeted collection, and a forensically sound and defensible collection from any source, including USB drives, thumb drives and laptops. Further, legal users can perform their own collection and culling for each case without IT assistance using the simplified case-based application.

Back to Contents...

One Box Extender

Perfect Search Corp. and Adhere Solutions have formed a partnership to deliver Perfect Search's database indexing solutions through the Google Search Appliance. The companies claim their One Box Extender (OBX) will extend the appliance to enable organizations to search their database content with searing query speeds seamlessly delivered though the Google Search Appliance's OneBox interface.

Perfect Search and Adhere Solutions explain that currently, Google Search Appliance users search their database content by sending the query through the OneBox Connector to retrieve results from different systems. They add that the OBX for the Google Search Appliance enables rapid search of Oracle, Microsoft SQL, DB2, MySQL, and any SQL-compliant database without placing any additional load on these systems. The OBX integrates within the same Search Engine Results Page for database searches through the Google Search Appliance's OneBox API.

The developers say One Box Extender can:

index millions or even billions of database records,
remove load on existing database systems,
provide better results than traditional SQL queries, and
comply with database security policies.

Back to Contents...

A legal marriage

Vivisimo has announced a strategic OEM agreement with LexisNexis whereby Vivisimo’s Velocity Search platform will be integrated into LexisNexis’ Concordance Enterprise, which is designed as a cost-efficient way to manage the high volume of documents—including e-mails and e-documents—generated during litigation.

LexisNexis will integrate Velocity for OEM, which leverages Velocity’s architecture and inherent flexibility in a modular design for easy deployment in a variety of markets and applications.

The two companies say the solution will be commercially available in the fourth quarter of 2009, further claiming the Velocity-powered Concordance Enterprise will enable customers to gain the most from their search experience by making relevant information easy to find, regardless of source, location or type. Additionally, using Vivisimo’s Discovery Module, users can take advantage of automated tagging and be able to export search results in native file formats for easy and fast sharing of search results and to leverage additional social search tools for collaboration and innovation in the enterprise.

Back to Contents...

Exalead and The vdR Group Release Search-Based Application

Exalead, a provider of search-based business application (SBA) technology, and The vdR Group, a developer of integration solutions for manufacturing and engineering applications, introduced Partrieve, an SBA built on the Exalead CloudView platform. Partrieve v5.0 helps users find and access part and product data that reside in multiple and disparate repositories, applications, and various content formats within an enterprise.
Partrieve unifies and aggregates part data independent of its source, format, language, or units of measure. Data can exist in computer-aided design (CAD), electronic content management (ECM), enterprise resource planning (ERP) and product lifecycle management (PLM) solutions, and more. Formats can include documents, spreadsheets, media files, and CAD drawings.

(www.vdR.com)

Back to Contents...

Gabriels Launches Product for The New York Times Using Endeca

Gabriels Technology Solutions, a private-label ecommerce provider, announced the recent re-launch of The New York Times Real Estate Portal along with its complementary site Great Homes and Destinations, which focuses on national and international real estate. The New York Times has worked with Gabriels since 2001 by offering its advertisers and consumers technology and real estate content. Gabriels portal search technology is built on the Endeca Information Access Platform designed to offer advanced Guided Navigation and Content Spotlighting capabilities. The recently launched version of the site includes search functionality with the ability to query listings via street address, neighborhood, city, zip code, and web ID from a single field. Rich content such as neighborhood and city level information include demographic, recent sales history, and school information. The site also tracks price changes, so that consumers can monitor any price changes for listings of interest. Advanced mapping allows users to view listings within neighborhood boundaries that are distinguished on the map. The Map Area Search enables users to search listings as they navigate up a popular avenue, along the coast or block by block.

(www.gabriels.net, www.endeca.com, www.greathomesanddestinations.com, www.realestate.nytimes.com)

Back to Contents...

Enterprise search: a key business enabler

Haley & Aldrich has been using Coveo Enterprise Search (CES) for two years now to manage search and access to corporate information and knowledge. In December 2008, the consulting firm, which focuses on environmental, engineering and management concerns, upgraded to the latest version of Coveo’s solution, CES 6.0.

According to Trent Parkhill, director of IT services and VP at Haley & Aldrich, the new release makes an invaluable search platform even better.

Parkhill says, "Initial searches are now five times faster than before and refinements of searches with contextual facets are 10 times faster. Those speeds draw more Haley & Aldrich users to further refine their search and thus find exactly what they are looking for.

"In over 50 years of advising clients on their most complex projects, we have accumulated a colossal volume of information. Accessing the right information at the right time without having to know where the document was previously stored translates into higher productivity for the Haley & Aldrich team, saving time and costs while boosting collaboration and knowledge sharing."

Haley & Aldrich chose the solution, according to Parkhill, because it offered a powerful, customizable and quick-to-implement enterprise search platform that is affordable to small and midsize businesses.

"Although the IT staff didn’t initially consider enterprise search as a key enabler to solve our business challenges," Parkhill says, "they now consider the implementation of CES as one of the three best IT additions to our company in the past 25 years."

Back to Contents...

ZyLAB deepens e-discovery

ZyLAB has announced three new modules for its ZyIMAGE E-discovery and Production platform. These three new modules enable organizations to audit and keep detailed records of their in-house e-discovery process to provide validation to the court that it was done properly.

ZyLAB elaborates:

The PST-to-XML Module is critical to the e-discovery process. Although e-mail is an excellent medium for transferring information, a PST file is not a sustainable, endurable and open archive. Further, searching through PST files can be slow, and the format does not allow searching on the contents of attachments. The new PST-to-XML Module allows a user to extract items from a Microsoft Outlook PST file and save them as either .msg or .rtf files with the help of XML wrappers, which store the properties of the Microsoft Outlook item. These files can be imported into a ZyIMAGE index, then searched and retrieved using several ZyLAB solutions including ZyFIND, the ZyIMAGE Webserver or the ZyIMAGE Legal Review Platform. Users can then easily search all e-mail files and attachments.

The ZyIMAGE Deduplication Module is used to find duplicate files that exist in a single ZyIMAGE Index, or across a series of ZyIMAGE Indexes. The duplicate files can be deleted, marked, moved or copied. This is important due to the growing amount of electronic data in corporations, particularly for those corporations who have had overly broad or non-existent record retention policies.

The ZyIMAGE Culling Module is used to filter chosen file types from source folders and copy or move them to specified target folders where they can be indexed with ZyINDEX (a solution that creates and manages all the searchable archives that contain scanned and electronic information). The ZyIMAGE Culling Module enables users to filter by data groups, add and modify data groups, create XML wrappers, add extra metadata, post-process image files, uncompress files and create reports.

Back to Contents...

The meaning of the matter

Autonomy Interwoven has introduced Social Media Analysis, a new offering said to allow businesses to convert the dynamic conversations taking place on social networks into actionable business opportunities. Autonomy Interwoven's Web content management solution employs Autonomy's Intelligent Data Operating Layer (IDOL) to exploit the full value of social media. Autonomy says Social Media Analysis automatically listens to social media content, analyzes the dialogue to understand sentiment and enables marketers to instantly act on the insights to protect their brand and drive revenue growth.

The company says traditional keyword spotting solutions fail to truly understand the rich meaning and conceptual patterns within user-generated content. The challenge of understanding this type of information is more pronounced, because the language is more conversational, rife with familiar expressions, slang and varying emotional undertones (e.g. sarcasm, excitement, disappointment), and stated so briefly that context is difficult to discern.

Autonomy's IDOL provides proven clustering, pattern matching techniques and probabilistic modeling that treat words as symbols of meaning rather than simple data points, yielding a much richer and contextual set of data for marketers to act upon. Additionally, Autonomy Interwoven's Web content management (WCM) solution provides powerful capabilities for engaging customers in a dialogue and enables marketers to instantly act upon the meaning and insights gathered by IDOL, to deliver compelling, targeted offers to consumers on the Web.

Autonomy Interwoven Social Media Analysis includes an extensive set of intelligent connectors into social networks, enabling a single point of search for all user-generated content, saving organizations time and money. Social media connectors include CNET Reviews, Epicurious, Facebook, IMDb, Kbb.com, LinkedIn, RSS, TripAdvisor, Twitter, WebMD, Yahoo Finance and Yelp. Additionally, says Autonomy, IDOL provides connectors to literally any form of information and any channel—including call center, audio, customer relationship management systems, and traditional media and video to arm marketers with an even more comprehensive and holistic view of the customer.

IDOL analyzes the information from these sources to automatically form clusters of sentiment, both positive and negative, which marketers can use to identify meaningful trends upon which they can act.

Back to Contents...

MetaVis launches visual tool

MetaVis Technologies has released MetaVis ARCHITECT for SharePoint, which gives information architects, developers, consultants and administrators the capability to design, document and deploy SharePoint objects using a graphical tool. A fully functional trial version can be downloaded here.

MetaVis CEO and founder of VitalPath Steve Pogrebivsky says ARCHTECT’s intuitive, graphical interface allows users to quickly create SharePoint metadata and hierarchies, allowing SharePoint sites to be easily understood and well documented. Features include:

visual taxonomy designer,

create/copy/move content types/lists and columns by dragging and dropping,

multisite compare and synchronization, and

load/deploy directly to SharePoint.

MetaVis plans to launch other tools that simplify organization and maintenance of this data inside SharePoint.

Back to Contents...

EBSCO Publishing Introduces NoveList Select and Nonprofit Organization Reference Center

EBSCO Publishing and the creators of the readers’ advisory service, NoveList have released NoveList Select, which extends a library catalog by providing links to other books by leveraging NoveList content. NoveList Select helps avoid searches that occur when all copies of a book are checked out and provides readers with additional options for what to read next. With NoveList Select, each recommendation is a live link to the library catalog allowing users to discover new books that are available. NoveList & NoveList Plus enables users to view a list of similar and recommended books from more than four million titles—from within the catalog. NoveList Select also adds links to Recommended Reading Lists, Author Read-alikes, Book Discussion Guides, and additional feature content into the catalog results. NoveList Select also provides links from bibliographic records to the library’s newsletters and newsletter signup page.

EBSCO Publishing also announced it has designed a database to provide information to all types of nonprofit sector organizations. Nonprofit Organization Reference Center (NPORC) contains hundreds of thousands of records and coverage of more than 200 leading nonprofit and business-related publications, many of which are available in a searchable PDF format. The database is designed for nonprofit organizations, not-for-profit organizations, unincorporated nonprofit associations, and other related entities. The database is updated weekly and is available on the EBSCOhost platform. In addition, EBSCO offers the tools to integrate NPORC content into corporate intranets and portals.

(www.ebscohost.com)

Back to Contents...

Kalido Integrates Netrics

Netrics, a provider of data access and data quality solutions, announced that it has signed an OEM agreement with Kalido to integrate the Netrics Matching Platform with the Kalido Information Engine. These new automated data matching functions add to Kalido’s native data governance capabilities by improving rules management, candidate management, confidence calculation, process matching, and survivorship management within Kalido. The newly integrated Netrics matching technology enables Kalido users to: match records from different sources to build master data, use hierarchy relationships as part of a match and support match definitions across fields, and use any master data element or nickname as a thesaurus and use those elements as part of a matching criterion.

(www.kalido.com, www.netrics.com)

Back to Contents...

[Newsletters] [Home]