News   Features   White Papers   Research Reports   Web Events   Conferences  
September 15, 2010

Table of Contents

NewspaperARCHIVE: A Case of Broken News
Connecticut Innovations Makes Follow-on Investment in ExeCue Inc.
Managing SharePoint
Google Brews A Cup of Instant Search
Trend Setting Search Products

NewspaperARCHIVE: A Case of Broken News

Founded in 1999 by parent company Heritage Microfilm, NewspaperARCHIVE is the world's largest online archive of historical and contemporary newspapers. With an archive reaching back to 1700, the service provides a fully searchable, graphical, and textual database of more than 4,500 newspapers from 1,100 cities. Users can search by names, keywords, or dates, giving amateur and professional researchers alike instant access to news reports, obituaries, birth announcements, sports coverage, and other useful content.

With more than 100 million pages averaging about 6,000 words each, it goes without saying that NewspaperARCHIVE doesn't go easy on its database and search systems. The service is also constantly expanding and updating its database, which means that a solution that works today might easily prove inadequate tomorrow. When problems began cropping up with its old search solution, NewspaperARCHIVE realized it needed a platform that would be robust enough to handle the company's vast database of content while simultaneously ensuring that users had rapid access to search results and allowing the company to easily update its database to reflect new acquisitions and changing agreements with publishers.

Exalead is a developer of enterprise search solutions and web search technologies based in France. The company was originally founded in 2000 by two former AltaVista employees whose initial goal was to create a Google-like search engine for Europe. Its focus quickly turned to enterprise search. In addition to its flagship Exalead CloudView search platform, the company offers unified personal search in the form of Exalead Desktop, as well as SaaS enterprise search through Exalead On Demand. In June, Exalead announced that it had been acquired by product management developer and existing business partner Dassault Systèmes.;

Unlike many enterprise databases that are used primarily for internal record keeping and data management, NewspaperARCHIVE's database serves customers as well as internal users. Since the service is targeted at individual researchers (such as genealogists and history buffs) as well as larger institutions, poor service and slow searches can lead directly to lost business. As NewspaperARCHIVE continued to grow, the company found that Autonomy, its previous search solution, was struggling to keep up with the volume of data and frequent updates that the service required. Derek Fiscus, NewspaperARCHIVE's director of technology, knew he had a real problem on his hands.

"Because we deal with small clients, they expect every document to be searchable. And it got to the point where I would present a document to our last product and it would say it made it searchable, but it really didn't," says Fiscus. "So we had to constantly audit it."

NewspaperARCHIVE's Autonomy-based implementation was also becoming difficult to update, which impacted the service's ability to grow. "Because we just OCR [optical character recognition] the text off microfilms and newspaper pages, sometimes [the result] can be really, really, really poor. So you get a lot of words that really aren't words," says Fiscus. To compensate for this, NewspaperARCHIVE had to create search dictionaries. Unfortunately, that gave rise to yet another problem. "If we added in content and needed to expand our dictionary, not only did we have to fix all these documents that we were presenting-120 million-we also had to recall them all," says Fiscus.

Increasing the size of the search dictionaries solved one problem, while it created another: reduced search speed. "The previous product couldn't ingest all these words and provide search that was reasonable to what the public expects, which is as fast as Google," explains Fiscus.

NewspaperARCHIVE's final problem was one that will sound familiar to anyone who has managed a major information project: cost. Tasked with handling the company's 1.5 terabytes of searchable data, the company's old platform was gobbling up resources. "It only allowed us to have about 5.7 million documents per server," says Fiscus. "It became an issue of how much rackspace and power you were going to use."
Confronted with all these issues, Fiscus decided that enough was enough. "Taking into consideration what my customers were saying, I made the decision that looking for a new search engine was the best option," he says.

Fiscus and NewspaperARCHIVE looked at a number of different solutions that could meet the company's search needs, both open source and proprietary. To get started, Fiscus traveled to Information Today, Inc.'s Enterprise Search Summit West conference, where he narrowed it down to three possible contenders: Dieselpoint, Exalead, and the open source Apache Solr server.

According to Fiscus, the company couldn't afford to take a risk on a completely custom solution that might take a year of development time. It needed something that could be implemented quickly and efficiently. After considering the strengths and weaknesses of each, NewspaperARCHIVE ultimately went with Exalead.

Not surprisingly, Eric Rogge, senior director of marketing at Exalead, feels that NewspaperARCHIVE made the right choice. Given Exalead's origins in the field of web search, its products were engineered with web content in mind. "The technology [NewspaperARCHIVE is] using is usually designed for content management behind the firewall," says Rogge. "[Exalead] was originally designed to handle web content."

Exalead's CloudView grew out of the company's earlier attempts at creating a traditional web-based search engine. The current iteration of the product maintains a straightforward web interface that lets users collect and search a wide variety of both structured and unstructured data. Results are automatically sorted by categories, and the engine can also make suggestions and corrections to search terms based on alternate, incorrect, or phonetic spellings. The platform is also capable of sentiment and semantic analysis for use in discovery and analytics applications.

According to Rogge, CloudView is fully functional and easy to implement as a vanilla implementation while also maintaining a degree of extensibility. "We designed the product to be useful out of the box. But we also designed the product to be highly customizable," he says. In addition to end-user search applications such as NewspaperARCHIVE's, CloudView has been used to organize phone directories, IT help desks, economic development data, automotive sales listings, and ecommerce sites.

Rogge notes that a large chunk of the company's business comes from companies such as NewspaperARCHIVE, which are transitioning from another product that no longer suits their needs. "For enterprise search, we do a lot of replacement business: FAST, Autonomy, Google Search Appliance," says Rogge. "The most common reason why we get replacements is because we're a more economical solution. If you compare us to some of the other competitors out there, those products are very cumbersome to implement."

According to Rogge, replacing an existing implementation is a two-step process: "It's pretty straightforward. Usually, what we do is we build our index alongside and then we migrate the user interface." He explains that most of the implementation time is spent mimicking the existing user interface and incorporating any improvements the client might want, although the indexing time can vary based on the database.

"I saw them index an IT service desk app in 15 minutes," says Rogge. "We've had other applications where it [took] a fair amount of time." According to Rogge, the duration and difficulty of a transition ultimately comes down to the complexity of the existing user interface, the variability of the content being indexed, and the extent to which semantic extraction must be performed on the data.

Since migrating to Exalead's CloudView, Fiscus and NewspaperARCHIVE have witnessed a number of improvements. For instance, some unexplained issues that the service had been experiencing were suddenly gone. "We always had this sort of ghost in our website," says Fiscus. "We weren't quite sure what was going on ... but the week after we released [with Exalead], we quickly noticed that all of our long request times and unfulfilled requests that were showing up in the logs before were no longer there."

Because CloudView uses system resources more efficiently than NewspaperARCHIVE's previous solution, Fiscus was also able to make better use of NewspaperARCHIVE's servers. "Before, because of the amount of servers we were running, I could only afford one search node," he says. "Now I have three."

There were also several improvements in the day-to-day administration of NewspaperARCHIVE's databases. "One is how easy it is to update my metadata," says Fiscus. NewspaperARCHIVE uses more than a dozen different metadata fields that correspond to the subject, content, location, and date of stories, as well as specific publisher agreements. While the old platform required custom coding to manage this content, Fiscus says that Exalead made the process much simpler.

"I had to turn off content because our relationship with the publisher was no longer in existence," says Fiscus. "With the other search engine, I had to write custom code. ... It actually made the search results less relevant as well. With Exalead, you can simply flag it for deletion, and you can either leave it in there or tell it to purge it later. We'd be in a legal bind if we put it out there."

According to Fiscus, it took about 100 days from signing a contract with Exalead to being able to take the old search engine offline and replace it with CloudView. There was also a last-second crisis that was averted thanks to a quick response from Exalead.

"We go to turn it on at 6 in the morning, and it wasn't returning results," says Fiscus. To make matters worse, that particular morning just so happened to be 2 days before Christmas-the day the new search engine was set to debut. According to Fiscus, Exalead put people from both sides of the Atlantic on the case, and after 13 hours of work, the engine was back up and running.

Although Fiscus notes that there have been some minor issues with custom database connectors, overall, he is very pleased with Exalead and has twice served as a reference for the company. "They have very professional and responsive search engineers that they assign to people," he says. "Sometimes they get a little overtaxed, but I suppose that's a good problem."

Back to Contents...

Connecticut Innovations Makes Follow-on Investment in ExeCue Inc.

Connecticut Innovations (CI), a quasi-public authority responsible for technology investing and innovation development operated by the state of Connecticut, announced a follow-on investment of up to $550,000 in ExeCue Inc. of Stamford, Conn., through its Eli Whitney Fund. The investment is part of a round of up to $1.6 million also involving individual investors.

ExeCue developed a knowledge-driven search platform that helps business and Internet users retrieve search results from both structured databases and unstructured repositories like email, file systems, and web content. The company operates to demonstrate the capabilities of its search platform, with a focus on building, sharing, and monetizing search applications. The site hosts a variety of search applications that offer search capabilities for SEC filings, government spending, economic metrics, and the U.S. Census.


Back to Contents...

Managing SharePoint

Open Text has announced a set of products and services designed to help enterprise (IT) groups centrally manage large numbers of Microsoft SharePoint 2010 sites from creation through archiving.

The consulting services will be led by Open Text’s recent acquisition of Burntsand, whose consultants focus on SharePoint 2010 and ECM implementations.

Using the recently released SharePoint 2010 version of Open Text Content Lifecycle Management (CLM) and Open Text Case Management Framework for SharePoint 2010, Open Text services can help IT departments deploy the infrastructure needed to take control over unmanaged SharePoint 2010 deployments. Moreover, it can help accelerate the rollout of critical projects by giving users simple tools to create and deploy SharePoint 2010 sites and applications in accordance with corporate governance policies and manage the entire life cycle of SharePoint 2010 sites.

Back to Contents...

Google Brews A Cup of Instant Search

Search giant Google announced the latest addition to its search engine. Dubbed Google Instant, the feature constantly updates a live window of search results as the user types in a query, refining the results in real time as the user updates the contents of the search bar, without the need to launch a new search with each new term. Google Instant can also predict likely searches based on current input and allows users to scroll through predictions to see additional results.

According to Google, the primary benefit of the new feature is faster searches, with the company claiming Google Instant saves the average searcher two to five seconds per search. The new feature requires Firefox version 3.0 or newer, Internet Explorer 8, Safari 5, or Google Chrome 5 to function. It is being rolled out gradually and may not be immediately available to all users.


Back to Contents...

Trend Setting Search Products

Check out KMWorld's Trend-Setting Products review section in this month's issue

According to the editors, "The solutions listed below were selected by the panel because each demonstrates thoughtful, well-reasoned  innovation and execution for the most important  constituency of them all: the customer."

A2iA: A2iA DocumentReader—classification of digitized documents into categories (letter, identity papers, contract, etc.) based on both their geometry and their content.

Abbyy USA: FlexiCapture 9.0—accurate and scalable data capture and document processing system.

Accusoft Pegasus: Prizm Viewer—a Web-based image viewer.

Aivea: Commerce Server—scalable, service-oriented architecture (SOA), Web services-based e-commerce software platform with built-in integration to Microsoft Dynamics CRM, GP, NAV and AX.

Alfresco: Version 3.3—a platform for building and delivering future-proof, content-rich applications.

Alterian: Content Manager—complete Web solution allowing organizations to learn about customer Web preferences and target them with the appropriate content.

AnyDoc Software: CAPTUREit—software can work as a standalone document capture application, or as part of an end-to-end OCR for AnyDoc automated document and data capture and processing solution.

Appian: BPM Suite—100 percent Web-based BPM platform, delivering the ease of use, comprehensive features and flexibility required to accelerate process improvement.

Applied Knowledge Group: KM Solutions built on SharePoint—collaboration and knowledge management solutions for government, commercial and non-profit clients on SharePoint.

Aquire: OrgPublisher Premier—succession planning, organizational planning and powerful organization charting.

ASG Software: ViewDirect Suite—a highly scalable, full-featured archiving platform.

Attensity: Attensity360 (formerly Biz360 Community)—continuous monitoring and analysis of social media conversations and their impact on businesses.

Attivio: Active Intelligence Engine—extends enterprise search capabilities across documents, data and media.

Autonomy: Virage—video and audio analysis technology for video specialists, enabling enterprises to leverage rich media content.

AvePoint: DocAve Software for SharePoint—platform for creating a centralized knowledge repository and collaborative workspace.

Basis Technology: Rosette Linguistics Platform—multilingual text management software for information retrieval, text mining, concept extraction, search engines, etc. Enterprise—allows businesses to share, manage and access all their content online.

Brainware: Globalbrain Enterprise Edition—locates information on PCs, enterprise databases, file servers, the Internet, etc., supports more than 250 file formats.

Bridgeline Digital: iAPPS Product Suite—Web application management solution with deep, data-level integration within fully functional Web analytics.

Bridgeway: eDiscovery—integrated legal business technology solutions for corporate law departments and government agencies.

Caspio: Bridge 6.8—solution to visually present data on the Web with new charts feature to translate flat data into visual reports and dashboards.

Citrix Online: GoToMeeting—Web conferencing and meeting solutions.

Concept Searching: conceptClassifier for SharePoint—automatic document classification and taxonomy management to Microsoft SharePoint; works without the need to build another search index.

Connotate: Agent Community GEN2—allows non-technical users to create powerful, customized intelligent agents to access high-value content deep on the Web or in the enterprise.

Consona: Consona CRM Software Product Suite—knowledge-driven support, knowledge management, live assistance,  subscriber assistance, dynamic agents and enterprise CRM.

Content Analyst: Content Analyst Analytical Technology (CAAT)–document analytics OEM software to find, organize, and discover relevant documents using conceptual search technologies.

Copyright Clearance Center: Rightsphere—Web-based rights advisory and management service that helps corporations promote collaboration and the free flow of published information while respecting copyright.

Coveo: Customer Information Access Solutions—a dashboard approach to provide users with a complete, 360-degree view of customers and call center operations.

CT Summation: CaseVault Services—a full-service consulting and ASP hosted offering providing clock project management, ESI forensics, litigation support consulting and technical support.

Darwin Ecosystem: Awareness Engine—an enterprise solution that fetches and correlates its own Web 2.0 information sources, as well as WWW selected sources, to provide timely awareness about what is happening in the enterprise and the Internet.

Datameer: Datameer Analytics Solution—designed to solve the challenges of accessing, analyzing and using massive amounts of data using Apache Hadoop as its foundation.

Digital Reef: DigitalInformation Governance—robust services to govern the life cycle of information and physical assets.

Discover Technologies: DiscoverPoint—automatic relevant content delivery and expertise matching, delivering access to experts, resources and published content automatically.

eGain: eGain Social—a solution to monitor social media sites for conversations about the organization, enabling companies to identify and respond to customer inquiries or complaints.

Ektron: CMS400.NET—a platform with complete functionality to create, deploy and manage world-class Web sites.

EMC: SourceOne Suite—family of products and solutions for archiving, e-discovery and compliance aimed at helping companies centrally manage multiple content types in order to apply consistent retention, disposition and overall life cycle management.

Endeca: Information Access Platform—foundation upon which configurable, search-based business applications can be built and deployed.

EPiServer: CMS 6—a site management platform where certain features and functionality appeal more to specific roles; business owners, editors, marketers and developers.

eTouch: SamePage V. 4.3—an enterprise wiki and Enterprise 2.0 solution for knowledge management, including content and document management.

EXSYS: CORVID—a robust knowledge automation expert system development tool.

Exterro: Genome—dynamic data mapping solution that allows corporate legal and IT teams to identify, manage and analyze the life cycle of data sources and electronically stored information.

Fabasoft: Mindbreeze Enterprise—finds distributed information in all data repositories across the corporation ... e-mail, notes, calendar entries, contracts and personal data.

FatWire: Digital Asset Management Solution—a platform that enables interactive marketers to maximize the value of rich media to drive brand consistency, marketing effectiveness and cost efficiencies.

FTI Technology: Ringtail Analytics—provides e-discovery teams with data analytics and visual review software to address a wide variety of e-discovery tasks.

Global360: analystView 3.0—offers business analysts process modeling and simulation functionality within Microsoft Visio 2010.

GlobalNet Services: Google Search Solutions—search optimization and clustering, security and personalization for structured and unstructured data.

Google: Commerce Search 2.0—hosted, complete search solution for retailers.

HP: HP TRIM 7—a records management system with SharePoint support that provides a scalable, policy-driven foundation for information governance strategy.

IBM: Content Manager—imaging, digital asset management, Web content management and content integration for multiple platforms, databases and applications.

iDatix: iSynergy—integrated content management designed to enable organizations to innovate their approach to managing content and simplify their daily workflow.

IGLOO Software: Online Community Solutions—integrated suite of content management, collaboration and knowledge sharing tools within a single secure social business platform.

Information Builders: iWay Enterprise Information Management (EIM) Suite—real-time management of any information from anywhere across your entire enterprise.

Inmagic: Presto—robust, complete social knowledge management platform.

Innodata Isogen: Content Services for Publishers—consulting, technology, editorial and production services to media, publishing and information services.

InQuira: Intelligent Search—a unified system combining advanced semantic search techniques and contextual understanding.

Integrify: Integrify 5.0—a lean BPM solution that helps organizations reduce costs and improve employee satisfaction by providing process definition, workflow automation and visibility for areas such as IT, human resources, finance, sales, marketing and other services.

IntelliResponse: Instant Answer AgentInstant Answer e-service suite—enhances consumer experience across a variety of interaction channels, including corporate web sites, agent desktops, social media platforms and mobile devices.

ISYS Search: ISYS Document Filters—an embeddable set of document filters for extracting text from a comprehensive library of file, container and e-mail formats.

JackBe: Presto Platform—allows application developers and power users to create, customize and share enterprise application mashups.

Jive Software: SBS 4.5—integrated collaboration, community and social networking software for all types of online communities: employee, public or both.

MarkLogic: MarkLogic Server—a powerful XML server to accelerate the development of information applications to meet all sharing and delivery requirements.

Metastorm: Enterprise—aligns business strategy with execution while optimizing the types and levels of business resources it uses.

Microsoft: SharePoint 2010—arguably the most versatile business platform ever.

MindTouch: MindTouch CLOUD—an affordable tool allowing users to collaborate with the ease of a wiki but with the capabilities of an enterprise platform.

MobilVox: IRIS—intelligent retrieval information system running in a browser, enabling search of hundreds of gigabytes of data on the network.

NetBase: ConsumerBase—helps to create consumer insight strategies by capitalizing on social media explosion and internal data.

Newsgator: Social Sites—integrates directly into SharePoint’s collaboration to add value to customers and unify technology infrastructure.

nGenera: Spaces—wikis, blogs, and messaging that allow users to connect and share using familiar tools.

Noetix: Noetix Analytics—a packaged analytics solution for the Oracle E-Business Suite designed to greatly speed the process of implementing a data warehouse.

Northern Light: MI Analyst—text analytics and meaning extraction application optimized for business research applications.Omtool ( AccuRoute—a document handling platform that captures, converts and distributes paper and electronic documents.

Open Text: ECM Suite—a robust platform addressing all management capabilities needed to handle each type of enterprise content—including business documents, records, Web content, digital, e-mail, forms, reports, etc.

Oracle: ECM 11g—a suite that tightly integrates its content management platform into the Fusion Middleware architecture and is built on a unified content repository.

PaperThin: CommonSpot—a user-friendly, intuitive Web content management system.

Parature: Customer Service—a SaaS offering integrating a customer portal, rich knowledgebase and full trouble ticket software.

Project Performance Corp.: KM Solutions—consulting and IT solutions with an emphasis in the areas of environment and energy.

Raytion: Enterprise Search Connectors—a family of vendor-independent enterprise search connectors that enables secure and easily retrievable business-critical information.

Recommind: Decisiv Search—automated categorization, filing, storage and retrieval of e-mail-based information.

Rivet Logic: ECM Solutions—enable organizations to transform traditional content repositories and static intranets into dynamic, collaborative work environments through open source functionality.

RSD: Enterprise Output Solution (EOS)—enables organizations to achieve concise and compliant information deliveryin a heterogeneous and demanding IT environment.

SAS: SAS Text Analytics—a framework that enables organizations to maximize the value of information within large quantities of text that is generated, acquired or exists in repositories.

SDL: SDL Tridion—an enterprise Web content management solution facilitating delivery of personalized content to target audiences internationally.

Search Technologies: Search Application Assessment Process—comprehensive customer assessment program that provides a structured and pragmatic approach to search implementation.

Sinequa: Enterprise Search 7.0—combines linguistic and semantic technology to drive intelligent decision making.

Sitecore: Sitecore CMS—full Web content management suite to create and maintain dynamic sites of all types.

Siteworx: Enterprise Search Consulting—facilitates development of intuitive, customized search interfaces for a wide variety of business sectors.

Smartlogic: Semaphore—three components that analyze, classify and reveal content, including ontology management, content classification and search enhancement.

Sophia Search: SOPHIA—enterprise search solution that understands the different contextual meanings for a query and intelligently returns results organized in structured thematic folders.

SpringCM: Content Management Service—affordable, cloud-based content and document management.

Stored IQ: Intelligent Information Management (IIM) Platform—software-based appliance for managing the intersection of e-discovery, information governance, records management and storage management.

SumTotal Systems: ToolBook 10.5—e-learning solution that allows subject matter experts and learning professionals to create interactive learning content, quizzes, assessments and software simulations.

SwiftKnowledge: Version 9 for the Enterprise—business intelligence software with "mashboards" of structured and unstructured data, Web 2.0 content, search, new user interface and new dashboard/report layout designer.

Symantec: Norton 360 Version 4.0—threat protection to detect viruses, Trojans horses, spyware, etc.

TEMIS: Luxid—text-mining discovery and knowledge extraction tools for immediate access to internal and external data sources.

Traction Software:TeamPage—pioneering Enterprise 2.0 solution featuring security, threaded discussion, moderation, document management, social networking, search, etc.

Unify: SQLBase 11.5 SP3—database technology designed for ease of use, high performance and worldwide application deployment.

Vivisimo: Velocity Platform—designed to harness and optimize information regardless of the source and to drive innovation, real-time decisions and actionable insight.

Xerox: DocuShare—enables users to capture, manage, share and protect a wide range of paper and digital content in one secure, central and highly scalable repository.

ZyLAB: eDiscovery and Production System—engineered for legal departments to efficiently manage the most expensive and tedious elements of litigation in house with automated tools.

Back to Contents...
[Newsletters] [Home]

Problems with this site? Please contact the webmaster. | About ITI | Privacy Policy