News   Features   White Papers   Research Reports   Web Events   Conferences  
June 22, 2011

Table of Contents

Semiotics for enterprise search
Recommind and LexisNexis partner
Managing SharePoint content
ZyLAB tackles sound files: E-discovery audio search
Enterprise search and object storage
Semantics for science
Construction company migrates content to SharePoint 2010

Semiotics for enterprise search

Readers of KMWorld are comfortable with the semantic processes that knowledge management systems employ to make sense of business information. There are knowledgebases, taxonomies, controlled vocabularies and access control tags. Knowledge management is more than key word search and retrieval, although finding information is part of the discipline.

Many of the flagship companies offering knowledge management systems have wrapped a basic string matching system with various enhancements. Some systems process e-mail and use data from those analyses to pinpoint an individual who is a "hub" for information exchange in an organization. Other vendors' systems generate relationship maps, sometimes described as social graphs. With those maps, a manager can learn who is an expert in a particular topic based on the content flowing through the system. There are many variations, and organizations have experienced major successes with systems from different vendors. At the same time, other licensees of the same vendors' systems report less helpful results.

I had an opportunity to learn about a new approach to access information within an organization. Unlike most of the systems we have tested, Sophia Search, headquartered in Belfast, Northern Ireland, has enlisted the art and science of semiotics to make information access and management more useful. Collaborating with colleagues at the State University of St. Petersburg, Russia, Dr. David Patterson from the University of Ulster in Northern Ireland, developed a patented approach to information retrieval. He and his co-founder, Dr. Vladimir Dobrynin, established Sophia Search in 2007. The company is now competing in the enterprise software sector with the likes of Autonomy, Endeca, Exalead, Google, and Microsoft, among others. 

Semiotics focuses on signs and symbols as indicators of meaning. As implemented, the approach enables "Sophia to understand and interpret the meaning and context of information within documents," Patterson said when I interviewed him in February. Sophia has tuned its approach so that the "meaning and relevancy of a document depends on both the user's query and, importantly, all the other documents within the organization," he added.

Contextual discovery

Patterson explained, "Sophia is based on a model of linguistics, called semiotics, which is the science behind how we as humans understand the meaning of information in context. This is the power behind the technology that drives our discovery engine and ability to improve the findability of information."

On the surface, Sophia works like or or one of the big enterprise search solutions from Autonomy or Exalead. Sophia describes its approach as a "contextual discovery engine." The system processes the words in the source document and then automatically disambiguates the different means of words based on their context in the source document.

Autonomy's "meaning-based computing" and Recommind's approach appear to be similar to the Sophia system and method. However, Patterson pointed out that semiotics is the key to the firm's technology. Sophia searches by the meaning of what users are looking foras opposed to just the key words they use in their query. Sophia is designed to enable users to discover contextually relevant information of which they were previously unaware, and it increases users' understanding of their content.

Most information processing systems require the system administrator or a subject matter expert to tune the system. Patterson said, "One of the benefits of our technical approach is that Sophia operates without human guidance or training, and it does not require taxonomies, ontologies or thesauri."

According to Patterson, Sophia's approach is novel, possibly unique, in the world of information retrieval and content processing. "What motivated me was solving the problems that faced the world of search from a research perspective. I was aware of the limitations of some commercial systems through my own research and experience," he said. "The more I thought about the problems of locating the information I needed, I began to question the basic assumptions that conventional search vendors make."

The knowledge management application of Sophia is that the system can work with other enterprise software systems. If an organization has licensed a major vendor's search system, Sophia can enhance that system. It also works with the Google Search Appliance. Patterson said, "Sophia was developed with an open architecture to enable ease of integration through our Java APIs and RESTful Web services. In this way, we have made it easy to augment other search tools with Sophia's contextual capabilities and to build additional applications based on third-party products."

Crafting query, not so easy

The core of Sophia's approach is that the developers worked to avoid the pitfalls that have plagued other information retrieval systems. Those included a realization that the user knows what he or she is looking for before running a query. The Sophia team recognized that many users cannot express a specific information need. As a result, Sophia's developers wanted to provide a system that worked around forcing the user to craft a query.

He said, "Experience and research revealed that users often find creating a quality query a challenge."

The Sophia system features a search box, but the system also displays links to other relevant content. It depends on its patented semiotics approach. According to Patterson, "The system method organizes and presents information contextually, users spend less time sifting through irrelevant information and can focus on information that they know is of value."

The benefits of the approach pivot on reducing the time a user needs to locate desired information. Patterson gave me this example: "Imagine that you are interested in queries like ‘Where is Joe's office?' or ‘What is his phone number?' In those instances, deploying Sophia is overkill. We don't add any further value over other tools such as Google Search Appliance or some of the other basic key word systems in this instance. But if you are interested in discovering what topics exist in your data, or unearthing new information related to your query that you didn't know existed, or deciphering how documents are semantically linked to one another within a particular context, or understanding the meaning of your information at a glance, then Sophia is a tool that is worth spending time evaluating."

The Sophia system can create customized search reports that can be exported as PDF or XML to enable ease of integration with other analytical tools within an organization.

Patterson said, "Sophia enables and encourages results sharing among employees to reduce the amount of time people spend re-executing queries already carried out by others, and it can automatically watch the corpus for new information indexed after a result set has been returned to the user."

The challenge facing Sophia Search is difficult. The knowledge management and content processing sector is characterized by fierce competition. The emergence of open source options for search such as Lucene/Solr, for business intelligence such as Pentaho, and for data management such as Cassandra puts pressure on commercial, proprietary enterprise software.

The opportunity Sophia has is to demonstrate a better way to tap into the unstructured content that an organization possesses. If Sophia can significantly reduce the time required for a user to locate the item of information needed to close a deal or resolve another business issue, Sophia could gain traction in both the enterprise knowledge management and the search-and-retrieval markets.

The present business climate remains unforgiving. A number of enterprise content processing firms have undergone some organizational shifts. Executives have rotated at Lucid Imagination, MarkLogic and Sinequa since the first of the year. Other firms have been repositioning themselves into a wide range of vertical markets, including financial services, customer support, healthcare and competitive intelligence. The payoff from those changes is not yet known.


Sophia's reliance on semiotics may be the breakthrough required in content processing and information retrieval. Vendors with more traditional methods face longer sales cycles and the shadow of the free, open source software option.

Patterson sees today's troubled marketplace as benefiting the buyer. Sophia's sales approach focuses on the customer's need, according to Patterson. "At Sophia, we discuss with the customer right at the start where the strengths of Sophia lie, because we don't want to waste their time if we don't believe it is the best tool for their needs," he said.

Sophia wants to make partnering a key component of the customer's business strategy. "We believe our technology is not just a standalone product," Patterson said, "but also it is complementary to many existing solutions and can be used to enhance their capabilities. "

Reliance on key word indexing causes some knowledge management systems to crash at take off. To get over the hurdles to enterprise knowledge management and information retrieval license deals, Sophia will rely on the lift from the power of semiotics to take flight.   

Back to Contents...

Recommind and LexisNexis partner

Recommind has formed a strategic hosting and sales alliance with LexisNexis to provide hosted e-discovery review and analysis. The new solution enables rapid deployment of Axcelerate On-Demand, Recommind’s end-to-end e-discovery review platform with Predictive Coding, its technology for computer expedited document review capabilities.

The alliance with Recommind provides LexisNexis clients with more options and greater flexibility in discovery. The companies further claim the US-based service will enable enterprises and law firms to dramatically reduce the costs and timelines associated with document review and analysis as part of litigation and regulatory investigations.

Back to Contents...

Managing SharePoint content

MetaVis Technologies has unveiled Information Manager for SharePoint 2010, which is said to enable users to bulk import, copy and classify content directly from the SharePoint user interface. MetaVis reports Information Manager is embedded directly into the SharePoint 2010 ribbon, allowing authorized users to classify, upload and copy content using familiar controls and interfaces.

Further, Information Manager allows authorized users to copy content between different SharePoint sites, site collections, list and libraries; as well as bulk import from File Shares (while retaining file system metadata) and tag in a single process from the SharePoint ribbon, giving users the flexibility to apply metadata in bulk during file import, copy or just in-place to improve SharePoint search and findability.

MetaVis emphasizes that Information Manager allows users to:

  • import content in bulk from file systems, set metadata values or map from NTFS and Folder names; and maintain or remove existing folder structures;
  • select multiple documents or items and copy them to another list or library while applying or remapping existing metadata values;
  • change content types for multiple selected documents or items;
  • re-map fields to copy metadata values from one field to another;
  • enter values directly or select them from standard SharePoint columns including Managed Metadata Fields linked to Term Sets; and
  • perform metadata enrichment and content classification in SharePoint sites.

Back to Contents...

ZyLAB tackles sound files: E-discovery audio search

ZyLAB has unveiled its Audio Search Bundle, a desktop software product engineered to identify relevant audio clips from multimedia files and from business tools such as fixed-line telephone, VOIP, mobile and specialist platforms such as Skype or MSN Live. It is designed for technical and non-technical users involved in legal disputes, forensics, law enforcement and lawful data interception to search, review and analyze audio data with the same ease as more traditional forms of electronically stored information (ESI).

ZyLAB says Audio Search Bundle transforms audio recordings into a phonetic representation of the way in which words are pronounced, so that investigators can search for dictionary terms as well as proper names, company names or brands without the need to “re-ingest” the data.

With the ZyLAB Audio Search Bundle, forensic investigators and attorneys can identify and collect audio recordings from various sources with far greater efficiency and effectiveness than was ever possible with manual processing. The software supports multiple search techniques simultaneously, such as Boolean and wildcard, leading to greater accuracy and relevance of results. The fast, iterative search helps to reduce the size of the data set and the costs for review.

The ZyLAB Audio Search Bundle supports all industry-standard audio formats, including G711, GSM6.10, MP3 and WMA, as well as the audio component of video files. The bundle is available with the ZyLAB eDiscovery & Production System, which is fully aligned with the Electronic Discovery Reference Model (EDRM) or any other ZyLAB system.

Back to Contents...

Enterprise search and object storage

Caringo, a provider of object storage software for digital content, and Perfect Search have joined forces to provide mutual customers with a solution for precise search of objects stored in Caringo object storage software (CAStor), the companies say. Using either metadata or the full-text content of the objects being stored, the combined solution enables the search of petabytes of data in seconds, instead of minutes or hours. It searches billions of objects and returns relevant content in seconds.

The Perfect Search Appliance (PSA) indexes the metadata and content of each object as it is stored, keeps track of the locators for each piece of data and provides precise hits as to where the query was found to enable immediate click through to the document, image, backup or file. When used in conjunction with CAStor, the PSA provides customers with rapid access to files or objects within the object store to provide customers with a comprehensive data management index and search solution.

Back to Contents...

Semantics for science

Springer Science+Business Media and TEMIS, a semantic content enrichment company, have extended their strategic collaboration on semantic enrichment and linking of content for the SpringerLink platform.

Springer and TEMIS have long collaborated on facilitating information access on the SpringerLink portal by offering navigational tools to customers searching for the scientific content most relevant to their topics of interest. TEMIS’ Luxid Content Enrichment Platform leverages a combination of linguistic and statistical methods to calculate semantic relatedness among the millions of publications accessible on SpringerLink. The approach enables the automatic recommendation for each available article or book chapter of a selection of highly relevant, semantically related documents, without requiring specific editorial efforts, according to the companies.

They add that the goal of the recent extension of the partnership is to link not only documents to each other, but also to concepts originating from structured domain-specific vocabularies and to key topics derived from the documents’ content.

Back to Contents...

Construction company migrates content to SharePoint 2010

Winter Park Construction, with employees and construction sites nationwide, uses a SharePoint portal to help manage its project documentation. WPC needed a migration strategy for moving to SharePoint 2010 and chose a third-party migration tool, MetaVis Migrator, for that purpose. The general contracting company migrated content in bulk from SharePoint 2007 and 2010 beta, preserved existing metadata and classified content as it moved to SharePoint 2010.

William Henderson, IT director for Winter Park Construction, says, “We were at a roadblock on getting our SharePoint sites migrated properly, and there were no easy strategies coming from Microsoft. MetaVis saved us months of migration work. We could not have completed the project without MetaVis Migrator and Classifier. More importantly, these products have become an instrumental part of our day-to-day SharePoint management.”

MetaVis says it provides Winter Park Construction with a cost-effective system for moving content in bulk to SharePoint 2010 with corresponding libraries. Because the solution preserves the integrity of data, according to MetaVis, the construction company is able to migrate without the need for much reconfiguration.

Winter Park Construction has also implemented MetaVis Information Manager for SharePoint 2010, providing users with the ability to bulk tag and classify content directly from the SharePoint interface. Consequently, users have greater flexibility to change or add new metadata, making documents easier to search across all sites, according to MetaVis.

Back to Contents...
[Newsletters] [Home]

Problems with this site? Please contact the webmaster. | About ITI | Privacy Policy