The definition of the word “semantics” sounds straightforward enough: According to Merriam-Webster, it is the study of meanings. However, if you ask a technologist, an editor, or an advertising rep, you may come up with widely differing interpretations. They might say it’s a practical application of artificial intelligence; a way to streamline back-office operations and make websites stickier; or a means
of contextualizing advertisements for higher efficacy.
None of them is wrong.
Ironically, it is the very breadth, depth, and range of possibility inherent in semantic technology that can prevent content companies from experimenting with it, though it may be one of the most useful commercial innovations of the past decade. The murkiness of the word itself, not to mention the standards, acronyms, and jargon that can dominate the discussion of semantics, only adds to the confusion. David Siegel, author of Pull: The Power of the Semantic Web to Transform Your Business (Portfolio, 2009), says, “Semantics was a bad word from the get-go. A better word would have been unambiguous.”
As both understanding of the possibilities inherent in the semantic web and tools to harness it have matured, semantic technology has finally gained a foothold in practical business applications, in areas from search to back-office processing to advertising. Siegel believes that enterprise adoption has reached a critical growth point. “I’d say we’re solidly 1% of the way there, in terms of adoption for the enterprise,” says Siegel. “And that’s very good.”
Particularly for the content industry, semantic technology offers a compelling story. At the June 2010 SemTech conference in San Francisco, Bob DuCharme of TopQuadrant, a provider of semantic web technologies, pointed out that publishing and semantics share much in common. “The publishing industry has lots of real data and metadata already, and they have experience in developing vocabularies,” DuCharme noted.
Tom Tague, vice president of platform strategies at ThomsonReuters who leads the OpenCalais initiative, a web service that automatically creates rich semantic metadata, agrees that within the content world, semantics has found its beachhead. “The future is here,” Tague says, citing a favorite aphorism. “It’s just not evenly distributed yet.”
A Brief History
Semantics first made a big splash with a 2001 article in Scientific American by Tim Berners-Lee, James Hendler, and Ora Lassila called “The Semantic Web.” The article laid out a future on the web in which common data formats and definitions would enable people (and machines) to share information at a much more granular level, moving through disparate databases to find interrelated information based
on a shared “aboutness.” Instead of the document in which it resided, the lowest common denominator on the semantic web envisioned by the authors was the information itself.
To provide a simple example, it’s the difference between typing “hotel Berlin” into a search engine from 10 years ago, and then again in 2010. Back in 2000, the searcher might have received long lists of links to various hotels (and one movie) called “Hotel Berlin,” and each link would have to be opened and read individually. These days, a search engine utilizing semantic technologies, such as Bing, brings back a page with a map showing locations of various hotels, along with links to Berlin attractions and Berlin tours, and lists of hotels presorted into price categories. By understanding the meaning and intent behind the search term, semantics makes it easier to find the correct information quickly.
To achieve that vision, producers of content needed to add metadata to information resources. It was critical that data standards and ontologies—that is, formalized vocabularies of terms—were agreed upon so that anyone putting data onto the web could agree that “revenue” in one database was the same as “revenue” in another. There are a number of techniques for doing this, and the applicability of each depends on the complexity and flexibility required by a specific application.
For instance, the resource description framework (RDF) publishing standard is an eXtensible Markup Language (XML)-based standard for describing resources that exist on the web, intranets, and extranets, including metadata such as title, author, modification date of a webpage, copyright, and licensing information about a web document.
Underlying RDF are “triples” that match a subject with a predicate and an object, each of which describe a particular aspect of the subject. One example of a triple might be “chicken” “is” “animal.” Another is “chicken” “is” “bird,” and still another is “chicken” “has” “feathers.” The number of triples associated with a particular subject is nearly infinite, which provides the flexibility that is touted as a major advantage of semantics. As new categories emerge that may relate to “chicken,” such as “bird flu,” a new triple can be easily added.
Other data organization techniques, such as Web Ontology Language (OWL), Simple Knowledge Organization System (SKOS), and Rule Interchange Format (RIF), can be used alone or in combination to describe information parameters. Purpose-driven standards have also emerged to solve specific vertical information needs on the semantic web, such as eXtensible Business Reporting Language (XBRL), designed to facilitate the exchange of business and financial information on the web. Siegel says, “XBRL is a winning example of how to make things interoperable and shareable. It’s not overly flexible, but it’s well specified.”
Staying Focused on Answers
All these approaches and acronyms have had the unfortunate tendency of helping to keep semantics shrouded in mystery. Steven Kludt, EVP of marketing at Cambridge Semantics, which provides semantic middleware and application development tools, says, “If semantics is going to live up to its promise, we need to stop talking technology and start talking business benefits.”
Tague says that OpenCalais had an eye-opening client meeting last year where this became apparent. “The client put a hat in the middle of the table and said that anyone who used the word ‘ontology’ would have to put a dollar into it,” Tague recalls. “He said, ‘I just want to understand how it solves my business problems.’” Tague believes that steering the conversation away from the technical aspects of semantics and toward the real business applications is what has helped OpenCalais start gaining significant ground in the enterprise world over the past year.
One of the most obvious “real” applications of semantic technologies is in search. Seth Grimes, analytics strategist at Alta Plana Corp., says, “Semantics enables search engines to extract information from sources and combine it on-the-fly to provide complete answers—not just hit lists—to the questions that are implicit in most searches.” Vendors such as Cognition Technologies have painstakingly mapped the English language to enable the use of natural language processing (NLP) so that users can enter searches in plain English to achieve meaningful search results.
However, it is important to note that semantic search isn’t just for unstructured searches; it can also help users get at data buried deep in structured applications. Semantifi is one vendor touting the benefits of semantics for getting at structured data sets. Shree Pragada, founder and CEO of Semantifi, says, “When it comes to government transparency, it’s not enough to just make it available. It has to be searchable as well.” The company creates vertical semantic search applications such as Government Spending, which allows users to search government spending with historical information dating from 1996.
Making Sites Stickier
According to 2009 figures from the Newspaper Association of America, only 10% of newspaper revenues come from online advertising, and that percentage had dropped from both 2008 and 2007 figures. In order to maximize ad revenues, publishers are doing whatever they can to keep readers engaged for longer. One means of doing so is through the display of “related content,” showing a reader who has already tipped his hand about a specific interest to other content that may engage him and keep him on the site longer.
The New Republic, one of the oldest magazines in the country, uses OpenPublish, an OpenCalais-enabled Drupal-powered content management system developed by Phase2 Technology, to drive reader engagement by offering faceted search and recommended reading sidebars. “When New Republic deployed OpenPublish they saw a 150% increase in unique visits, and a 300% increase in pages per visit,” according to Tague.
Semantics can clearly apply to much more than text-based media, of course. At SemTech 2010 Hannah Eaves, the director of new media at Link TV, a nonprofit media company that focuses on presenting news from outside the U.S., talked about how critical semantics has been in the creation of the company’s new video platform ViewChange. The site, which was slated to go live in September, presents videos from and about the developing world with funding from the Bill & Melinda Gates Foundation. It utilizes several third-party application programming interfaces (APIs) to perform semantic content analysis and related content aggregation of video transcripts.
Creating Topic Hubs
“It allows us to create topic buckets dynamically and to bring in external articles as well,” Eaves said. With ViewChange developed in only 6 months, she said, “Without semantics we couldn’t have done it. The technology makes connections between videos that we wouldn’t even have thought of.” For publishers, semantics can be an automated means of making a deep content archive remain relevant for readers.
The “topic buckets” to which Eaves alludes are also known as microsites or topic hubs, which have become increasingly common in the publishing world. While Google pulled the plug on one of the better-known experiments of this type, its “Living Stories” microsites that ran from December 2009 through February 2010, other publishing companies are using the model successfully.
Sacramento Connect from The Sacramento Bee brings together content from local and state bloggers and news providers under topics such as Business, Education, and Entertainment. Seán McMahon, digital product development manager for The Sacramento Bee, says that the trigger for creating topic hubs was to give people a place to go online as stories developed over time. “It allows us to leverage our existing content and drives page views of related content,” says McMahon of the project, which utilizes semantic software solutions from Lingospot, Inc.
McMahon says instituting the automation of editorial work has required a culture shift and a level of trust from editors. “We’re seeing the shift occur over time as editors see that they can make adjustments easily,” he says. The aim is not to remove human intervention completely. “We find it’s more effective when we add some editorial control,” McMahon says, citing an instance when editors did some upfront work to set up an Olympic topic hub, but then let the process run automatically.
Assuming that the use of semantics is engaging readers on the site longer, the next logical use of the technology is to match the advertisement to the page and the reader. It’s a task for which semantics is particularly well-suited, because it can facilitate a match between an article and an advertisement based on context and concepts within the article.
Amiad Solomon is president and founder of Peer39, which uses semantic analysis technology to match publisher inventory to relevant advertisements. Solomon says, “Advertisers want to buy display ads around relevant content based on an understanding of the true meaning of the content.” Peer39 processes more than a billion articles a day to uncover that meaning and make the matches, either directly with large publishers or through yield optimization firms such as AdMeld.
The technology also provides a measure of brand protection for firms who want to make sure there are places where their ads won’t show up—for example, near stories about pornography, terrorism, or plane crashes.
Amram Shapiro, president and founder of Book of Odds (http://bookofodds.com), believes that without semantic technology in back-office processing, his 3-year-old company wouldn’t be here. “We’re a content site,” explains Shapiro, providing a dictionary of probability via “odds statements” such as, “What are the odds an adult considers football to be his or her favorite sport?” (One in 3.23, if you’re curious.) The key audience comprises companies who use the odds statements to help them assess risk, and it currently has 400,000–500,000 such statements in the database.
Shapiro says, “We needed to organize a complex set of interrelationships about data regarding the odds of everyday life—diseases, accidents, weather, crime, war. To set that up in a relational database would simply have been impossible.” The company turned to the Anzo middleware platform from Cambridge Semantics for help solving the main two obstacles it faced in its early days: organizing data and keeping back office costs low.
“It’s neither sexy nor sizzling,” says Shapiro, “but using semantic technology in this instance ensures that we have consistency across data types, flexibility, and efficiency. Our researchers don’t have to rethink things each time, they just refer to the ontology. And if the ontology changes, it’s much easier to address using semantics than it would be with an RDBMS.”
Customer care is another enterprise function finding use for semantics, as the economy is driving companies to do more with fewer resources. Chris Hall, VP of product strategy for InQuira, a provider of enterprise knowledge solutions, says, “Customers are saying, ‘How can I cross-train customer service employees, because I have fewer of them to work with?’“
One answer is to maximize the efficiency of web self-service. Apple’s support site is a good example of a company using semantics to, essentially, create “topic hubs” around products and problems so that customers can find what they need.
Another efficiency-boosting technique is to make it easier to resolve customer phone queries on the first call. The InQuira solution uses semantics to measure the intent of a query for web self-service, contact center support, and sales departments to increase the efficiency of those customer interactions. With InQuira embedded into the desktop of major CRM vendors such as Oracle and SAP, Hall says, “As the agent starts to fill in the case screen, we’re already searching in the background.” By bringing back related articles, discussion forums, and case studies and making it easier for agents to sort and drill through responses, companies can reduce the need for agent training and the time spent on individual calls.
Full (Incremental) Speed Ahead
The opportunities for semantic technology seem boundless for the content industry, not only in the areas above but also in media monitoring, harnessing user-generated content, monitoring copyright, and usage. But don’t let the wide-open, uncharted future scare you off.
“Start with small projects, pilot projects,” counsels Cambridge Semantics’ Kludt. “By making an incremental investment you can get your value and then move on to the next project.” He says that every Cambridge customer who has implemented a semantic solution is working on his or her next project already.
Having followed it himself, Book of Odds’ Shapiro also endorses the incremental approach. But he adds, “The only thing I wish I’d known before we started is that semantics is more powerful than we realized. We could have moved ahead even faster.”
Book of Odds
The New Republic
Siegel, David, Pull: The Power of the Semantic Web to Transform Your Business (Portfolio, 2009)