Call me now: 650 273 5600

The Rhyme and Reason: Shakespeare and the Semantic Web

Wednesday Feb 11, 2009

“Words, words, beautiful words masking a heart that’s breaking, breaking….” Can we really build computer intelligence through tagging, data mining, and dissecting words into stems, senses, and syntaxes of senses to make the semantic web a reality? I suppose the process is not dissimilar to trying to make sense of archaic Elizabethan language for a modern day audience.

Have you the heart? When your head did but ache, I knit my hankercher about your brow, the best I had…”

Arthur was the heir to the throne but his uncle, King John usurped the throne sending young Arthur to the Tower and to have his eyes burned out by his keeper and best friend Hubert. One of the darkest of Shakespeare’s scenes in “King John” involved Arthur negotiating for his eyes to be spared. Playing this role was a challenge. Much of the language was difficult. By skipping on a rope and reciting the lines I discovered that reason was found in the off beat of the rhyme and in the haunting repetition of the letter “h”. The brilliance of the horrific imagery needed only to be placed not played. Shakespeare was brilliant in providing everything an actor needed to bring his work to life. The meaning was there in the rhythm of the language, the syntax, and stem. 

So how do computational linguistics decipher modern day language and web search to create valid meaning in our lives?

The semantic web, where web content is presented not as documents but as items of data linked by both meaning and relationship was envisioned nearly 15 years ago by Sir Tim Berners-Lee, the inventor of the world wide web. Since 2000, 23 billion data relationships have been coded using a protocol called RDF or resource description framework. RDF enables a definition of relationship of each data item to others, not just within a document but wherever that data may be on the web. The semantic web attempts to deliver search results from thousands of documents into one convenient collection and it does so in a personalized way. Think of the full functionality of Linked-In or Facebook using the RDF approach and you can imagine where we are headed.

BooRah a semantic and natural language processing aggregator based in Mountainview, California is attempting to extract user reviews, and blogs to summarize and rate local businesses. They score and analyze the content and package it in a usable form. For example a search for “hotels” in San Diego can return results that are more category specific such as delivering on qualifiers like type of service, facilities, etc.. that are more personalized for the searcher. They use location qualifiers to provide more razor sharp search result delivery to searchers looking for the: “best calamari in Chicago”. 

There is no question that computational linguistic professionals can have a huge impact on our lives. In fact these changes in search will impact every search engine, online retailer, media publisher and most web sites.

But how fast will the semantic web be here? Well, just as every line of Shakespearian text is not decipherable, neither do I believe semantic technologies can be diffinitive in understanding human language and all its nuances. Its going to take time. Such a broad availability of metadata, annotation data and relationship data is required in order for computers to learn and there are huge numbers of people required to dedicate the intelligence and judgement for building and maintaining it.

Leave a Reply

Comment