Monday, 2 February 2009

Semantic web - The foundations

Following on from my earlier post "A semantic web future" I felt the need to perform some further investigations into some of the technicalities of delivering semantic web solutions.

I thought I'd share my findings to hopefully assist anyone else looking to learn more about the application of semantics to internet data. Firstly, I am no semantic internet expert, these are purely representative of my understanding of the process and I'd welcome any further feedback or information on the subject matter.

I am planning to make this the second of a series of investigation and posts around the semantic internet.

Top-down or Bottom-up?


As I mentioned briefly in my earlier post, the semantic web is intended to facilitate greater access to the reams of information that is available on the internet about specific subject matter. Making it easier to extract information from multiple sources in one process.

There are currently two approaches to the semantic internet, "top-down" or "bottom-up", you can read further details on these approaches in this article from ReadWriteWeb or there's a wealth of information that can be found on a specific area of the W3C site (other relevant links are listed at the bottom of this post).

Briefly, the "top-down" approach utilises technology to extract the required information from existing content. In comparison to the "bottom-up" approach which requires the retrospective application of annotation to data to enable computers to perform intelligent searches.

The "bottom-up" approach has been around for several years, but due to the overhead required to add the annotation to existing data, the approach has never really been commercial embraced. All data would be required to be annotated and encoded with RDF (Resource Definition Framework) and OWL (Web Ontology Language).

More recently the "top-down" approach has begun to be explored. Without the requirement to retrospectively annotate data, the approach presents a more practical and more commercial viable solution to the semantic web technical conundrum.

This approach takes existing information from websites and with a degree of intelligence formulates relationships between them and other sources of similar data. Obviously the top down approach has a greater dependence on query and algorithmic quality, thus is more exposed to mistakes and invalid data returns.

That said, the "top-down" approach, due to it working with data as is, offers greater potential for immediate commercial application. I would also assume were semantic annotation to become more commonplace on websites and data structures in the future, that the existing "top-down" applications could be adapted via algorithms to incorporate new, more accurate data, and in turn improve their own quality.

Application Platforms


Neither of these development areas are simple to approach, the market has reacted to this (as it has in the past with previous evolutions on the internet), by creating starting platforms for developers.

These semantic platforms allow developers that want to get into the application market to avoid having to reinvent the wheel each time. They represent years of work undertaken by other development companies that have created semantic frameworks onto which other applications may be integrated.

The first application platform I've spent time investigating has been Zemanta, this provides the foundations required for applications to enable semantic searching of data. Using analysis data through their "proprietary natural language processing and semantic algorithms" they then statistically compare its contextual framework to their pre-indexed database of content.

This process is improved and developed via a program of machine learning techniques and end-user input. This enables them to teach the engine and constantly improve the service the engine can provide.

Talis, is another application platform the allows faster release of semantic applications, they store databases of RDF meta data and content and then share the facility to increase the accessibility of the data. Delivering the services via stores enables them to apply security to specific data segments, using this security framework they can open the door to corporate use as well as public.

Obviously these two represent just the tip of the iceberg when it comes to application development in this area. Other projects of interest were the Wordnet programme at Princeton and The KIM platform that Ontotext are developing.

Where now, the future . . .


So where do both of these approaches leave us and what difference is it going to make to my browsing, I hear you ask.

The above platforms and other similar products are going to enable a number of smaller software houses to approach the semantic app market. I am hoping these platforms play a similar role to Fire Eagle in the geo-location market, hopefully creating a number of apps on the market this year.

I'll review these in greater detail soon, I am hoping to have some discussions with their producers that I can publish.

Properly implemented I believe the difference is going to be vast, sure it's going to have some major issues along the way, and we are not talking about odd dodgy blog post or libellous tweets issues of web 2.0. Storing data and then redistributing via numerous platforms is not going to be without some human error.

The semantic web should improve the efficiency of data access dramatically, ensuring that our time spent hunting answers on the internet drops creating a far better service than we currently receive.

The future will see a semantic internet, but it's not going to happen overnight.


Related links:

Headup - Semantic browser integration - Digital Signals http://www.digital-constructions.com/blog/2009/03/headup-article-semantic-browser.html

Interview with Mike Darnell of Headup - Digital Signals http://www.digital-constructions.com/blog/2009/02/interview-with-mike-darnell-of-headup.html

Semantic internet - Intelligent web future - Digital Signals http://www.digital-constructions.com/blog/2009/01/semantic-web-future-intelligent.html



Reading Sources:

W3C Semantic Web Activity - http://www.w3.org/2001/sw/
Wordnet - Princeton University - http://wordnet.princeton.edu/
Zemanti API - http://www.zemanta.com/api/
headup blog - http://blog.headup.com/
Conversation Agent - http://www.conversationagent.com/2007/11/web-30-artifici.html
Kevin Kelly - Technium - http://www.kk.org/thetechnium/archives/2009/01/two_strands_of.php#
Ontotext - The KIM Platform - http://www.ontotext.com/kim/semanticannotation.html
Web3Beat - Top-down vs Bottom-up http://www.web3beat.com/2008/10/top-down-v-bottomup-in-the-sem.html

Labels: , , ,

blog comments powered by Disqus

Links to this post:

Create a Link

<< Home

Clicky Web Analytics