Wikipedia and the Semantic Web – Part 2
About a month ago I posted (here) my thoughts about how Wikipedia can improve the Semantic Web. My take is that Wikipedia can provide a global and ever improving vocabulary bloggers and other content creators to provide richer context around what they write.
Several people contacted me after reading the post to ask about the best way to annotate their content, and to find out what else I think Wikipedia needs to do to make iteasier to create Semantic Web pages. The big question seemed to be: What context can bloggers add so that search engines and others understand their posts?
I’ll use a simple scenario to illustrate my answer to this question. Let’s say I am about to write a blog post on the healthcare debate. Obviously, I want to tell them I am talking mainly about the http://en.wikipedia.org/wiki/2009_US_healthcare_debate. And within that context I want to discuss thehttp://en.wikipedia.org/wiki/United_States_Democratic_Party and the http://en.wikipedia.org/wiki/United_States_Republican_Party. As you can see, Wikipedia provides me with a clear vocabulary to uniquely identify the different “Entities” that I want to talk about in my post. There is a unique URL to every entity. This will not work for entities that are not popular enough to have Wikipedia pages, but it is a good start. It is also only a small step over “Tagging”, a common way to annotate today.
Next, as I talk about different entities, I may want to explicitly state the connections I am making. For example, I’ll mention http://en.wikipedia.org/wiki/Michelle_Obama and want to add the fact that I am commenting on her impact as the “Wife” of the President in their personal relationship, and not as the First Lady. Wikipedia does give us a lot of information on how different entities are related to each other. However, the vocabulary is far less organized and many of these relationships do not have unique names. Some of the fact boxes at the bottom of Wikipedia pages called “Templates”, like the one at http://en.wikipedia.org/wiki/Template:United_States_topics, are even less structured and uniform. Wikipedia needs to evolve to a more structured hierarchy and schema for relationships. Without it, it will remain hard for content creators to add more rich information and make new relationships evident.
Lastly, I may want to annotate with information on what kind of content it is. Am I talking about some great “Videos” or “Documentaries”? Am I writing with a “Liberal” view? Am I discussing some recent “News”? Is this a “Review” of the administrations efforts? The last forte in Semantic Web is specifying the kind of content I am creating, instead of the topics of my content. Obviously, Wikipedia does not have the vocabulary that allows me to specify this and I must look elsewhere.
In the end, we have to take baby steps in our goal for rich semantic annotation of Web content. Automated tools are already attempting to do this for content that has already been created. Will the automated methods improve fast enough that there will never be a need for content creators to annotate? Or will having a vocabulary and an easy method of annotation give enough advantage to the content creators that we will see widespread adoption? My guess is that the answer lies somewhere in between.
More accurate annotation already allows better cross linking and makes it easier for users to find your content, both from search engines and other sources. It also allows innovative startups to use your content in rich ways and drive traffic to you. At the same time automated annotation techniques are improving. In the end a “Semi-Automated” solution that allows you to influence how your content is annotated and, with improving technology, reduces the effort it takes will be the winner.
Tags: Categorization, Content, Innoavation, Search Engine, Semantic Web, Wikipedia

Subscribe to our RSS Feed

