Why Wikipedia Can Make a Giant Leap Ahead for the Semantic Web

Every time I want to look up facts, read about a topic, or am curious I go to Wikipedia. So does everyone else. Wikipedia is a brilliant idea for the greatest compendium of knowledge ever created. Unfortunately, for a machine it is a blob of data with very little meaning… all that information is too hard to understand.
Semantic Web is an idea that has been alive for a while. The Internet has revolutionized our access to information. If computers could access that information and process it, imagine how much more we could do with it. But computers are not as smart as humans and we need to talk to them in a different language. We need Web pages and blogs and news and email to be written in this language that computers can understand, a language far simpler and more structured than English. The Semantic Web’s goal is to create this language and make it possible for people who create content to use it everyday.
When did you last learn a new language? Can your neighborhood blogger read French? Or Chinese? New languages are hard to learn and content creators need a BIG incentive to master and use them. Google, Bing, or Kosmix could, for example, provide that incentive by ranking Semantic Web Pages higher, as they can understand them better.
All right, so we know why this language is needed and we have some idea of the incentives that will push people to learn it. But where is this language? Who defines it? The World Wide Web Consortium (W3C) has been defining standards like “RDF” and “OWL” that they hope will lead to this global language. But these standards are really an “Alphabet” that tell us how to write this language. What they don’t give us is a dictionary, a vocabulary called a “Schema” that will help computers understand this language.
What is needed and what the proponents of the Semantic Web have failed to create is this global “Schema” of types of things and their relationships. A vocabulary that works for all the content on the Web. Something that tells a program that an iPhone is a “Mobile Phone” which is a “Phone” which is a “Communication Device” and also a “Personal Electronics Gadget”. A “Schema” is easily extendable and the extensions are easily standardized. There were no “eBook Readers” a few years ago. Kindle is an “eBook Reader.” Both my computer and yours should call it an “eBook Reader.”
Who can create such a language? They would need the largest compendium of information ever created. They would need an easy way for the world to edit and change this compendium over time. They would need a process by which every piece of information in the compendium can be “defined” by a common schema. You see where I’m going here: They would need to be Wikipedia.
Frankly, this is an opportunity that Wikipedia has missed. Don’t get me wrong, Wikipedia has a lot of structure for humans, and lots of companies and researchers are writing sophisticated programs to understand this structure. But this structure isn’t yet complete or visible enough to be used by other content creators. If Wikipedia can evolve into a compendium of information that can also create and maintain this vocabulary, we can have another revolution with Wikipedia at the center. The amazing thing is, this is only a small step from where Wikipedia is today.
At Kosmix, we write sophisticated programs that understand pages on the Web, including Wikipedia. We want our programs to understand what people are writing so we can connect that information to those looking for it. But it would take decades for computing power and technology to grow enough to truly understand the English language. Another revolution in Wikipedia can skip the world ahead by a few decades.

Subscribe to our RSS Feed


September 28th, 2009 at 4:44 pm
Wikipedia is trying to structure data. For example, info-tables from actors, films, etc. are standardized. Also, the categories for many pages are really useful for classifying things. For example, the page for Kindle links to category for eBook readers. Finally, you might want check out Freebase — I think that they’re creating the kind of “Wikipedia for data” that you’re suggesting, with a more clear and structured vocabulary.
October 5th, 2009 at 1:59 pm
Wikipedia is trying to structure data. For example, info-tables from actors, films, etc. are standardized. Also, the categories for many pages are really useful for classifying things. For example, the page for <a href=”http://en.wikipedia.org/wiki/Amazon_Kindle” rel=”nofollow”>Kindle</a> links to category for <a href=”http://en.wikipedia.org/wiki/Category:Dedicated_e-book_devices” rel=”nofollow”>eBook readers</a>. Finally, you might want check out Freebase — I think that they’re creating the kind of “Wikipedia for data” that you’re suggesting, with a more clear and structured vocabulary.
October 28th, 2009 at 3:13 pm
[...] a month ago I posted (here) my thoughts about how Wikipedia can improve the Semantic Web. My take is that Wikipedia can [...]