November 6, 2009

And the Startup Smackdown Winner Is…er, Us! »

SmackDown Logo

Chalk it up to home court advantage.

The Kosmix doubles team won last night’s Startup Smackdown tournament, after four rounds of some of the most intense ping pong the Valley has ever seen.

Nine teams competed for the Smackdown title, and the competitive spirit was palpable right from the beginning.  After cruising through the first two rounds, TheFind’s formidible pair, Ranjith “Ranji” Subramanian and Krishna “DaKriz” Ganti, narrowly defeated SkyFire in the semi-finals to face off against Kosmix in the championship game.

The Kosmix team almost didn’t make it to the final round, and had to battle hard to beat Meebo’s awesome doubles team in the semi-finals.  In the final round, TheFind and Kosmix waged a heated match to the finish.  Ankur “Neo” Jain and Nikesh “The Wall” Garera unleashed all they had, and triumphed by going (staying??) home with the coveted Kosmix Kup.

The fact that Kosmix won our own Kup means only one thing:  Next year, we’re toast.     While we were finishing up the leftover beer and pizza last night, the other teams already started training for Startup Smackdown II.

jodi
November 5, 2009

Kosmix Startup Smackdown »

SmackDown Logo

Who plays the meanest game of ping pong in Silicon Valley??

Tonight eight of the hottest startups  in Mountain View will gather at Kosmix  for the first annual Startup Smackdown ping pong tournament.   The competition looks fierce, and we hear that teams have been practicing all week in preparation for the big event.

Appearing in the Kosmix Arena tonight will be:

  • Evernote: Ron “The Octopus” Toledo & Andy “Black Widow” Kill
  • Kosmix: Nikesh “The Wall” Garera & Ankur “Neo” Jain
  • Meebo: Simon “The Smasher” Yeo & Greg “Marco Polo” Fair
  • Polyvore: Guangwei Yuan & Jianing Hu
  • Rhythm NewMedia: Khoi “Grasshopper” Dinh & Sundar “Semiconductor” Vedula
  • Skyfire: Brad “Defense” Landthorn & Sunil “Offense” Kaki
  • Talenthouse: Byron “Shrimp” Louie & Frederik “Pee-wee” Hermann
  • TheFind: Ranjith “Ranji” Subramanian & Krishna “DaKriz” Ganti

We also have an awesome press team competing in the tourney:  Jennifer “Mediaphyter” Leggio and Julie “Julie B” Blaustein.

Who will be the ultimate champion? Place your bets now!

jodi
November 3, 2009

The Real Time Web and You »

Here’s a repost of an article I wrote for Inc., on the emergence of the real time web and how your business can benefit from this trend:

One of the biggest technology trends in 2009 has been the emergence of the “Real-Time Web.” The real-time Web is a made up of technologies and practices that can inform users as soon as information is published, instead of requiring users to check for updates. The real-time Web discards the traditional notion of the more static “webpages,” and instead adopts the notion of dynamic “streams” of information. The real-time Web is also very conversational because it makes it possible to get instant responses across very large networks of people.

Action in the real-time Web started with companies like Twitter and Friendfeed, which built their own infrastructure for large scale delivery of real-time messages. By providing Web service application programming interfaces (APIs), these companies enabled many other developers to create applications based on the real-time Web. However, Anil Dash, a prominent blogger, points out that real time services need not be built on the back of Twitter and Facebook anymore. Due to emerging technologies, the pieces are falling together for creating a free, open and decentralized “pushbutton platform,” which makes it easy for websites to add real-time messaging services. With these developments, we can expect many more websites to jump onto the real-time bandwagon.

Growing importance to business

The real-time Web is becoming increasingly important to businesses in multiple ways. Firstly, as many webmasters and Web analytics companies have pointed out, the real-time Web is starting to rival search engines like Google as a source of website traffic. For example, Mark Cuban talked a few months ago about how his blog receives more visits from Twitter and Facebook than from Google. Secondly, the real-time Web opens up communication opportunities that the traditional Web could not have provided. For instance, if an airline wants to sell off its last minute tickets, the real-time Web provides a great outlet for advertising this very time-sensitive deal.  Thirdly, by making information instantaneously accessible, the real-time Web can create, or erase, instances of information arbitrage. As an example, take a look at Skygrid, a service that provides high quality financial news in real time, giving its users an edge, but at the same time leveling the playing field between professional investors and amateurs in terms of the speed of access to reliable information. Finally, because the real-time Web is very conversational, it becomes a repository of people’s sentiment, and mining this sentiment can be very useful to marketers and others.

Taking advantage of real-time Web

Beyond creating an account on Twitter, how can you take advantage of the real-time Web?  Here are some thoughts to get you started:

  • Engage with the real-time Web with tailored offers and content. Several companies are seeing success with time-sensitive programs that could not have been conceived without the real-time Web. Jet Blue’s “cheeps” and United Airlines’ twares are exclusive Twitter promotions for last minute fare deals. Another company that has encountered great success with offering exclusive deals on Twitter is Dell. A Dell blog post from June mentioned that Dell had surpassed $2 million in Twitter sales fro Dell Outlet, which sells refurbished items, scratch and dent items, and previously ordered new laptops. The real-time Web also acts as a place where people express their intent to shop (e.g. someone may tweet “thinking of buying an ipod touch.”) Selectively targeting such users, without spamming them, might also be a great way to help your customers make real time buying decisions. A service like Twitterhawk can be used to automate this kind of marketing.
  • Make use of real-time Web tools for business intelligence. The real-time Web is a great source of knowledge and sentiment about your customers, your competitors and your industry. You can use services like Firstrain to research the real Web for the news that matters to you. You could also use Twitter’s search functionality in simple ways to keep track of some of this information, or go to one of the many real time search engines. A recent article in mashable talks about the many tools that help analyze Twitter content.
  • Join in the conversation about your company. In one of my previous articles, I had talked about how companies like Comcast are using Twitter to understand their customers’ concerns and address them. The conversational nature of real time web can be very powerful in building relationships with your customers.
  • Create the infrastructure that allows your company to respond in real time. Real-time enterprise data integration has been around for a long time. However, with the emergence of the real-time Web and the opportunities it creates, it is becoming increasingly critical for companies to be able to access all their internal data in real time. In other words, “real-time data integration is no longer a luxury.”
vijay
October 30, 2009

We’re So Geeky at Halloween, It’s Scary. »

At some companies, Halloween is just another day.  The guy in the corner cube might wear a t-shirt with a pumpkin on it, the  lady on the other side of the office puts out a bowl of Fun Size chocolate bars, and that’s about it.

At Kosmix, we take our Halloween seriously.   Here are a few pics from today’s costume competition:

jodi
October 28, 2009

Wikipedia and the Semantic Web – Part 2 »

About a month ago I posted (here) my thoughts about how Wikipedia can improve the Semantic Web. My take is that Wikipedia can provide a global and ever improving vocabulary bloggers and other content creators to provide richer context around what they write.

Several people contacted me after reading the post to ask about the best way to annotate their content, and to find out what else I think Wikipedia needs to do to make iteasier to create Semantic Web pages. The big question seemed to be:  What context can bloggers add so that search engines and others understand their posts?

I’ll use a simple scenario to illustrate my answer to this question.  Let’s say I am about to write a blog post on the healthcare debate. Obviously, I want to tell them I am talking mainly about the http://en.wikipedia.org/wiki/2009_US_healthcare_debate. And within that context I want to discuss thehttp://en.wikipedia.org/wiki/United_States_Democratic_Party and the http://en.wikipedia.org/wiki/United_States_Republican_Party. As you can see, Wikipedia provides me with a clear vocabulary to uniquely identify the different “Entities” that I want to talk about in my post. There is a unique URL to every entity. This will not work for entities that are not popular enough to have Wikipedia pages, but it is a good start. It is also only a small step over “Tagging”, a common way to annotate today.

Next, as I talk about different entities, I may want to explicitly state the connections I am making. For example, I’ll mention http://en.wikipedia.org/wiki/Michelle_Obama and want to add the fact that I am commenting on her impact as the “Wife” of the President in their personal relationship, and not as the First Lady. Wikipedia does give us a lot of information on how different entities are related to each other. However, the vocabulary is far less organized and many of these relationships do not have unique names. Some of the fact boxes at the bottom of Wikipedia pages called “Templates”, like the one at http://en.wikipedia.org/wiki/Template:United_States_topics, are even less structured and uniform. Wikipedia needs to evolve to a more structured hierarchy and schema for relationships. Without it, it will remain hard for content creators to add more rich information and make new relationships evident.

Lastly, I may want to annotate with information on what kind of content it is. Am I talking about some great “Videos” or “Documentaries”? Am I writing with a “Liberal” view? Am I discussing some recent “News”? Is this a “Review” of the administrations efforts? The last forte in Semantic Web is specifying the kind of content I am creating, instead of the topics of my content. Obviously, Wikipedia does not have the vocabulary that allows me to specify this and I must look elsewhere.

In the end, we have to take baby steps in our goal for rich semantic annotation of Web content. Automated tools are already attempting to do this for content that has already been created. Will the automated methods improve fast enough that there will never be a need for content creators to annotate? Or will having a vocabulary and an easy method of annotation give enough advantage to the content creators that we will see widespread adoption? My guess is that the answer lies somewhere in between.

More accurate annotation already allows better cross linking and makes it easier for users to find your content, both from search engines and other sources. It also allows innovative startups to use your content in rich ways and drive traffic to you. At the same time automated annotation techniques are improving. In the end a “Semi-Automated” solution that allows you to influence how your content is annotated and, with improving technology, reduces the effort it takes will be the winner.

digvijay
October 22, 2009

Web 2.0 Summit: Sean Parker’s Take on the Rise of the Network Company »

Web 2.0 Summit

OK, I admit it:  When first I saw the title of Sean Parker’s session at Web 2.0 today, I immediately thought HARDWARE.  After all,  “The Rise of the Network Company” initially seemed to suggest IT infrastructure, switches and routers,  and all the geekiness that lurks inside those stuffy data centers.

But this is Web 2.0–and Parker is the guy who co-founded Napster and Facebook before his 28th birthday.

Parker’s definition of a network company isn’t Cisco.  It’s Twitter, Apple, eBay, Facebook, and other organizations that understand the value of networking people.

Parker began his 10 minute talk with the assertion that Google won’t determine the future of the world, because collecting data is less important than connecting people.  He cited Metcalfe’s Law, which states that the value of a communications network is proportional to the square of the numbers of connected users in the system.   By this measure, a network company is only as good as the number of connections it facilitates.  Sound familiar, Facebook addicts?

While the first phase of the Internet was all about data–compiling it, searching it, organizing it, and analyzing it–Parker argues that the next phase is about building connections between people and things.

In the course of his discussion, Parker called out eBay, AOL and Craigslist as examples of his belief that groundswell makes big network companies even bigger, until they dominate the market completely.  And the best products don’t always win.  As the larger players drive out competition, organizations get too comfortable and stop taking risks.  Though Parker didn’t explicitly say what this dynamic means for innovation, his message was clear that a lack of competition doesn’t exactly foster excellence.

Parker’s real point today was that the new economic value on the Web isn’t Search–it’s establishing connections.  Here at Kosmix, we take a similar view that connecting ideas and putting things in context is inherently valuable.  We give people an easier, more visual way to explore topical information on the Web, and we also help you understand how different people, places and things relate to each other.  Our acquisition of Cruxlux this week gives some indication about the role connectedness will play in our overall Kosmix roadmap.

As Parker concluded his comments today, I suddenly found the words of British novelist E.M. Forster ringing in my ears.  “Only connect!,” he wrote in Howard’s End nearly 100 years ago.  “Live in fragments no longer.”

jodi
October 20, 2009

Kosmix Acquires Cruxlux »

Cruxlux and Kosmix

Big news around Kosmix HQ today:  we’ve just acquired Cruxlux.

Cruxlux is an awesome startup that specializes in determining the relationships between people, places or things.  TechCrunch equates Cruxlux with the party game ‘Six Degrees of Kevin Bacon’, and that’s an apt comparison.  Cruxlux lets you chart each connection that links one person or thing to another.  How, for example, is Bono connected to Larry Ellison?  Cruxlux tells us that 1) Bono has collaborated with Bob Dylan; 2) Bob Dylan used to go out with Joan Baez; 3) Joan Baez lives in Woodside, California; and 4) Larry Ellison is Joan’s neighbor in Woodside.  Pretty cool, eh?

Cruxlux

While we could spend hours playing the Kevin Bacon game with Cruxlux (and we have!), the real value of the Cruxlux technology  for Kosmix is the advanced algorithms it uses to understand relationships.  The Cruxlux team and technology fits perfectly with the work we’ve been doing in our “Related in the Kosmos” section on each topic page, and we expect to integrate Cruxlux tightly into Kosmix.  Watch this space.

This is the first acquisition for Kosmix, and we’re excited to have the Cruxlux team on board!

jodi
October 15, 2009

Two Thumbs Up for SF New Tech »

Tracy and Saumil

We had a GREAT time at SF New Tech last night–thanks to Myles and the crew for hosting such an awesome evening.

The club was packed with more than 200 movers and shakers in the local startup community, as well as a few VCs and journalists–including the New York Times’ Brad Stone, who had some hard-hitting questions for us about the future of media and the role sites like MeeHive play in a well-informed society.

If you haven’t had the chance to check out some of the other startups who presented last night, several of them are definately worth a look.   One of our favorites was Famililink, which has created a site that makes it easier for elderly people and those who aren’t technically inclined to keep in touch with family email messages, videos, pictures and calendars.  One person in the audience suggested that they should also integrate MeeHive into their offering, to add personal news to the mix.  (And, no, we didn’t plant someone to say that.)

We finished the night with a Kosmix Lucky Twit giveaway.  In the true spirit of “Wheel of Fortune,” some folks in the audience even chanted, “No whammy! No Whammy! No Whammy!”  when we’d spin the tweets to pick a winner.  Congrats to the lucky tweeter who went home with the FlipCam–enjoy!

Jennifer Simpson took some terrific photos of the event–check them out here.    Here’s a sample of some of her cool pics:

Tracy Presents

Saumil Presents

Tracy Picks a Lucky Twit

Kartik and YourVersion

Tracy and Tilo

jodi
October 14, 2009

We’re Presenting at SF New Tech Tonight! »

Our very own Saumil Mehta and Tracy Lou will be on the big stage at SF New Tech this evening, sharing a live demo of MeeHive.

We’ll have five minutes to give a quick overview of the site, show how you can use MeeHive to track news about all your interests and share articles with friends through Facebook and Twitter, and give a quick plug for our newest iPhone app, MeeTV.  Saumil, how fast can you talk??

It’s going to be a great night–tacos, beer, and cool presentations from other startups like Bodukai, ZoomPool and Famililink.  We’re also planning to use Lucky Twit to give away an awesome Flip video camera, so be sure to tweet from the event using the hashtag #sfnewtech.

Here are the event details and the link to buy tickets.  Come on out–we’d love to see you!

When:
Wednesday, October 14, 2009 from 5:30 PM – 10:00 PM (PT)

Where:
Mighty
119 Utah Street
(Cross street is 15th. Look for the big black doors!)

Tickets now on sale at http://october14sfnewtech.eventbrite.com


jodi
September 30, 2009

Organizing the Web around Concepts »

During the initial days of the web, directories like Yahoo manually organized the web to find the relevant information. As web grew in size and search engine technology evolved, search engines like Google became the main source to query the web. Today, we see the next wave is making web navigation easier by reorganizing the Internet by topic or concept, and increasingly meaningful web (which may lead to Semantic Web) is being built around concepts such as Freebase, Google Squared, DBLife, and Kosmix topic pages.  At Kosmix, we’re often asked about the technical philosophy driving this change.  Here is a brief overview for the geeks among us.

To start with, what do we mean by concepts? A concept is loosely defined as a set of keywords of interest, for example, the name of a restaurant, cuisine, event, name of a movie, etc. There are various websites tailored to a particular kind of concept such as Yelp for restaurants (e.g., Amarin Thai), IMDB for movies (e.g., The Shawshank Redemption), LinkedIn for professional people, Last.fm for music (e.g., U2), etc.

Why should one care about organizing the web around concepts? There are three main kinds of web pages: search pages, topic/concept pages, and articles. Organizing the web around concepts can benefit each one of them.

Search pages. A search results page for a given query consists of various relevant links with snippets, for example, Google search results pages on “Erykah Badu”. Web data around concepts can improve search results in two ways. First, a search page can show a bunch of concepts related to the query, and their relationships to the query. This will help in further refining the query, and enable exploration of concepts related to the query.  Second, a search page can promote the concept page result for a concept closely matching the query.

Concept/Topic pages. A topic page or concept page organizes information around a concept, for example consider this music artist page on “Erykah Badu”. Such pages can utilize attributes of concepts, and show content related to the concept and its attributes, such as, albums, music videos, songs listing, album reviews, concerts, etc.

Articles. Articles can put semantic links to the concepts present in the article, and promote exploration of concepts present in the article, for example, this page on oil prices.

Given so many benefits of arranging the web around concepts, how can we achieve that? Some of the ways to arrange the web around concepts are as follows.

1. Editorial: An editor can pick a set of interested concepts, create attributes of the concepts, and organize the data around the concepts. Many sites like IMDB (for movies) have taken this approach. This approach gives high quality content but it’s not scalable in terms of the number of concepts.

2. Community: Many sites such as Wikipedia and Yelp have taken this approach in which a community of users picks concepts, creates the attributes of the concepts, and organizes the data around the concepts. This process scales as the user community grows, but it is hard to build such community, this approach is susceptible to spam, and scale is limited. For example, Wikipedia has grown to millions of concepts with such a large user base, but it size is still far from the scale of the web.

3. Algorithmic approach:  One way to organize the web around concepts is to mine the web for concepts and their attributes, and link data with concepts. This approach is the most promising in terms of scaling to the size of the web. Various steps in this approach are (a) Concept Extraction, (b) Relationship mining, and (c) Linking data with concepts.

(a) Concept Extraction. There are two main methods for concept extraction from web pages, site-specific and category-specific.

In the site specific method, the structure or semantics of a site is used to extract concepts. Many web sites generate HTML pages from the databases through a program, and such pages have similar structure. One can write site specific rules or wrappers to extract interesting data from such web pages, but writing such wrappers is labor intensive task. Kushmerick et. al. have proposed wrapper induction technique to automatically learn wrapper procedures based upon samples of such web pages. A recent work by Dalvi et. al. extends the wrapper induction technique to dynamic web pages. Another site specific method is to use natural language processing to understand semantic of web pages and to mine concepts from web pages.

In the category specific method, web pages are classified into categories, such as, restaurants, shopping, movies, etc., and category specific extraction rules are applied. For example, extract menu, reviews, cuisine, location for restaurants; extract price, reviews, ratings for shopping category; and extract actors, director, ratings for movies. This method is more scalable in terms of the number of web pages compared to the site specific method, but slightly more error prone since classification and extraction errors accumulate.

(b) Relationship mining. After extracting interesting concepts, one needs to match them with concepts in the database to create attributes, to grow concepts, and to find relationships between concepts. Some web databases like Freebase provide substantial amount of relationships between Wikipedia concepts.

(c) Linking data with concepts. As mentioned earlier, organizing web around concepts can benefit experience with search pages, topic pages, and article pages by linking them with concepts.

The algorithmic approach to organizing the web around concepts is somewhat error prone, though it improves as algorithms for a particular step improves. However, it is most promising in terms of scaling to enormous web that exists.

In short, organizing the web around concepts is a promising area and a stepping stone to bring meaning behind the web data.

References

[1] Nicholas Kushmerick, Daniel S. Weld, Robert B. Doorenbos: Wrapper Induction for Information Extraction. IJCAI (1) 1997.

[2] Nilesh N. Dalvi, Philip Bohannon, Fei Sha: Robust web extraction: an approach based on a probabilistic tree-edit model. SIGMOD Conference 2009.

mitul