February 8, 2010

Web 3.0 and Semantic Search »

I recently attended the Web 3.0 conference held in Santa Clara (January 26-27). During my attendance at the conference I had the chance to listen from Google’s Johanna Wright (Director of Product Management in Search) and Microsoft’s Scott Prevost (Principal Development Manager at Bing) about how they are using semantic technologies to drive innovation in search.

The conference focused on the Semantic Web which is something that we at Kosmix have been innovating in for the last three years.  Our goal has been simple: to provide consumers with the best experience for exploring a topic and following topics that they care about.

Here is my take on the evolution of search and the semantic web:

If web 1.0 was about linking web pages, web 2.0 about linking people, and then web 3.0 is about linking data. Tim Burner Lee, father of the Web, has made it ample clear that linking data is where the future of the Web is. Semantic web is about annotating facets and attributes associated with web content and linking data. In other words, semantic web is about teaching machines to read web pages, which are designed to be read by humans. So how can semantics improve search?

Search so far has been about finding the best web pages for a given query. However, the purpose for searching is to complete a task. Say you want to find the lowest price for a camera, pick a romantic restaurant, or research the effect of pollution as a function of GDP. The information to complete such tasks resides in different web pages and therefore, it is no longer possible to find one page that will complete the task. For instance, the pollution levels by country and country GDP are in separate places on the Web. However, by using semantics understanding, search engines can connect these web pages and fuse the two datasets to complete the desired task.

Another instructive example is figuring out what to cook. If search engines understood the structure of recipes then one could narrow down their search to recipes based on course, ingredients, occasion, cuisine, convenience, and user ratings. A variant of the idea is semantic snippets where search snippets present the structure behind the linked page. For instance, for events we list the date, time, event snippet, and even ticket prices, which really let you decide if you should be clicking to book a ticket or it is not aligning with your schedule and budget.

Events Module

At Kosmix we strive to provide rich snippets for each result we surface and now all the big search engines – Yahoo! with SearchMonkey; Google with rich snippets; and Bing with Smart Captions – have started to do the same. The benefit is that the user has more information before clicking which increases the quality of traffic that a publisher gets. Nick Cox from Yahoo! reported up to 15 percent greater click-through-rate because of richer presentation of results using semantic techniques.

Semantic techniques can also be used to rank web pages. Today, the rankings are largely a function of keyword matches and the popularity of the page. However, if we searched for “drop in currency value”, what we really mean is “inflation”. If search engines understood the meaning of documents and used it in ranking, then higher quality documents about “inflation” would surface, which need not even contain the search terms!

As you can see Semantic techniques have already made inroads into search and have started benefiting users. However, there is still a long way to go before the promise is fully realized. After all, the Web’s content was designed for our consumption and machines need much of our help in understanding it.

abhishek
January 25, 2010

Attention All Bookworms »

This is a first for us here at Kosmix.  Not only were we included in one book but we were included in two.

Founder Venky Harinarayan was interviewed by Sramana Mitra in her new book, “Positioning: How to Test, Validate, and Bring Your Idea to Market.”  As part of the interview, Venky shares how he got started from his early days in Bombay to the founding of Kosmix. Sramana’s new book can be found on Amazon.com.

Our other book mention came from “Trade-Off: Why Some Things Catch On, and Others Don’t” by Kevin Maney.  In Maney’s book, Kosmix is noted as one company that is providing consumers with a higher “fidelity” search experience versus the results “the famously bare-boned Google” is surfacing. “Trade-Off” can also be found on Amazon.com and you can find the mention by typing in “Kosmix” when you search inside the book.

jodi
January 6, 2010

Some predictions for 2010 »

The last post provided a recap of  search in 2009 and expanded on some of the more important themes.  What do you think will be the important themes that will emerge in 2010?  Here is a quick glimpse into the looking glass:

  • Media as Search Results:  Further expanding on the “no more 10 blue links” theme, much of this year’s search results will be answered by media such as videos, images, slideshows, podcasts, etc.  We have already started seeing this trend from the increasing number of “how to” queries for which there are full videos showing exactly how!
  • Apps Everywhere:  Given the success of apps on the mobile medium, it seems likely that similar small apps will also become ubiquitous on our browsers.  There are already many widgets and gadgets on pages like igoogle and myYahoo, but going further, you can think of browsers like chrome becoming a mini OS where more complex plug-ins and apps enable you to do common activities like booking flights or buying movie tickets.
  • Search with a Social Flavor:  This has been a few years in the making, but I think this year is when we will see a close integration between social networks and search.  Search results will be flavored by actions performed by people within your social network and in turn, you will be able to share the information you learn more easily with those in your network.
  • Demand Response: We now have enough real-time publishing and searching tools that we can now post somewhat complicated questions and get near instantaneous responses.  Some of the more popular folks on twitter are already doing this when they post for restaurant recommendations or ask for help with something.

There are already companies working on many of these themes and the stage is now set for the process of seeking and finding information to be forever changed.  One other prediction that I think is likely given these themes is that while last year the number of big search engines shrunk with the Yahoo-Bing deal, this year we will see the emergence of a new search giant (or two!).  Any bets on who that will be?

sailesh
December 28, 2009

2009 – capping a decade of web search innovation »

This past year capped a decade of innovation in web search, propelling it to new heights.  Today search engines handle 100 billion queries a month world wide and advertisers have spent a whopping $13billion on search marketing this year in the US alone. As 2009 comes to close, let’s take a look back at the big themes of innovation this year in search:

  • The biggest theme of the year has been increasing realization that SERPs limited to “ten blue links” isn’t going to be the future.  As Todd at the Bing team puts it : “Bing is moving beyond the ’10 blue links’ with richer and more organized pages that are designed to help consumers complete tasks and make better decisions”. Here is Yahoo’s game plan. The web has moved far beyond being a collection of html documents to a rich mosaic of video, audio,  news,  real time opinions, social graphs and web applications.  Naturally users are demanding more from their search engines and companies big and small are responding by making search result pages  richer with a wide variety of information on them.
  • The next big theme of the year is the arrival of real time search, with twitter operating the biggest fire hose. Several companies big and small are racing to make sense of  billions of those tweets and bringing them in front of search users.  We can expect few clear winners to emerge in 2010.
  • The big handshake deal between Bing and Yahoo elicited wide opinions. The agreement has sparked a new wave of innovation and awakened Google to up their game. Be it maps, image search, local listings or search assists every media search is getting a make over and that’s good news for all of us.
  • Despite the deepest recession since Great Depression online advertising revenues proved resilient and are showing signs of growth. As more money continue to come online expect new waves of innovation to happen.
  • SearchMe, the company that brought us visual search engine closed its doors but a host of others continue to introduce new paradigms to find information and improve the web search experience. Charles Knight @ AltSearchEngines takes us through 100 such alternatives to look for ideas and inspiration.

Who says search is done? Stay tuned.

manyam
December 21, 2009

Introducing the MeeHive News Ticker »

Keep track of your MeeHive stories whenever you’re online!

Today we’re introducing the MeeHive News Ticker, a cool new way to make sure you never miss a story about your interests. Get all the news from your MeeHive account, right on your Firefox toolbar. Stories update every 15 minutes, so you’ll always know about breaking news. It’s just like having your own personal CNN.

Give it a whirl and let us know what you think! Send any and all comments to feedback@meehive.com.

Click here to get the MeeHive News Ticker directly from Firefox:
https://addons.mozilla.org/en-US/firefox/addon/51433

jodi
December 17, 2009

Kosmix Holiday Extravaganza »

What do you get when you combine a Palo Alto hotspot, precariously named drinks like “The Kosmopolitan”  and “MeeHito”, some really baaaaaaaad karaoke, and a roomful of Kosmix friends and family?  It’s the annual Kosmix holiday party!

Here are a few of our favorite pics from the big night:

jodi
December 7, 2009

The Head-to-Head: Comparing Kosmix to Bing »

Microsoft announced the Bing fall release on November 11 along with new ideas for search such as Entity Cards and Task Pages. Bing’s new version is very similar to the topic page approach we’ve taken here at Kosmix–though, unlike Bing, we’ve completely moved away from the “10 blue links” search paradigm.

In a demonstration for the press, the Bing team used the query “John Mayer” as an example of what Bing has to offer. It’s interesting to compare this Bing example with the Kosmix topic page for the same query.

John Mayer - Bing

On the top of the Bing page is the entity card for John Mayer. Entity Cards bring together content from third party sources which are considered relevant to the query. So here we see a link to his official site, upcoming events, songs, and twitter posts. The rest of the page organizes search results around categories such as songs, albums, biography, and tours. Now check the John Mayer topic page on Kosmix.

John Mayer - Kosmix - Reference, Videos, Images, News, Shopping and more..._1260223795985

On this page is a quick link to his official site, several biographies, facts from Freebase, concerts and tickets, list of his albums, album reviews, videos, images, news, blogs, Facebook fan pages, song lyrics, playable mp3 clips, twitter buzz about him, a high quality art portrait you can buy, and even a quiz about who he is dating!

The Kosmix Categorization Engine ensures that our John Mayer topic page specifically draws from sources related to music and entertainment. The system works algorithmically, drawing from a taxonomy of millions of categories to find content most closely related to John Mayer. For instance, here you see artist profiles from last.fm as opposed to Wikipedia–which is otherwise a great reference for most topics. Once we identify the best categories, our mashup engine assembles each page by pulling content from the best sources associated with those categories using a combination of our crawl index as well as real-time API calls to partners.

In contrast to Bing and Google, Kosmix has also moved beyond the SERP approach. The page has a two dimensional layout and each information category gets its own rendering that displays the rich structured data provided by it. For events we list the date, time, event snippet and even ticket prices, which really let you decide if you should be clicking to book a ticket or it is not aligning with your schedule and budget.
John Mayer - Kosmix - Events

Bing’s Entity Cards are designed to reduce the number of follow-on searches you do for a topic, a goal we share at Kosmix. Try a simple test for yourself: Do a head-to-head comparison of two of the other queries that Bing demonstrated at their press event last week, and see which page lets you explore content without clicking through to multiple other sites.

Miami, Florida:

http://www.kosmix.com/topic/miami%2C_florida

http://www.bing.com/search?q=miami%2C+florida&go=&form=QBRE&qs=n

Apple, Inc:

http://www.kosmix.com/topic/Apple_Inc.

http://www.bing.com/search?q=apple%2C+inc&go=&form=QBRE&qs=n

The Web is exploding with content of different types: the Deep Web, the Real-time Web, the Semantic Web, maps, videos, images, and so much more. Simply seeing the top 10 links is no longer enough because you’re still lost in the information superhighway. Here at Kosmix, our job is to take you a step further: To help you understand the best of the Web’s content, and to present it to you in a way that lets you search less and learn more.

abhishek
November 23, 2009

Real-Time CrunchUp: Tag, RT, Discuss, Repeat »

crunchup logo

Remember when conferences used to be an annual thing? These days, when the Web moves fast and startups move even faster, holding an event once or twice a year is no longer a sufficient.

Case in point: After hosting the first RealTime CrunchUp just three months ago, TechCrunch brought everyone together again for a second CrunchUp last week. And the show has grown so fast it’s moved from suburban Redwood City to the InterContinental Hotel in San Francisco.

Three key trends have emerged in this real-time space in past three months:

Geostreams

With Twitter’s Geo API launch recently and the advent of applications like Google Latitude and FourSquare, location-based discussions kept popping up throughout the day. One specific discussion was whether tracking your location should be persistent or opt-in. Elad Gil, CEO of Mixer Labs, said that 90% of the applications using their GeoAPI choose the opt-in model. However, Steve Lee, Group Product Manager for Google Latitude, responded by saying that it is interesting when people deviate from normal patterns, which is only possible if your location is tracked continually.

In the last discussion of the day, there was general consensus among the panel that location-based coupons and commerce would be one of the three major areas that the money will be found for real-time. (The other two being enterprises and search.)

The Power of Retweets

Even though Twitter has only recently added a retweet button their site, the acronym RT has been around for quite some time now and companies are clearly taking advantage. Various Twitter search engines such as Mozzler and OneRiot have stated that the number of times an article has been retweeted is an important signal in determining the relevance of an article or tweet. Tweetmeme, another company that demoed during the conference, has built an ad platform based on retweets. Not only can you retweet an article, but they are also working with Federated Media to retweet ads.

Real-Time Discussions

Various demos at the conference highlighted the focus on real-time contextual discussions. A few of the standouts:

Hot Potato – They launched and demoed their iPhone application that connects people around live events. They get a bonus point because their application takes advantage of the location of their users participating in these events…tapping into the power of Geostreams.

Qwisk – Your social network in your browser, without needing a plug-in. As you browse, you can share links in a more visual way, and chat with your friends about anything you are viewing on the Web.

Video Lobby – A blogging platform for live video webcasts. The moderator can take comments and questions, all live.

Qlipso – Share videos, games, slideshows and music with your friends and interact with them in real-time about what you’ve shared. It’s like watching a movie together from your own living room.

Overall, the conference was a great way to see what’s happening right now in the real-time space. It’s clear we still do not know exactly where we are going with all of this, but we are gaining clarity with every passing month (and conference). There are already rumors of the next CrunchUp happening in the spring.

tracy
November 18, 2009

Kosmix Hosts Girls In Tech: Resume Best Practices Workshop »

Tomorrow night Silicon Valley’s Girls In Tech group will head over to Kosmix HQ for an evening of networking and career advice from some of the area’s coolest companies.

If you are looking for a job, or want tips on how to sharpen up your resume, this is the place to be.  Here are all the details from the Girls In Tech crew:

Have you ever wondered what it is, exactly, about your resume that will turn a recruiter on/off to your potential as a candidate for employment? Find out with Girls in Tech Silicon Valley, on November 19, as we invite a variety of  managers / recruiters to share resume best practices and knowledge around how to make yourself an attractive professional in today’s competitive market.

During the evening, participants will be breaking out into intimate groups to do peer resume reviews while Hiring Professionals provide valuable insight and feedback.

Professionals from the following industries and others will be joining us:

–Lizelle Baylon, Principal Recruiter at Boston Scientific (Medical Devices)

–Debbie Donovan, Sr. Manager, Practice & Hospital Market Development at Intuitive Surgical (Medical Devices)

–Kiran Prasad, CTO at Sliced Simple, Inc. Former Senior Director WebOS Emerging Technologies at Palm (Mobile)

–Cindy Wang, Product Manager at Tiny Prints Inc. (Online Retailer)

–Stephanie Lonn, Technical Recruiter at Zynga (Social Media / Games)

–Isabelle Mitura, Recruiter at Zynga (Social Media / Games)

Spots are first come, first serve. Refreshments and snacks will be provided!

Doors open at 6:45, breakout sessions begin at 7:00pm
$10 in advance (register at Eventbrite)
$15 at the door

Kosmix
444 Castro St
(Entrance on Mercy St)
Mountain View, CA 94041


jodi
November 13, 2009

Google, Kosmix, and The Deep Web – A Love Triangle »

alon halevyanand rajaraman

Alon Halevy of Google Labs and Anand Rajaraman of Kosmix went after the Deep Web in their own separate ways last night, at the SDForum Search SIG in Palo Alto.

Alon and Anand are long-time collaborators in solving the Deep Web problem, and their joint presentation last night had the all the easy familiarity and good-natured competition you find with friends who go way back. Years ago, Anand’s VC firm, Cambrian Ventures, funded a company that Alon founded called Transformic Inc. Transformic, which built technology to crawl HTML forms, was later acquired by Google. Alon joined Google Labs, and Anand went on to found Kosmix with his business partner, Venky Harinarayan.

The Deep Web is simply the Web behind HTML forms. If you want to buy a car, for example, you might visit Cars.com and search for a used Toyota Prius, priced at less than $15,000 and located near Palo Alto, California. Cars.com will turn your query into an HTML page to present the results to you. A search engine won’t be able to see the page, however, because it was created just for you from a series of databases. The page becomes “lost” in the Deep Web. Tim Berners-Lee also explains in this TED video how leveraging such hidden data will drive the next innovation on the web.

According to one study, the Deep Web is estimated to be 500 times larger than the surface Web. As the number of dynamic websites and applications increase, this number will only go up. Imagine…all that data is not available to search engines!

Google’s Approach to the Deep Web

Google’s approach to the Deep Web is to find HTML forms, send input to these forms, and index the resulting HTML pages. Simple? Not quite. How do you discover these forms? Which forms do you pick? What inputs do you send to these forms? How do your parse the structured data in the result pages?

Google takes the “Less is More” approach. They drop forms used for transactions such as credit-card purchases, interactions that the computer science community calls “POST”. To send inputs to a form Google first tries well-defined lists such as zip codes, if present. Otherwise, they compile inputs using iterative-probing to discover what to send to a form. In Alon’s experience, only a small percentage of the Deep Web qualifies for indexing. This slice, however, is hugely valuable, as it is helping to answer 1000 queries a second! Google’s approach to the Deep Web is language independent, is fully automated to scale easily, answers body and tail searches, and fits nicely with the crawl infrastructure. For further insights, read Alon’s VLDB paper published in 2008.

Kosmix’s Approach to the Deep Web

After Alon shared Google’s perspective, Anand explained that Kosmix has taken a very different approach to the Deep Web: the federated way.

Unlike Google, Kosmix does not crawl HTML forms. Instead, for any given search query, Kosmix taps into these forms in real-time through API calls, evaluates the results and organizes them into a topic page. If you wanted to look up “Pumpkin Pie” on Kosmix, for example, the system would bring you fresh content from recipe sites like the Food Network, “How To” baking videos, real-time tweets about pumpkin pie from Twitter, and information about the caloric profile of pumpkin pie from diet sites like FatSecret. A query for “AdMob,” on the other hand, will call services like CrunchBase for a company profile and Fool.com for up-to-date investor information. To provide the most relevant topic page and also avoid overwhelming these different services with too many API calls, the Kosmix system is smart enough to know which type of services to call for which query. Thus, the query for “Pumpkin Pie” would never be routed to Crunchbase. A important enabling factor for the federated approach.

So how does Kosmix decide which Web service to route a query too? The answer lies with Kosmix’s categorization technology. Over the past three years, Kosmix has created a taxonomy of several million nodes, which we organized into a graph, using a combination of humans and algorithms. Editors discover, integrate, and tag Web services to taxonomy nodes in a semi-automated fashion. Algorithms route the user’s query through the set of taxonomy nodes, which enable the engine to decide which Web service to call.

After outlining the benefits of this approach, Anand dived deeper into the need to select the right sources, and touched on the challenge of discovering and integrating data sources, layout, rankings, etc -details about which can be found in this year’s VLDB paper. Anand also explained how the federated approach is keeping pace with emerging Web trends like real-time, the explosion of Web APIs, different content types such as videos, maps, etc.

Digging Even Deeper
Last night’s audience—about 50 specialists in the search space from some of the Valley’s leading companies and startups– was some of the most engaged groups I have ever seen. Questions ranged from business models to how to do multi-way join between HTML tables. Some people even were contributing ideas. If the Deep Web is important to you, then this was a place to be.

Both Google and Kosmix have compelling yet contrasting approaches to the Deep Web. It will be interesting to see if there is a winner or simply a combination of the two.

abhishek