Archive for the ‘Uncategorized’ Category

November 23, 2009

Real-Time CrunchUp: Tag, RT, Discuss, Repeat »

crunchup logo

Remember when conferences used to be an annual thing? These days, when the Web moves fast and startups move even faster, holding an event once or twice a year is no longer a sufficient.

Case in point: After hosting the first RealTime CrunchUp just three months ago, TechCrunch brought everyone together again for a second CrunchUp last week. And the show has grown so fast it’s moved from suburban Redwood City to the InterContinental Hotel in San Francisco.

Three key trends have emerged in this real-time space in past three months:

Geostreams

With Twitter’s Geo API launch recently and the advent of applications like Google Latitude and FourSquare, location-based discussions kept popping up throughout the day. One specific discussion was whether tracking your location should be persistent or opt-in. Elad Gil, CEO of Mixer Labs, said that 90% of the applications using their GeoAPI choose the opt-in model. However, Steve Lee, Group Product Manager for Google Latitude, responded by saying that it is interesting when people deviate from normal patterns, which is only possible if your location is tracked continually.

In the last discussion of the day, there was general consensus among the panel that location-based coupons and commerce would be one of the three major areas that the money will be found for real-time. (The other two being enterprises and search.)

The Power of Retweets

Even though Twitter has only recently added a retweet button their site, the acronym RT has been around for quite some time now and companies are clearly taking advantage. Various Twitter search engines such as Mozzler and OneRiot have stated that the number of times an article has been retweeted is an important signal in determining the relevance of an article or tweet. Tweetmeme, another company that demoed during the conference, has built an ad platform based on retweets. Not only can you retweet an article, but they are also working with Federated Media to retweet ads.

Real-Time Discussions

Various demos at the conference highlighted the focus on real-time contextual discussions. A few of the standouts:

Hot Potato – They launched and demoed their iPhone application that connects people around live events. They get a bonus point because their application takes advantage of the location of their users participating in these events…tapping into the power of Geostreams.

Qwisk – Your social network in your browser, without needing a plug-in. As you browse, you can share links in a more visual way, and chat with your friends about anything you are viewing on the Web.

Video Lobby – A blogging platform for live video webcasts. The moderator can take comments and questions, all live.

Qlipso – Share videos, games, slideshows and music with your friends and interact with them in real-time about what you’ve shared. It’s like watching a movie together from your own living room.

Overall, the conference was a great way to see what’s happening right now in the real-time space. It’s clear we still do not know exactly where we are going with all of this, but we are gaining clarity with every passing month (and conference). There are already rumors of the next CrunchUp happening in the spring.

tracy
November 13, 2009

Google, Kosmix, and The Deep Web – A Love Triangle »

alon halevyanand rajaraman

Alon Halevy of Google Labs and Anand Rajaraman of Kosmix went after the Deep Web in their own separate ways last night, at the SDForum Search SIG in Palo Alto.

Alon and Anand are long-time collaborators in solving the Deep Web problem, and their joint presentation last night had the all the easy familiarity and good-natured competition you find with friends who go way back. Years ago, Anand’s VC firm, Cambrian Ventures, funded a company that Alon founded called Transformic Inc. Transformic, which built technology to crawl HTML forms, was later acquired by Google. Alon joined Google Labs, and Anand went on to found Kosmix with his business partner, Venky Harinarayan.

The Deep Web is simply the Web behind HTML forms. If you want to buy a car, for example, you might visit Cars.com and search for a used Toyota Prius, priced at less than $15,000 and located near Palo Alto, California. Cars.com will turn your query into an HTML page to present the results to you. A search engine won’t be able to see the page, however, because it was created just for you from a series of databases. The page becomes “lost” in the Deep Web. Tim Berners-Lee also explains in this TED video how leveraging such hidden data will drive the next innovation on the web.

According to one study, the Deep Web is estimated to be 500 times larger than the surface Web. As the number of dynamic websites and applications increase, this number will only go up. Imagine…all that data is not available to search engines!

Google’s Approach to the Deep Web

Google’s approach to the Deep Web is to find HTML forms, send input to these forms, and index the resulting HTML pages. Simple? Not quite. How do you discover these forms? Which forms do you pick? What inputs do you send to these forms? How do your parse the structured data in the result pages?

Google takes the “Less is More” approach. They drop forms used for transactions such as credit-card purchases, interactions that the computer science community calls “POST”. To send inputs to a form Google first tries well-defined lists such as zip codes, if present. Otherwise, they compile inputs using iterative-probing to discover what to send to a form. In Alon’s experience, only a small percentage of the Deep Web qualifies for indexing. This slice, however, is hugely valuable, as it is helping to answer 1000 queries a second! Google’s approach to the Deep Web is language independent, is fully automated to scale easily, answers body and tail searches, and fits nicely with the crawl infrastructure. For further insights, read Alon’s VLDB paper published in 2008.

Kosmix’s Approach to the Deep Web

After Alon shared Google’s perspective, Anand explained that Kosmix has taken a very different approach to the Deep Web: the federated way.

Unlike Google, Kosmix does not crawl HTML forms. Instead, for any given search query, Kosmix taps into these forms in real-time through API calls, evaluates the results and organizes them into a topic page. If you wanted to look up “Pumpkin Pie” on Kosmix, for example, the system would bring you fresh content from recipe sites like the Food Network, “How To” baking videos, real-time tweets about pumpkin pie from Twitter, and information about the caloric profile of pumpkin pie from diet sites like FatSecret. A query for “AdMob,” on the other hand, will call services like CrunchBase for a company profile and Fool.com for up-to-date investor information. To provide the most relevant topic page and also avoid overwhelming these different services with too many API calls, the Kosmix system is smart enough to know which type of services to call for which query. Thus, the query for “Pumpkin Pie” would never be routed to Crunchbase. A important enabling factor for the federated approach.

So how does Kosmix decide which Web service to route a query too? The answer lies with Kosmix’s categorization technology. Over the past three years, Kosmix has created a taxonomy of several million nodes, which we organized into a graph, using a combination of humans and algorithms. Editors discover, integrate, and tag Web services to taxonomy nodes in a semi-automated fashion. Algorithms route the user’s query through the set of taxonomy nodes, which enable the engine to decide which Web service to call.

After outlining the benefits of this approach, Anand dived deeper into the need to select the right sources, and touched on the challenge of discovering and integrating data sources, layout, rankings, etc -details about which can be found in this year’s VLDB paper. Anand also explained how the federated approach is keeping pace with emerging Web trends like real-time, the explosion of Web APIs, different content types such as videos, maps, etc.

Digging Even Deeper
Last night’s audience—about 50 specialists in the search space from some of the Valley’s leading companies and startups– was some of the most engaged groups I have ever seen. Questions ranged from business models to how to do multi-way join between HTML tables. Some people even were contributing ideas. If the Deep Web is important to you, then this was a place to be.

Both Google and Kosmix have compelling yet contrasting approaches to the Deep Web. It will be interesting to see if there is a winner or simply a combination of the two.

abhishek
November 12, 2009

Deep Web: Google & Kosmix Debate Two Very Different Approaches »

Tonight at SDForum’s SearchSIG, Kosmix Co-Founder Dr. Anand Rajaraman and Dr. Alon Halevy of Google Labs will debate two very different approaches to the Deep Web.

The Deep Web is the portion of the Internet not accessible to traditional search engines.  Social networks, media-sharing sites for photos and videos, library catalogs, airline reservation systems, phone books, and all kinds of scientific databases lurk inside the Web, practically invisible today’s search tools.

The volume of this hidden content is enormous: some estimates have pegged the size of the Deep Web at up to 500 times larger than the slice of the Web we see on search engines today.

Ironically, the Deep Web hides some of the richest content on the Internet. The Web 2.0 revolution has enabled an explosion of sites dedicated to user-generated content, including Wikipedia, YouTube, Flickr, TripAdvisor and Yelp. Content on these sites grows so rapidly that it’s nearly impossible for Web crawlers to keep up. Social media sites like Facebook, MySpace, and Twitter pose the same challenge, and often add privacy protections that permit only “friends” to view certain information.

Anand and Alon are among the world’s leading experts in the field of the Deep Web, and their two companies take very different approaches to mining this data.  In this session, Alon and Anand will examine how the Deep Web will change the business of Search, offer insight into what content creators should expect as the Deep Web becomes more accessible, and comment on what this shift might mean for the end user. They will also address the question of whether the Deep Web will someday be fully exposed, and if not, why not.

Agenda:
6:30-7:00pm – Registration / Food & Drink
7:00–7:05pm – A few words about the Search SIG
7:00-9:00 pm – Program

Price
$15 at the door for non-SDForum members
No charge for SDForum members

Date:  Tonight! November 12, 2009

Location:

Cubberley Community Center
4000 Middlefield Rd., RM H-1
Palo Alto, CA
jodi
November 9, 2009

Startup Smackdown: The Movie »

SmackdownHere’s your chance to re-live all those incredible Smackdown moments.  We hear that ESPN wants to air this piece nationally, right after Monday Night Football.  You saw it here first.

Click here to see the Smackdown highlights vid: Kosmix Startup Smackdown

jodi
November 6, 2009

And the Startup Smackdown Winner Is…er, Us! »

SmackDown Logo

Chalk it up to home court advantage.

The Kosmix doubles team won last night’s Startup Smackdown tournament, after four rounds of some of the most intense ping pong the Valley has ever seen.

Nine teams competed for the Smackdown title, and the competitive spirit was palpable right from the beginning.  After cruising through the first two rounds, TheFind’s formidible pair, Ranjith “Ranji” Subramanian and Krishna “DaKriz” Ganti, narrowly defeated SkyFire in the semi-finals to face off against Kosmix in the championship game.

The Kosmix team almost didn’t make it to the final round, and had to battle hard to beat Meebo’s awesome doubles team in the semi-finals.  In the final round, TheFind and Kosmix waged a heated match to the finish.  Ankur “Neo” Jain and Nikesh “The Wall” Garera unleashed all they had, and triumphed by going (staying??) home with the coveted Kosmix Kup.

The fact that Kosmix won our own Kup means only one thing:  Next year, we’re toast.     While we were finishing up the leftover beer and pizza last night, the other teams already started training for Startup Smackdown II.

jodi
November 5, 2009

Kosmix Startup Smackdown »

SmackDown Logo

Who plays the meanest game of ping pong in Silicon Valley??

Tonight eight of the hottest startups  in Mountain View will gather at Kosmix  for the first annual Startup Smackdown ping pong tournament.   The competition looks fierce, and we hear that teams have been practicing all week in preparation for the big event.

Appearing in the Kosmix Arena tonight will be:

  • Evernote: Ron “The Octopus” Toledo & Andy “Black Widow” Kill
  • Kosmix: Nikesh “The Wall” Garera & Ankur “Neo” Jain
  • Meebo: Simon “The Smasher” Yeo & Greg “Marco Polo” Fair
  • Polyvore: Guangwei Yuan & Jianing Hu
  • Rhythm NewMedia: Khoi “Grasshopper” Dinh & Sundar “Semiconductor” Vedula
  • Skyfire: Brad “Defense” Landthorn & Sunil “Offense” Kaki
  • Talenthouse: Byron “Shrimp” Louie & Frederik “Pee-wee” Hermann
  • TheFind: Ranjith “Ranji” Subramanian & Krishna “DaKriz” Ganti

We also have an awesome press team competing in the tourney:  Jennifer “Mediaphyter” Leggio and Julie “Julie B” Blaustein.

Who will be the ultimate champion? Place your bets now!

jodi
November 3, 2009

The Real Time Web and You »

Here’s a repost of an article I wrote for Inc., on the emergence of the real time web and how your business can benefit from this trend:

One of the biggest technology trends in 2009 has been the emergence of the “Real-Time Web.” The real-time Web is a made up of technologies and practices that can inform users as soon as information is published, instead of requiring users to check for updates. The real-time Web discards the traditional notion of the more static “webpages,” and instead adopts the notion of dynamic “streams” of information. The real-time Web is also very conversational because it makes it possible to get instant responses across very large networks of people.

Action in the real-time Web started with companies like Twitter and Friendfeed, which built their own infrastructure for large scale delivery of real-time messages. By providing Web service application programming interfaces (APIs), these companies enabled many other developers to create applications based on the real-time Web. However, Anil Dash, a prominent blogger, points out that real time services need not be built on the back of Twitter and Facebook anymore. Due to emerging technologies, the pieces are falling together for creating a free, open and decentralized “pushbutton platform,” which makes it easy for websites to add real-time messaging services. With these developments, we can expect many more websites to jump onto the real-time bandwagon.

Growing importance to business

The real-time Web is becoming increasingly important to businesses in multiple ways. Firstly, as many webmasters and Web analytics companies have pointed out, the real-time Web is starting to rival search engines like Google as a source of website traffic. For example, Mark Cuban talked a few months ago about how his blog receives more visits from Twitter and Facebook than from Google. Secondly, the real-time Web opens up communication opportunities that the traditional Web could not have provided. For instance, if an airline wants to sell off its last minute tickets, the real-time Web provides a great outlet for advertising this very time-sensitive deal.  Thirdly, by making information instantaneously accessible, the real-time Web can create, or erase, instances of information arbitrage. As an example, take a look at Skygrid, a service that provides high quality financial news in real time, giving its users an edge, but at the same time leveling the playing field between professional investors and amateurs in terms of the speed of access to reliable information. Finally, because the real-time Web is very conversational, it becomes a repository of people’s sentiment, and mining this sentiment can be very useful to marketers and others.

Taking advantage of real-time Web

Beyond creating an account on Twitter, how can you take advantage of the real-time Web?  Here are some thoughts to get you started:

  • Engage with the real-time Web with tailored offers and content. Several companies are seeing success with time-sensitive programs that could not have been conceived without the real-time Web. Jet Blue’s “cheeps” and United Airlines’ twares are exclusive Twitter promotions for last minute fare deals. Another company that has encountered great success with offering exclusive deals on Twitter is Dell. A Dell blog post from June mentioned that Dell had surpassed $2 million in Twitter sales fro Dell Outlet, which sells refurbished items, scratch and dent items, and previously ordered new laptops. The real-time Web also acts as a place where people express their intent to shop (e.g. someone may tweet “thinking of buying an ipod touch.”) Selectively targeting such users, without spamming them, might also be a great way to help your customers make real time buying decisions. A service like Twitterhawk can be used to automate this kind of marketing.
  • Make use of real-time Web tools for business intelligence. The real-time Web is a great source of knowledge and sentiment about your customers, your competitors and your industry. You can use services like Firstrain to research the real Web for the news that matters to you. You could also use Twitter’s search functionality in simple ways to keep track of some of this information, or go to one of the many real time search engines. A recent article in mashable talks about the many tools that help analyze Twitter content.
  • Join in the conversation about your company. In one of my previous articles, I had talked about how companies like Comcast are using Twitter to understand their customers’ concerns and address them. The conversational nature of real time web can be very powerful in building relationships with your customers.
  • Create the infrastructure that allows your company to respond in real time. Real-time enterprise data integration has been around for a long time. However, with the emergence of the real-time Web and the opportunities it creates, it is becoming increasingly critical for companies to be able to access all their internal data in real time. In other words, “real-time data integration is no longer a luxury.”
vijay
October 14, 2009

We’re Presenting at SF New Tech Tonight! »

Our very own Saumil Mehta and Tracy Lou will be on the big stage at SF New Tech this evening, sharing a live demo of MeeHive.

We’ll have five minutes to give a quick overview of the site, show how you can use MeeHive to track news about all your interests and share articles with friends through Facebook and Twitter, and give a quick plug for our newest iPhone app, MeeTV.  Saumil, how fast can you talk??

It’s going to be a great night–tacos, beer, and cool presentations from other startups like Bodukai, ZoomPool and Famililink.  We’re also planning to use Lucky Twit to give away an awesome Flip video camera, so be sure to tweet from the event using the hashtag #sfnewtech.

Here are the event details and the link to buy tickets.  Come on out–we’d love to see you!

When:
Wednesday, October 14, 2009 from 5:30 PM – 10:00 PM (PT)

Where:
Mighty
119 Utah Street
(Cross street is 15th. Look for the big black doors!)

Tickets now on sale at http://october14sfnewtech.eventbrite.com


jodi
September 28, 2009

Changing face of Web Search »

Last week Yahoo unveiled a new search interface after bucket testing it for a while. On the surface the changes might seem minimal and for a vast majority of search queries it will seem so. But for a significant volume of queries especially ones we call “topical” in search parlance the interface offers something wholly new and refreshing.

Danny Sullivan at SearchEngineLand does a yeoman job of listing all the new features. While he likes the new interface he doesn’t think it will translate into higher market share for Yahoo.

Here at Kosmix, we see real value in offering users vast breadth and depth of information for topical searches. Take a popular query like how to make sushi, which was Danny’s example.  We offer videos,  images,  guides, howtos, cookbooks,  link to history of sushi,  Martha Stewarts adventures with sushi, news and blogs on sushi, celebrity take on sushi, topics related to sushi and much more … all in one page and under half a second. We do this by intelligently searching the web in real time for the best content on a topic and offering it to you in easily digestible magazine format.

With Yahoo’s makeover and their bold re branding effort more users will be exposed to the new interface. Time will tell if they like what they see.  Regardless of who wins and loses in the search market place this continuing trend of richer search interfaces is a big win for consumers.  What do you think?

manyam
September 16, 2009

Kosmix is hiring! »

Yes, I think that statement is absolutely exclamation point worthy. After the year everyone’s had, it’s nice to feel like the skies are parting and there are jobs to be found.

We have a number of positions available in our engineering department, from entry level for those just out of school to more senior positions requiring 10+ years of experience. We’re looking for superstars in the world of Categorization, Release, Information Retrieval, Systems Engineering and Relevance Architecture. If you love an intellectual challenge and have what it takes to thrive in an energetic, fast-paced environment, contact us at http://www.kosmix.com/corp/jobs.

Current Openings:

Product Analyst
Assist the Kosmix ContextLinks team in collecting and analyzing data to improve the relevance and user experience of the product. You will work closely with developers and product managers to identify product problems and areas for product improvement. Great position for recent college graduates.

Member of Technical Staff- Systems
Design, implement, and deploy high-performance, scalable systems and algorithms for massive data storage and distributed processing.

Member of Technical Staff- Categorization
Be a key part of building the world’s best Semantic Categorization platform. Design and implement data pipeline and tools to extract structured information from semi-structured and unstructured sources. The job requires a unique combination of Systems, Data Semantics, and Web Tools.

Sr Support/ Release Engineer
We are looking for an awesome engineer to manage and support Kosmix’s production sites. You will be responsible for the availability and performance of our high traffic sites. A key element of the role is diagnosing and resolving production software issues, requiring you to develop an in-depth understanding of Kosmix’s application architecture and work closely with our developers.

Member of Technical Staff- Information Retrieval/ Categorization
Apply a strong combination of interest and experience in consumer applications, algorithms, and systems to analyze, design and build the core of Kosmix’s Categorization and Topic Engines. The position offers a breadth of challenges involving consumer product and scalable systems. This is not just a classic algorithms position; it requires a passion for consumer experience, a willingness to go the last mile, and an attitude of doing what it takes!

Relevance Architect
This is a senior position with similar requirements to the Information Retrieval/ Categorization position above. Prior Experience in Search/Relevance/Machine Learning and designing large-scale architecture are a must.

Life at Kosmix
We love what we do here at Kosmix so we work hard, but also find time for fun. Ping pong tournaments, scooter races, trivia contests and laser tag are the norm. We eat lunch together every Friday and at least once a month we have cocktails together.

Benefits include medical, dental and vision with no premium for employees, spouses/ domestic partners, and dependents. We also provide life insurance for employees and the option to participate in a 401(k) plan managed by Fidelity. Kosmix offers subsidized commuter passes for those who take the train to work, or we’ll pay you to ride your bike. Employees who have been with the company for three years or less get 15 vacation days. We also have 11 observed holidays and one floating holiday (sick days taken when needed). We are headquartered in Mountain View and have a small office in San Francisco.

barbara