Archive for November 2009

November 18, 2009

Kosmix Hosts Girls In Tech: Resume Best Practices Workshop »

Tomorrow night Silicon Valley’s Girls In Tech group will head over to Kosmix HQ for an evening of networking and career advice from some of the area’s coolest companies.

If you are looking for a job, or want tips on how to sharpen up your resume, this is the place to be.  Here are all the details from the Girls In Tech crew:

Have you ever wondered what it is, exactly, about your resume that will turn a recruiter on/off to your potential as a candidate for employment? Find out with Girls in Tech Silicon Valley, on November 19, as we invite a variety of  managers / recruiters to share resume best practices and knowledge around how to make yourself an attractive professional in today’s competitive market.

During the evening, participants will be breaking out into intimate groups to do peer resume reviews while Hiring Professionals provide valuable insight and feedback.

Professionals from the following industries and others will be joining us:

–Lizelle Baylon, Principal Recruiter at Boston Scientific (Medical Devices)

–Debbie Donovan, Sr. Manager, Practice & Hospital Market Development at Intuitive Surgical (Medical Devices)

–Kiran Prasad, CTO at Sliced Simple, Inc. Former Senior Director WebOS Emerging Technologies at Palm (Mobile)

–Cindy Wang, Product Manager at Tiny Prints Inc. (Online Retailer)

–Stephanie Lonn, Technical Recruiter at Zynga (Social Media / Games)

–Isabelle Mitura, Recruiter at Zynga (Social Media / Games)

Spots are first come, first serve. Refreshments and snacks will be provided!

Doors open at 6:45, breakout sessions begin at 7:00pm
$10 in advance (register at Eventbrite)
$15 at the door

Kosmix
444 Castro St
(Entrance on Mercy St)
Mountain View, CA 94041


jodi
November 13, 2009

Google, Kosmix, and The Deep Web – A Love Triangle »

alon halevyanand rajaraman

Alon Halevy of Google Labs and Anand Rajaraman of Kosmix went after the Deep Web in their own separate ways last night, at the SDForum Search SIG in Palo Alto.

Alon and Anand are long-time collaborators in solving the Deep Web problem, and their joint presentation last night had the all the easy familiarity and good-natured competition you find with friends who go way back. Years ago, Anand’s VC firm, Cambrian Ventures, funded a company that Alon founded called Transformic Inc. Transformic, which built technology to crawl HTML forms, was later acquired by Google. Alon joined Google Labs, and Anand went on to found Kosmix with his business partner, Venky Harinarayan.

The Deep Web is simply the Web behind HTML forms. If you want to buy a car, for example, you might visit Cars.com and search for a used Toyota Prius, priced at less than $15,000 and located near Palo Alto, California. Cars.com will turn your query into an HTML page to present the results to you. A search engine won’t be able to see the page, however, because it was created just for you from a series of databases. The page becomes “lost” in the Deep Web. Tim Berners-Lee also explains in this TED video how leveraging such hidden data will drive the next innovation on the web.

According to one study, the Deep Web is estimated to be 500 times larger than the surface Web. As the number of dynamic websites and applications increase, this number will only go up. Imagine…all that data is not available to search engines!

Google’s Approach to the Deep Web

Google’s approach to the Deep Web is to find HTML forms, send input to these forms, and index the resulting HTML pages. Simple? Not quite. How do you discover these forms? Which forms do you pick? What inputs do you send to these forms? How do your parse the structured data in the result pages?

Google takes the “Less is More” approach. They drop forms used for transactions such as credit-card purchases, interactions that the computer science community calls “POST”. To send inputs to a form Google first tries well-defined lists such as zip codes, if present. Otherwise, they compile inputs using iterative-probing to discover what to send to a form. In Alon’s experience, only a small percentage of the Deep Web qualifies for indexing. This slice, however, is hugely valuable, as it is helping to answer 1000 queries a second! Google’s approach to the Deep Web is language independent, is fully automated to scale easily, answers body and tail searches, and fits nicely with the crawl infrastructure. For further insights, read Alon’s VLDB paper published in 2008.

Kosmix’s Approach to the Deep Web

After Alon shared Google’s perspective, Anand explained that Kosmix has taken a very different approach to the Deep Web: the federated way.

Unlike Google, Kosmix does not crawl HTML forms. Instead, for any given search query, Kosmix taps into these forms in real-time through API calls, evaluates the results and organizes them into a topic page. If you wanted to look up “Pumpkin Pie” on Kosmix, for example, the system would bring you fresh content from recipe sites like the Food Network, “How To” baking videos, real-time tweets about pumpkin pie from Twitter, and information about the caloric profile of pumpkin pie from diet sites like FatSecret. A query for “AdMob,” on the other hand, will call services like CrunchBase for a company profile and Fool.com for up-to-date investor information. To provide the most relevant topic page and also avoid overwhelming these different services with too many API calls, the Kosmix system is smart enough to know which type of services to call for which query. Thus, the query for “Pumpkin Pie” would never be routed to Crunchbase. A important enabling factor for the federated approach.

So how does Kosmix decide which Web service to route a query too? The answer lies with Kosmix’s categorization technology. Over the past three years, Kosmix has created a taxonomy of several million nodes, which we organized into a graph, using a combination of humans and algorithms. Editors discover, integrate, and tag Web services to taxonomy nodes in a semi-automated fashion. Algorithms route the user’s query through the set of taxonomy nodes, which enable the engine to decide which Web service to call.

After outlining the benefits of this approach, Anand dived deeper into the need to select the right sources, and touched on the challenge of discovering and integrating data sources, layout, rankings, etc -details about which can be found in this year’s VLDB paper. Anand also explained how the federated approach is keeping pace with emerging Web trends like real-time, the explosion of Web APIs, different content types such as videos, maps, etc.

Digging Even Deeper
Last night’s audience—about 50 specialists in the search space from some of the Valley’s leading companies and startups– was some of the most engaged groups I have ever seen. Questions ranged from business models to how to do multi-way join between HTML tables. Some people even were contributing ideas. If the Deep Web is important to you, then this was a place to be.

Both Google and Kosmix have compelling yet contrasting approaches to the Deep Web. It will be interesting to see if there is a winner or simply a combination of the two.

abhishek
November 12, 2009

Deep Web: Google & Kosmix Debate Two Very Different Approaches »

Tonight at SDForum’s SearchSIG, Kosmix Co-Founder Dr. Anand Rajaraman and Dr. Alon Halevy of Google Labs will debate two very different approaches to the Deep Web.

The Deep Web is the portion of the Internet not accessible to traditional search engines.  Social networks, media-sharing sites for photos and videos, library catalogs, airline reservation systems, phone books, and all kinds of scientific databases lurk inside the Web, practically invisible today’s search tools.

The volume of this hidden content is enormous: some estimates have pegged the size of the Deep Web at up to 500 times larger than the slice of the Web we see on search engines today.

Ironically, the Deep Web hides some of the richest content on the Internet. The Web 2.0 revolution has enabled an explosion of sites dedicated to user-generated content, including Wikipedia, YouTube, Flickr, TripAdvisor and Yelp. Content on these sites grows so rapidly that it’s nearly impossible for Web crawlers to keep up. Social media sites like Facebook, MySpace, and Twitter pose the same challenge, and often add privacy protections that permit only “friends” to view certain information.

Anand and Alon are among the world’s leading experts in the field of the Deep Web, and their two companies take very different approaches to mining this data.  In this session, Alon and Anand will examine how the Deep Web will change the business of Search, offer insight into what content creators should expect as the Deep Web becomes more accessible, and comment on what this shift might mean for the end user. They will also address the question of whether the Deep Web will someday be fully exposed, and if not, why not.

Agenda:
6:30-7:00pm – Registration / Food & Drink
7:00–7:05pm – A few words about the Search SIG
7:00-9:00 pm – Program

Price
$15 at the door for non-SDForum members
No charge for SDForum members

Date:  Tonight! November 12, 2009

Location:

Cubberley Community Center
4000 Middlefield Rd., RM H-1
Palo Alto, CA
jodi
November 9, 2009

Startup Smackdown: The Movie »

SmackdownHere’s your chance to re-live all those incredible Smackdown moments.  We hear that ESPN wants to air this piece nationally, right after Monday Night Football.  You saw it here first.

Click here to see the Smackdown highlights vid: Kosmix Startup Smackdown

jodi
November 6, 2009

And the Startup Smackdown Winner Is…er, Us! »

SmackDown Logo

Chalk it up to home court advantage.

The Kosmix doubles team won last night’s Startup Smackdown tournament, after four rounds of some of the most intense ping pong the Valley has ever seen.

Nine teams competed for the Smackdown title, and the competitive spirit was palpable right from the beginning.  After cruising through the first two rounds, TheFind’s formidible pair, Ranjith “Ranji” Subramanian and Krishna “DaKriz” Ganti, narrowly defeated SkyFire in the semi-finals to face off against Kosmix in the championship game.

The Kosmix team almost didn’t make it to the final round, and had to battle hard to beat Meebo’s awesome doubles team in the semi-finals.  In the final round, TheFind and Kosmix waged a heated match to the finish.  Ankur “Neo” Jain and Nikesh “The Wall” Garera unleashed all they had, and triumphed by going (staying??) home with the coveted Kosmix Kup.

The fact that Kosmix won our own Kup means only one thing:  Next year, we’re toast.     While we were finishing up the leftover beer and pizza last night, the other teams already started training for Startup Smackdown II.

jodi
November 5, 2009

Kosmix Startup Smackdown »

SmackDown Logo

Who plays the meanest game of ping pong in Silicon Valley??

Tonight eight of the hottest startups  in Mountain View will gather at Kosmix  for the first annual Startup Smackdown ping pong tournament.   The competition looks fierce, and we hear that teams have been practicing all week in preparation for the big event.

Appearing in the Kosmix Arena tonight will be:

  • Evernote: Ron “The Octopus” Toledo & Andy “Black Widow” Kill
  • Kosmix: Nikesh “The Wall” Garera & Ankur “Neo” Jain
  • Meebo: Simon “The Smasher” Yeo & Greg “Marco Polo” Fair
  • Polyvore: Guangwei Yuan & Jianing Hu
  • Rhythm NewMedia: Khoi “Grasshopper” Dinh & Sundar “Semiconductor” Vedula
  • Skyfire: Brad “Defense” Landthorn & Sunil “Offense” Kaki
  • Talenthouse: Byron “Shrimp” Louie & Frederik “Pee-wee” Hermann
  • TheFind: Ranjith “Ranji” Subramanian & Krishna “DaKriz” Ganti

We also have an awesome press team competing in the tourney:  Jennifer “Mediaphyter” Leggio and Julie “Julie B” Blaustein.

Who will be the ultimate champion? Place your bets now!

jodi
November 3, 2009

The Real Time Web and You »

Here’s a repost of an article I wrote for Inc., on the emergence of the real time web and how your business can benefit from this trend:

One of the biggest technology trends in 2009 has been the emergence of the “Real-Time Web.” The real-time Web is a made up of technologies and practices that can inform users as soon as information is published, instead of requiring users to check for updates. The real-time Web discards the traditional notion of the more static “webpages,” and instead adopts the notion of dynamic “streams” of information. The real-time Web is also very conversational because it makes it possible to get instant responses across very large networks of people.

Action in the real-time Web started with companies like Twitter and Friendfeed, which built their own infrastructure for large scale delivery of real-time messages. By providing Web service application programming interfaces (APIs), these companies enabled many other developers to create applications based on the real-time Web. However, Anil Dash, a prominent blogger, points out that real time services need not be built on the back of Twitter and Facebook anymore. Due to emerging technologies, the pieces are falling together for creating a free, open and decentralized “pushbutton platform,” which makes it easy for websites to add real-time messaging services. With these developments, we can expect many more websites to jump onto the real-time bandwagon.

Growing importance to business

The real-time Web is becoming increasingly important to businesses in multiple ways. Firstly, as many webmasters and Web analytics companies have pointed out, the real-time Web is starting to rival search engines like Google as a source of website traffic. For example, Mark Cuban talked a few months ago about how his blog receives more visits from Twitter and Facebook than from Google. Secondly, the real-time Web opens up communication opportunities that the traditional Web could not have provided. For instance, if an airline wants to sell off its last minute tickets, the real-time Web provides a great outlet for advertising this very time-sensitive deal.  Thirdly, by making information instantaneously accessible, the real-time Web can create, or erase, instances of information arbitrage. As an example, take a look at Skygrid, a service that provides high quality financial news in real time, giving its users an edge, but at the same time leveling the playing field between professional investors and amateurs in terms of the speed of access to reliable information. Finally, because the real-time Web is very conversational, it becomes a repository of people’s sentiment, and mining this sentiment can be very useful to marketers and others.

Taking advantage of real-time Web

Beyond creating an account on Twitter, how can you take advantage of the real-time Web?  Here are some thoughts to get you started:

  • Engage with the real-time Web with tailored offers and content. Several companies are seeing success with time-sensitive programs that could not have been conceived without the real-time Web. Jet Blue’s “cheeps” and United Airlines’ twares are exclusive Twitter promotions for last minute fare deals. Another company that has encountered great success with offering exclusive deals on Twitter is Dell. A Dell blog post from June mentioned that Dell had surpassed $2 million in Twitter sales fro Dell Outlet, which sells refurbished items, scratch and dent items, and previously ordered new laptops. The real-time Web also acts as a place where people express their intent to shop (e.g. someone may tweet “thinking of buying an ipod touch.”) Selectively targeting such users, without spamming them, might also be a great way to help your customers make real time buying decisions. A service like Twitterhawk can be used to automate this kind of marketing.
  • Make use of real-time Web tools for business intelligence. The real-time Web is a great source of knowledge and sentiment about your customers, your competitors and your industry. You can use services like Firstrain to research the real Web for the news that matters to you. You could also use Twitter’s search functionality in simple ways to keep track of some of this information, or go to one of the many real time search engines. A recent article in mashable talks about the many tools that help analyze Twitter content.
  • Join in the conversation about your company. In one of my previous articles, I had talked about how companies like Comcast are using Twitter to understand their customers’ concerns and address them. The conversational nature of real time web can be very powerful in building relationships with your customers.
  • Create the infrastructure that allows your company to respond in real time. Real-time enterprise data integration has been around for a long time. However, with the emergence of the real-time Web and the opportunities it creates, it is becoming increasingly critical for companies to be able to access all their internal data in real time. In other words, “real-time data integration is no longer a luxury.”
vijay