Interview with Scott Jones, creator of the ChaCha Search Engine

chacha.PNG When I first head about ChaCha Search, I was rather skeptical about the concept, but Scott Jones, the creator of this new human-driven search engine, answered some of my criticism on WebMaster World via private message (which was pretty impressive and let me know they are definately listening to feedback). I had some questions about the service and asked him for an interview.

1. What was the inspiration for this service and how long did it take you to put it together?

The inspiration was that it is widely recognized that “search” is important and yet we are really dealing with “first inning” technology. This first generation of search engines seems to have hit a plateau that is limited by the ceiling of today’s machine intelligent solutions for things like indexing and page ranking. It took us about 6 months to build this alpha version. I came up with the “big idea” last October (2005). We then built a prototype in Jan/Feb which we used for usability and concept testing. We learned a ton! Our production version (now in experimental alpha stage) is architected and built to scale up to google levels of traffic.

2. Looking at the historical issues with bias at DMOZ and recently Digg, namely the problem with those at the top of the pyramid wielding too much control – How do you plan to deal with bias issues and making sure the top ChaCha guides are not swayed by personal gain and outside money/influence? (And on a related note: who is going to answer questions about “mesothelioma” and “viagra”?)

Our system is built to be both “Darwinian” and self-healing. There are several levels of quality assurance starting with the guide invitation process which involves guides having a “limited resource” of slots that they can allocate to those that they sign up. They are then motivated to sign up guides who will put in the most time and who will perform well enough to get the higher rate of pay ($10 instead of the normal $5 per search hour, which goes to our top 20% of performers, as ranked by users and trainer guides). Therefore, there is a self-policing mechanism that prevents highly biased guides from being successful in our system and prevents them from being exposed to users (at least it would be the exception). Furthermore, the PSR’s (previous search results) have a voting mechanism that causes more useful ones to float to the top and the highly-biased and/or less useful ones to fall off.

3. With the recent “Digg effect” you’ve experienced, it seems that a mass of early adopters tried out the service and were disappointed with the speed and results. Do you think they will give you another shot, and what do you plan to do to get back to their good graces?

To the contrary, because we launched as an “experimental” version to more properly set expectations, we’ve had mostly positive feedback with only occasional negative feedback. Those who take the time to understand what we are about (as opposed to focusing on the implementation flaws of the alpha version) are the ones who have been the most positive. We have found that those who are the most negative have a severe misunderstanding gap of what we are really doing. We consider that to be good news. We think we will be back in good graces with them if they revisit our site as the system improves, which is already beginning to happen. For those who want a fully-baked experience should definitely wait a couple of months before trying us out. In the meantime, there appear to be plenty of early adopters who are willing to see the vision and give us the benefit of the doubt.

4. What is the specific market and demographic that you are going after? What strategies do you have to reach the different segments?

We are approaching the same broad market that google, yahoo, msn, and ask are approaching. We have a top down strategy of “getting the word out” via top publications as well as a bottoms up strategy using a grassroots approach, working with guides, users, bloggers, etc…

5. With the recent AOL data leak, and general concern over privacy for users, what is your privacy policy? Will you save people’s searches and use them to refine your algorithms? Or will you use the “we don’t store your information” as a differentiator?

We do not retain user-identifying information associated with queries (as was the case for AOL’s recent problematic situation). We do store user information IF a guide reports abuse from an outside user, in which case we store the IP address for further investigation, which might involve shutting off that IP address so an anonymous user could not continue to threaten our guides and our system.

In general, we do retain the queries themselves (and the results that are returned) so that we can serve them up for other users. This is a model that is being used very successfully by which has over 40% of the market in Korea. Several US companies such as Google, Yahoo, MSN, and Ask are also storing search queries (without associating those queries with user-identifying information).

6. It looks like video and contextual ads are your primary sources of monetization. Seems SpiralFrog and ChaCha are both banking on people wanting to view video ads while they download or search. Has there been research into how much people will want/bear this online?

Only a small fraction of our advertising is through the video medium although we think we do have a unique angle for offering inventory to advertisers who want to try this relatively new form of targeted video advertising. We have a unique pre-roll opportunity that is not possible with other search engines. And, yet this is only a small fraction of the way we perform search.
The vast majority of our searches involves real-time delivery of highly-relevant results in real-time (i.e. zero delay).

7. What are the biggest weaknesses do you see with these websites/services:

a. Google Search

It’s hard to fault a company that has enjoyed the massive success of Google (or any of the top search engines). Having said that, 2 million results in a split second isn’t necessarily useful unless the first page of results is highly relevant to my query. It is interesting to note that some estimates suggest that 50% – 70% of searches are for simple navigation (as opposed to really searching for information)… i.e. people use google to get to amazon, etc… of the remainder of queries, about half the time, people never get what they want. And the other half the time, it takes on average, 11 minutes to get a relevant result (a microsoft statistic, I think). Those are sobering statistics that leave a lot of room for improvement in the search industry, especially if human intelligence and the “deep web” (the >90% of the web that the big search engines don’t even “see”) can be better leveraged.

b. Yahoo Answers

Again, hard to fault one of the world’s best search engines. Yahoo claims that “Yahoo Answers” is complementary to its search product. Unfortunately, the quality of the answers has much to be desired. However, if asking about boyfriend/girlfriend problems or the meaning of life, this is an interesting place to go. It is a social experience as opposed to a credible place to go to get information. There is a severe problem with quality control using their methodology. And yet, just as with MySpace, there is certainly a place for the “yahoo answers” type of system.

c. Google Answers

Failure to launch. People (in general) won’t pay for results. Experts won’t play (in general) when the money isn’t more dependable. It was an interesting experiment. From those results, I knew I couldn’t go in that direction.

d. Wikipedia

Love the concept… not sure about the experience always. I go there a lot to get information, but I feel like I always have to take that information with a grain of salt. From my own experience, there is an astounding level of bias in the wikipedia entries. The “history” mechanism at wikipedia helps achieve better objectivity, but few people actually use that feature.

e. MySpace

Astounding viral growth model. Difficult to monetize optimally because of content control problems.

8. After lauching the alpha and seeing the results, what do you see as your biggest challenges in the next 6 months?

Execution of a business that has more than the usual number of moving parts. I feel very good about our first week though. We rocketed to a level of traffic that is quite unusual for a start-up. We attracted more guides than I imagined (we got to our early November target just in our first week). We earned respectable advertising revenue even while just coming out of the gate. And the energy and enthusiasm of users, guides, and onlookers has been extremely encouraging. We know we are onto something good!

9. How will you tell apart people who are generally stupid from bots designed to spam the system with randomly generated questions? How do you determine which IP to ban?

We don’t discuss our ranking and protection mechanisms (for obvious reasons) other than what I already mentioned above.

10. Danny Sullivan was quoted saying “Any new search engine is going to have an extremely tough time against the major search engines, simply because those services are still working well for many people. Consider that Microsoft has spent millions in promoting their new search engine, (MSN) yet has failed to pull any significant traffic from Google or Yahoo!”” How do you plan to address this?

He’s right! We think that when we exit our alpha phase in two months that there will be a demonstrably different search experience that significantly differentiates from the existing top search engines. This is not the usual “me too” approach. Ours is a radically different approach. We think it opens up a whole new world of possibilities… just wait for “Act Two”… it’s much bigger!

Thanks for the interview, Scott.(This was our first interview on this blog – we hope to have more in the future – and get better at it. Thanks for bearing with our verbose questions.)