Sunday, June 7, 2009

Baker's dozen: Methodology? What methodology?

Before I go much further in subjecting search engines to my questioning, two overall disclaimers:
  • Again, there is absolutely no research or experimental protocol behind the questions. I wrote them up in about five minutes off the top of my head.
  • I aimed for the kind of questions Google would not do particularly well on but the latest generation might. Being a habitual Google user, I may well have missed.
And some finer points:

It's become very clear that it's not very clear how to measure partial success. For most of these particular closed-ended questions, full success is reasonably easy to judge: Look for a concise, correct answer presented in direct response to the question. But what about the typical Google answer of links and snippets that may well point you directly at the real answer? It's clearly not full success, but it's still quite useful in practice. What score should it get?

What about a question that's easily answered by a slightly different search question and a little link chasing? This is the status quo, and to some extent people are attuned to that. Is there anything wrong with having to learn how to use a tool? We have to learn how to use cell phones, music players, video web sites and so forth. Care and feeding of search engines is now taught somewhere around grade school or middle school (and often learned even earlier).

What would an actual experimental protocol look like? Where would you get your search questions? From a sampling of the population at large, likely biased by years of Google use? From people who don't routinely use Google (or other current-generation search engines)?

What are we trying to model? I can think of at least three possibilities, each with its own arguments for and against:
  • The current population
  • A population of people coming to the whole "search engine" thing completely cold
  • The "steady-state" condition in which everyone has had a chance to learn and get used to whatever search engine is being tested.
All of this a long-winded way of saying "Hey, this is just tire-kicking. The results are meant to be grist for discussion and nothing more."

No comments: