In last week’s post about Best Bets I commented that search software is “certainly not good enough without a lot of work. A lot of expensive work. If your supplier says ‘the search is really good, you don’t need to worry about it’ then you definitely need to worry about it.”
Worrying about and testing search systems has been a common theme in my working life: whether that involves benchmarking the performance of existing system, testing a new one prior to launch and comparing vendors when choosing a new system.
I’ve had varying levels of exposure to APR Smartlogik, Google, Inktomi/Yahoo, Fast, Verity, Autonomy, SharePoint. At this moment I’m in the middle of testing and tweaking the search for a SharePoint powered website. The challenges are surprisingly similiar to those I encountered when working with Muscat in 2001.
Having gone through such similar processes so many times, now seemed a good time to write it all down. I’ve divided my process into three stages: preparation, running the tests, and making changes.
1. Ask the suppliers lots and lots of questions. You are after actual answers, testing their level of knowledge and letting them know that the quality of the search matters to you. Don’t rely wholy on the suppliers answers. Find other users and do your own reading to validate what the supplier tells you.
Most important to find out:
- Ranking criteria
- What is configurable; of those configurations which have a graphical interface; and of those which have a user friendly graphical interface?
Other useful things to find out:
- What query syntax is supported? What is the default syntax?
- What are the stemming rules and which words are stop words? Ask for copies
- Is there a default thesaurus? Ask for a copy
- What will the crawl timescales be during testing?
- How to construct queries using the URL query strings
2. Build a list of test queries. You really need hundreds. Good sources are:
- Names of a pages/articles on your current site or items in your catalogue
- Real queries from your search logs or from a similar site if you can find someone willing to share
- Obvious variants of these terms – thesaurus, misspellings, abbreviations
- Known problems – ask for feedback from users
- Include a range of specific items, broad topics and ambiguous queries
Your list could be a simple list of terms but you’ll find it easier to run many rounds of tests if you set your list up as http links that will run the query in your test search engine.
If you are testing multiple search engines and you have access to coding skills then you can set up the list to run automatically across the range of search engines and display your result back to you, saving lots of time. Or if you are running multiple rounds of testing on the same search system, an interface that checks to see if the results have changed since last time is invaluable.
But for most of us, we’ll be working from a list of queries and running them one by one.
Next: Running the tests