The bane of my life when trying to work out what’s gone wrong with a search engine is the hidden thesaurus.
Lots of search software comes with a thesaurus that is referred to ‘behind the scenes’ to expand queries to include other queries that are *known to be equivalent*. Anyone who has spent even a short amount of time thinking about language can see why these things might become a problem.
(these files are doubly irritating as they’re usually set up without any kind of admin interface…the assumption being only the system administrator would or should edit it…and that they will of course be technical)
The expansion happens behind the scenes and the user isn’t necessarily told it has happened. This is usually bad. You need to be really really wary of expanding the users search queries without telling them. Don’t just give them results for aubergine and results for eggplant, when they only searched for Aubergine. You think you are being clever and helpful. If you’re wrong about the expansion then you are just being extremely irritating.
Or possibly worse than irritating.
I read a comment on the Guardian recently that suggested hor = mum in Danish. I thought that was wrong and searched for “hor mum” in Google. It wasn’t my most thought through search query but I didn’t expect Google to automatically convert it into “hot mum”. That was a bit of a surprising set of results.
(the word the commenter had misheard was mor)
This Google example demonstrates how you can end up with a worse situation that the user simply not getting the results they were looking for. But it is also different from the thesaurus examples that I started this talking about. Google do at least tell you what they’ve done and allow you to correct them. Given how uncertain query expansion is, best practice must be to tell the users what you’ve done.
If you tell the users you have two choices about how to tell them:
a) Suggest the expansion but don’t run it for them. Risks them missing it as an option.
b) Run the expansion but tell them you’ve done it. Still risks them missing the option to un-do
Google’s experimented with both approaches over the years. And currently has a bit of a mixed approach. Don’t assume their approach has “cracked” the problem.