Archive for the ‘categorisation’ Category
I’ve been doing quite a bit of surveying in recent weeks and I’ve been challenged over my liking for free-text fields.
My colleague/partner-in-crime was worried that the data would be too time-consuming to analyse if we didn’t turn every field into a tick box of some form. I’ve always found the free-text fields to be the ones that contain the most interesting responses so I’m willing to wade through the data.
But it wasn’t just on our side of the fence that concerns where raised. In the latest batch of responses to the survey a couple of people wrote things along the lines “why not checkboxes?” in the field in question.
(of course, if it had been checkboxes, the people who’d wanted free text wouldn’t have been able to complain to me)
An unexpected benefit of the free-text field was that I could spot the spam because they faithfully quoted our navigation back at us when asked what parts of our website they read. The humans were mostly more varied than that.
The question was about what Guardian content they read. It was deliberately vague but most people interpreted as a request for genres and listed out four or five of them. It wouldn’t have been a huge problem to have offered them the main genres and asked them to tick. It would have probably involved a little less thinking for the respondents.
I suspect people would have ticked more things if offered a list.
What I wouldn’t have got was the things that people thought were important but we hadn’t thought important enough to put on the list. A lot of people chose something surprising as one of the four or five things they specifically chose to tell us they read.
As well lots of the expected genres, the responses also included:
- specific topics and countries
- things they don’t read
- how they choose what to read
- who they read
- which supplements they like
They used language we don’t use e.g. Current affairs, Entertainment, International, Global, Arts, Finance, Opinion, Economics, Sports, IT.
I was also interested to see people using Guardian specific acronyms e.g. OBO, MBM, CiF.
Most people responded with a comma separated list which was pretty easy to turn into structured data and then just mop up the stuff that doesn’t fit nicely by hand. And that mopping up gave me an opportunity to learn the data and to begin to understand it.
This wasn’t a big scientific piece of market research, just the beginning of a conversation. And that’s best done without checkboxes.
As a classification geek, married to a tree geek I was delighted to discover treen which Wikipedia says is “a generic name for small handmade functional household objects made of wood”.
I may start collecting miscellaneous categories from different domains…
This article is part of a series about our e-commerce redesign.
The browse structure of any website is always controversial within the organisation. I’m always struck by the discrepancy between how interested the organisation is in browse (as opposed to search) and how interested the users are. I’m not saying users don’t want a sensible, intuitive navigation scheme but they also want a really effective search engine. Most web design project involve huge amounts of effort invested in agreeing the navigation and very few discussions of how search will work.
Partly this is because navigation is easy for stakeholders to visualise. We can show them a sitemap and they can instantly see where their content is going to sit. And they know the project team is perfectly capable of changing it if they can twist their arm. With search on the other hand, stakeholders often aren’t sure how they want it to work (until they use it) and they’re not sure if it is possible to change anyway (search being a mysterious technical thing).
Even forgetting search, the focus on navigation is almost always about primary navigation with most stakeholders have very little interest in the cross-links or related journeys. The unspoken assumption is still that the important journey is arriving at the homepage and drilling down the hierarchy.
So I went into the e-commerce project assuming we’d need to spend alot of time consulting around the navigation structure (but knowing that I’d need to make sure I put equal energy into site search, seo and cross-linking, regardless of whether I was getting nagged about it).
A quick glance also showed that the navigation wasn’t going to be simple to put together. Some of my colleagues thought I wasn’t sufficiently worried but I’m used to the pain of categorising big diverse websites or herding cats as Martin puts it. I participated in at least three redesigns of the BBC’s category structure, which endeavours to provide a top-down view of the BBC’s several million pages on topics as diverse as Clifford the Big Red Dog, the War on Terror and Egg Fried Rice.
My new challenge was a simple, user friendly browse structure that would cover a huge book catalogue, RNIB publications, subscriptions to various services, magazines, and a very diverse product catalogue of mobility aids, cookware, electronics and stationery. And those bumpons, of course.
Card-sorting is usually the IA’s weapon of choice in these circumstances. Now I’ve got my doubts about card-sorting anyway, particularly where you are asking users to sort a large, diverse set of content of which they are only interested in a little bit of it. Card-sorting for bbc.co.uk always came up with a very fair, balanced set of categories but one that didn’t really seem to match what the site was all about. It was too generous to the obscurer and less trafficked bits of the site and didn’t show due respect to the big guns. Users didn’t really use it, probably even the users who’d sorted it that way in the testing. My favourite card-sorting anecdote was the guy who sorted into two piles “stuff I like” and “stuff I don’t like”. Which I think also alludes to why card-sorting isn’t always successful.
In any case, card-sorting isn’t going to half as simple and cheap when your users can’t see.
We decided to put together our best stab at a structure and create a way for users to browse on screen. Again not just any old prototyping methods is going to work here – however the browse structure was created would need to be readable with a screenreader. So coded properly.
I wrote some principles for categories and circulated them to the stakeholders. Nothing controversial but it is helpful to agree the ground rules so you can refer back to them when disagreements occur later.
I reviewed the existing structure, which has been shaped over the years by technical constraints and the usual org structure influence. I also looked at lots of proposed re-categorisations that various teams had worked on. I looked at which items and categories currently performed well. I reviewed the categorisation structures as part of the competitive review.
I basically gathered lots of information. And then stopped. And looked at it for a bit. And wondered what to do next. Which is also pretty normal for this sort of problem.
(actually one of the things I did at this point was write up the bulk of this blog post – I find it really, really helpful to reset my thinking by writing up what I’m doing)
Somewhat inevitably I got the post-it notes out. I wrote out a post-it for each type of product and laid them out in groups based on similarity (close together for very similiar products and further away as the relationship gets weaker). This is inevitably my sense of similarity but remember this is a first stab to test with users.
Where obvious groups developed I labelled them with a simple word, some like books or toys. If a group needed a more complex label then I broke it up or combined it until I felt I had very simple, easily understood labels (essentially a stab at “basic categories”).
There were too many groupings and there were also a scattering of items that didn’t fit any group (the inevitable miscellaneous group). I dug out the analytics for the shop to see how my grouping compared in terms of traffic. I made sure the busiest groups were kept and the less popular sections got grouped up or subsumed.
This gave me a first draft to share with the business units. Which we argued about. A lot.
I referred everyone back to the principles we’d agreed and the analytics used to make the decisions. Everyone smiled sweetly at me and carried on with the debate.
After some advice from my eminently sensible project manager, I conceded one of the major sticking points. As I reported on Twitter at the time:
Luckily at this stage we were finally able to do some usability testing with some real users. Only four mind, but they all managed to navigate the site fine and actually said some nice stuff about the categories. One tester even thought there must be more products on the new site, in spite of us cutting the categories by two-thirds.
So if someone attempts to re-open the browse debate, hopefully we can let usability tester #2 have the last word as in her opinion the new shop is…
“very, very clearly divided up”
Enough navigation, time to concentrate on search….
Drop-down menus aren’t inherently evil but they do seem to encourage all sorts of terrible behaviour.
HMCS CourtFinder includes a menu that is certainly the worst I’ve had to interact with this year, and probably for a quite a long time before that.
The list is incredibly long. But more damagingly it isn’t in *any* order that I can see. Nor is this a list where you or I is likely to be sure exactly what the term we’re looking for is. After all types of court work isn’t a classification that most of us know off-by-heart.
I was digging around in my files this weekend and found this table I made once of different approaches to applying metadata to content. At first glance the volunteers example looks like it is only relevant to charities but alot of scenarios that refer to users tagging, it is actually volunteers tagging. The difference is doing something for your own benefit (users) or contributing something to a greater cause (volunteers).
|Users apply metadata to their own content or content they have gathered for their own use||Unpaid volunteers apply metadata to content produced by others e.g Freebase||The paid author applies metadata to their own content.||Paid metadata specialists apply metadata to content produced by others||Software applies metadata to content based on rules defined by specialists||Software applies metadata to content based on training sets chosen by specialists|
|Strengths||Cheap, real user language, subjective value judgements, highly reactive, latest trend vocab||depending on how handled can be more predictable and reliable than users, may be close to user language, can be guided more like staff, asked to go back and change||small commitment required from each staff member, expert knowledge of the content||highly motivated, objectives likely to be tied to quality of this work||more efficient than staff options||more efficient than staff options|
|Weaknesses||no guarantees of contributions, same tag to mean different things, different tags mean the same thing, cryptic personal tags, smaller interpretations drowned out, hardly anyone goes back and changes out-of-date tagging,||can require more management/attention than users, smaller number, may not make up enough hours, probably not viable in most commercial enterprises – although can still be done if company offers a free-at-consumption service that may be perceived as a public good.||low motivation and interest, may be too close to the content to understand user needs, more likely to be formal/objective||cost, needs to read the content first, may not necessarily be user focused, more likely to be formal/objective||needs operational staffing||hard to control, can be ‘black-box’, need a mechanism for addressing errors|
|Recommended environment||Large user-base, with a *selfish* motivation for users – often gathering/collecting, reasonably shared vocabulary, rarely works on a single site where the user could instead aggregate links or content on a generic site like delicious||Where you can rely on lots of good will. Probably in combination with another approach, unless a large number of volunteers are likely.||You have good historical examples of imposing new activities on the authors and getting them to follow them. Probably quite process and guideline driven organisation. Bad where your authors think of themselves as creatives…they’ll think metadata is beneath them.||Strong information management skills in the organisation. The project needs to be resourced on an ongoing basis. Business probably needs to see a very close correlation between the quality of the metadata and profit.||As for specialist staff.||Strong technical and information management skills in the organisation. An understanding from management of the ongoing need for operational staffing. Management do not believe the vendors promises.|
Really good stuff in this month’s FUMSI article by Ian Davis:
“Image indexing gets especially tricky, and really parts company from the world of document indexing, with the ‘aboutness’ access to images. By their nature images convey a myriad of messages to any number of people. Few images are not ‘about’ some type of abstract concept and few images users make no use of this important access point to image content”
I really like the fact that Ian both addresses the genuine challenges in describing ‘aboutness’ but also highlights that this is exactly what the users of image retrieval systems want.
A lot of commentators, mentioning no names, often present cataloguing and classification as librarians imposing their view of the world on the rest of us, conveniently glossing over both the usual librarian motivation of just wanting to help and the existence of a mass users who want help and not an ontological debate.
I’ve been looking at lots of alternative format bookstores, as part of the e-commerce project. One of these was the Large Print Bookshop which has a category of ‘uncategorized’.
I’m trying to imagine the scenario when the user would think “I know…it’ll be in uncategorized”? Particularly given that the choices above are ‘fiction’ and ‘non-fiction’, surely one of the better examples of exhaustive options?
If Guy is still reading, I’d love to know the thinking…
I remember when I was first working with the UNESCO thesaurus I was amused to see that ‘home-makers’ was a sub-category of women. I just thought that reflected the age of the thesaurus (it has some particularly lovely terminology around disability too).
Now I don’t expect the Daily Mail to demonstrate cutting edge social attitudes, or to be honest , to have particularly great information architecture. So I really shouldn’t have spent quite so long trying to figure out where their recipes section was buried. There is a shortcut on the homepage but I’d come in via a search engine and foolishly thought I could work out the main nav to get me to my destination.
The penny dropped eventually. It is nestled in the ‘Femail’ section, of course!
On Nov 3rd I’ll be taking in part in a panel (with Silver Oliver and Helen Lippell) as part of ISKO UK’s Semantic Analysis Technology: in search of categories, concepts & content. The seminar “aims to examine the real issues and technical challenges presented by automating semantic analysis for whatever purpose”.
Presentations by Expert Systems, Rattle Research and SmartLogic will be followed by the three of us sharing our auto-categorisation (or should that now be semantic analysis) war stories.
Following on from the controlled vocabulary resources, I dug out what I have on automatic classification.
Strangely most of the information available on automatic indexing/classification/tagging is pretty dated (although it has been a couple of years since I was immersed in this stuff daily). The most detailed stuff seems to precede the arrival of folksonomies and user tagging, perhaps the buzz around tagging sucked up all the available energy in the metadata space?
DM Review’s 2003 article on Automatic Classification is a good intro to the various types of auto-classification: rules-based, supervised learning and unsupervised learning.
CMS Review has a good list of Metadata Tagging Tools and a list of other resources at the end.
Taxonomy Strategies provide a bibliography on info-retrieval that includes automatic classification articles.
From 2004 there’s the AMeGA project and Delphi’s white paper ‘Information Intelligence: Intelligent Classification and the Enterprise Taxonomy Practice’. Download from Delphi’s whitepaper request form.
There must be more recent stuff that this. I’ll start gathering stuff on the automating metadata page.