Archive for the ‘metadata’ Category
As a general principle it is best not to go overboard on defining SharePoint content types. They add power to information retrieval but also add content creation overheads. Keep the number of types reasonable and also the number of metadata fields. (Obviously the art is defining what ‘reasonable’ means)
A list of reasons to define a specific content type:
- you want to attach a document template for that content type
- there’s a standard workflow for that content type
- there’s a standard info policy for that content type
- you want properties of the content type to be possible to search through advanced search
- you want to restrict a search to that content type
- you want to be able to sort a list or library by a specific metadata field of the content type
- you want to categorise a list or library by a specific metadata field of the content type
See also Microsoft’s Managing enterprise metadata with content types
I was digging around in my files this weekend and found this table I made once of different approaches to applying metadata to content. At first glance the volunteers example looks like it is only relevant to charities but alot of scenarios that refer to users tagging, it is actually volunteers tagging. The difference is doing something for your own benefit (users) or contributing something to a greater cause (volunteers).
|Users apply metadata to their own content or content they have gathered for their own use||Unpaid volunteers apply metadata to content produced by others e.g Freebase||The paid author applies metadata to their own content.||Paid metadata specialists apply metadata to content produced by others||Software applies metadata to content based on rules defined by specialists||Software applies metadata to content based on training sets chosen by specialists|
|Strengths||Cheap, real user language, subjective value judgements, highly reactive, latest trend vocab||depending on how handled can be more predictable and reliable than users, may be close to user language, can be guided more like staff, asked to go back and change||small commitment required from each staff member, expert knowledge of the content||highly motivated, objectives likely to be tied to quality of this work||more efficient than staff options||more efficient than staff options|
|Weaknesses||no guarantees of contributions, same tag to mean different things, different tags mean the same thing, cryptic personal tags, smaller interpretations drowned out, hardly anyone goes back and changes out-of-date tagging,||can require more management/attention than users, smaller number, may not make up enough hours, probably not viable in most commercial enterprises – although can still be done if company offers a free-at-consumption service that may be perceived as a public good.||low motivation and interest, may be too close to the content to understand user needs, more likely to be formal/objective||cost, needs to read the content first, may not necessarily be user focused, more likely to be formal/objective||needs operational staffing||hard to control, can be ‘black-box’, need a mechanism for addressing errors|
|Recommended environment||Large user-base, with a *selfish* motivation for users – often gathering/collecting, reasonably shared vocabulary, rarely works on a single site where the user could instead aggregate links or content on a generic site like delicious||Where you can rely on lots of good will. Probably in combination with another approach, unless a large number of volunteers are likely.||You have good historical examples of imposing new activities on the authors and getting them to follow them. Probably quite process and guideline driven organisation. Bad where your authors think of themselves as creatives…they’ll think metadata is beneath them.||Strong information management skills in the organisation. The project needs to be resourced on an ongoing basis. Business probably needs to see a very close correlation between the quality of the metadata and profit.||As for specialist staff.||Strong technical and information management skills in the organisation. An understanding from management of the ongoing need for operational staffing. Management do not believe the vendors promises.|
There’s a post on the CMS Watch Blog about the challenges of achieving a metadata-driven publishing model:
“The content needs metadata for this to work. Many will tell you that “people won’t tag.” No, seriously, they won’t tag content with the right labels, add the right metadata, or correctly categorize, “even if threatened with being fired.” And even if they do tag, it will be haphazard and inconsistent.
This is a very real problem. But at the same time it’s complete nonsense. Because if this were the case, why would people meticulously tag and file their holiday snapshots on Flickr and Facebook? Somehow, in their spare time, they do identify the people in a picture, add keywords to a shot, give it a meaningful title, and actually describe it. Without having to be threatened with being fired, or even having to be beaten with a stick.
Partly this is because they get the feedback that makes it worth their while to do so. If you identify your friends in a picture on Facebook, they (and then their friends) will immediately find it and start commenting, which creates a positive feedback loop to tag some more. More importantly though, it’s really easy.”
Really good stuff in this month’s FUMSI article by Ian Davis:
“Image indexing gets especially tricky, and really parts company from the world of document indexing, with the ‘aboutness’ access to images. By their nature images convey a myriad of messages to any number of people. Few images are not ‘about’ some type of abstract concept and few images users make no use of this important access point to image content”
I really like the fact that Ian both addresses the genuine challenges in describing ‘aboutness’ but also highlights that this is exactly what the users of image retrieval systems want.
A lot of commentators, mentioning no names, often present cataloguing and classification as librarians imposing their view of the world on the rest of us, conveniently glossing over both the usual librarian motivation of just wanting to help and the existence of a mass users who want help and not an ontological debate.
James Robertson of Step Two has published Metadata fundamentals for intranets and websites
The article is a great intro and neatly captures several of my metadata hobby horses.
Capture what you need:
“As discussed in the previous section, metadata is a burden on the authors of the content, and one that they may not fully understand or support.For all these reasons, only metadata that has a concrete and immediate need should be captured. Don’t set up metadata fields to support potential future uses” “authors may not have the skill, time or inclination to enter consistent and high-quality metadata.”
“It takes several person-years of work to develop a taxonomy, making it hard to justify, even though the return on investment will be several times the initial cost.
In the shorter term, organisations should therefore look to simpler approaches to metadata, pending the development of a more extensive taxonomy.”
Users must be motivated to tag:
“While tagging has proven to be successful on sites such as these, its use on corporate websites and intranets is much less clear. The motivation and purpose for end users to tag our content is not obvious, and this is key to the tagging approach.”
Following on from the controlled vocabulary resources, I dug out what I have on automatic classification.
Strangely most of the information available on automatic indexing/classification/tagging is pretty dated (although it has been a couple of years since I was immersed in this stuff daily). The most detailed stuff seems to precede the arrival of folksonomies and user tagging, perhaps the buzz around tagging sucked up all the available energy in the metadata space?
DM Review’s 2003 article on Automatic Classification is a good intro to the various types of auto-classification: rules-based, supervised learning and unsupervised learning.
CMS Review has a good list of Metadata Tagging Tools and a list of other resources at the end.
Taxonomy Strategies provide a bibliography on info-retrieval that includes automatic classification articles.
From 2004 there’s the AMeGA project and Delphi’s white paper ‘Information Intelligence: Intelligent Classification and the Enterprise Taxonomy Practice’. Download from Delphi’s whitepaper request form.
There must be more recent stuff that this. I’ll start gathering stuff on the automating metadata page.
Michelle asked for resources when starting out with taxonomies and I’d coincidently been compiling some stuff for other nefarious purposes, after have to search for the same old set of stuff for the umpteenth time.
So I’ve made a page of CV resources. There’s the basics now but I’ll be adding more.
Only just realised our programme model is publicly available: BBC programmes ontology
We now have a top-level directory of /ontologies/. I’m not quite sure what I think of that. The metadata-geek in me is tickled. The bit that spent hours going through the list of all the random top-level directories is uneasy.