September 2009

SharePoint search: more insights

Surprisingly this white paper on building multilingual solutions in SharePoints provides a good overview of how the search works, regardless of whether you are interested in the multilingual aspect.

White paper: Plan for building multilingual solutions.

Read page 15, titled “overview of the language features in search” for a description of content crawling and search query extraction. Then 16-18 provide a good overview of individual features and what they are doing.

Word breakers A word breaker is a component used by the query and index engines to break compound words and phrases into individual words or tokens. If there is no word breaker for a specific language, the neutral word breaker is used, in which case word breaking occurs where there are white spaces between the words and phrases. At indexing time, if there is any locale information associated with the document (for example, a Word document contains locale information for each text chunk), the index engine will try to use the word breaker for that locale. If the document does not contain any locale information, the user locale of the computer the indexer is installed on is used instead. At query time, the locale (HTTP_ACCEPT_LANGUAGE) of the browser from which the query was sent is used to perform word breaking on the query. Additional information about the language availability of the word breaker component is available in Appendix B: Search Language Considerations.

Stemming Stemming is a feature of the word breaker component used only by the query engine to determine where the word boundaries are in the stream of characters in the query. A stemmer extracts the root form of a given word. For example, ”running,” ”ran,” and ”runner“ are all variants of the verb ”to run.” In some languages, a stemmer expands the root form of a word to alternate forms. Stemming is turned off by default. Stemmers are available only for languages that have morphological expansion; this means that, for languages where stemmers are not available, turning on this feature in the Search Result Page (CoreResult Web Part) will not have any effect. Additional information about language availability for the Stemmer feature is available in Appendix B: Search Language Considerations.

Noise words dictionary Noise words are words that do not add value to a query, such as ”and,” ”the,” and ”a.” The indexing engine filters them to save index space and to increase performance. Noise word files are customizable, language-specific text files. These files are a simple list of words, one per line. If a noise word file is changed, you must perform a full update of the index to incorporate the changes. Additional information about the noise words dictionary and how to customize it is available at www.microsoft.com.

Custom dictionary The custom dictionary file contains values that the search server must include at index and query times. Custom dictionary lists are customizable, language-specific text files. These files are used by Search in both the index and query processes to identify exceptions to the noise word dictionaries. A word such as “AT&T,” for example, will never be indexed by default because the word breaker breaks it into single noise words. To avoid this, the user can add ”AT&T” to the custom dictionary file; as result, this word will be treated as an exception by the word breaker and will be indexed and queried. These files contain a simple list of words, one per line. If the custom dictionary file is changed, you must perform a full update of the index to incorporate the changes. By default, no custom dictionary file is installed during Office SharePoint Server 2007 Setup. Additional information about the custom dictionary file and how to customize it is available at www.microsoft.com.

Thesaurus There is a configurable thesaurus file for each language that Search supports. Using the thesaurus, you can specify synonyms for words and also automatically replace words in a query with other words that you specify. The thesaurus used will always be in the language of the query, not necessarily the server’s user locale. If a language-specific thesaurus is not available, a neutral thesaurus (tseneu.xml) is used. Additional information about the thesaurus file and how to customize it is available at www.microsoft.com.

Language Auto Detection The Language Auto Detection (LAD) feature generates a best guess about the language of a text chunk based on the Unicode range and other language patterns. Basically, it’s used for relevance calculation by the index engine and in queries sent from the Advanced Search Web Part, where the user is able to specify constraints on the language of the documents returned by a query.

Did You Mean? The Did You Mean? feature is used by the query engine to catch possible spelling errors and to provide suggestions for queries. The Did You Mean? feature builds suggestions by using three components:

· Query log Information tracked in the query log includes the query terms used, when the search results were returned for search queries, and the pages that were viewed from search results. This search usage data helps you understand how people are using search and what information they are seeking. You can use this data to help determine how to improve the search experience for users.

· Dictionary lexicon A dictionary of most-used lexicons provided at installation time.

· Custom lexicon A collection of the most frequently occurring words in the corpus, built at query time by the query engine from indexed information.

The Did You Mean? suggestions are available only for English, French, German, and Spanish.

Definition Extraction The Definition Extraction feature finds definitions for candidate terms and identifies acronyms and their expansions by examining the grammatical structure of sentences that have been indexed (for example, NASA, radar, modem, and so on). It is only available for English.

search
sharepoint

Comments (0)

Permalink

content strategy, cont.

I laughed out loud at the accusation in the comments that content strategy, as defined in Rachel’s article, was an ‘expansive transposition’ followed by the explanation that we already have “user experience”, a term not unfamiliar with land grabs.

As soon as people don’t want to be constrained by their job titles, they start redefining their job titles as matching their interests. I realise I’m running the risk of falling foul  of my own critique.

Uncategorized

Comments (0)

Permalink

content strategy: another job title in the mix

A good while back, Chris Sizemore pointed me at Rachel Lovinger’s article Content Strategy: The Philosophy of Data.

He thought I might find it interesting as Rachel’s take on content strategy overlapped alot with how my IA team at the BBC had formed, as opposed to IA out in the digital agency world. Our heartland was content management and search, we did content audits, metadata models and were entranced by visions of linked data. We were part of a wider UX team, but it was an often difficult relationship, with many designers (UX or otherwise)  seeing us as cuckoos in the nest.

You’ll have probably noticed that I’m uncomfortable with the growing voices within the IA community to re-brand us all as UX designers. And I was struck by Rachel’s comments on her blog that a discussion of URIs didn’t seem at home at the IA summit . That’s led me to have a better look at what is going on in the content strategy field than I did when Chris first told me I ought to.

So as well as Rachel’s article, here’s some content strategy reading material

Apologies to those of you who still find IA a new-fangled job title.

Uncategorized

Comments (0)

Permalink

book: Shop Class as Soulcraft by Matthew B. Crawford

I’ve been reading extracts of Shop Class as Soulcraft: An Inquiry Into the Value of Work by Matthew B. Crawford. Crawford has a PhD in Political Philosophy, once worked writing abstracts for an academic journal service and now runs a motorcycle repair shop. His book, which began as an article in the New Atlantis, champions the virtues of using your hands to make and repair things.

He tells some fairly depressing tales of cubicle life:

“The quota demanded, then, not just dumbing down but also a bit of moral re-education, the opposite of the kind that occurs in the heedful absorption of mechanical work. I had to suppress my sense of responsibility to the article itself, and to others — to the author, to begin with, as well as to the hapless users of the database, who might naïvely suppose that my abstract reflected the author’s work. Such detachment was made easy by the fact there was no immediate consequence for me; I could write any nonsense whatever….

A good job requires a field of action where you can put your best capacities to work and see an effect in the world. Academic credentials do not guarantee this…

The good life comes in a variety of forms.”

via The Case for Working With Your Hands – NYTimes.com.

craft
happiness
work

Comments (0)

Permalink

document accessibility

Web accessibility is a reasonably familiar topic for IAs but document accessibility is also important. Here’s some considerations for your typical Word documents.

To support screen magnification and other adjustments:

  • don’t set the text to black. choose automatic (if you set the text to black and the person reading has the colours reversed for ease of reading then all of your text will disappear)
  • use a simple clear font e.g. Ariel
  • avoid italics
  • use left aligned text including headings (screen magnification users often don’t realise there is content that is centred or right aligned)
  • don’t use other colours for fonts (the RNIB training specifically asks us not to use fancy colours like purple. I don’t think it was particularly aimed at me)
  • use 14 point text as the standard font size (this seems huge to me, but this is our recommended standard as meeting the needs of most readers)

Screen readers with speech output

  • use the correct Word styles
  • use heading hierarchies to communicate the structure of the document

You’ll note the advice is less detailed for screen readers. This mirrors my experience with web design in the RNIB. Outside the RNIB most accessibility conversations I heard focused on the challenge of designing for screenreaders but the challenges are much greater in designing for both magnification users and fully sighted users at the same time.

accessibility

Comments (0)

Permalink

BCS IRSG – Search Solutions 2009

I’m going to “Innovations in Web and Enterprise Search” at BCS next week

Search Solutions is a special one-day event dedicated to the latest innovations in web and enterprise search. In contrast to other major industry events, Search Solutions aims to be highly interactive and collegial, with attendance limited to 60-80 delegates.

Provisional programme

09:30 – 10:00 Registration and coffee

Session 1: (Chair: Tony Russell-Rose)

* 10:00 Introduction – Alan Pollard, BCS President

* 10:10 “Enterprising Search” – Mike Taylor, Microsoft

* 10:35 Accessing Digital Memory: Yahoo! Search Pad – Vivian Lin Dufour, Yahoo

* 11:00 “How Google Ads Work” – Richard Russell, Google

11:25 – 11:45 COFFEE BREAK

Session 2: (Chair: Andy MacFarlane)

* 11:45 “Location-based services: Positioning, Geocontent and Location-aware Applications” – Dave Mountain, Placr

* 12:10 “Librarians, metadata, and search” – Alan Oliver, Ex Libris

* 12:35 “UI Design Patterns for Search & Information Discovery”- Tony Russell-Rose, Endeca

13:00 – 14:15 LUNCH

Session 3: (Chair: Leif Azzopardi)

* 14:15 “Search-Based Applications: the Maturation of Search” – Greg Grefenstette, Exalead

* 14:40 “How and why you need to calculate the true value of page 1 natural search engine positions” – Gary Jennings, WebOptimiser

* 15:05 “Search as a service with Xapian” – Richard Boulton, Lemur Consulting

15:30 – 16:00 TEA BREAK

Session 4: (Chair: Alex Bailey)

* 16:00 “The Benefits of Taxonomy in Content Management”, Andrew Maisey, Unified Solutions

* 16:25 Panel: “Interactive Information Retrieval” – details to follow

17:00 – 19:00 DRINKS RECEPTION

via BCS IRSG – Search Solutions 2009.

events
search

Comments (0)

Permalink

ia deliverables

A recent conversation with a friend generated shock (and even a little scorn) that I’d been producing wireframes. I was firmly entreated to sketch instead. Around the same time a recruiter approached me with information on a job that would require detailed annotated UI specs of around 40 pages every fortnight.

The profession is still judged, by and large, by the quality of our documentation. Most recruiters and hiring managers seem more interested in the quality of annotation than the quality of thinking.

I’m rather inconsistent in my approach to documentation. Mostly the medium is picked for the context. Is the project agile? How good are the developers? Is there a remote team? Do lots of people need to be consulted? What are their reading preferences?

Whilst I’m happier with pen and paper  than computer, I think it is far to say that I doodle a good deal more than I sketch.  Now there’s always a way to get chickens into a blog post… this little trio were sketched during a conference presentation, presumably a scintilating one and probably about something 2.0 related given the labelling of the fowl.

Chicken conference doodles

In fact, it appears I doodle most when irritated by the speaker. In this case , rather than asking an insightful question to highlight the cliched and superficial nature of the argument, I wrote “blog, wisdom of the crowds, whatever”. That told him, I’m sure. I do still want this mug though:

Angry (?) conference doodles

None of this is what my friend had in mind though. She’d like this more: part user journeys, part concept map, but mostly not very pretty. Not really for sharing (apart from with you lot, of course) but it could be re-jigged into something more respectable.

Book discovery sketch

I do these little pages all the time but again they aren’t for collaborative purposes. This one was so I could sanity check we had all the functionality we’d need on the product backlog before the supplier drew up the drawbridge.

Homepage sketch

Then of course, there’s cheating. Those search forms I shared recently were created in Visio but with the sketchy stencil:

E-commerce search forms: scope drop-downs


I very rarely do this kind of documentation anymore. My business stakeholders are bored by them and the developers are best told what to do by pointing over their shoulders.

Wireframe and sitemap

I do still do content models. This kind of specification still gets traction with the developers:


Book content model

But, horror of horrors, a lot of my documentation these days is actually reasonably high-fidelity mock-ups. These are really aimed at the business stakeholders. Colours and fonts are pretty much fixed by our visibility requirements, so the business units know better than to ask for their favourite shade of puce.  And they worry less if they don’t have to try and visualise from wireframes. It doesn’t take me any longer as I’ve got a colour stencil and the choices are pretty limited.

Page mock-up

Is this ironic? I’m working for an organisation of and for blind people and I’m producing the most colourful deliverables ever.  But then you should see the colour of the office floors.

deliverables
drawing
ucd

Comments (2)

Permalink

refreshable braille displays

Some of my colleagues use screen readers with braille display output. I’d never come across this particular form of access tech and it wasn’t immediately obvious why the braille displays are necessary…or indeed how they worked. They look a bit space-age, or rather 1960s Sci-Fi movies idea of space age.

According to Wikipedia:

“The mechanism which raises the dots uses the piezo effect of some crystals, where they expand when a voltage is applied to them. Such a crystal is connected to a lever, which in turn raises the dot. There has to be a crystal for each dot of the display, i.e. eight per character.”

An RNIB training video made the why clear.  The braille output is often used in combination with speech output and it is particularly useful for punctuation, spelling and codes. These can’t be easily heard in the speech output, at least not without seriously compromising your ability to listen to the speech comfortably. You can ask the screenreader to speak all the punctuation and spell out words but you wouldn’t always want it to be doing that. And the braille display is much more like reading, as opposed to listening which could make it easier for precision work and for remembering. The video featured a computer programmer explaining how valuable the braille display is for her when reading computer code.

They’re not cheap though. The Braille display available from the RNIB shop is £1,195.00 (Ex. VAT).

accessibility

Comments (0)

Permalink

might finally get to London IA in the Pub

As Martin has picked my local The Harrison for the next London IA gathering, and moved it so it doesn’t clash with the Linked Data meeting … I might actually get to an IA in the Pub meeting.

In fact, I’ll probably be the first one there.

London IA in the Pub – London IA.

events

Comments (0)

Permalink

search forms on online shops

I’ve been thinking about the search functionality for our online shop this week. I’ll write up our approach to search properly at a later date but for now I thought I share the variety of search forms I’ve seen on other online shops.

E-commerce search forms: simple boxes

E-commerce search forms: labelled boxes

E-commerce search forms: scope drop-downs

E-commerce search forms: guidance text

Some things of note:

  • The longer search boxes were mostly on book sites.
  • 3 sites also offered “suggestions as you type” (Amazon, Borders, Ocado)
  • Only 1 site had an obvious link to an advanced search
  • All sites handled scopes with a dropdown

(Visio stencil is from GUUUI)

e-commerce
search

Comments (4)

Permalink