Honestly, metadata is really, really boring. But hey, metadata is extremely useful. Without good metadata, we can’t really solve the findability problems we have on our intranet. In order to give the right person, the right information, at the right time, at the right place and in the right way, we must use metadata extensively. Or to be more precise, we must use master metadata. In this post I’ll try to explain how we mandate the use of metadata without making it a barrier for publishing information.
Important note: I’m foremost a practitioner and I prefer systems that solve problems in a pragmatic way. I’m sure there are more elegant and correct ways of solving the problem with metadata if you ask information architects/managers and taxonomists.
There are some problems…
If it’s hard to add metadata to the information and it is mandatory to add it, then all publishers will simply use the first available metadata at hand in order to get past the mandatory metadata. This is actually even worse than no metadata. Usually we have relied on web editors to manually add keywords, but the problem is that the web editor needs (very) good domain knowledge in order to add the right metadata (keywords, subject). There should be a system that can help content editors add relevant metadata. Also, more people than web editors should be able to add metadata.
But to any problem there is a solution…
The most important thing is that it should be very easy to add metadata to any information. It needs to be designed in a way so that it only takes a few seconds to get done. So that web editors with good domain knowledge can add metadata fast. Or the web editor should be able to get help from the system, by adding metadata from a list of suggested keywords. Users should also be able to help with adding metadata to information, by tagging it.
We have decided to have three separate types of keyword metadata:
- Keywords that belong to a taxonomy like MeSH or SnoMed CT
- Keywords that are manually added by the content editor (the old standard way)
- Keywords that are added by users, i.e.. tagging
Because of the three separate types of keywords, the search engine’s relevancy model can use them in different ways, depending of the content type, content usage etc.
We have tried to design a system that is as non-intrusive as possible in order to get all content editors/contributors, users etc. to add metadata. We have developed (and open sourced) a few metadata-services (documentation now in english!), that we think are useful:
- Content analysis
- Keyword service
- Controlled lists
- Tagging service
Content analysis and the keyword service
With the press of a button, the content is sent to the metadata service where the Content analysis strips away all formatting from the content, so that the keyword service can analyze the content and identify good keyword candidates. This done by comparing the content to a larger corpus/model for information and when it finds words that are rare in the larger corpus then they are very likely to be (according to statistical models) good (unique) keywords for the document that is analyzed. The resulting keywords can then be mapped against a taxonomy, e.g.. MeSH, and if the corresponding term for the keyword is found, then we can add that as metadata as well. This is very useful as this gives the possibility for semantic data as well or linked data if you prefer. The keywords are then returned to the system that asked for the document-specific keywords. Example below with keyword suggestions for a short text about sunstroke and heatstroke from the implementation in our CMS.
What usually happens is that the keywords are presented for the content editor who then can choose the relevant keywords from the suggested list. This further improves the quality of the keywords.
Controlled lists, one type of master-metadata
Another way we use the metadata-service is to provide us with controlled lists of metadata, e.g.. Target groups, cities, subjects or document-types. We have a lot of this kind of lists and they are basically all governed by our information management people. The lists and taxonomies are all stored in our terminology/taxonomy server (Apelon DTS), an open source product. This gives us the opportunity to use the same master-metadata in many different information systems. The practical use of this is that the content editor can choose metadata from e.g. a drop-down list. Example below where the metadata element “Use for” is high-lighted:
This is necessary if we want to give the right information to the right people. This way we can use our search engine to find all documents related to a specific document type, for a specific subject, at a given geo-location that relates to a specified target group. For example we could ask the search engine to
“Give me all documents regarding personnel benefits that applies to everybody working at the HQ in Gothenburg as IT strategists”.
Of course the actual (programmatic) search query is formatted in a different way, but still it’s the same query.
Tagging
User can also add valuable and usable metadata if they are allowed to tag the information. Instead of allowing any keyword to be added by an anonymous user (we wanted to avoid swearwords, dirty words etc). We wanted to automate what tags are added and what tags are not added by using a tagging service, described in the illustration below:
This is how we work with adding metadata to our information. All of the metadata-services described here are open sourced by us or others.
Any comments are very welcome.

The Information flow part 2: Information and metadata by sys 64738, unless otherwise expressly stated, is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.




Pingback: IntranetLounge
Great post Kristian, very useful. Thanks for sharing!
I am very impressed to see how serious you take metadata, how you use it and implemented it. Metadata is the secret ingredient for a successful intranet or any information system, it is like the oil for a smoothly running engine. Metadata makes content relevant, findable, it can link content to content, content to people and vice versa and eventually people to people.
I recognise the issue that people sometimes use the wrong metadata, indeed that is worse than no metadata (and then complain their article can’t be found, duh).
Maybe metadata is boring for some people, I find it fascinating…. An inspiring article!
Cheers.
Bas, thanks for your positive feedback (and links and mentions). You really made my day. Also motivated me to write the rest of the posts in this series.
This quote I would very much like to use:
“Metadata makes content relevant, findable, it can link content to content, content to people and vice versa and eventually people to people.” – Bas Zurburg
Do you think semantic web in practice would be interesting to read about?
The only boring metadata is boring metadata. It’s not all that boring when it’s active metadata, e.g. used to define active states, replacing fixed processes.
You’re right and I guess wasn’t really clear about where I stand re: metadata.
I still think that metadata in itself is as interesting as a SQL table. It’s the use cases of the data/metadata that are both useful, interesting, fun and even cool
Indeed there is nothing more boring than an SQL table. But such clues lead me to believe that the context that you’ve jailed your metadata in is one where SQL lives: relational databases.
Metadata is a free spirited, full of power and potential when freed from the structure of relational databases. If you bypass the database and go straight to the logfile — that’s where metadata can perform its real magic. Don’t know of such a tool? Start with Traction TeamPage.
Great post Kristian, and obviously do I like the overview and strategy way forward, I did mention this in my recent (swe) eHealth2.0 blog http://bloggar.itivarden.idg.se/emergentmeccano/2010/10/25/byggande-av-tornet-i-babylon-och-martin-timell-effekten-automatiska-simultantolkar/
more soon in my research blog as well
Thanks Fredric.
I hope we can spread the word and show others this metadata-stuff and get them to use it, especially in eGov and eHealth.
Pingback: Emergent Meccano: öppen innovation inom vård & h» Blogg-arkiv » Byggande av Tornet i Babylon och Martin Timell effekten! Automatiska simultantolkar? Dagliga nyheter om vård och it - journalsystem, patientsäkerhet, kvalitetsregister, ny
Pingback: Rebooting enterprise collaboration: the emerging Social Intranet « Emergent Meccano
Pingback: Tagging, social networks and interaction « The Findability blog
Pingback: » Information flow part 1: Overview
Pingback: Enterprise Search and Findability discussions at World Cafe in Oslo (gjesteblogg)
Pingback: Tagging, Social Networks, Interaction and Findability