The Art of the Database: Why Content Modeling Matters

Virtually every content site on the Internet runs off of a database, but as with many attempts at user-friendly design, that central fact is obscured from the person creating the content.

As I write this story, I have a field at the top of my page for the headline I have written, below that is my byline and then this box that I am typing the body of the text into. Invisible to me is the metadata that describes these content entities and their relationship to each other.

But if it is my job to set up this system in the first place, the database aspect of what I am doing is first and foremost. In the world of content strategy, the defining of the basic elements of an editorial system is called content modeling. And for those with a background in information architecture, this is just a variant on the more established practice of data modeling.

Mike Atherton, head of user experience at Huddle in London and a former information architect for the BBC, uses the term “domain modeling” to describe how expert knowledge about a certain subject matter is translated into the basic components of a content site.

In his talk, “Domain Modeling at the BBC,” Atherton shows how the idea of the scientific taxonomy dating back to the 18th century has become the basic armature on which we hang what we know about a subject, for instance the way teams are organized into divisions in a sports league.

This is to say that there is both an art and a science to creating content models. Create one that is too complex and full of distinctions without meaningful differences and you wind up with unwieldy content management systems (CMSs) and unhappy content creators. But if you don’t break the hard rocks of the given subject matter you wind up with an unstructured blob of content that is hard to reshape and reuse.

In the API-driven future of publishing, creating structures that are independent of format allows you to easily develop new products and facilitate use by third parties.

A slide from Mike Atherton’s “Domain modelling at the BBC”

As content strategist Rachel Lovinger at Razorfish in New York describes the process in an article on A List Apart, a content model is composed of an assembly model (how individual content items are put together to make webpages), content types (collections of components “that are distinct enough to be unique types in the system”) and content attributes (the actual elements of content and metadata that constitute each type, and their interrelations.)

On NPR’s site, for instance, each broadcast story has an inventory of components (long and short headlines and link text, full text transcriptions along with summaries of different lengths and multi-media elements like audio files, photographs and videos.) Different parts of the site use different “recipes” to mix and match these elements to assemble pages — and so do many other organizations via NPR’s robust APIs.

By focusing first on how the content will come together in a particular scenario, content modeling is more user-centric than traditional information architecture that tends to work from the elements up. Your model needs to capture the mental model that your users map the subject matter with. And remember, your users are both the consumers of the content and also the creators.

“How tolerant are your content creators of laborious processes?” Lovinger asks. Try to avoid asking editors to break unstructured content into lots of separate data fields unless it serves a functional purpose. The key is metadata. If you will need to reuse an element in a different context, it should be its own field, otherwise probably not.

Lovinger identifies three distinct types of metadata used in content management systems: structural, administrative and descriptive. Of these it is the structural metadata with which a content model is primarily concerned. Making these distinctions will make the most difference to your users. Administrative details, like the status of a given post, is housekeeping that we expect our CMSs to handle. And descriptive metadata is more the province of the domain model. Tags and keywords will be tended by the content creators themselves and can often be at least partially automated through intelligent machine learning that can extract semantic information from text.

The content model itself emerges from our engagement with the subject matter and publishing process of the given organization as a higher-level abstraction that will inform the structure of the underlying database. It is, in fact, a drawing, a diagram — a design that facilitates visual as well as informational decision-making. Most likely, the c-suite will never ask to see the actual model, but they will miss it if it’s not there.

Top image courtesy of adactio/flickr

Tags: , , , ,