This Data Scientist Spent A Year Deep Inside The New York Times. Here’s What He Discovered.

By Sam Petulla

“If, as I’ve been led to believe, this is a post-pageview world, then we must be living in a zombie apocalypse, as I’m relentlessly haunted by the metric’s lifeless corpse.”

Those are the words of Brian Abelson, a data scientist and the 2013 OpenNews Fellow at The New York Times, who spent the last year at The Times using data and analytics to understand Times content.

In particular, he focused on the pageview’s value to the organization and looked into other metrics it might use. Abelson had access to one of the most coveted datasets in publishing, The New York Times’ web and social traffic, to draw his conclusions.

In November, Abelson blogged on some of his research, setting off reactions ranging from thankful to wildly speculative by popular media commentators. Sparking the discussion was his use of sabermetrics — a popular sports statistics technique seen in the film Moneyball — to understand how promotion affects Times content performance.

It also put him on the spot, with reactions coming from all sides of the social web, some good, others a lot less so. The Strategist wanted to talk to Abelson further about his year at The Times, as he attempted to create a better set of metrics focused on measurements of human response to media, like impact and behavioral change.

Abelson keeps a blog where more of his analyses can be found.

Brian Abelson

How should traditional journalistic outlets utilize data differently than other publishers, such as digitally-native or brand publishers?

A journalism publisher usually says: We have an idea of what’s responsible journalistically and we want to hold ourselves to that. It doesn’t necessarily open up space for the type of randomized experiments that really make data useful. There’s no randomization of headlines like at Upworthy.

At The Times, there’s no sense that it’s worth their while to use data to make decisions on editorial content. If there were, it would be in context of putting data in front of the people who can make decisions rather than the data making decisions for itself. There will always be this line between the editor and the information and the data. The traditional editor is central.

But I think the quickest way to learn stuff about what works and what doesn’t on a site is to run a well-designed experiment, which is a problem of selecting the number you optimize for. But I’m not convinced The Times is wrong in thinking that they don’t need to use data how sites like Upworthy or BuzzFeed do.

Then what are some of the metrics that might be optimized for? What does “engagement” even mean?

If you’re going to create a data-driven business model, you have to be very, very clear about the numbers you’re going to push up. And those have to be very measurable. If you don’t have that, it just adds complication. There’s no way you can apply same metric to every piece of content, because different people have different things they have to do.

So, you need to pick data for every single task in the newsroom and decide what metric you’re actually trying to maximize. The homepage editor follows one metric, a social media editor follows something else. Then you can make decisions about emotional reactions or creating a conversation.

But, as it is, there’s a lack of connection between what people do and the outcomes from those decisions. If people knew how they could push content up and down along a metric, and use their intuitions, they could try that out. But today, they don’t know the relationship between those efforts.

What is “Engagement” to you? Time on page seems like a start.

As Nieman Lab just wrote, serious journalism provokes a higher rate of commentary than cats wearing piñatas, which produce more sharing. Metrics need to be relevant. What about dynamically changing measurements based on the anticipated emotional or psychological responses from the articles?

I think most of metrics we have are around first order things, like drawing people in and attention and eyeballs and not so much around other responses. Second order measurements, such as cognitive ability to make decisions or process information differently, are rarely measured. But time has to play a larger role.

The difference is whatever number is in the equation. If you change that number, then your outcome looks completely different. You have to figure out the number or set of numbers you want to push your website to maximize.

But, unfortunately, no one knows what to optimize for everyone. When people think about optimizing with data, they have a specific vision of what that entails: this is BuzzFeed optimization or The Hufffington Post or UpWorthy.

What needs to improve in the ways we use data and have conversations about using data in newsrooms and publishing?

They should think: Wow, how can we better provide people with content that is important for them to see? They should think about their readers and context. Because, people go to different websites for different things. It’s not through one portal.

It’s kind of like asking broad questions about how humans consume information, and how they engage with content online. It varies from site to site.

A scientific approach is needed to make conclusions. Because in one sense, people are asking petty questions about relationship between engagement and impact and promotion and impact. But they can only answer that question in their context, not broadly across the Internet.

Relatedly, if everyone is going to move forward as an industry, past the pageview, the relationship between publishers and advertisers is going to need to change. And it will have to happen across the board, in an organized fashion.

A single publisher today can’t just be like, “We report these numbers because this is a better representation of engagement.” That doesn’t work for advertisers because they want to make a decision by comparing lots of different sites. They need a way of optimizing from their spreadsheets and their ad buys, too.

Contently arms brands with the tools and talent to become great content creators. Learn more.

Image by Army Medicine / Flickr.com