What Publishers Need to Know About Google’s Patented News Article Rankings
Google is the publisher’s angel — and its devil. The search engine claims to connect “1 billion unique users a week to news content,” from 50,000 sources in 30 languages. (By comparison, a large publisher like The New York Times averages about 12 million unique visitors a week.)
On the other hand, it also takes a share of the advertising revenue generated by all of that traffic. But no publisher or content producer can afford to ignore how Google ranks the news.
A recently published patent application by the company sheds light on the factors it uses to decide which news stories to promote in its rankings. Notably, these newly updated metrics are surprising for how they explicitly reward larger publishers. It is unclear the extent to which these are actual proxies for editorial quality or rather a concerted effort to appease publishers who have sued Google in the past for profiting from content it does not own or license.
Intentions aside, it is most helpful for actual content producers to familiarize themselves with the story level signals that Google monitors and uses to calculate its adjusted news rankings, which are the subject of this patent. For large publishers, the lesson is to try to maintain their scale despite the encroachment of the Internet — and Google.
Google is giving print and broadcast news priority over “pure players, aggregators or digital native organizations.”
Frédéric Filloux, general manager for digital operations at Les Echos Groupe, wrote an analysis of these metrics on the Monday Note blog he collaborates on with former Apple executive, Jean-Louis Gassée. Filloux speculates that Google is giving print and broadcast news priority over “pure players, aggregators or digital native organizations,” because, “legacy media are less prone to tricking the algorithm. For once, a known technological weakness becomes an advantage.”
Behind this cynicism is the very real fact that even in their attenuated state, large publishers do exert more quality control than the stereotypical blogger in their pajamas. And Google is indeed giving them a very clear reason to try to maintain (and even improve) those standards.
The first three metrics refer to “the number of articles produced … during a given time period, an average length of an article [and] the importance of coverage from the news source.” These favor large publishers, but they select for originality and focus as well. Google is actually “counting the number of original sentences” and comparing the output about the given subject to the news source’s competitors.
The “breaking news score” is a metric of how quickly a publisher jumps on a story. A high score here, however, can be undermined if it is not subsequently supported by the in-depth coverage rewarded above.
Google is actually “counting the number of original sentences.“
The next three factors, usage patterns, human opinion and circulation statistics, are different ways of assessing popularity. These include Google’s own PageRank algorithms as well as unspecified surveys of reader preference and stats like Nielsen Netratings. Needless to say, there is a large fudge factor in this area.
In a further nod to “legacy media,” metrics eight and nine measure the number of individual bylines associated with a news source as well as the number of news bureaus. Unclear how much this will affect head counts at large publishers.
One of the most interesting metrics is the “number of original named entities the source news produces within a cluster of articles.” This is a way of rewarding reporting that goes beyond the herd and actually breaks news or provides additional salient details. It also reinforces the overarching lesson of search engine optimization that it is always beneficial to name specific search terms.
This is a way of rewarding reporting that goes beyond the herd and actually breaks news or provides additional salient details.
Publishers are also promoted for their “breadth of coverage,” as well as the “international diversity” of their audience. Again, these metrics favor large publishers.
The biggest surprise, perhaps, and the metric that is the most actionable by every content producer irrespective of scale is what Google characterizers as “writing style.” Simply put, this measures consistency of grammar, accuracy of spelling and other indicators of editorial craft.
Filloux writes that, “In the Google world, this means statistical analysis of contents against a huge language model.” In the world of writers and editors it means slowing down so that too many typos do not compromise the first-mover advantage of breaking the news.