Inaccurate News Analytics: When Robots Get Things Wrong | Portfolio for the Future

A new study prepared for the Federal Reserve Board looks at the use of algorithms to read and interpret financial news. While there have been a lot of studies that have looked at this topic, one unique feature of this new paper, “First to ‘Read’ the News,” is that it discusses what happens when the algorithms get something wrong.

The authors look specifically at RavenPack (RP), the leading Big Data analytics provider. Among many other tasks, RP’s algorithms determine how relevant the story is to each company mentioned in it, and whether it is favorable or unfavorable to that company for every article in the Dow Jones Newswire.

This scoring is very predictive. If RP rates an article as highly relevant and positive to a certain publicly owned company, it is very likely the price of that company’s stock will rise. And vice versa when the article is highly relevant and negative. Further, the market displays “almost no reaction to articles with a low relevance score.”

This raises the question: is this a causal link or merely a predictive one? Are market buyers responding to the underlying good news or to the ranking of that good news as relevant by RP?

Imperfect Algos

Algorithms aren’t perfect, and RavenPack has over time improved its assessment of relevance. It has also done the back-filling—that is, the world now knows how the new and improved algos would have rated the relevance of stories that broke while the older algos were in use. For simplicity’s sake, let us simply say that we know when the old algos were “wrong,” either overstating or understating relevance.

This gives the authors of the FRB study their chance to examine when markets are reacting not to the actual news, but to the algorithmic mis-estimation of the news. Suppose the algo said that a certain datum about XYZ was highly relevant and negative, but the new algo, in retrospect, says it should have been rated of low relevance (even if still negative). Did the price of XYZ head downward more than it “should” have?

The authors of the study are: Bastian von Beschwitz, of the FRB’s International Finance Division; Donald B. Keim, Wharton School, University of Pennsylvania; and Massimo Massa, INSEAD, France. They described the sort of algo mistake described in the above paragraph as an LRH: one in which an actual Low-relevance article was Released as of High relevance. They find that LRH mistakes do raise or lower prices respectively above or below what the actual news merits, but that their impact is fleeting.

The market, these authors say, “[I]nitially over-reacts to the incorrect information, realizes the inaccuracy, and quickly corrects after 30 seconds.” That 30 seconds is plenty of time for a “smarter” algo HFT program that estimated relevance originally better than RP did, to take advantage of the mistake and to make a lot of money for its owners. Still, as humans measure time, this “30-second” finding is an impressive rate of self-correction for markets, and more of a vindication than not for underlying efficient capital market theories. Or as von Beschwitz et al. point out, their data “suggests that the market is quite resilient against disturbances from inaccurate news analytics.”

Further Questions

Having established that news analytics, right or wrong, have an impact on the stock market, these authors also ask which article types, fed through such analytics, have the largest impact. On the one hand they look at press releases directly released by the company and carried verbatim on the Dow Jones wire. On the other hand, they look at the work product of Dow Jones’ reporters. Their conclusion: “The effect of RavenPack is mainly concentrated in press releases.” This is because traders and their own algos care more about what RP has to say about press releases than they do about how it scores articles.

Another observation, intriguing and perhaps helpful for those who might want to try to create alpha by frontrunning RP and its kin: algorithmic traders dynamically re-assess the significance of RP scores over time.

The “causal effect of RavenPack on the 5-second return is stronger if RavenPack has been more informative in the past 6 months [suggesting] that algorithmic traders learn dynamically about the precision of RavenPack, and that they rely more heavily on RavenPack’s sentiment scores if these scores have been more informative in the past. Such learning could be programmed into their algorithms (machine learning) or can come from [humans who are] manually updating their algorithms over time.”