Yes, it sounds a bit like the phrase “jumbo shrimp,” in terms of sense, but a “long short-term memory neural network” is an important recent advance in artificial intelligence research.

The term refers to a neural network devised with “forget gates” attached to cells of memory, originally in order to model human short-term memory. But the “forget gate” can be set for any length of time, so the researchers can make this “short-term memory” last as long as they want it to, hence the oxymoronic name. The idea was first proposed in 1997 in an article by Sepp Hochreiter and Jurgen Schmidhuber.

The Vanishing Gradient

Networks that have these LSTM units as their components have certain advantages over the “vanilla version” of recurrent neural networks (RNNs). In particular, they can overcome what is known as the “vanishing gradient” problem. That is, vanilla RNNs were built to learn from new data, but to give that data a weight that depended on the amount of already-stored data on the same subject a network already possess; in other words, to model the human process of inductive learning. But once a network has already received a good deal of data, the marginal significance of the new datum can be insignificant--“vanishing.” Thus, such machines could become incapable of learning further at all. This meant that networks had to be allowed to forget, in order to preserve the salience of new data, which is the problem resolved by the newer LSTM models.

Four researchers at UC Berkeley, Haas School of Business, with advice from Ronald Kahn, the global head of scientific active equity research at BlackRock, have found that an LSTM neural network is well suited to one-year earnings per share predictions and other “fundamental valuation factors that tend to drive long-term stock results.”

The researchers were: Tao Tong, Manas Shah, Manoj Cherukumalli, and Yasmine Moulehiwy, all students of financial engineering at Haas.

They chose LSTM networks for the job because LSTM is “a flexible universal function approximator suited for time-series forecasting,” because it is free of the above-described vanishing gradient problem, and because it is well-suited to discovering long-range characteristics of a data set.

A Fundamentalist Thesis

The Berkeley scholars were inspired by recent work of Zachary Lipton and John Alberg on “improving factor-based quantitative investing by forecasting company fundamentals.”  Lipton and Alberg looked particularly at earnings before interest and taxes, EBIT over enterprise value, and book value, all over a forward period of 12 months.

Tong et al. agree with the fundamental thesis (one might also say the “fundamentalist” thesis) of Lipton and Alberg used a selection from the S&P 500 companies representing all the main sectors.

For their own study, Tong use a more focused data set than did Lipton and Alberg. They look at a selection of stocks from the S&P 500 and its performance over the period 1961-2017.

Their network used 30 items of publicly available data from each quarter (including, for example, net sales, costs of goods sold, balance sheet receivables and inventories, depreciation and amortization, etc.) to predict both one-quarter and one-year ahead company earnings.

Picking your battles

The Berkeley scholars found that their model performed much better on the one-year horizon than did the other predictive models to which they compared it: support vector regression, linear regression, or deep neural networks. Intriguingly, it did not outperform at the one-quarter horizon. In that nearer-term test, their network “failed to add any additivity when compared to the other machine learning models.” It, like the other models to which they compare it, performs poorly just one quarter out. They attribute this to the efficiency of the markets, the likely earnings one quarter down the line are already priced into the stocks.

The lesson the scholars draw from this is that neural networks, like humans, have to choose their fights carefully. The network they have created does some things (and some things quite important to a portfolio manager) quite well, and they are confident they have poured “a solid technical foundation for further expansion” or work on LSTM networks and the markets.