Harvesting Predictability

The difficulty of predicting things is, like many human endeavors, intuitively viewed with a competitive model. We only really give credit to people for forecasting when their abilities exceed our own, and at higher levels when they exceed the best that humanity can martial. Few are impressed by 100% accurate predictions of eclipses now that orbital mechanics is well understood, but once this was thought to be wisdom bordering on magic. The edge of predictability, which is in reach of some people, but not others is where it can be harvested for value. This often takes the form of investing or or of gambling.

The Efficient Market Hypothesis

The E.M.H. states that asset prices fully reflect all available information. A direct implication is that it is impossible to “beat the market” consistently on a risk-adjusted basis since market prices should only react to new information or changes in discount rates. --Wikipedia

The EMH comes in a few forms, the "weak" form being the most reasonable, largely because it suggests that this the EMH is only true "in the long run". I propose an even weaker form of the EMH, which states:

prices fully reflect all available information and predictive techniques. A direct implication is that it is impossible to “beat the market” consistently on a risk-adjusted basis since market prices should only react to new information, new techniques or changes in discount rates.

It doesn't seem like a big change, but it is. It suggests that if you have developed a new, cutting edge, better-than-everybody-else-playing-the-market technique, it is absolutely possible to beat the market. Furthermore, you can keep beating the market until everyone else catches on to what your doing and starts doing it themselves. The gotcha is you need to be better than a bunch of very smart, very motivated, very well funded people. Beating the market is a little like having olympic potential in track and field. It's possible, but if you think your going to do easily, your probably deluding yourself.

A Simple Stock Market Game

Imagine a game where a single six sided die is rolled. The player who rolls the die gets $1,000 for every pip that shows on the die. Each round players can bid for usage of the die.

The players are going to consistently bid the price up to about $3,500 per roll. Any players who deviate from that strategy are either going to go broke (and hence be eliminated from the game) or never end up with a bid that wins and allows them to roll the die. (which is a lot like not playing at all) Those who are paying ~$3,500 a roll are about breaking even.

Now imagine that Nostradamus shows up at the table… he's not really playing, but just occasionally murmuring predictions that are pretty darned good. Imagine that he is right about 50% of the time.

The folks standing next to him can hear him, and notice how good he is. If he murmurs “six” then they can happily bid up to 4,500. If he murmurs 1, then they will skip that round. Those players quickly start making a tremendous amount of money. Other players are bound to notice they are getting fleeced pretty fast and try to figure out what the winners are doing. Soon everyone is trying to stand next to Nostradamus as well, and hear his predictions. Those who can hear his predictions are following along with the new Nostradamus-aware strategy. They bid up to about $4,500 when he says “six”, and down near around $1,700 when he says “one”. Even the people on the other end of the table who can't hear him know that something is up, and when the other side of the table starts bidding up past $3,500 they try to cluster in with the “smart money”. Those who repeatedly play against the “Smart Money” either end up making themselves too poor to play… or are too timid to make an appreciable number of bets.

Competitive Prediction

One instructive trait of this game is that Nostradamus' predictions are neither self fulfilling nor self defeating, but the bidding process is competitive, and so they quickly loose their edge. For a player to do noticeably better they need to have access to Nostradamus before everyone else does. Nostradamus generates predictability, and those who use it on the market are harvesting it.

Once it has been completely assimilated, it stops being valuable because it is no longer a competitive advantage. The predictability is “priced in” to the bid for the die roll. A lot of people think of the stock market as being structured to create “self defeating prophecies”, but being "self-defeating"1 isn't what makes it hard to beat the market, rather it is hard to beat because it is competitive, there is a mad rush to harvest any predictability that can be found. Both of these patterns feel frustratingly unbeatable, but they are actually a quite different.

Your Own, Personal, Nostradamus

As part of a Machine Learning Nanodegree I recently implement my own personal Nostradamus in the form of a support vector regressor. An SVR is a pretty powerful machine learning tool, the absolute bleeding edge of the field in about 1992.2 SVRs are so powerful that I suspected it would be able to predict stock prices based on previous stock prices (using a variety of stocks in the market) in violation of the Efficient Market Hypothesis. That is… I thought it would be able to do this *until* enough people on wall street were able to hear it's “murmured” predictions. Once the investors caught on, just like with Nostradamus in the stock market dice game it would be "priced in".

I ran my SVR against historical stock market data. Starting in 1970, for every day of every year I used the “current” prices of a few stocks to try to predict the next day's price of IBM. As I had suspected until the late 90's it worked really, really well. I had essentially simulated traveling back in time with an SVR to see how well it would do. When only I had access to "Nostradamus", I was able to generate market dominating predictions. The SVR stops dominating the market almost exactly when the wall street investors began to maser the technique, and this seems like no coincidence. I'm sure there are a number of firms who made a fortune using SVR's to harvest predictability when SVRs were still cutting edge.

This Diagram reflects the findings from my SVR. For each year I trained it on the previous year's stock prices, and then called on it using "current" data to make a prediction on the stock price of IBM for for the "next day". The blue dots are the mean squared error of the SVM's predicted stock price. If you look closely you can see them rise dramatically in 1987, as a result of black monday, and then again in the late 90's. The best fit line for these dots is shown in blue.

The red dots represent the multiplier on their investment that someone would get if they simply bought IBM at the beginning of the year and sold it at the end. This can be seen as a baseline to beat. it's best fit line is shown in red, and is fractionally higher than 1, about 1.07 (as we would expect). The Green dots represent the multiplier on investment that one would get if they bought on any day the SVM said that the stock would go up, and shorted the stock on any day the SVM said it would go down. We can see some absurdly successful years for this strategy, including one year where it would have produced returns of 15x on any money invested. The green, best fit, line clearly demonstrates how this predictability was harvested out of the market in the 90's and entirely gone by the mid 2000's

For those interested in reading more about this particular experiment, you can read my Machine Learning Nano-Degree Capstone Paper, it's about 14 pages long with lots and lots of pretty graphs, and sort of dry and awkward writing3.

If you are interested in playing with the actual code, you can download it. It's a fairly easy to understand and takes the form of a python based jupyter notebook. It runs quite quickly on a modern laptop.

  • 1. A market is fascinatingly complicate, and it definitely creates self defeating prophecies in some cases, as well as creating "self fulfilling prophecies" in other ones.
  • 2. SVMs close to their current form were first introduced with a paper at the COLT 1992 conference (Boser, Guyon and Vapnik 1992), http://www.svms.org/history.html
  • 3. It's dry and awkward because it was written to a very specific format that didn't exactly fit this research. Such is life