Using Artificial Intelligence in portfolio allocation

We use Reinforcement Learning (Also known as Q-Learning) to train our agent, Meeta, in making decisions around portfolio allocation between equities and fixed income. The results are surprising!

The term Artificial Intelligence is often misused and abused. What exactly artificial intelligence is, could just as well be a topic of an entire PhD thesis and you’d still not have a conclusion to it. For our purposes, however, let us specify Artificial Intelligence as a machine-driven decision-making system. Now, an important distinction must be made at this point between making decisions and making forecasts.

It might rain tomorrow, is a forecast.

I will take an umbrella to work tomorrow, is a decision.

In finance, time-series-based forecasting models are extensively used to predict stock price returns, option values amongst others. These models are extremely difficult to develop and train because of a major statistical problem called non-stationarity. A data series such as stock returns of an index is often non-stationarity because the mean (return) and standard deviation (of the return) are not constant over a period. This is a rather frustrating problem for model developers because every model successfully developed and launched comes with an unknown expiry date.

Decision-making models however can adapt to the problem of non-stationarity. And one such is the popularly known reinforcement learning or q-learning technique. Reinforcement learning is essentially training an agent to make decisions in different ‘states’ based on the available action-set. The agent is given a reward for making a good decision or a penalty for making a bad decision. As the agent is trained over and over and over on all the possible states confronted with all the available actions, its decision-making ability improves with the acquired memory of the rewards. Much the same way as us humans learn from our mistakes and become wiser as we get older (at least that’s what I am told :)).

We developed a learning algorithm to train our agent, Meeta, and to get her to answer the popular question of how much to invest in stocks. We introduced Meeta in one of our earlier posts. If you haven’t read that, check it out here. Here is what we told her.

  • Equity market returns are centred around 12% (mean) with a standard deviation of 20% and follow the Cauchy Distribution. Note: Don’t worry if you don’t know the properties of Cauchy distribution. The important thing to remember is that we are telling Meeta that on average stock returns are 12% per annum AND 80% of the time, the returns are between -48% and +72% per annum.

  • Fixed deposit returns are 5% per annum.

  • Our goal is to have Rs.100 in X number of years. We will vary the value of X between 1 year and 15 years. Our baseline, the minimum must-have in our account at the end of this period, is Rs.80.

  • Our starting capital is between Rs.1 and Rs.99. We will let this vary as well.

  • Meeta’s job now is to tell us, in each of these ‘states’ denoted by (Amount of capital already available, Number of years available), what % of our wealth should go into stocks.

  • An example question is – Meeta, I have Rs.55 in wealth now. I would like to have Rs.100 in 5 years. If fixed deposits return 5% per annum and if stock returns have a mean of 12% per annum and a standard deviation of 20%, what % of my wealth should I invest in stocks? And what % in fixed deposits?

  • Meeta gives us an answer. We follow that investment strategy through and check how much wealth we have at the end of X years (or 5 years in the above example).

    • If at the end of it, we have Rs.100 or more, we give Meeta +50 points. Good job!

    • If we have between Rs.80 (our baseline) and Rs.100, we give Meeta 0 points.

    • If we have anything less than Rs.80, we penalise her. The penalty starts at -0.125 points for an ending wealth of Rs.79 and goes linearly down to -50 points for a final wealth of Rs.0.

We play this game over and over and over with her to let her figure out what the best strategy is. And the results, are super-interesting.

Meeta, I have 1 year left - Tell me what to do?

Say we tell Meeta that we only have 1 year left to achieve our goal.

Observe the chart on the left side first – That chart shows Meeta’s recommendation of the optimum % allocation to equities for different starting wealth. There are a few very interesting things it is telling us.

  • Number one – You see it flatlining at 0 at the end of the curve? It is Meeta’s way of telling us, that if our starting wealth is over Rs.94, to just get fixed deposits – “you’ll have your 100 bucks and I will have my 50 points”! But I suspect you already knew that!

  • Number two – You see that same curve flatlining at 0 around that Rs.80 mark? That is Meeta saying, “If your starting wealth is around Rs.80, don’t bother with investing in stocks. It is too difficult to get to Rs.100 by the end of the year. No point in taking any risk and going below your baseline. So, stick to fixed deposits”. Interesting, isn’t it?

  • Number three – Notice how the curve ramps up from 0 to 100% at the beginning and stays there? If our starting wealth is anything less than say Rs.50, Meeta says, we should put everything in the stock market. That sounds nuts – If we only have 1 year left, considering anything can happen in the stock market, why put 100% into stocks if our starting wealth is already a measly 50 bucks? That’s because of what is happening on the chart on the right – the reward function for Meeta. Meeta’s reward in points is negative for most values less than Rs.80. Meeta’s desired state is to be at the goal of Rs.100 where she gets the most points. That’s her plan A. She is however OK to take 0 points by getting us to Rs.80 – that is plan B. Bear in mind, at a starting wealth of Rs.76 she is assured of non-negative points because she can tell us to invest it all on fixed deposits and walk away with our baseline of Rs.80. But she is deeply unhappy about being on the lower left-hand side of the curve. Consequently, she is trying to rush to the top right of the curve as quickly as possible. And the only way to get there is to invest aggressively in stocks.

  • If you think about it, this is quite intuitive. We want to have Rs.100 at the end of the year. If not 100, at the very least, Rs.80. Anything else is undesirable. Given that, and a starting wealth of Rs.50, our best chance of getting into one of these regions is to aggressively pump money into the stock market. Kapisch? Moving on.

Meeta, I have 15 years left - Tell me what to do?

Once again, let us focus on what is going on the left chart which is Meeta’s recommendation of the optimum % allocation to equities for different starting wealth. It looks rather curious. Can you decipher the message?

  • There are 3 kinks on the curve. The first one is a drop from 100% allocation to equities to about 40% at a starting wealth of about Rs.30 (that steep drop). This happens because, at a starting wealth of Rs.30, the impact from losing money and going bust (to Rs.0) by putting it all on stocks is too high compared to the opportunity of making money and moving right to higher levels of wealth. Imagine you are climbing a steep mountain. In the early stages, the impact from the fall is going to be small. So, you can be aggressive. But as you climb and get to somewhere near the midpoint, the impact of a fall will be much more. So, you’ll have to be extra careful. That is what is happening here – Meeta becomes more conservative.

  • Curiously, however, it climbs up again. This is because Meeta starts getting greedy looking at the possibility of getting to basecamp 1 (Rs.80) which becomes more and more likely with higher & higher starting wealth.

  • The second kink is the drop from about 50% allocation to close to 0% allocation at a starting wealth of Rs.40. This is straightforward – With a starting wealth of Rs.40, it is quite easy to get to Rs.80 by simply putting it all on fixed deposits and leaving it be for 15 years. Remember, Meeta likes 0 points more than negative points. So, our allocation goes all the way down.

  • The third and final kink, the smallest of it all, is the increase in equity allocation from nearly zero to about 10% at a starting wealth of Rs.45. Can you think about why this is? This is because, with Rs.45 as starting wealth, the probability of getting to Rs.100 by investing some money in equities is higher than the probability of losing money and going below Rs.80. All of Rs.45 invested in fixed deposits for 15 years will get us to Rs.94. So, we only need a little bit extra to get to Rs.100 and Meeta says, “just about 10% into stocks should do the trick”.

Every other scenario

Any other scenario in terms of time frame from 2 years to 14 years is in between these two boundary cases. Here are some of the other charts.

There are other subtleties in this. Here are some questions to ponder about.

  • Notice how the starting wealth at which we go from zero to all-in to stocks keeps increasing as the time frame increases. Why do you think that is?
  • Compare Meeta’s Reward Function when there is 1 year left compared to when there are 15 years left. Notice its transition to a rather smooth curve? What do you think this implies to how she makes decisions?

What does this mean for your investment strategy? Should you go all into equities and forget about fixed deposits? Should you go all into fixed deposits and forget about equities? Well, the answer is – it depends! How badly do you want to get to Rs.100 by the end of your time period? What is your absolute must-have minimum amount? Are you the kind of person who says some wealth is better than nothing? Or are you the kind of risk-taker who wants all or nothing? The answers to these questions decide whether or not you take the recommended allocations from these charts above.

But of one thing you can be sure – The application of reinforcement learning to the problems of traditional finance is an absolutely exciting field. We will cover more of it in future cover stories. Stay tuned!

P.S: The code to run this algorithm is on Github. You can access it here.

P.P.S: To a very large extent, I have hidden the mathematics behind the approach. It requires an understanding of Bellman equation and dynamic programing – none of which are essential to interpret & reflect on the results. If you are keen to know more, drop me a note.

Share this post

Share on facebook
Share on twitter
Share on linkedin
Share on pinterest
Share on email