Seasons Of Code

R(ea)L Trader    • Pratyush Agarwal    • Raaghav Raaj, Yash Gupta   

WnCC - Seasons of Code

Seasons of Code is a programme launched by WnCC along the lines of the Google Summer of Code. It provides one with an opprtunity to learn and participate in a variety of interesting projects under the mentorship of the very best in our institute.

List of Running Projects

R(ea)L Trader

R(ea)L Trader

The end goal is to deploy a RL based agent for automated stock trading.

No. of mentees: 6


Reinforcement learning might sound exotic and advanced, but the underlying concept of this technique is quite simple. In fact, everyone knows about it since childhood!

As a kid, you were always given a reward for excelling in sports or studies. Also, you were reprimanded or scolded for doing something mischievous like breaking a vase. This was a way to change your behaviour. Suppose you would get a bicycle or PlayStation for coming first, you would practice a lot to come first. And since you knew that breaking a vase meant trouble, you would be careful around it. This is called reinforcement learning.

The reward served as positive reinforcement while the punishment served as negative reinforcement. In this manner, your elders shaped your learning. In a similar way, the RL algorithm can learn to trade in financial markets on its own by looking at the rewards or punishments received for the actions.

In the realm of trading, the problem can be stated in multiple ways such as to maximise profit, reduce drawdowns, or portfolio allocation. The RL algorithm will learn the strategy to maximise long-term rewards.

A quote sums it up perfectly, “AlphaZero, a reinforcement learning algorithm developed by Google’s DeepMind AI, taught us that we were playing chess wrong!”

The project would involve reading up about RL, doing some simple assignments on it, learning about stocks and market, and then applying RL techniques for trading. Enthusiasm in stock markets, AI/RL and willingness to spend time and effort is the only pre-requisite. However having any knowledge/experience about the mentioned topics might help, so mention it in the proposal. If you don’t have any prior background in these fields, write about your general skills, programming languages you are comfortable in, and other things you find relevant. Keep the proposal short and crisp. Also in your proposal, mention your brief understanding of the below article (1 page max)

Reading resources :

Sample websites to test out trading strategies :

Tentative Project Timeline

Week Number Tasks to be Completed
Week 1 Read up about MDP, Reinforcement Learning, Stock markets and trading in general.
Week 2 Develop RL agents for sample problems (Like maze solving), read up about different strategies in trading, alpha discovery and generation, try and test out different alphas. Start participating in online quant trading tournaments (continue this is coming weeks as well)
Week 3 Play around with simple AI models for trading, find datasets, develop metrics to test strategies. Explore RL algorithms like Q-Learning, Deep Q-Learning and apply such algorithms to simple games like tic-tac toe.
Week 4 Develop RL agent for automating stock trading, work with 1-2 stocks initially and a small dataset, observe performance of different algorithms on it, backtest the agent, optimise it and slowly expand the dataset to include more stocks and larger time periods.
Week 5 Finish up coding the agent, simulate trading on unseen data using it, check for possible errors and optimisations. Test the agent on various quant-trading websites, participate in tournaments and win prizes !


Checkpoint Number Progress
1 Developing RL agent for solving a maze
2 Learning to optimise different alphas and participation in online quant trading tournaments.
3 Understanding Deep Q-Learning algorithm, experience replay and coming up with intuition for alpha generation for trading
4 Making simple RL agent for stock trading involving 2-3 stocks and small time period and backtesting it.
5 Extending the above agent to large number of stocks over larger time periods and employing it and evaluating its performance