3H Project: Introduction to Reinforcement Learning

Reinforcement Learning is a mathematical and computational framework for studying how an agent can learn to make decisions through interaction with an environment. Unlike supervised learning, where correct answers are usually provided in advance, reinforcement learning is based on trial and error, rewards, and long-term consequences. This project will introduce the basic ideas of reinforcement learning, including Markov decision processes, policies, value functions, Q-learning, policy gradients, and simple deep reinforcement learning methods.

Group project

The group project will provide an introduction to the main ideas and methods of reinforcement learning. We will work through selected parts of the Hugging Face Reinforcement Learning course, supported by standard textbook material.

By the end of the group project, students will have learned the basic language and concepts of reinforcement learning, practised explaining these concepts to each other, and implemented small computational examples where appropriate. Topics may include agents and environments, reward functions, exploration versus exploitation, value functions, Bellman equations, Q-learning, policy gradients, and simple neural-network-based RL methods.

Mode of Operation and Evidence of Learning

The group will work through selected readings, online course material, and computational exercises. Students will meet regularly to discuss the material, divide preparation tasks, explain concepts to each other, and develop short Python demonstrations of key algorithms where appropriate. Supervisor meetings will be used to clarify difficult points, review progress, discuss code and model behaviour, and guide the group towards the final presentation and oral examination.

Evidence of learning will include participation in group discussions, weekly diary entries recording progress and contributions, short student-led explanations of the material, Python code demonstrations or computational experiments, the final group presentation, and the oral examination. Students will be expected to explain both the mathematical idea behind an algorithm and the behaviour of their implementation. For example, they should be able to explain what an agent is learning, how rewards affect behaviour, why exploration matters, and what can go wrong when the reward or environment is poorly chosen.

Individual project

The individual project will allow students to explore a more specialised reinforcement learning topic or application. Possible topics include reinforcement learning for simple games, trading or portfolio strategies, exploration versus exploitation, comparison of value-based and policy-based methods, reinforcement learning in control problems, or the limitations of RL methods such as instability, reward misspecification, and poor generalisation.

Mode of Operation and Evidence of Learning

Students will choose an individual direction, in consultation with the supervisor, and investigate it through a mixture of reading, computational experiments, and written explanation. Evidence of learning will be provided through the final written project, which should demonstrate understanding of the chosen reinforcement learning method or application, explain the relevant mathematical or computational ideas, and, where appropriate, include working Python code and analysis of the results.

Pre-/co-requisites

Students should be comfortable with basic probability, linear algebra, and Python programming. Some familiarity with machine learning would be useful but is not essential.

Resources

Sutton and Barto, Reinforcement Learning: An Introduction; Hugging Face Deep Reinforcement Learning Course; David Silver’s Reinforcement Learning lectures; selected papers, notes, or documentation depending on the chosen individual-project topic.