Principles of online decision-making
Summary
This course provides a mathematical treatment of online decision-making. It covers bandits (multi-armed, contextual, structured), Markov Decision Processes (MDPs), and related topics. Key concepts include exploration-exploitation, UCB, Thompson sampling, and tools to derive regret bounds.
Content
- Decision-making problems in active and reinforcement learning
- Online learning and prediction
- Multi-armed bandits model
- Upper-confidence bound (UCB) and Thompson sampling
- Contextual bandits
- Reinforcement learning model
- Markov Decision Processes (MDP)
- General decision-making and regret bounds
- Large state space and function approximation
Learning Prerequisites
Required courses
CS-233 Introduction to machine learning
Recommended courses
COM-102 Advanced information, computation, communication II
COM-300 Modèles stochastiques pour les communications or equivalent
Important concepts to start the course
- Probability and random processes
- Data structures and algorithms
- Information theory
Learning Outcomes
By the end of the course, the student must be able to:
- Create mathematical models of real-world sequential decision-making scenarios
- Develop algorithms, reason about efficiency and performance
Teaching methods
- Ex-cathedra lectures
- Homework series
- Labs to develop, simulate, and analyze algorithms
Assessment methods
- Lab reports
- Final exam
Supervision
| Office hours | No |
| Assistants | Yes |
Dans les plans d'études
- Semestre: Automne
- Forme de l'examen: Ecrit (session d'hiver)
- Matière examinée: Principles of online decision-making
- Cours: 2 Heure(s) hebdo x 14 semaines
- Exercices: 1 Heure(s) hebdo x 14 semaines
- Labo: 1 Heure(s) hebdo x 14 semaines
- Type: optionnel
- Semestre: Automne
- Forme de l'examen: Ecrit (session d'hiver)
- Matière examinée: Principles of online decision-making
- Cours: 2 Heure(s) hebdo x 14 semaines
- Exercices: 1 Heure(s) hebdo x 14 semaines
- Labo: 1 Heure(s) hebdo x 14 semaines
- Type: optionnel
- Semestre: Automne
- Forme de l'examen: Ecrit (session d'hiver)
- Matière examinée: Principles of online decision-making
- Cours: 2 Heure(s) hebdo x 14 semaines
- Exercices: 1 Heure(s) hebdo x 14 semaines
- Labo: 1 Heure(s) hebdo x 14 semaines
- Type: optionnel
- Semestre: Automne
- Forme de l'examen: Ecrit (session d'hiver)
- Matière examinée: Principles of online decision-making
- Cours: 2 Heure(s) hebdo x 14 semaines
- Exercices: 1 Heure(s) hebdo x 14 semaines
- Labo: 1 Heure(s) hebdo x 14 semaines
- Type: optionnel
Semaine de référence
| Lu | Ma | Me | Je | Ve | |
| 8-9 | |||||
| 9-10 | |||||
| 10-11 | |||||
| 11-12 | |||||
| 12-13 | |||||
| 13-14 | |||||
| 14-15 | |||||
| 15-16 | |||||
| 16-17 | |||||
| 17-18 | |||||
| 18-19 | |||||
| 19-20 | |||||
| 20-21 | |||||
| 21-22 |
Légendes:
Cours
Exercice, TP
Projet, Labo, autre