Principles of online decision-making
Summary
This course provides a mathematical treatment of online decision-making. It covers bandits (multi-armed, contextual, structured), Markov Decision Processes (MDPs), and related topics. Key concepts include exploration-exploitation, UCB, Thompson sampling, and tools to derive regret bounds.
Content
- Decision-making problems in active and reinforcement learning
- Online learning and prediction
- Multi-armed bandits model
- Upper-confidence bound (UCB) and Thompson sampling
- Contextual bandits
- Reinforcement learning model
- Markov Decision Processes (MDP)
- General decision-making and regret bounds
- Large state space and function approximation
Learning Prerequisites
Required courses
CS-233 Introduction to machine learning
Recommended courses
COM-102 Advanced information, computation, communication II
COM-300 Modèles stochastiques pour les communications or equivalent
Important concepts to start the course
- Probability and random processes
- Data structures and algorithms
- Information theory
Learning Outcomes
By the end of the course, the student must be able to:
- Create mathematical models of real-world sequential decision-making scenarios
- Develop algorithms, reason about efficiency and performance
Teaching methods
- Ex-cathedra lectures
- Homework series
- Labs to develop, simulate, and analyze algorithms
Assessment methods
- Lab reports
- Final exam
Supervision
Office hours | No |
Assistants | Yes |
Dans les plans d'études
- Semestre: Automne
- Forme de l'examen: Ecrit (session d'hiver)
- Matière examinée: Principles of online decision-making
- Cours: 2 Heure(s) hebdo x 14 semaines
- Exercices: 1 Heure(s) hebdo x 14 semaines
- Labo: 1 Heure(s) hebdo x 14 semaines
- Type: optionnel
- Semestre: Automne
- Forme de l'examen: Ecrit (session d'hiver)
- Matière examinée: Principles of online decision-making
- Cours: 2 Heure(s) hebdo x 14 semaines
- Exercices: 1 Heure(s) hebdo x 14 semaines
- Labo: 1 Heure(s) hebdo x 14 semaines
- Type: optionnel
- Semestre: Automne
- Forme de l'examen: Ecrit (session d'hiver)
- Matière examinée: Principles of online decision-making
- Cours: 2 Heure(s) hebdo x 14 semaines
- Exercices: 1 Heure(s) hebdo x 14 semaines
- Labo: 1 Heure(s) hebdo x 14 semaines
- Type: optionnel
- Semestre: Automne
- Forme de l'examen: Ecrit (session d'hiver)
- Matière examinée: Principles of online decision-making
- Cours: 2 Heure(s) hebdo x 14 semaines
- Exercices: 1 Heure(s) hebdo x 14 semaines
- Labo: 1 Heure(s) hebdo x 14 semaines
- Type: optionnel