CS-303 / 6 crédits

Enseignant: Grossglauser Matthias

Langue: Anglais


Summary

This course provides a mathematical treatment of online decision-making. It covers bandits (multi-armed, contextual, structured), Markov Decision Processes (MDPs), and related topics. Key concepts include exploration-exploitation, UCB, Thompson sampling, and tools to derive regret bounds.

Content

  • Decision-making problems in active and reinforcement learning
  • Online learning and prediction
  • Multi-armed bandits model
  • Upper-confidence bound (UCB) and Thompson sampling
  • Contextual bandits
  • Reinforcement learning model
  • Markov Decision Processes (MDP)
  • General decision-making and regret bounds
  • Large state space and function approximation

Learning Prerequisites

Required courses

CS-233 Introduction to machine learning

Recommended courses

COM-102 Advanced information, computation, communication II

COM-300 Modèles stochastiques pour les communications or equivalent

Important concepts to start the course

  • Probability and random processes
  • Data structures and algorithms
  • Information theory

Learning Outcomes

By the end of the course, the student must be able to:

  • Create mathematical models of real-world sequential decision-making scenarios
  • Develop algorithms, reason about efficiency and performance

Teaching methods

  • Ex-cathedra lectures
  • Homework series
  • Labs to develop, simulate, and analyze algorithms

Assessment methods

  • Lab reports
  • Final exam

Supervision

Office hours No
Assistants Yes

Resources

Moodle Link

Dans les plans d'études

  • Semestre: Automne
  • Forme de l'examen: Ecrit (session d'hiver)
  • Matière examinée: Principles of online decision-making
  • Cours: 2 Heure(s) hebdo x 14 semaines
  • Exercices: 1 Heure(s) hebdo x 14 semaines
  • Labo: 1 Heure(s) hebdo x 14 semaines
  • Type: optionnel
  • Semestre: Automne
  • Forme de l'examen: Ecrit (session d'hiver)
  • Matière examinée: Principles of online decision-making
  • Cours: 2 Heure(s) hebdo x 14 semaines
  • Exercices: 1 Heure(s) hebdo x 14 semaines
  • Labo: 1 Heure(s) hebdo x 14 semaines
  • Type: optionnel
  • Semestre: Automne
  • Forme de l'examen: Ecrit (session d'hiver)
  • Matière examinée: Principles of online decision-making
  • Cours: 2 Heure(s) hebdo x 14 semaines
  • Exercices: 1 Heure(s) hebdo x 14 semaines
  • Labo: 1 Heure(s) hebdo x 14 semaines
  • Type: optionnel
  • Semestre: Automne
  • Forme de l'examen: Ecrit (session d'hiver)
  • Matière examinée: Principles of online decision-making
  • Cours: 2 Heure(s) hebdo x 14 semaines
  • Exercices: 1 Heure(s) hebdo x 14 semaines
  • Labo: 1 Heure(s) hebdo x 14 semaines
  • Type: optionnel

Semaine de référence

Mardi, 15h - 17h: Cours CM1

Mercredi, 16h - 17h: Exercice, TP ELA1

Mercredi, 17h - 18h: Projet, labo, autre ELA1

Cours connexes

Résultats de graphsearch.epfl.ch.