EE-623 / 4 crédits
Enseignant: Odobez Jean-Marc
Remark: Next time: Spring 2024
Every 2 years
The course will cover different aspects of multimodal processing (complementarity vs redundancy; alignment and synchrony; fusion), with an emphasis on the analysis of people, behaviors and interactions from multimodal sensor, using statistical models and deep learning as main modeling tools.
Multimodal processing is at the core of the perception of our world: we see, we hear, we touch, we smell, we taste, and we move. Being able to analyze and combine multimodal streams of data is therefore an inherent ability that should be addressed by artificial intelligence systems, and comprises several core challenges like how to represent multimodal information? How to deal with modality asynchrony? How to fuse complementary or redundant information? How to do Co-training of models?
The course will cover this topic, with a particular emphasis on the analysis of people and of their behaviors, including interactions, from multimodal sensor streams (with a bias towards vision and audio). We will rely on Bayesian statistical models and deep learning as main modeling tools. Within this domain, the course will comprise different lectures, labs, and further reading about the following topics:
1. Introduction to multimodal processing - application and main issues
2. Fundamentals in Bayesian models and deep learning
3. Audio processing: voice analysis, speaker diarization, sound localization
4. Video processing: fundamental tasks - person detection, body, face and hand representation detection and learning; single and multiple person audio-visuel tracking.
5. Multimodal processing: joint representation learning, alignment, fusion
6. Multimodal Activity recognition: gaze and attention models; facial expression and emotion recognition; gesture recognition; social analysis, multi-person and multi-modal action and interaction recognition.
Multimodality, human activity analysis, interactions, machine learning, computer vision, audio processing.
Undergraduate-level knowledge of linear algebra, statistics, image and signal processing.
Introduction to machine learning course.
Written and oral.
Pattern Recognition and Machine Learning, C·. Bishop, Springer, 2008.
Ressources en bibliothèque
Dans les plans d'études
- Nombre de places: 20
- Forme de l'examen: Ecrit & Oral (session libre)
- Matière examinée: Perception and learning from multimodal sensors
- Cours: 28 Heure(s)
- Exercices: 10 Heure(s)
- TP: 18 Heure(s)