Perception and learning from multimodal sensors


Lecturer(s) :

Odobez Jean-Marc




Every 2 years


Next time: Spring 2021


The course will cover different aspects of multimodal processing (complementarity vs redundancy; alignment and synchrony; fusion), with an emphasis on the analysis of people, behaviors and interactions from multimodal sensor, using statistical models and deep learning as main modeling tools.


Multimodal processing is at the core of the perception of our world: we see, we hear, we touch, we smell, we taste, and we move. Being able to analyze and combine multimodal streams of data is therefore an inherent ability that should be addressed by artificial intelligence systems, and comprises several core challenges like how to represent multimodal information? How to deal with modality asynchrony? How to fuse complementary or redundant information? How to do Co-training of models?

The course will cover this topic, with a particular emphasis on the analysis of people and of their behaviors, including interactions, from multimodal sensor streams (with a bias towards vision and audio). We will rely on Bayesian statistical models and deep learning as main modeling tools. Within this domain, the course will comprise different lectures, labs, and further reading about the following topics:


1. Introduction to multimodal processing - application and main issues


2. Fundamentals in Bayesian models and deep learning


3. Audio processing: voice analysis, speaker diarization, sound localization


4. Video processing: fundamental tasks - person detection, body, face and hand representation detection and learning; single and multiple person audio-visuel tracking.


5. Multimodal processing: joint representation learning, alignment, fusion


6. Multimodal Activity recognition: gaze and attention models; facial expression and emotion recognition; gesture recognition; social analysis, multi-person and multi-modal action and interaction recognition.


Multimodality, human activity analysis, interactions, machine learning, computer vision, audio processing.

Learning Prerequisites

Required courses

Undergraduate-level knowledge of linear algebra, statistics, image and signal processing.

Recommended courses

Introduction to machine learning course.

Assessment methods

Written and oral.



Pattern Recognition and Machine Learning, C·. Bishop, Springer, 2008.


Ressources en bibliothèque

In the programs

    • Semester
    • Exam form
       Written & Oral
    • Credits
    • Subject examined
      Perception and learning from multimodal sensors
    • Number of places
    • Lecture
      28 Hour(s)
    • Exercises
      10 Hour(s)
    • Practical work
      18 Hour(s)

Reference week

      Exercise, TP
      Project, other


  • Autumn semester
  • Winter sessions
  • Spring semester
  • Summer sessions
  • Lecture in French
  • Lecture in English
  • Lecture in German