Computational perception using multimodal sensors


Lecturer(s) :

Aran Oya
Odobez Jean-Marc




The course will cover perceptual modalities in computers, models for analyzing people (representation, detection an localization, segmentation, tracking, recognition).


1. Perceptual modalities in computers. Vision, hearing, touch, smell. basic fusion principles.
2. Models for analyzing people. introduction to probabilistic graphical models. Basic concepts. Bayesian Networks (BNs). Learning and inference in BNs. Dynamic Bayesian Networks (DBNs). Exact and approximate inference. Examples.
3. Analyzing people. fundamental tasks.
a. Representation. The problem of representation in computational perception. Global vs. local representations. Visual models for faces, heads, hands, and full-bodies (shape/appearance, exemplars, geometric models). Models and features for speech and audio processing.
b. Detection and localization. Basic concepts. Detection as binary classification and as random sampling. Visual localization: skin color modeling, face localization. Audio localization: microphone arrays. Audio-visual fusion for speaker detection.
c. Segmentation. Basic concepts. Visual segmentation: background subtraction. Audio segmentation: source separation, speaker turn segmentation, speaker clustering.
d. Tracking. State space representation. Dynamic modeling. Human motion modeling. Multi-person tracking. Visual, audio and multimodal tracking of people.
e. Recognition. Recognition tasks. Visual recognition: facial expressions, gestures, actions, interaction. Audio recognition: speech, emotion, multi-speaker events. Audio classification. Multimodal recognition: actions.


Artificial perception, human representation, multi-modalities, audio, video, probabilistic model, graphical models.

Learning Prerequisites

Recommended courses

Undergraduate-level knowledge of linear algebra, statistics, image and signal processing.

