CS-503 / 6 credits

Teacher: Roshan Zamir Amir

Language: English


Summary

This course covers both classic concepts and recent advances in computer vision and machine learning for processing visual data -- with a primary focus on embodied intelligence and multimodal learning

Content

Visual perception is the capability of inferring the properties of the external world merely from the light reflected off the objects therein. This is done beautifully well by simple (e.g., mosquitoes) or complex (e.g., humans) biological organisms. They can see and understand the complex environment around them and act accordingly -- all done in an efficient and astonishingly robust way. Computer vision is the discipline of replicating this capability for machines. Despite a remarkable progress in the past few years, a large gap to sophisticated perceptual capabilities, such as those exhibited by animals, remains.

 

The goal of this course is to discuss what is possible in computer vision today and what is not. We will overview the basic concepts in computer vision and recent advances in machine learning relevant to processing visual data, multimodal learning, and active perception. For inspiration around the missing capabilities and how to approach them, we will turn to visual perception in biological organisms.

 

The course includes lectures, homeworks, and projects. There will be a heavy emphasis on the projects and hands-on experience. The homework tasks will focus on key tools and concepts in ML, including Transformers, LLMs, multimodal foundation models, and perceptual simulation environments

 

The course project will be around designing, implementing, and testing a solution to a (preferably open) problem pertinent to visual perception. The students are encouraged to work in groups, propose a project that interests them, and pursue ambitious yet feasible goals. The course staff will provide support throughout the semester with the projects. In the lectures, the students will learn about the principles of computer vision and multimodal learning, the current limits, and the visual perception in humans and animals, which will help them with formulating their course projects. In particular, the lectures will discuss the following:

 

  1. An overview of basic computer vision concepts: classification, detection, grouping, image transformations, optical flow, 3D from X, etc., and recent neural network architectures, such as Transformers.
  2. Psychology/physiology of the visual system.
  3. Multimodal leanring and multimodal foundation models.
  4. Perception-action loop: active perception and embodied vision.

 

The course interests masters/PhD students interested in research in computer vision, machine learning, and perceptual robotics, as well as senior undergraduate students interested in understanding state-of-the-art computer vision.

Keywords

Computer vision, Machine learning, Embodied intelligence, Multimodal learning, Robotics, Neural networks, AI.

Learning Prerequisites

Required courses

  • Machine Learning (CS-433) or Introduction to Machine Learning (CS-233) or equivalent course on the basics of machine learning.
  • Deep Learning (EE-559) or Artificial Neural Networks (CS-456) or equivalent course on the basics of deep learning.

Recommended courses

  • Computer vision (CS-442) or equivalent undergraduate/masters course on the basics of computer.

Important concepts to start the course

  • Deep learning and machine learning.
  • Python programming.
  • Basics of probability and statistics.
  • Familiarity with RL, for the students who pick projects that involve RL.

Learning Outcomes

By the end of the course, the student must be able to:

  • Define basic concepts in computer vision, such as detection, segmentation, 3D from X, as covered in the lectures.
  • Explain the range of theories in psychology around visual perception, covered in the lectures.
  • Design and implement computer vision/machine learning algorithms and foundation models to address problems with real-world complexity.
  • Design and implement proper evaluation pipelines for computer vision/machine learning algorithms to assess their performance in the real-world.
  • Assess the limits and performance pitfalls of a given computer vision/machine learning algorithm, especially when facing real-world complexity

Transversal skills

  • Write a scientific or technical report.
  • Make an oral presentation.
  • Assess progress against the plan, and adapt the plan as appropriate.
  • Demonstrate the capacity for critical thinking

Teaching methods

Lectures. Programming notebooks. Lab sessions. Project Tutoring. Course Project.

Expected student activities

  • In regard to the lectured material, the students are expected to study the provided reading material, actively participate in the class, engage in the discussions, and answer homework questions.
  • For the programming homework, students are expected to complete the provided Python notebook assignments.
  • In regard to the course project, the students are expected to formulate and implement an in-depth project, demonstrate continuous progress throughout the semester, and provide a final written report and presentation.

Assessment methods

  • Project (60%) [distributed over the project proposal, milestone reports, final report and presentation]
  • Homeworks (40%)

Supervision

Office hours Yes
Assistants Yes
Forum Yes

Resources

Bibliography

  • Vision Science: Photons to Phenomenology, Steven Palmer, 1999.
  • The Ecological Approach to Visual Perception, Jame Gibson, 1979.
  • Computer Vision: Algorithms and Applications, Richard Szeliski, 2020.
  • Animal Eyes, Michael Land and Dan-Eric Nilsson, 2012.

Ressources en bibliothèque

Notes/Handbook

The reference reading of different lectures will be from different books (the main ones listed above) and occasionally from papers. Resources will be provided in class. Full-text books are not mandatory.

Moodle Link

In the programs

  • Semester: Spring
  • Exam form: During the semester (summer session)
  • Subject examined: Visual intelligence : machines and minds
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: optional
  • Semester: Spring
  • Exam form: During the semester (summer session)
  • Subject examined: Visual intelligence : machines and minds
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: optional
  • Semester: Spring
  • Exam form: During the semester (summer session)
  • Subject examined: Visual intelligence : machines and minds
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: optional
  • Semester: Spring
  • Exam form: During the semester (summer session)
  • Subject examined: Visual intelligence : machines and minds
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: optional
  • Semester: Spring
  • Exam form: During the semester (summer session)
  • Subject examined: Visual intelligence : machines and minds
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: optional
  • Semester: Spring
  • Exam form: During the semester (summer session)
  • Subject examined: Visual intelligence : machines and minds
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: optional
  • Semester: Spring
  • Exam form: During the semester (summer session)
  • Subject examined: Visual intelligence : machines and minds
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: optional
  • Semester: Spring
  • Exam form: During the semester (summer session)
  • Subject examined: Visual intelligence : machines and minds
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: optional
  • Semester: Spring
  • Exam form: During the semester (summer session)
  • Subject examined: Visual intelligence : machines and minds
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: optional
  • Semester: Spring
  • Exam form: During the semester (summer session)
  • Subject examined: Visual intelligence : machines and minds
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: optional
  • Exam form: During the semester (summer session)
  • Subject examined: Visual intelligence : machines and minds
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: mandatory
  • Semester: Spring
  • Exam form: During the semester (summer session)
  • Subject examined: Visual intelligence : machines and minds
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: optional
  • Semester: Spring
  • Exam form: During the semester (summer session)
  • Subject examined: Visual intelligence : machines and minds
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: optional

Reference week

Tuesday, 17h - 18h: Exercise, TP INM200

Tuesday, 18h - 19h: Project, labs, other INM200

Thursday, 16h - 18h: Lecture INM202

Related courses

Results from graphsearch.epfl.ch.