COM-490 / 6 crédits

Enseignant(s): Bouillet Eric Pierre, Delgado Borda Pamela Isabel, Sarni Sofiane, Verscheure Olivier

Langue: Anglais

Withdrawal: It is not allowed to withdraw from this subject after the registration deadline.


Summary

This hands-on course teaches the tools & methods used by data scientists, from researching solutions to scaling up prototypes to Spark clusters. It exposes the students to the entire data science pipeline, from data acquisition to extracting valuable insights applied to real-world problems.

Content

Keywords

Data Science, IoT, Machine Learning, Predictive Modeling, Big Data, Stream Processing, Apache Spark, Hadoop,
Large-Scale Data Analysis

Learning Prerequisites

Required courses

Students must have prior experience with Python

Recommended courses

Students must have prior experience with at least one general-purpose programming language.

Important concepts to start the course

It is recommended that students familiarize themselves with concepts in statistics and standard methods in
machine learning.

Learning Outcomes

By the end of the course, the student must be able to:

  • Use standard Big Data tools and Data Science librairies
  • Carry out out real-world projects with a variety of real datasets, both at rest and in motion
  • Design large scale data science and engineering problems
  • Present tangible solution to a real-world Data Science problem

Transversal skills

  • Demonstrate a capacity for creativity.
  • Plan and carry out activities in a way which makes optimal use of available time and other resources.
  • Write a scientific or technical report.

Teaching methods

  • Hands-on lab sessions
  • Homework assignments
  • Final project

... using real-world datasets and Cloud Compute & Storage Services

Expected student activities

  • STUDY : Attend the lab sessions
  • WORK : Complete homework assignments
  • ENGAGE : Contribute to the ineractive nature of the class
  • COLLABORATE : Work in small groups to provide solutions to real-world problems
  • EXPLAIN : Present ideas and results to the class

Assessment methods

  • 60% continuous assessment during the semester
  • 40% final project, done is small groups

Supervision

Office hours Yes
Assistants Yes
Forum Yes

Resources

Virtual desktop infrastructure (VDI)

No

Bibliography

  • Python Data Science Handbook: Essential Tools for Working with Data by Jake VanderPlas, O'Reilly Media, November 2016
  • pyGAM - https://github.com/dswah/pyGAM

A list of additional readings will be distributed at the beginning of the course

Websites

Moodle Link

Dans les plans d'études

  • Semestre: Printemps
  • Forme de l'examen: Pendant le semestre (session d'été)
  • Matière examinée: Large-scale data science for real-world data
  • TP: 4 Heure(s) hebdo x 14 semaines
  • Semestre: Printemps
  • Forme de l'examen: Pendant le semestre (session d'été)
  • Matière examinée: Large-scale data science for real-world data
  • TP: 4 Heure(s) hebdo x 14 semaines
  • Semestre: Printemps
  • Forme de l'examen: Pendant le semestre (session d'été)
  • Matière examinée: Large-scale data science for real-world data
  • TP: 4 Heure(s) hebdo x 14 semaines
  • Semestre: Printemps
  • Forme de l'examen: Pendant le semestre (session d'été)
  • Matière examinée: Large-scale data science for real-world data
  • TP: 4 Heure(s) hebdo x 14 semaines
  • Semestre: Printemps
  • Forme de l'examen: Pendant le semestre (session d'été)
  • Matière examinée: Large-scale data science for real-world data
  • TP: 4 Heure(s) hebdo x 14 semaines

Semaine de référence

 LuMaMeJeVe
8-9     
9-10     
10-11     
11-12     
12-13     
13-14  INF1  
14-15    
15-16    
16-17     
17-18     
18-19     
19-20     
20-21     
21-22     

Mercredi, 13h - 16h: Exercice, TP INF1

Cours connexes

Résultats de graphsearch.epfl.ch.