CS-460 / 8 crédits

Enseignant(s): Ailamaki Anastasia, Kermarrec Anne-Marie

Langue: Anglais


Summary

This course is intended for students who want to understand modern large-scale data analysis systems and database systems. The course covers fundamental principles for understanding and building systems for managing and analyzing large amounts of data. It covers a wide range of topics and technologi

Content

Learning Prerequisites

Recommended courses

  • CS-107 Introduction to programming
  • CS-206 Parallelism and concurrency
  • CS-322 Introduction to database systems
  • CS-323 Introduction to operating systems
  • CS-452 Foundations of software

Important concepts to start the course

  • Algorithms and data structures.
  • Scala and/or Java programming languages will be used throughout the course. Programming experience in one of these languages is strongly recommanded.
  • Basic knowledge or computer networking and distributed systems.

Learning Outcomes

By the end of the course, the student must be able to:

  • Understand how to design big data analytics systems using state-of-the-art infrastructures for horizontal scaling, e.g., Spark
  • Implement algorithms and data structures for streaming data analytics
  • Decide between different storage models based on the offered optimizations enabled by each modl and the expected query workload
  • Compare concurrency control algorithms, and algorithms for distributed data management
  • Configure systems parameters, data layouts, and application designs for database systems
  • Develop data-parallel analytics programs that make us of modern clusters and cloud offerings to scale up to very large workloads
  • Analyze the trade-offs between various approaches to large-scale data management and analytics, depending on efficiency, scalability, and latency needs

Teaching methods

Lectures, project, homework, exercises and practical work

Expected student activities

  • Attend lectures and participate in class
  • Complete a project as per the guidelines posted by the teaching team

Assessment methods

  • Project
  • Midterm (as needed)
  • Final exam

Supervision

Office hours Yes
Assistants Yes
Forum Yes

Resources

Bibliography

J. Hellerstein & M. Stonebraker, Readings in Database Systems, 4th Edition, 2005
R. Ramakrishnan & J. Gehrke: "Database Management Systems", McGraw-Hill, 3rd Edition,
2002.
A. Rajaraman & J. Ullman: "Mining of Massive Datasets", Cambridge Univ. Press, 2011.

Ressources en bibliothèque

Moodle Link

Dans les plans d'études

  • Semestre: Printemps
  • Forme de l'examen: Ecrit (session d'été)
  • Matière examinée: Systems for data management and data science
  • Cours: 2 Heure(s) hebdo x 14 semaines
  • Exercices: 2 Heure(s) hebdo x 14 semaines
  • TP: 2 Heure(s) hebdo x 14 semaines
  • Semestre: Printemps
  • Forme de l'examen: Ecrit (session d'été)
  • Matière examinée: Systems for data management and data science
  • Cours: 2 Heure(s) hebdo x 14 semaines
  • Exercices: 2 Heure(s) hebdo x 14 semaines
  • TP: 2 Heure(s) hebdo x 14 semaines
  • Semestre: Printemps
  • Forme de l'examen: Ecrit (session d'été)
  • Matière examinée: Systems for data management and data science
  • Cours: 2 Heure(s) hebdo x 14 semaines
  • Exercices: 2 Heure(s) hebdo x 14 semaines
  • TP: 2 Heure(s) hebdo x 14 semaines
  • Semestre: Printemps
  • Forme de l'examen: Ecrit (session d'été)
  • Matière examinée: Systems for data management and data science
  • Cours: 2 Heure(s) hebdo x 14 semaines
  • Exercices: 2 Heure(s) hebdo x 14 semaines
  • TP: 2 Heure(s) hebdo x 14 semaines
  • Semestre: Printemps
  • Forme de l'examen: Ecrit (session d'été)
  • Matière examinée: Systems for data management and data science
  • Cours: 2 Heure(s) hebdo x 14 semaines
  • Exercices: 2 Heure(s) hebdo x 14 semaines
  • TP: 2 Heure(s) hebdo x 14 semaines
  • Semestre: Printemps
  • Forme de l'examen: Ecrit (session d'été)
  • Matière examinée: Systems for data management and data science
  • Cours: 2 Heure(s) hebdo x 14 semaines
  • Exercices: 2 Heure(s) hebdo x 14 semaines
  • TP: 2 Heure(s) hebdo x 14 semaines
  • Semestre: Printemps
  • Forme de l'examen: Ecrit (session d'été)
  • Matière examinée: Systems for data management and data science
  • Cours: 2 Heure(s) hebdo x 14 semaines
  • Exercices: 2 Heure(s) hebdo x 14 semaines
  • TP: 2 Heure(s) hebdo x 14 semaines
  • Semestre: Printemps
  • Forme de l'examen: Ecrit (session d'été)
  • Matière examinée: Systems for data management and data science
  • Cours: 2 Heure(s) hebdo x 14 semaines
  • Exercices: 2 Heure(s) hebdo x 14 semaines
  • TP: 2 Heure(s) hebdo x 14 semaines
  • Semestre: Printemps
  • Forme de l'examen: Ecrit (session d'été)
  • Matière examinée: Systems for data management and data science
  • Cours: 2 Heure(s) hebdo x 14 semaines
  • Exercices: 2 Heure(s) hebdo x 14 semaines
  • TP: 2 Heure(s) hebdo x 14 semaines
  • Semestre: Printemps
  • Forme de l'examen: Ecrit (session d'été)
  • Matière examinée: Systems for data management and data science
  • Cours: 2 Heure(s) hebdo x 14 semaines
  • Exercices: 2 Heure(s) hebdo x 14 semaines
  • TP: 2 Heure(s) hebdo x 14 semaines
  • Forme de l'examen: Ecrit (session d'été)
  • Matière examinée: Systems for data management and data science
  • Cours: 2 Heure(s) hebdo x 14 semaines
  • Exercices: 2 Heure(s) hebdo x 14 semaines
  • TP: 2 Heure(s) hebdo x 14 semaines
  • Forme de l'examen: Ecrit (session d'été)
  • Matière examinée: Systems for data management and data science
  • Cours: 2 Heure(s) hebdo x 14 semaines
  • Exercices: 2 Heure(s) hebdo x 14 semaines
  • TP: 2 Heure(s) hebdo x 14 semaines
  • Semestre: Printemps
  • Forme de l'examen: Ecrit (session d'été)
  • Matière examinée: Systems for data management and data science
  • Cours: 2 Heure(s) hebdo x 14 semaines
  • Exercices: 2 Heure(s) hebdo x 14 semaines
  • TP: 2 Heure(s) hebdo x 14 semaines
  • Semestre: Printemps
  • Forme de l'examen: Ecrit (session d'été)
  • Matière examinée: Systems for data management and data science
  • Cours: 2 Heure(s) hebdo x 14 semaines
  • Exercices: 2 Heure(s) hebdo x 14 semaines
  • TP: 2 Heure(s) hebdo x 14 semaines

Semaine de référence

 LuMaMeJeVe
8-9     
9-10     
10-11     
11-12     
12-13     
13-14     
14-15CM2 INM10
INJ218
  
15-16   
16-17CM2    
17-18    
18-19     
19-20     
20-21     
21-22     

Lundi, 14h - 16h: Cours CM2

Lundi, 16h - 18h: Exercice, TP CM2

Mercredi, 14h - 16h: Exercice, TP INM10
INJ218

Cours connexes

Résultats de graphsearch.epfl.ch.