Systems for data science
Summary
The course covers fundamental principles for understanding and building systems for managing and analyzing large amounts of data.
Content
Big data systems design and implementation :
- Distributed systems for data science
- Data management : locality, accesses, partitioning, replication
- Distributed Machine Learning Systems : federated learning/parameter server/decentralized learning
- Massively parallel processing operations
Large-scale storage systems :
- Data structures : File systems, Key-value stores, DBMS
- Consistency models. The CAP theorem. NoSQL and NewSQL systems
- Transactions
Large-scale processing :
- Parallel processing
- Streaming Processing
- Online Processing
- Graph Processing
Keywords
Distributed systems, Parallel programming, Large-scale storage systems, Large-scale data management
Learning Prerequisites
Recommended courses
CS-322 Introduction to database systems
CS-323: Introduction to operating systems
CS-206 Parallelism and concurrency
Important concepts to start the course
- Algorithms and data structures.
- Scala and/or Java programming languages will be used throughout the course. Programming experience in one of these languages is strongly recommended.
- Basic knowledge or computer networking and distributed systems
Learning Outcomes
By the end of the course, the student must be able to:
- Choose systems parameters, data layouts, and application designs for database systems and applications.
- Develop data-parallel analytics programs that make use of modern clusters and cloud offerings to scale up to very large workloads.
- Analyze the trade-offs between various approaches to large-scala data management and analytics, depending on efficiency, scalability, and latency needs
- Choose the most appropriate existing systems architecture and technology for a task
Teaching methods
Lectures, exercisesand practical work
Expected student activities
During the semester, the students are expected to:
- attend the lectures in order to ask questions and interact with the professor,
- attend the exercises session to solve and discuss exercises,
- solve practical homeworks and/or finish a project during the semester,
- take the exams during the semester
Assessment methods
Homeworks, written examinations, project. Continuous control
Supervision
Others | Office ours by appointment |
Resources
Bibliography
Relevant resources (textbook chapters, articles, and videos) posted on moodle page.
Dans les plans d'études
- Semestre: Printemps
- Forme de l'examen: Pendant le semestre (session d'été)
- Matière examinée: Systems for data science
- Cours: 2 Heure(s) hebdo x 14 semaines
- Exercices: 2 Heure(s) hebdo x 14 semaines
- Projet: 2 Heure(s) hebdo x 14 semaines
- Semestre: Printemps
- Forme de l'examen: Pendant le semestre (session d'été)
- Matière examinée: Systems for data science
- Cours: 2 Heure(s) hebdo x 14 semaines
- Exercices: 2 Heure(s) hebdo x 14 semaines
- Projet: 2 Heure(s) hebdo x 14 semaines
- Semestre: Printemps
- Forme de l'examen: Pendant le semestre (session d'été)
- Matière examinée: Systems for data science
- Cours: 2 Heure(s) hebdo x 14 semaines
- Exercices: 2 Heure(s) hebdo x 14 semaines
- Projet: 2 Heure(s) hebdo x 14 semaines
- Semestre: Printemps
- Forme de l'examen: Pendant le semestre (session d'été)
- Matière examinée: Systems for data science
- Cours: 2 Heure(s) hebdo x 14 semaines
- Exercices: 2 Heure(s) hebdo x 14 semaines
- Projet: 2 Heure(s) hebdo x 14 semaines
- Forme de l'examen: Pendant le semestre (session d'été)
- Matière examinée: Systems for data science
- Cours: 2 Heure(s) hebdo x 14 semaines
- Exercices: 2 Heure(s) hebdo x 14 semaines
- Projet: 2 Heure(s) hebdo x 14 semaines
- Semestre: Printemps
- Forme de l'examen: Pendant le semestre (session d'été)
- Matière examinée: Systems for data science
- Cours: 2 Heure(s) hebdo x 14 semaines
- Exercices: 2 Heure(s) hebdo x 14 semaines
- Projet: 2 Heure(s) hebdo x 14 semaines
Semaine de référence
Lu | Ma | Me | Je | Ve | |
8-9 | |||||
9-10 | |||||
10-11 | |||||
11-12 | |||||
12-13 | |||||
13-14 | |||||
14-15 | |||||
15-16 | |||||
16-17 | |||||
17-18 | |||||
18-19 | |||||
19-20 | |||||
20-21 | |||||
21-22 |
Légendes:
Cours
Exercice, TP
Projet, autre