MGT-499 / 4 credits

Teacher(s): Thurm Boris, Gallea Quentin, Chiarotti Edoardo

Language: English

Withdrawal: It is not allowed to withdraw from this subject after the registration deadline.

Remark: Courses given on UNIL Campus.


Summary

This class provides a hands-on introduction to statistics and data science, with a focus on causal inference and applications to sustainability issues using Python.

Content

Keywords

Data Science, Statistics, Econometrics, Causal Inference, Regression, Python

Learning Prerequisites

Recommended courses

  • Analysis
  • Algebra
  • Probability and statistics
  • Econometrics
  • Introduction to Python

Important concepts to start the course

  • Basic probability and statistics knowledge (random variable, expectation, mean, conditional and joint distribution, independence, Bayes' rule, central limit theorem)
  • Basic linear algebra (matrix multiplication, system of linear equations)
  • Multivariate calculus (derivative w.r.t. vector and matrix variables)
  • Basic programming skills (labs will use Python, basic knowledge will help)

Learning Outcomes

By the end of the course, the student must be able to:

  • Describe the main pitfalls behind data analysis
  • Investigate dataset, and the problems and bias behind the data
  • Explore and clean datasets
  • Visualize datasets
  • Decide which statistical/econometrics methods to use for a given problem
  • Implement these methods in Python
  • Estimate model parameters from empirical observations and confidence bounds
  • Test hypothesis

Transversal skills

  • Plan and carry out activities in a way which makes optimal use of available time and other resources.
  • Demonstrate the capacity for critical thinking
  • Use a work methodology appropriate to the task.
  • Access and evaluate appropriate sources of information.

Teaching methods

  • Lectures
  • Exercice sessions: coding lab sessions
  • Group projects

Expected student activities

The students are expected to:

  • attend and actively participate in lectures and lab sessions
  • work on the weekly theory and coding exercises
  • complete assignement (graded)
  • collaborate on group projects making use of the theory learned during lectures and code developed during lab sessions (graded)

Assessment methods

  • Assignments: 30% (personnal)
  • Group projects: 70%

Supervision

Office hours No
Assistants Yes
Forum No
Others Slack channel

Resources

Virtual desktop infrastructure (VDI)

No

Bibliography

  • [not mandatory] Mostly Harmless Econometrics, by Angrist, Josh and Steve Pischke (2008), Princeton University Press, EPFL library
  • [not mandatory] Python Data Science Handbook: Essential Tools for Working with Data, by Jake VanderPlas (2016), O'REILLY, EPFL library
  • [not mandatory] Introduction to Computation and Programming Using Python, Revised And Expanded Edition, by John V. Guttag (2013), The MIT Press, MIT Press
  • [not mandatory] A Primier on Scientific Programming with Python, by Hans Petter Langtangen (2016), Springer, Springer Link

Ressources en bibliothèque

Notes/Handbook

Slides will be made available on a Moodle page. Notebooks will be made available in a GitHub repository.

Moodle Link

Prerequisite for

Data Science and Machine Learning (MGT-502)

In the programs

  • Semester: Fall
  • Number of places: 50
  • Exam form: During the semester (winter session)
  • Subject examined: Statistics and data science
  • Lecture: 2 Hour(s) per week x 14 weeks
  • Exercises: 2 Hour(s) per week x 14 weeks

Reference week

 MoTuWeThFr
8-9     
9-10     
10-11     
11-12     
12-13     
13-14    
14-15     
15-16    
16-17     
17-18     
18-19     
19-20     
20-21     
21-22     

Wednesday, 12h - 14h: Lecture

Wednesday, 14h - 16h: Exercise, TP