MGT-499 / 4 crédits

Enseignant(s): Thurm Boris, Gallea Quentin, Chiarotti Edoardo

Langue: Anglais

Withdrawal: It is not allowed to withdraw from this subject after the registration deadline.

Remark: Courses given on UNIL Campus.

## Summary

This class provides a hands-on introduction to statistics and data science, with a focus on causal inference and applications to sustainability issues using Python.

## Keywords

Data Science, Statistics, Econometrics, Causal Inference, Regression, Python

## Recommended courses

• Analysis
• Algebra
• Probability and statistics
• Econometrics
• Introduction to Python

## Important concepts to start the course

• Basic probability and statistics knowledge (random variable, expectation, mean, conditional and joint distribution, independence, Bayes' rule, central limit theorem)
• Basic linear algebra (matrix multiplication, system of linear equations)
• Multivariate calculus (derivative w.r.t. vector and matrix variables)
• Basic programming skills (labs will use Python, basic knowledge will help)

## Learning Outcomes

By the end of the course, the student must be able to:

• Describe the main pitfalls behind data analysis
• Investigate dataset, and the problems and bias behind the data
• Explore and clean datasets
• Visualize datasets
• Decide which statistical/econometrics methods to use for a given problem
• Implement these methods in Python
• Estimate model parameters from empirical observations and confidence bounds
• Test hypothesis

## Transversal skills

• Plan and carry out activities in a way which makes optimal use of available time and other resources.
• Demonstrate the capacity for critical thinking
• Use a work methodology appropriate to the task.
• Access and evaluate appropriate sources of information.

## Teaching methods

• Lectures
• Exercice sessions: coding lab sessions
• Group projects

## Expected student activities

The students are expected to:

• attend and actively participate in lectures and lab sessions
• work on the weekly theory and coding exercises
• collaborate on group projects making use of the theory learned during lectures and code developed during lab sessions (graded)

## Assessment methods

• Assignments: 30% (personnal)
• Group projects: 70%

## Supervision

 Office hours No Assistants Yes Forum No Others Slack channel

No

## Bibliography

• [not mandatory] Mostly Harmless Econometrics, by Angrist, Josh and Steve Pischke (2008), Princeton University Press, EPFL library
• [not mandatory] Python Data Science Handbook: Essential Tools for Working with Data, by Jake VanderPlas (2016), O'REILLY, EPFL library
• [not mandatory] Introduction to Computation and Programming Using Python, Revised And Expanded Edition, by John V. Guttag (2013), The MIT Press, MIT Press
• [not mandatory] A Primier on Scientific Programming with Python, by Hans Petter Langtangen (2016), Springer, Springer Link

## Notes/Handbook

Slides will be made available on a Moodle page. Notebooks will be made available in a GitHub repository.

## Prerequisite for

Data Science and Machine Learning (MGT-502)

## Dans les plans d'études

• Semestre: Automne
• Nombre de places: 50
• Forme de l'examen: Pendant le semestre (session d'hiver)
• Matière examinée: Statistics and data science
• Cours: 2 Heure(s) hebdo x 14 semaines
• Exercices: 2 Heure(s) hebdo x 14 semaines

## Semaine de référence

 Lu Ma Me Je Ve 8-9 9-10 10-11 11-12 12-13 13-14 14-15 15-16 16-17 17-18 18-19 19-20 20-21 21-22

Mercredi, 12h - 14h: Cours

Mercredi, 14h - 16h: Exercice, TP