MGT-499 / 4 credits

Teacher(s): Chiarotti Edoardo, Gallea Quentin, Thurm Boris

Language: English

Withdrawal: It is not allowed to withdraw from this subject after the registration deadline.

## Summary

This class provides a hands-on introduction to statistics and data science, with a focus on causal inference, applications to sustainability issues using Python, and dissemination of scientific results to a broad audience.

## Content

• Exploratory Data Analysis: Data acquisition and cleaning; Descriptive Statistics; Data Visualization; Data Ethics, Bias, and Fairness
• Causal Inference: Linear Regression; Fixed effects; Non-linear Regression; Randomized Control Trial; Regression Discontinuity Design; Difference-in-Differences; Instrumental Variables
• Applications in Python to sustainability issues

## Keywords

Data Science, Statistics, Econometrics, Causal Inference, Regression, Python, Sustainability, Scientific dissemination

## Recommended courses

• Analysis
• Algebra
• Probability and statistics
• Econometrics
• Introduction to Python

## Important concepts to start the course

• Basic probability and statistics knowledge (random variable, expectation, mean, conditional and joint distribution, independence, Bayes' rule, central limit theorem)
• Basic linear algebra (matrix multiplication, system of linear equations)
• Multivariate calculus (derivative w.r.t. vector and matrix variables)
• Basic programming skills (labs will use Python, basic knowledge will help)

## Learning Outcomes

By the end of the course, the student must be able to:

• Describe the main pitfalls behind data analysis
• Investigate dataset, and the problems and bias behind the data
• Explore and clean datasets
• Visualize datasets
• Decide which statistical/econometrics methods to use for a given problem
• Implement these methods in Python
• Estimate model parameters from empirical observations and confidence bounds
• Test hypothesis
• Synthesize their findings to a broad audience

## Transversal skills

• Plan and carry out activities in a way which makes optimal use of available time and other resources.
• Demonstrate the capacity for critical thinking
• Use a work methodology appropriate to the task.
• Access and evaluate appropriate sources of information.

## Teaching methods

• Lectures
• Exercice sessions: coding lab sessions
• Group projects

## Expected student activities

The students are expected to:

• attend and actively participate in lectures and lab sessions
• work on the weekly theory and coding exercises
• collaborate on group projects making use of the theory learned during lectures and code developed during lab sessions (graded)

## Assessment methods

The evaluation consists of one group project. Students will have to apply the data science and econometrics techniques learned during the class to causally answer a question related to sustainability. The grade is made of 2 deliverables:

• Mid-term project (20%): Students will have to submit a short deliverable to motivate their research question, present their exploratory data analysis, and discuss the potential issues they will face in their causal analysis;
• Final report (80%): Students will have to write a short article to present their work, targeting a broad audience.

## Supervision

 Office hours No Assistants Yes Forum Yes

No

## Bibliography

• [not mandatory] Mostly Harmless Econometrics, by Angrist, Josh and Steve Pischke (2008), Princeton University Press, EPFL library
• [not mandatory] Python Data Science Handbook: Essential Tools for Working with Data, by Jake VanderPlas (2016), O'REILLY, EPFL library
• [not mandatory] Introduction to Computation and Programming Using Python, Revised And Expanded Edition, by John V. Guttag (2013), The MIT Press, MIT Press
• [not mandatory] A Primier on Scientific Programming with Python, by Hans Petter Langtangen (2016), Springer, Springer Link

## Notes/Handbook

Slides will be made available on a Moodle page. Notebooks will be made available in a GitHub repository.

## Prerequisite for

Data Science and Machine Learning (MGT-502)

## In the programs

• Semester: Fall
• Number of places: 50
• Exam form: During the semester (winter session)
• Subject examined: Statistics and data science
• Lecture: 2 Hour(s) per week x 14 weeks
• Exercises: 2 Hour(s) per week x 14 weeks
• Type: mandatory

## Reference week

Wednesday, 13h - 15h: Lecture SG0213

Wednesday, 15h - 17h: Exercise, TP SG0213

## Related courses

Results from graphsearch.epfl.ch.