MGT-499 / 4 crédits
Withdrawal: It is not allowed to withdraw from this subject after the registration deadline.
Remark: Courses given on UNIL Campus.
This class provides a hands-on introduction to statistics and data science, with a focus on causal inference and applications to sustainability issues using Python.
- Exploratory Data Analysis: Data acquisition and cleaning; Descriptive Statistics; Data Visualization; Data Ethics, Bias, and Fairness
- Causal Inference: Linear Regression; Fixed effects; Non-linear Regression; Randomized Control Trial; Regression Discontinuity Design; Difference-in-Differences; Instrumental Variables
- Applications in Python
Data Science, Statistics, Econometrics, Causal Inference, Regression, Python
- Probability and statistics
- Introduction to Python
Important concepts to start the course
- Basic probability and statistics knowledge (random variable, expectation, mean, conditional and joint distribution, independence, Bayes' rule, central limit theorem)
- Basic linear algebra (matrix multiplication, system of linear equations)
- Multivariate calculus (derivative w.r.t. vector and matrix variables)
- Basic programming skills (labs will use Python, basic knowledge will help)
By the end of the course, the student must be able to:
- Describe the main pitfalls behind data analysis
- Investigate dataset, and the problems and bias behind the data
- Explore and clean datasets
- Visualize datasets
- Decide which statistical/econometrics methods to use for a given problem
- Implement these methods in Python
- Estimate model parameters from empirical observations and confidence bounds
- Test hypothesis
- Plan and carry out activities in a way which makes optimal use of available time and other resources.
- Demonstrate the capacity for critical thinking
- Use a work methodology appropriate to the task.
- Access and evaluate appropriate sources of information.
- Exercice sessions: coding lab sessions
- Group projects
Expected student activities
The students are expected to:
- attend and actively participate in lectures and lab sessions
- work on the weekly theory and coding exercises
- complete assignement (graded)
- collaborate on group projects making use of the theory learned during lectures and code developed during lab sessions (graded)
- Assignments: 30% (personnal)
- Group projects: 70%
Virtual desktop infrastructure (VDI)
- [not mandatory] Mostly Harmless Econometrics, by Angrist, Josh and Steve Pischke (2008), Princeton University Press, EPFL library
- [not mandatory] Python Data Science Handbook: Essential Tools for Working with Data, by Jake VanderPlas (2016), O'REILLY, EPFL library
- [not mandatory] Introduction to Computation and Programming Using Python, Revised And Expanded Edition, by John V. Guttag (2013), The MIT Press, MIT Press
- [not mandatory] A Primier on Scientific Programming with Python, by Hans Petter Langtangen (2016), Springer, Springer Link
Ressources en bibliothèque
- Mostly Harmless Econometrics / Angrist
- Python Data Science Handbook / VanderPlas
- Introduction to Computation and Programming Using Pytho / Guttag
- A Primer on Scientific Programming with Python / Langtangen
Slides will be made available on a Moodle page. Notebooks will be made available in a GitHub repository.
Data Science and Machine Learning (MGT-502)
Dans les plans d'études
- Semestre: Automne
- Nombre de places: 50
- Forme de l'examen: Pendant le semestre (session d'hiver)
- Matière examinée: Statistics and data science
- Cours: 2 Heure(s) hebdo x 14 semaines
- Exercices: 2 Heure(s) hebdo x 14 semaines
Semaine de référence