BIOENG-210 / 4 credits

Teacher: La Manno Gioele

Language: English


Summary

Processing, analyzing, and interpreting large biological datasets is an essential skill for modern biologists. This course aims to provide the theoretical foundations, analytical techniques, and software tools necessary to effectively manage and derive insights from complex biological data.

Content

Biological data types
Probability Distributions in Biology data
Maximum Likelihood estimators for Univariate and Bivariate Distributions
Statistical tests
Multivariate data analysis
Multivariate Linear Regression
Principal Component Analysis (PCA)
Clustering
Priors, Bayes, and Maximum a Posteriori Estimation
Advanced Linear Regression
Logistic Regression and Classification
Model Selection
Resampling and Simulations
Time series and 1D Signal Processing
ND-image Processing
Generative Models and MCMC

Keywords

Biological data, statistical learning, probability distributions, maximum likelihood estimation, multivariate analysis, multivariate normal, PCA, SVD, multivariate regression, classification, Bayesian inference, time series, image processing, resampling methods, MCMC.

 

Learning Prerequisites

Required courses

Analysis, Linear Algebra, Probability and Statistics

 

Learning Outcomes

By the end of the course, the student must be able to:

  • Analyze multidimensional biological data
  • Apply regression and classification models
  • Perform model selection
  • Use PCA and interpret it
  • Visualize multivariate data
  • Explore different types of biological data
  • Implement basic routines of ML and MAP estimation
  • Choose the most appropriate model to specific situation
  • Plan an analysis end-to-end
  • Interpret statistical tests and posterior distributions

Teaching methods

Lectures and excercises

Assessment methods

Written examination at the exam session (70%) and graded exercises (30%).

Supervision

Office hours No
Assistants Yes
Forum Yes

Resources

Virtual desktop infrastructure (VDI)

No

Bibliography

Main:
"Elements of Statistical Learning" by Trevor Hastie, Robert Tibshirani, and Jerome Friedman

"Methods of Multivariate Analysis" By Alvin C. Rencher

Other resources:

"Computer Age Statistical Inference" by Bradley Efron and Trever Hastie

"Data-Driven Science and Engineering" by Steven L. Brunton and J. Nathan Kutz

 

Ressources en bibliothèque

Notes/Handbook

Course notes in pdf format

Moodle Link

In the programs

  • Semester: Spring
  • Exam form: Written (summer session)
  • Subject examined: Biological data science I: statistical learning
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 2 Hour(s) per week x 14 weeks
  • Type: mandatory

Reference week

Monday, 13h - 15h: Lecture CM2

Thursday, 14h - 16h: Exercise, TP CE16

Related courses

Results from graphsearch.epfl.ch.