Fiches de cours

Randomness and information in biological data

BIO-369

Enseignant(s) :

Bitbol Anne-Florence Raphaëlle

Langue:

English

Summary

Biology is becoming more and more a data science, as illustrated by the explosion of available genome sequences. This course aims to show how we can make sense of such data and harness it in order to understand biological processes in a quantitative way.

Content

Recently, biology has become more and more a data science. For instance, progress in sequencing has caused an explosion of available genome sequences. How can we make sense of such data and harness it in order to understand biological processes in a quantitative way? In many cases, biological data can be understood as being sampled from distributions of random variables. This course will first show the importance of randomness in biology. Then it will introduce some ways of extracting information from biological data, and a simple method to approximately infer the probability distributions underlying biological data. Some notions of statistics, information theory and statistical physics will be introduced, always with concrete applications to biological data in mind. Exercises and numerical projects will allow students to apply the methods to real biological data. The course will be organized as follows:

Part I: Randomness in biological processes and biological data

  1. Randomness and random variables. Luria-Delbrück experiment.
  2. Importance of thermal fluctuations at the cellular scale. Biopolymers, biomembranes, chemical reactions.
  3. Random walks. Population genetics. Examples of other applications (foraging, transcription factors binding to DNA).

Part II: Extracting information from biological data

  1. Quantifying randomness in data: entropy. Link to thermodynamics. Entropy in neuroscience data.
  2. Quantifying statistical dependence: correlation; mutual information. Applications to neuroscience and to sequencing data.
  3. Inferring probability distributions from data: introduction to maximum entropy inference. Prediction of protein structure from multiple sequence alignments.
  4. Finding relevant dimensions in data: dimensionality reduction methods (linear and nonlinear).

 

Keywords

Biological data, data science, sequencing data, neuroscience, population genetics, random variable, random walk, information theory, statistical physics, entropy, mutual information, inference, dimensionality reduction.

Learning Prerequisites

Required courses

Analysis; probability and statistics; linear algebra; general physics; programming.

Recommended courses

Introductory machine learning.

Learning Outcomes

By the end of the course, the student must be able to:

Teaching methods

Lectures, exercises, programming labs.

Assessment methods

Written final exam during the exam session, graded numerical mini-project.

Resources

Bibliography

Reference textbooks:

More advanced textbooks:

Dans les plans d'études

Semaine de référence

 LuMaMeJeVe
8-9     
9-10     
10-11     
11-12     
12-13     
13-14     
14-15     
15-16     
16-17     
17-18     
18-19     
19-20     
20-21     
21-22     
En construction
 
      Cours
      Exercice, TP
      Projet, autre

légende

  • Semestre d'automne
  • Session d'hiver
  • Semestre de printemps
  • Session d'été
  • Cours en français
  • Cours en anglais
  • Cours en allemand