CS-423 / 6 credits

Teacher: Aberer Karl

Language: English


Summary

This course introduces the foundations of information retrieval, data mining and knowledge bases, which constitute the foundations of today's Web-based distributed information systems.

Content

 

Information Retrieval

  1. Information Retrieval - Introduction
  2. Text-Based Information Retrieval (Boolean, Vector space, probabilistic)
  3. Inverted Files
  4. Distributed Retrieval
  5. Query Expansion
  6. Embedding models (LSI, word2vec)
  7. Link-Based Ranking

Mining Unstructured Data

  1. Document Classification (knn, Naive Bayes, Fasttext, Transformer models)
  2. Recommender Systems (collaborative filtering, matrix factorization)
  3. Mining Social Graphs (modularity clustering, Girvan-Newman)

Knowledge Bases

  1. Semantic Web
  2. Keyphrase extraction
  3. Named entity recognition
  4. Information extraction
  5. Taxonomy Induction
  6. Entity Disambiguation
  7. Label Propagation
  8. Link Prediction

 

Learning Prerequisites

Recommended courses

Introductory courses to databases and machine learning are helpful, but not required.

Programming skills in Python are helpful, but not required.

Learning Outcomes

By the end of the course, the student must be able to:

  • Characterize the main tasks performed by information systems, namely data, information and knowledge management
  • Apply collaborative information management models, like crowd-sourcing, recommender systems, social networks
  • Apply knowledge models, their representation through Web standards and algorithms for storing and processing semi-structured data
  • Apply fundamental models and techniques of text retrieval and their use in Web search engines
  • Apply main categories of data mining techniques, local rules, predictive and descriptive models, and master representative algorithms for each of the categories

Teaching methods

Ex cathedra + programming projects (Python)

Assessment methods

60% Continuous evaluations of projects with bonus system during the semester
40% Final written exam (180 min) during exam session

Resources

Moodle Link

In the programs

  • Semester: Fall
  • Exam form: Written (winter session)
  • Subject examined: Distributed information systems
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: optional
  • Semester: Fall
  • Exam form: Written (winter session)
  • Subject examined: Distributed information systems
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: optional
  • Semester: Fall
  • Exam form: Written (winter session)
  • Subject examined: Distributed information systems
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: optional
  • Semester: Fall
  • Exam form: Written (winter session)
  • Subject examined: Distributed information systems
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: optional
  • Semester: Fall
  • Exam form: Written (winter session)
  • Subject examined: Distributed information systems
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: optional
  • Semester: Fall
  • Exam form: Written (winter session)
  • Subject examined: Distributed information systems
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: optional
  • Semester: Fall
  • Exam form: Written (winter session)
  • Subject examined: Distributed information systems
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: optional
  • Semester: Fall
  • Exam form: Written (winter session)
  • Subject examined: Distributed information systems
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: optional
  • Semester: Fall
  • Exam form: Written (winter session)
  • Subject examined: Distributed information systems
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: optional
  • Semester: Fall
  • Exam form: Written (winter session)
  • Subject examined: Distributed information systems
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: optional
  • Semester: Fall
  • Exam form: Written (winter session)
  • Subject examined: Distributed information systems
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: optional
  • Semester: Fall
  • Exam form: Written (winter session)
  • Subject examined: Distributed information systems
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: optional
  • Semester: Fall
  • Exam form: Written (winter session)
  • Subject examined: Distributed information systems
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: optional
  • Semester: Fall
  • Exam form: Written (winter session)
  • Subject examined: Distributed information systems
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: optional
  • Exam form: Written (winter session)
  • Subject examined: Distributed information systems
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: optional
  • Semester: Fall
  • Exam form: Written (winter session)
  • Subject examined: Distributed information systems
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: optional
  • Exam form: Written (winter session)
  • Subject examined: Distributed information systems
  • Courses: 2 Hour(s) per week x 14 weeks
  • Exercises: 1 Hour(s) per week x 14 weeks
  • Project: 1 Hour(s) per week x 14 weeks
  • Type: optional

Reference week

Thursday, 10h - 12h: Lecture CM2

Thursday, 12h - 13h: Exercise, TP CM2
CM13
CM1105

Thursday, 13h - 14h: Project, labs, other CM2
CM1105

Related courses

Results from graphsearch.epfl.ch.