Coursebooks 2017-2018


Data Analysis for management research Module II


Lecturer(s) :

Younge Kenneth




19, 21, 26, 28 sept, 3, 5, 10 oct, 12, 19, 22 dec 2017 from 09:15 to 12:00 except on Dec. 19 from 08:15-12:00


The objective of the course is to introduce doctoral students to computational methods for data-driven management research.


The course complements courses in statistics and econometrics with a bottom-up, programmatic understanding of how to acquire, store, manipulate, measure, plot, analyze and classify data for research ends. It is intended to be hands-on ' every student is expected to apply the tools, methods, and ideas from the course to solve real research problems in their own domain. The course begins with an accelerated series of lectures to cover the methods, and then students use those methods on an extended project.

The course requires students to program in Python. The basics of the Python language and the Python scientific stack are reviewed during the first week of class, but students who are unfamiliar with Python should take an online Python Programming course to learn the basics. We recommend taking the course before the start of this course, but students may also take such a course any time before the 4th lecture.

The instructor understands that students will enter the course with (perhaps vastly) different levels of computer programming experience. Each student therefore will be evaluated by the professor based on their own dedication and progress within the course (i.e., 'within-student' 'across time'), but not necessarily between fellow classmates.


Data Processing, Visualization, Cloud Computing, Data Analysis, Text Analysis, Simulation, Machine Learning. 

Assessment methods

There are no written exams for this course. Students will be evaluated based on the components below.

Class Participation (24 points)

Physical attendance and participation in the course are mandatory. If you miss a class, you will be required to complete a makeup mini-project on the missed topic. Failure to do so to the satisfaction of the instructor will result in losing 3 points for the session, plus a penalty of another 3 points for the absence -- so a total loss of  6 points from the course. Please come to class with questions about the previous material so we can work on it together. You are also expected to follow along and participate in the class.

Exercises (24 points)

The professor will provide you with a list of online resources and exercises at the end of each session. You are expected to engage in self-directed learning using the resources and to work on the exercises after each session, and before the next session. Some of the topics/exercises may be too easy for you, and other topics/exercises may be too hard ' you should select appropriate ones based on your experience level. In general, however, you should dedicate at least 3 hours after each session to reviewing the material and completing exercises. Expectations for your work will be adjusted based on your proficiency as a programmer and familiarity with the concepts covered in the lecture.

You should start a new Jupyter Notebook for each session/exercises and commit it to the git repository assigned to you at I will provide more instructions and help on how to do this in the first class. Exercises are due before the start of the next class. No late exercises will be accepted.

Name exercise notebooks as:     1-lastname.ipynb   2-lastname.ipynb   etc.

Final Project (38 points)

The main deliverable for the course will be a research project of your choosing. The project must be a project of substantial academic interest, and it should be applicable to your field of study and research interests. Ideally, the experience you gain on the project will help you advance your dissertation. Projects must be committed to your git repo, hosted on, for evaluation. The repository should have a README.MD file (in markdown) to explain the project and how your program/data fit together. The README.MD file will be the final 'report' for your project, so take it seriously. All code must be fully documented and executable, with links to real data. If the data is large, then your main program should be stored outside of the repository and your code should use a tool such as  wget  or  curl  to download the data into the local directory; alternatively, you can host the data on a SQL server, Big Query, etc. In any event, your program(s) must be able to immediately run remotely on its own so it can be fully evaluated. If projects are not ready for submission by the due date, then you may continue to work on the project with a penalty of 3 points per day.  

Final Presentation (15 points)

As a doctoral student, it is important to learning how to summarize and communicate your findings. Therefore, an important component of the course therefore is to use the tools and methods covered in the course to arrive at real, substantive findings -- and to then present those findings to the class. You should prepare a 15-minute final presentation for your project, and you should then be prepared to field 5 to 10 additional minutes of Q&A (depending on the size of the class). Q&A will come from the professor and other students; we will grill you on the data, tools, and methods that you used in your project. Excellent presentations will anticipate questions and have appropriate 'backup slides' to answer probable questions. In general, I recommend that you spend a substantial amount of time on 'descriptive analysis' to demonstrate that you really understand the nature and limitations of your data. Powerful descriptive and graphical evidence is a hallmark of students well-trained in Data Science.

In the programs

Reference week

      Exercise, TP
      Project, other


  • Autumn semester
  • Winter sessions
  • Spring semester
  • Summer sessions
  • Lecture in French
  • Lecture in English
  • Lecture in German