PhD Position/Inria Grenoble

Multimodal perception of social and pedagogical classroom interactions using a privacy-safe non-individual approach


Keywords: Multimodal Perception, Deep Learning, Affective Computing, Behavioral Computing, Teaching Analytics.

Job application URL


Deadline for application: 2019-04-28


Job start: 2019-10-01


Summary: Recent research advances in multimodal perception and interaction with humans have led to huge achievements in multiple domains like computer vision (e.g., detecting faces or people from images), or speech recognition (e.g., large vocabulary, multi-speakers, recognition using smartphone), mainly thanks to Deep Learning. These achievements are impacting other research domains like affective computing (emotion or sentiment analysis), and behavioral modelling for human behavior detection and prediction.

In the Teaching Lab project (Idex Formation grant, Univ. Grenoble Alpes), we aim at developing a smart classroom pervasive system for providing delayed feedback about how teachers manage their instruction. This enables to help beginning teachers increase their awareness of the class while teaching. To do so, we need to capture cues to analyze teacher–students relationships: current teacher activity, current teaching episode, class engagement, students attention or engagement, class ambiance, etc. These cues will be computed using signal processing and machine learning techniques and we will rely on cognitive science background to interpret and to draw a multimodal model of classroom interactions.

However, some privacy et ethical issues arise from this analysis. The goal of the system is to analyze the underlying teaching processes (e.g., teacher–students interaction, misbehavior management, …), not to monitor individual behaviors per se, even if they are inadequate. The multimodal perception system will thus monitor the whole classroom at glance to help teachers enhance their instruction afterwards. Hence this system is not intended to detecting and tracking inattentive or disruptive students.

Most of the current state-of-the-art systems, notably Deep Learning systems, are focusing on humans as individuals, i.e. each individual is processed as one entity. For instance, to detect whether a photo carries a mood of happiness as a whole, systems try to accurately detect faces and then smiles on faces. Averaging the number of smiles on the photo leads to the decision: happy photo or not. Starting from these individual-based systems, we aim at creating and testing new multimodal models to capture global moods from whole classroom multi-view footages. The underlying idea is to get rid from individual analysis and to compute global scores for the class instead of counting on the sum of individual detections.

The research question addressed in this thesis is the following: can we analyze global cues about instructional episodes (like engagement, attentional level, etc.) from still image or video sequences coupled with acoustic features? Ways to address this research question are still open and will be questioned in the thesis (Deep fusion, Reinforcement learning, Generative Adversarial Networks, …). This work will be evaluated along two axes: standard performance evaluations for perception systems and pedagogical benefits of the generated feedback to teachers.

Context: The PhD thesis will be co-advised by Dominique Vaufreydaz (Pervasive team, LIG/Inria Univ. Grenoble Alpes, and Philippe Dessus (LaRAC laboratory, Univ. Grenoble Alpes, The Pervasive team ( has a long background in computer vision, multimodal perception, multimodal interaction and affective computing. The LaRAC ( is investigating learning and instruction at all school levels in an interdisciplinary way (cognitive science, social psychology).


Dernière mise à jour : 15 avril, 2019 - 16:50