Video-based action recognition refers to the task of analyzing a video to identify the actions taking place in it. This computer vision task has interesting practical applications in many fields, such as video surveillance, human-computer interaction, healthcare assistance. A video contains both spatial and temporal information that allow to gain additional information about the action taking place with respect to the information provided by a fixed image. However, extracting information from the video is challenging due to several factors, such as camera motions, scale variation, and so on. To overcome these difficulties, researchers have focused on skeleton-based action recognition, where key joint locations of human skeleton are considered. The interest in this type of problem has increased due to introduction of low cost cameras, such as Kinect, that can quickly provide such information. Several architectures for human action recognition are based on recurrent neural networks and graph-based convolutional neural networks. Recently, approaches that use attention-based mechanisms have been introduced. Transformer-based architectures represent the state-of-the-art in sequence modeling tasks like machine translation and language understanding. These networks have been used in the context of human action recognition.
The goal of this thesis is to study the recent advances in the field of human action recognition, with particular emphasis on the use of Transformers architectures and their applications to practical problems.
- Initial research on history and state-of-the-art models for human action recognition;
- Research on Transformers and how they can be applied to multi-modal contexts;
- Implementation of state of the art models for human action recognition;
- Application of Transformer-based human action recognition algorithms to a real problem.
Who we’re looking for
Students that are about to get their Master Degree in: computer science, computer engineering, mechatronic engineering, mathematical engineering, mathematics, physics, informatics.
- Proficiency in at least one programming language (Python, C++), Python is preferred;
- Basic knowledge of machine learning and Deep Learning (CNN, RNN) algorithms;
- Basic knowledge of one of these Deep Learning frameworks (Tensorflow, Pytorch)
- Good mathematical and analytical skills
Duration of this Projects: 6-8 months