Lennart Eing M.Sc.

Research Assistant
Chair for Human-Centered Artificial Intelligence
Phone: +49 821 598 2346
Email:
Room: 2039 (N)
Open hours: upon request
Address: Universitätsstraße 6a, 86159 Augsburg

Forschungsinteressen

  • Multimodale neuronale Netze
  • Self-Supervised Learning
  • Deep Learning
  • Few-Shot Learning

Abschlussarbeiten

The given topics are not fixed. If you want to bring your own ideas, adapt the given ones, we can talk. Topics are given in English to accomodate for foreign students, everything can be submitted in either German or English.

 

Open Topics:

  • [BA/MA] Finetuning HaMeR using Self-Intersection Contraints: HaMeR is an approach for the reconstruction of 3D hand-meshes from monocular (image) input. It uses a series of pre-trained deep learning models to predict the transformation parameters for the MANO hand model. This is done by a combination of loss functions that enforce a correct reconstruction of a 3D mesh, and, in cases where no such annotations are available, a correct projection of the predicted hand shape onto a 2D plane.
    The constructed hand mesh is however able to intersect itself. As this is not physically possible, you will try and fine tune the existing HaMeR model with a self-intersection constraint, forcing the model to learn hand shapes that are physically possible. While HaMeR is already very good at predicting hand reconstructions, this could provide an additional performance boost to the system.

    Ressources:
     
    • HaMeR: https://geopavlakos.github.io/hamer/
    • MANO: https://mano.is.tue.mpg.de/
    • Self-Intersection Constraints: https://arxiv.org/pdf/1904.05866
       
  • [BA/MA] Finetuning HaMeR for Video using Volmetric Constraints: HaMeR is an approach for the reconstruction of 3D hand-meshes from monocular (image) input. It uses a series of pre-trained deep learning models to predict the transformation parameters for the MANO hand model. This is done by a combination of loss functions that enforce a correct reconstruction of a 3D mesh, and, in cases where no such annotations are available, a correct projection of the predicted hand shape onto a 2D plane.
    While HaMeR shows good results even on video prediction (it is only trained on images), this can be improved. Your goal in this work will be to finetune HaMeR on video with an additional volumetric constraint, as the volume of a hand should not change during short segments of video.

    Ressources:
     
    • HaMeR: https://geopavlakos.github.io/hamer/
    • MANO: https://mano.is.tue.mpg.de
       
  • [BA/MA] Finetuning HaMeR for Video using Physiological Constraints: HaMeR is an approach for the reconstruction of 3D hand-meshes from monocular (image) input. It uses a series of pre-trained deep learning models to predict the transformation parameters for the MANO hand model. This is done by a combination of loss functions that enforce a correct reconstruction of a 3D mesh, and, in cases where no such annotations are available, a correct projection of the predicted hand shape onto a 2D plane.
    While HaMeR shows good results even on video prediction (it is only trained on images), this can be improved. Your goal in this work will be to finetune HaMeR on video with an additional physiological constraint, i.e. that the length of hand bones are not allowed to change during short segments of video.

    Ressources:
     
    • HaMeR: https://geopavlakos.github.io/hamer/
    • MANO: https://mano.is.tue.mpg.de
       
  • [BA/MA/Seminar] Isolated Sign Language Recognition using MANO pose and shape parameters: Isolated Sign Language Recognition is a classification tasks where single words in a given sign language are predicted from video. One of the main problems working with sign language is that there is only little training data available. To counteract this problem, pre-processing steps are necessary to reduce the information in a given video into a smaller set of relevant information. While hands are by far not the only relevant features of sign language (face, body, mouth movements and other manipulators are also important) they do represent crucial information of most words.
    MANO is a parametric hand model, i.e. a 3D hand mesh model with a transformation function, that transforms the "base" hand and a set of parameters into a hand mesh. HaMeR is a deep neural network that can predict these transformation parameters from an input image for 3D hand mesh reconstruction.
    In this work you will implement and evaluate a system, that performs Isolated Sign Language Recognition on MANO parameters extracted using HaMeR.

    Resources:
     
    • HaMeR: https://geopavlakos.github.io/hamer/
    • MANO: https://mano.is.tue.mpg.de
    • Isolated Sign Language Recognition: https://tinyurl.com/mrxptkvs
       
  • [BA/MA/Seminar] Isolated Sign Language Recognition using SMPL pose and shape parameters: Isolated Sign Language Recognition is a classification tasks where single words in a given sign language are predicted from video. One of the main problems working with sign language is that there is only little training data available. To counteract this problem, pre-processing steps are necessary to reduce the information in a given video into a smaller set of relevant information. While the body hands is by far not the only relevant features of sign language (face, hands mouth movements and other manipulators are also important) they do represent crucial information of most words.
    SMPL is a parametrics body model, i.e. a 3D body mesh model with a transformation function, that transforms a "base" body and a set of parameters into a body mesh. SMPLer-X is a system that can predict these transformation parameters from an input image for 3D body mesh reconstruction.
    In this work you will implement and evaluate a system, that performs Isolated Sign Language Recognition on SMPL parameters extracted using SMPLer-X.

    Resources:
     
    • SMPLer-X: https://github.com/SMPLCap/SMPLer-X
    • SMPL: https://smpl.is.tue.mpg.de/
    • Isolated Sign Language Recognition: https://tinyurl.com/mrxptkvs
       
  • [BA/MA] Using Sapiens Features and Attentive Probing for Isloated Sign Language Recognition: Sapiens is a foundation model specialized on understanding human physiology. It was trained on millions of images of humans in a self-supervised manner using a Masked Autoencoder (MAE) approach. It performs very well on keypoint estimation, semantic segmentation, depth estimation, and normal estimation tasks. The features output by the backbone model can however be used for a great number of different tasks.
    Attentive Probing is a relatively new approach to perform classification on a given set of input features. It is hypothesised that this is the case in settings, where the features contain high degrees of semantic information.
    Isolated Sign Language Recognition is a classification task where single words in a given sign language are predicted.
    In this work you will implement a system that uses Sapiens and Attentive Probing for the prediction task and compare your results against other existing systems.

    Resources:
     
    • Sapiens: https://www.meta.com/emerging-tech/codec-avatars/sapiens
    • Attentive Probing: https://arxiv.org/abs/2202.03026
    • Isolated Sign Language Recognition: https://tinyurl.com/mrxptkvs

In-Progress Topics:

  • [BA] Investigation into Visual Keyword Spotting for German Sign Language Video to German Subtitle Alignment: One of the main problems in automatic sign language translation research is data scarcity. One way to counteract this problem could be to use data acquired from public TV broadcasts, some of which provide their own sign language translations. However, these are often direct translation from spoken German, i.e. not scripted, and, if so, often do not have a correct temporal alignment with the spoken German subtitles.
    Thus, the data can not be used for training translation models.
    In this work we are investigating, if we can perform (some) alignment for a set of known words. To this end, we are using the fact, that some words in German sign language are accompanied by so-called "mouthings" that represent the mouth movements of their translated German counterparts. Using this information, and a model that can tell whether a known word exists in a given clip of video, we hope to gain some deeper understanding into how to handle large-scale sign language data.

Completed Seminar-/ and Thesis Papers:

  • [BA] Autonomous Re-Identification of Humans by Mobile Robots in an Indoor Environment: Robotic assistants are a current hot-topic in human-centered artificial intelligence research. For these assistants to perform well in the wild, they need to be able to (re-)identify human operators and/or other humans they are interacting with.
    In this work a student implemented a set of ROS nodes deployable on an NVIDIA Jetson machine, that can be used to detect and re-identify known persons in real-time.

Links

Publikationen
Social
Academia

Search