Articles producció científica> Enginyeria Informàtica i Matemàtiques

Spontaneous Facial Behavior Analysis Using Deep Transformer-based Framework for Child-computer Interaction

  • Dades identificatives

    Identificador: imarina:9332286
    Autors:
    Qayyum, ARazzak, ITanveer, MMazher, M
    Resum:
    A fascinating challenge in robotics-human interaction is imitating the emotion recognition capability of humans to robots with the aim to make human-robotics interaction natural, genuine and intuitive. To achieve the natural interaction in affective robots, human-machine interfaces, and autonomous vehicles, understanding our attitudes and opinions is very important, and it provides a practical and feasible path to realize the connection between machine and human. Multimodal interface that includes voice along with facial expression can manifest a large range of nuanced emotions compared to purely textual interfaces and provide a great value to improve the intelligence level of effective communication. Interfaces that fail to manifest or ignore user emotions may significantly impact the performance and risk being perceived as cold, socially inept, untrustworthy, and incompetent. To equip a child well for life, we need to help our children identify their feelings, manage them well, and express their needs in healthy, respectful, and direct ways. Early identification of emotional deficits can help to prevent low social functioning in children. In this work, we analyzed the child's spontaneous behavior using multimodal facial expression and voice signal presenting multimodal transformer-based last feature fusion for facial behavior analysis in children to extract contextualized representations from RGB video sequence and Hematoxylin and eosin video sequence and then using these representations followed by pairwise concatenations of contextualized representations using cross-feature fusion technique to predict users emotions. To validate the performance of the proposed framework, we have performed experiments with the different pairwise concatenations of contextualized repre
  • Altres:

    Autor segons l'article: Qayyum, A; Razzak, I; Tanveer, M; Mazher, M
    Departament: Enginyeria Informàtica i Matemàtiques
    Autor/s de la URV: Mazher, Moona
    Paraules clau: Text tagging Recognition Phrasesdatasets Neural networks Gaze detection Emotion Datasets
    Resum: A fascinating challenge in robotics-human interaction is imitating the emotion recognition capability of humans to robots with the aim to make human-robotics interaction natural, genuine and intuitive. To achieve the natural interaction in affective robots, human-machine interfaces, and autonomous vehicles, understanding our attitudes and opinions is very important, and it provides a practical and feasible path to realize the connection between machine and human. Multimodal interface that includes voice along with facial expression can manifest a large range of nuanced emotions compared to purely textual interfaces and provide a great value to improve the intelligence level of effective communication. Interfaces that fail to manifest or ignore user emotions may significantly impact the performance and risk being perceived as cold, socially inept, untrustworthy, and incompetent. To equip a child well for life, we need to help our children identify their feelings, manage them well, and express their needs in healthy, respectful, and direct ways. Early identification of emotional deficits can help to prevent low social functioning in children. In this work, we analyzed the child's spontaneous behavior using multimodal facial expression and voice signal presenting multimodal transformer-based last feature fusion for facial behavior analysis in children to extract contextualized representations from RGB video sequence and Hematoxylin and eosin video sequence and then using these representations followed by pairwise concatenations of contextualized representations using cross-feature fusion technique to predict users emotions. To validate the performance of the proposed framework, we have performed experiments with the different pairwise concatenations of contextualized representations that showed significantly better performance than state-of-the-art method. Besides, we perform t-distributed stochastic neighbor embedding visualization to visualize the discriminative feature in lower dimension space and probability density estimation to visualize the prediction capability of our proposed model.
    Àrees temàtiques: Hardware and architecture Engenharias iv Computer science, theory & methods Computer science, software engineering Computer science, information systems Computer networks and communications Ciência da computação Artes
    Accès a la llicència d'ús: https://creativecommons.org/licenses/by/3.0/es/
    Adreça de correu electrònic de l'autor: moona.mazher@estudiants.urv.cat
    Identificador de l'autor: 0000-0003-4444-5776
    Data d'alta del registre: 2024-08-03
    Versió de l'article dipositat: info:eu-repo/semantics/acceptedVersion
    Enllaç font original: https://dl.acm.org/doi/10.1145/3539577
    URL Document de llicència: https://repositori.urv.cat/ca/proteccio-de-dades/
    Referència a l'article segons font original: Acm Transactions On Multimedia Computing Communications And Applications. 20 (2):
    Referència de l'ítem segons les normes APA: Qayyum, A; Razzak, I; Tanveer, M; Mazher, M (2024). Spontaneous Facial Behavior Analysis Using Deep Transformer-based Framework for Child-computer Interaction. Acm Transactions On Multimedia Computing Communications And Applications, 20(2), -. DOI: 10.1145/3539577
    DOI de l'article: 10.1145/3539577
    Entitat: Universitat Rovira i Virgili
    Any de publicació de la revista: 2024
    Tipus de publicació: Journal Publications
  • Paraules clau:

    Computer Networks and Communications,Computer Science, Information Systems,Computer Science, Software Engineering,Computer Science, Theory & Methods,Hardware and Architecture
    Text tagging
    Recognition
    Phrasesdatasets
    Neural networks
    Gaze detection
    Emotion
    Datasets
    Hardware and architecture
    Engenharias iv
    Computer science, theory & methods
    Computer science, software engineering
    Computer science, information systems
    Computer networks and communications
    Ciência da computação
    Artes
  • Documents:

  • Cerca a google

    Search to google scholar