Articles producció científicaEnginyeria Informàtica i Matemàtiques

CoHAtNet: An integrated convolutional-transformer architecture with hybrid self-attention for end-to-end camera localization

  • Dades identificatives

    Identificador:  imarina:9463954
    Autors:  Hasan, H; Garcia, MA; Rashwan, H; Puig, D
    Resum:
    Camera localization refers to the process of automatically determining the position and orientation of a camera within its 3D environment from the images it captures. Traditional camera localization methods often rely on Convolutional Neural Networks, which are effective at extracting local visual features but struggle to capture long-range dependencies critical for accurate localization. In contrast, Transformer-based approaches model global contextual relationships appropriately, although they often lack precision in fine-grained spatial representations. To bridge this gap, we introduce CoHAtNet, a novel Convolutional Hybrid-Attention Network that tightly integrates convolutional and self-attention mechanisms. Unlike previous hybrid models that stack convolutional and attention layers separately, CoHAtNet embeds local features extracted via Mobile Inverted Bottleneck Convolution blocks directly into the Value component of the self-attention mechanism of Transformers. This yields a hybrid self-attention block capable of dynamically capturing both local spatial detail and global semantic context within a single attention layer. Additionally, CoHAtNet enables modality-level fusion by processing RGB and depth data jointly in a unified pipeline, allowing the model to leverage complementary appearance and geometric cues throughout. Extensive evaluations have been conducted on two widely-used camera localization datasets: 7-Scenes (RGB-D) and Cambridge Landmarks (RGB). Experimental results show that CoHAtNet achieves state-of-theart performance in both translation and orientation accuracy. These results highlight the effectiveness of our hybrid design in challenging indoor and outdoor environments. This makes CoHAtNet a strong candidate for end-to-end camera localization tasks.
  • Altres:

    Enllaç font original: https://www.sciencedirect.com/science/article/pii/S0262885625002628?via%3Dihub
    Referència de l'ítem segons les normes APA: Hasan, H; Garcia, MA; Rashwan, H; Puig, D (2025). CoHAtNet: An integrated convolutional-transformer architecture with hybrid self-attention for end-to-end camera localization. Image And Vision Computing, 162(), 105674-. DOI: 10.1016/j.imavis.2025.105674
    Referència a l'article segons font original: Image And Vision Computing. 162 105674-
    DOI de l'article: 10.1016/j.imavis.2025.105674
    Any de publicació de la revista: 2025-10-01
    Entitat: Universitat Rovira i Virgili
    Versió de l'article dipositat: info:eu-repo/semantics/publishedVersion
    Data d'alta del registre: 2026-02-13
    Autor/s de la URV: Abdellatif Fatahallah Ibrahim Mahmoud, Hatem / Puig Valls, Domènec Savi
    Departament: Enginyeria Informàtica i Matemàtiques
    URL Document de llicència: https://repositori.urv.cat/ca/proteccio-de-dades/
    Tipus de publicació: Journal Publications
    Autor segons l'article: Hasan, H; Garcia, MA; Rashwan, H; Puig, D
    Accès a la llicència d'ús: https://creativecommons.org/licenses/by/3.0/es/
    Àrees temàtiques: Artes / música, Biotecnología, Ciência da computação, Ciências biológicas i, Computer science, artificial intelligence, Computer science, software engineering, Computer science, software, graphics, programming, Computer science, theory & methods, Computer vision and pattern recognition, Direito, Electrical and electronic engineering, Engenharias iv, Engineering, electrical & electronic, Interdisciplinar, Matemática / probabilidade e estatística, Optics, Química, Signal processing
    Adreça de correu electrònic de l'autor: domenec.puig@urv.cat, hatem.abdellatif@urv.cat
  • Paraules clau:

    3-d environments
    Affordable and clean energy
    Attention mechanisms
    Camera localization
    Cameras
    Coatnet
    Convolution
    Convolutional neural network
    Convolutional neural networks
    End to end
    Hybrid cnn-transformer
    Hybrid cnn-transformers
    Hybrid self-attentio
    Hybrid self-attention
    Image processing
    Localization method
    Position and orientations
    Semantics
    Computer Science
    Artificial Intelligence
    Software Engineering
    Software
    Graphics
    Programming
    Theory & Methods
    Computer Vision and Pattern Recognition
    Electrical and Electronic Engineering
    Engineering
    Electrical & Electronic
    Optics
    Signal Processing
    Artes / música
    Biotecnología
    Ciência da computação
    Ciências biológicas i
    Direito
    Engenharias iv
    Interdisciplinar
    Matemática / probabilidade e estatística
    Química
  • Documents:

  • Cerca a google

    Search to google scholar