Articles producció científicaEnginyeria Informàtica i Matemàtiques

CoHAtNet: An integrated convolutional-transformer architecture with hybrid self-attention for end-to-end camera localization

  • Identification data

    Identifier:  imarina:9463954
    Authors:  Hasan, H; Garcia, MA; Rashwan, H; Puig, D
    Abstract:
    Camera localization refers to the process of automatically determining the position and orientation of a camera within its 3D environment from the images it captures. Traditional camera localization methods often rely on Convolutional Neural Networks, which are effective at extracting local visual features but struggle to capture long-range dependencies critical for accurate localization. In contrast, Transformer-based approaches model global contextual relationships appropriately, although they often lack precision in fine-grained spatial representations. To bridge this gap, we introduce CoHAtNet, a novel Convolutional Hybrid-Attention Network that tightly integrates convolutional and self-attention mechanisms. Unlike previous hybrid models that stack convolutional and attention layers separately, CoHAtNet embeds local features extracted via Mobile Inverted Bottleneck Convolution blocks directly into the Value component of the self-attention mechanism of Transformers. This yields a hybrid self-attention block capable of dynamically capturing both local spatial detail and global semantic context within a single attention layer. Additionally, CoHAtNet enables modality-level fusion by processing RGB and depth data jointly in a unified pipeline, allowing the model to leverage complementary appearance and geometric cues throughout. Extensive evaluations have been conducted on two widely-used camera localization datasets: 7-Scenes (RGB-D) and Cambridge Landmarks (RGB). Experimental results show that CoHAtNet achieves state-of-theart performance in both translation and orientation accuracy. These results highlight the effectiveness of our hybrid design in challenging indoor and outdoor environments. This makes CoHAtNet a strong candidate for end-to-end camera localization tasks.
  • Others:

    Link to the original source: https://www.sciencedirect.com/science/article/pii/S0262885625002628?via%3Dihub
    APA: Hasan, H; Garcia, MA; Rashwan, H; Puig, D (2025). CoHAtNet: An integrated convolutional-transformer architecture with hybrid self-attention for end-to-end camera localization. Image And Vision Computing, 162(), 105674-. DOI: 10.1016/j.imavis.2025.105674
    Paper original source: Image And Vision Computing. 162 105674-
    Article's DOI: 10.1016/j.imavis.2025.105674
    Journal publication year: 2025-10-01
    Entity: Universitat Rovira i Virgili
    Paper version: info:eu-repo/semantics/publishedVersion
    Record's date: 2026-02-13
    URV's Author/s: Abdellatif Fatahallah Ibrahim Mahmoud, Hatem / Puig Valls, Domènec Savi
    Department: Enginyeria Informàtica i Matemàtiques
    Licence document URL: https://repositori.urv.cat/ca/proteccio-de-dades/
    Publication Type: Journal Publications
    Author, as appears in the article.: Hasan, H; Garcia, MA; Rashwan, H; Puig, D
    licence for use: https://creativecommons.org/licenses/by/3.0/es/
    Thematic Areas: Artes / música, Biotecnología, Ciência da computação, Ciências biológicas i, Computer science, artificial intelligence, Computer science, software engineering, Computer science, software, graphics, programming, Computer science, theory & methods, Computer vision and pattern recognition, Direito, Electrical and electronic engineering, Engenharias iv, Engineering, electrical & electronic, Interdisciplinar, Matemática / probabilidade e estatística, Optics, Química, Signal processing
    Author's mail: domenec.puig@urv.cat, hatem.abdellatif@urv.cat
  • Keywords:

    3-d environments
    Affordable and clean energy
    Attention mechanisms
    Camera localization
    Cameras
    Coatnet
    Convolution
    Convolutional neural network
    Convolutional neural networks
    End to end
    Hybrid cnn-transformer
    Hybrid cnn-transformers
    Hybrid self-attentio
    Hybrid self-attention
    Image processing
    Localization method
    Position and orientations
    Semantics
    Computer Science
    Artificial Intelligence
    Software Engineering
    Software
    Graphics
    Programming
    Theory & Methods
    Computer Vision and Pattern Recognition
    Electrical and Electronic Engineering
    Engineering
    Electrical & Electronic
    Optics
    Signal Processing
    Artes / música
    Biotecnología
    Ciência da computação
    Ciências biológicas i
    Direito
    Engenharias iv
    Interdisciplinar
    Matemática / probabilidade e estatística
    Química
  • Documents:

  • Cerca a google

    Search to google scholar