Articles producció científicaCiències Mèdiques Bàsiques

Contamination of fungal genomes of Onygenaceae (Phylum Ascomycota) in public databases: incidence, detection, and impact

  • Dades identificatives

    Identificador:  imarina:9469348
    Autors:  Granados-Casas, AO; Fernández-Bravo, A; Stchigel, AM; Cano-Lira, JF
    Resum:
    Genomic datasets often contain unwanted, foreign, or erroneous nucleotide sequences that do not belong to the organism under study. Such contamination can significantly compromise genome analyses, reducing the accuracy and reliability of the results. Despite its potential impact, few studies have addressed the contamination of fungal genomes by exogenous sequences. Here, we analyzed eleven publicly available genomes of fungi from the family Onygenaceae, retrieved from the National Center for Biotechnology Information (NCBI) database. A comprehensive quality assessment was performed, evaluating genome completeness, contiguity, and contamination levels. Genomes with lower statistical quality and putatively contaminated were selected for further improvement. To enhance assembly quality, we built a custom Kraken 2 database including four high-quality genomes of closely related fungal taxa. After filtering, we reassessed the genomes to compare contiguity, completeness, and contamination levels before and after the process. Furthermore, structural and functional annotation was conducted to evaluate changes in predicted proteins, protein families and domains. Additionally, Average nucleotide identity and phylogenetic analyses were performed to further assess the impact of the filtering. Four genomes showed low-quality statistics and contamination levels between 5 and 12%, mainly of bacteria origin. After removing the contaminated regions, assembly quality metrics improved, and contamination level dropped below 3% in all cases. Functional annotation of the filtered assemblies revealed a reduction in bacteria-associated protein families. Our results demonstrate the presence of contamination in publicly available Onygenaceae fungal genomes and highlight its potential to bias down
  • Altres:

    Autor segons l'article: Granados-Casas, AO; Fernández-Bravo, A; Stchigel, AM; Cano-Lira, JF
    Departament: Ciències Mèdiques Bàsiques
    e-ISSN: 1471-2164
    Autor/s de la URV: Cano Lira, José Francisco / Granados Casas, Alan Omar / Stchigel Glikman, Alberto Miguel
    Paraules clau: Whole genome sequencing; Whole; Taxonomy; Software; Quality assessment; Phylogeny; Onygenales; Molecular sequence annotation; Genomics; Genome, fungal; Fungi; Dna contamination; Databases, genetic; Coverage; Contamination; Bacteria; Ascomycota; Algorithm; <italic>ascomycota</italic>
    Resum: Genomic datasets often contain unwanted, foreign, or erroneous nucleotide sequences that do not belong to the organism under study. Such contamination can significantly compromise genome analyses, reducing the accuracy and reliability of the results. Despite its potential impact, few studies have addressed the contamination of fungal genomes by exogenous sequences. Here, we analyzed eleven publicly available genomes of fungi from the family Onygenaceae, retrieved from the National Center for Biotechnology Information (NCBI) database. A comprehensive quality assessment was performed, evaluating genome completeness, contiguity, and contamination levels. Genomes with lower statistical quality and putatively contaminated were selected for further improvement. To enhance assembly quality, we built a custom Kraken 2 database including four high-quality genomes of closely related fungal taxa. After filtering, we reassessed the genomes to compare contiguity, completeness, and contamination levels before and after the process. Furthermore, structural and functional annotation was conducted to evaluate changes in predicted proteins, protein families and domains. Additionally, Average nucleotide identity and phylogenetic analyses were performed to further assess the impact of the filtering. Four genomes showed low-quality statistics and contamination levels between 5 and 12%, mainly of bacteria origin. After removing the contaminated regions, assembly quality metrics improved, and contamination level dropped below 3% in all cases. Functional annotation of the filtered assemblies revealed a reduction in bacteria-associated protein families. Our results demonstrate the presence of contamination in publicly available Onygenaceae fungal genomes and highlight its potential to bias downstream analyses. We emphasize the importance of contamination screening and removal to ensure reliable genomic data for fungal research.
    Grup de recerca: Unitat de Micologia i Microbiologia Ambiental
    Àrees temàtiques: Zootecnia / recursos pesqueiros; Saúde coletiva; Química; Odontología; Medicina veterinaria; Medicina iii; Medicina ii; Medicina i; Matemática / probabilidade e estatística; Interdisciplinar; Genetics & heredity; Genetics; Farmacia; Engenharias iv; Engenharias iii; Engenharias ii; Educação física; Ciências biológicas iii; Ciências biológicas ii; Ciências biológicas i; Ciências ambientais; Ciências agrárias i; Ciência de alimentos; Ciência da computação; Biotecnología; Biotechnology & applied microbiology; Biotechnology; Biodiversidade; Astronomia / física
    Accès a la llicència d'ús: https://creativecommons.org/licenses/by/3.0/es/
    Adreça de correu electrònic de l'autor: alanomar.granados@urv.cat; alanomar.granados@urv.cat; albertomiguel.stchigel@urv.cat; jose.cano@urv.cat
    Data d'alta del registre: 2026-02-11
    Versió de l'article dipositat: info:eu-repo/semantics/publishedVersion
    Enllaç font original: https://link.springer.com/journal/12864
    URL Document de llicència: https://repositori.urv.cat/ca/proteccio-de-dades/
    Referència a l'article segons font original: Bmc Genomics. 26 (1): 1057-
    Referència de l'ítem segons les normes APA: Granados-Casas, AO; Fernández-Bravo, A; Stchigel, AM; Cano-Lira, JF (2025). Contamination of fungal genomes of Onygenaceae (Phylum Ascomycota) in public databases: incidence, detection, and impact. Bmc Genomics, 26(1), 1057-. DOI: 10.1186/s12864-025-12223-3
    DOI de l'article: 10.1186/s12864-025-12223-3
    Entitat: Universitat Rovira i Virgili
    Any de publicació de la revista: 2025-11-19
    Tipus de publicació: Journal Publications
  • Paraules clau:

    Biotechnology,Biotechnology & Applied Microbiology,Genetics,Genetics & Heredity
    Whole genome sequencing
    Whole
    Taxonomy
    Software
    Quality assessment
    Phylogeny
    Onygenales
    Molecular sequence annotation
    Genomics
    Genome, fungal
    Fungi
    Dna contamination
    Databases, genetic
    Coverage
    Contamination
    Bacteria
    Ascomycota
    Algorithm
    ascomycota
    Zootecnia / recursos pesqueiros
    Saúde coletiva
    Química
    Odontología
    Medicina veterinaria
    Medicina iii
    Medicina ii
    Medicina i
    Matemática / probabilidade e estatística
    Interdisciplinar
    Genetics & heredity
    Genetics
    Farmacia
    Engenharias iv
    Engenharias iii
    Engenharias ii
    Educação física
    Ciências biológicas iii
    Ciências biológicas ii
    Ciências biológicas i
    Ciências ambientais
    Ciências agrárias i
    Ciência de alimentos
    Ciência da computação
    Biotecnología
    Biotechnology & applied microbiology
    Biotechnology
    Biodiversidade
    Astronomia / física
  • Documents:

  • Cerca a google

    Search to google scholar