Articles producció científica> Enginyeria Química

Nonunique UPGMA clusterings of microsatellite markers

  • Datos identificativos

    Identificador: imarina:9280466
    Autores:
    Segura-Alabart, NataliaSerratosa, FrancescGomez, SergioFernandez, Alberto
    Resumen:
    Agglomerative hierarchical clustering has become a common tool for the analysis and visualization of data, thus being present in a large amount of scientific research and predating all areas of bioinformatics and computational biology. In this work, we focus on a critical problem, the nonuniqueness of the clustering when there are tied distances, for which several solutions exist but are not implemented in most hierarchical clustering packages. We analyze the magnitude of this problem in one particular setting: the clustering of microsatellite markers using the Unweighted Pair-Group Method with Arithmetic Mean. To do so, we have calculated the fraction of publications at the Scopus database in which more than one hierarchical clustering is possible, showing that about 46% of the articles are affected. Additionally, to show the problem from a practical point of view, we selected two opposite examples of articles that have multiple solutions: one with two possible dendrograms, and the other with more than 2.5 million different possible hierarchical clusterings.© The Author(s) 2022. Published by Oxford University Press.
  • Otros:

    Autor según el artículo: Segura-Alabart, Natalia; Serratosa, Francesc; Gomez, Sergio; Fernandez, Alberto
    Departamento: Enginyeria Informàtica i Matemàtiques Enginyeria Química
    Autor/es de la URV: Fernández Sabater, Alberto / Gómez Jiménez, Sergio / Segura Alabart, Natàlia / Serratosa Casanelles, Francesc d'Assís
    Palabras clave: Upgma Tie in proximity Str Ssr Microsatellite repeats Microsatellite marker Genetic diversity Dendrogram Computational biology Cluster analysis upgma tie in proximity str ssr simple sequences l. dendrogram
    Resumen: Agglomerative hierarchical clustering has become a common tool for the analysis and visualization of data, thus being present in a large amount of scientific research and predating all areas of bioinformatics and computational biology. In this work, we focus on a critical problem, the nonuniqueness of the clustering when there are tied distances, for which several solutions exist but are not implemented in most hierarchical clustering packages. We analyze the magnitude of this problem in one particular setting: the clustering of microsatellite markers using the Unweighted Pair-Group Method with Arithmetic Mean. To do so, we have calculated the fraction of publications at the Scopus database in which more than one hierarchical clustering is possible, showing that about 46% of the articles are affected. Additionally, to show the problem from a practical point of view, we selected two opposite examples of articles that have multiple solutions: one with two possible dendrograms, and the other with more than 2.5 million different possible hierarchical clusterings.© The Author(s) 2022. Published by Oxford University Press.
    Áreas temáticas: Molecular biology Medicine (all) Mathematical & computational biology Information systems Ciências biológicas i Ciência da computação Biotechnology & applied microbiology Biochemical research methods
    Acceso a la licencia de uso: https://creativecommons.org/licenses/by/3.0/es/
    Direcció de correo del autor: natalia.segura@urv.cat natalia.segura@urv.cat natalia.segura@urv.cat alberto.fernandez@urv.cat sergio.gomez@urv.cat francesc.serratosa@urv.cat
    Identificador del autor: 0000-0002-1241-1646 0000-0003-1820-0062 0000-0001-6112-5913
    Fecha de alta del registro: 2024-10-26
    Versión del articulo depositado: info:eu-repo/semantics/publishedVersion
    URL Documento de licencia: https://repositori.urv.cat/ca/proteccio-de-dades/
    Referencia al articulo segun fuente origial: Briefings In Bioinformatics. 23 (5): bbac312-bbac312
    Referencia de l'ítem segons les normes APA: Segura-Alabart, Natalia; Serratosa, Francesc; Gomez, Sergio; Fernandez, Alberto (2022). Nonunique UPGMA clusterings of microsatellite markers. Briefings In Bioinformatics, 23(5), bbac312-bbac312. DOI: https://doi.org/10.1093/bib/bbac312
    Entidad: Universitat Rovira i Virgili
    Año de publicación de la revista: 2022
    Tipo de publicación: Journal Publications
  • Palabras clave:

    Biochemical Research Methods,Biotechnology & Applied Microbiology,Information Systems,Mathematical & Computational Biology,Molecular Biology
    Upgma
    Tie in proximity
    Str
    Ssr
    Microsatellite repeats
    Microsatellite marker
    Genetic diversity
    Dendrogram
    Computational biology
    Cluster analysis
    upgma
    tie in proximity
    str
    ssr
    simple sequences
    l.
    dendrogram
    Molecular biology
    Medicine (all)
    Mathematical & computational biology
    Information systems
    Ciências biológicas i
    Ciência da computação
    Biotechnology & applied microbiology
    Biochemical research methods
  • Documentos:

  • Cerca a google

    Search to google scholar