Repositori institucional URV
Español Català English
TITLE:
A compressed file partitioner for scalable genomics analysis with serverless technology - TFG:5628

Student:Maleno Gonzalez, Francisco Damián
Language:en
Title in original language:A compressed file partitioner for scalable genomics analysis with serverless technology
Title in different languages:A compressed file partitioner for scalable genomics analysis with serverless technology
Keywords:data analytics, serverless, genomics, compressed files, partitioner
Subject:Dades. Recuperació (Informàtica)
Abstract:The advances made in next-generation sequencing technologies have revolutionized the study of molecular biology by enabling the sequencing of millions of genomic sequences on a massive scale. Unimaginable amounts of genomic data require exhaustive bioinformatic processing for their correct interpretation, a need that traditional computing is struggling to cope with. Therefore, serverless architectures have been resorted, which allow the processing of otherwise unfeasible volumes of data from a personal computer, taking responsibilities such as resource provisioning and management away from the programmer, and based on the principles of simplicity, scalability, and billing only for the resources used. Motivated by its better performance and lower cost, bioinformatics research groups have decided to migrate their experiments to this architecture using serverless data analysis frameworks, such as Lithops. However, despite having fewer limitations in terms of data storage with these architectures, these frameworks have not been designed to work with all types of data. Genomic data is often stored in Gzip compressed files of tens of terabytes, so it is necessary to implement a utility able to decompress portions of these large files 'on-the-fly' for their analysis in serverless functions. Thanks to the data partitioner and retriever for Gzip-compressed files implemented in this study, bioinformaticians will be able to perform their experiments using the Lithops serverless data analysis framework in a simple way, enjoying a programming experience driven by data rather than by resource management. To validate the efficiency of this system, Cloudbutton's genomic use case 'SNP Variant Caller' has been implemented with satisfactory results.
Project director:García López, Pedro
Department:Enginyeria Informàtica i Matemàtiques
Education area(s):Enginyeria Informàtica
Entity:Universitat Rovira i Virgili (URV)
TFG credits:12
Creation date in repository:2023-02-07
Work's public defense date:2022-01-21
Academic year:2021-2022
Confidenciality:No
Subject areas:Computer engineering
Access rights:info:eu-repo/semantics/openAccess
Search your record at:

Available files
FileDescriptionFormat
MemòriaMemoryapplication/pdf

Information

© 2011 Universitat Rovira i Virgili