Articles producció científica> Enginyeria Informàtica i Matemàtiques

A Seer knows best: Auto-tuned object storage shuffling for serverless analytics

  • Datos identificativos

    Identificador: imarina:9330486
    Autores:
    Eizaguirre, GTSánchez-Artigas, M
    Resumen:
    Serverless platforms offer high resource elasticity and pay-as-you-go billing, making them a compelling choice for data analytics. To craft a “pure” serverless solution, the common practice is to transfer intermediate data between serverless functions via serverless object storage (IBM COS; AWS S3). However, prior works have led to inconclusive results about the performance of object storage systems, since they have left large margin for optimization. To verify that object storage has been underrated, we devise a novel shuffle manager for serverless data analytics called SEER. Specifically, SEER dynamically chooses between two shuffle algorithms to maximize performance. The algorithm choice is made online based on some predictive models, and very importantly, without end users having to specify intermediate shuffle data sizes at the time of the job submission. We integrate SEER with PyWren-IBM [31], a well-known serverless analytics framework, and evaluate it against both serverful (e.g., Spark) and serverless systems (e.g., Google BigQuery, Caerus [46] and SONIC [22]). Our results certify that our new shuffle manager can deliver performance improvements over them.
  • Otros:

    Autor según el artículo: Eizaguirre, GT; Sánchez-Artigas, M
    Departamento: Enginyeria Informàtica i Matemàtiques
    Autor/es de la URV: Eizaguirre Suárez, Germán Telmo / Sanchez Artigas, Marc
    Palabras clave: Shuffle Serverless computing Object storage I/o optimization
    Resumen: Serverless platforms offer high resource elasticity and pay-as-you-go billing, making them a compelling choice for data analytics. To craft a “pure” serverless solution, the common practice is to transfer intermediate data between serverless functions via serverless object storage (IBM COS; AWS S3). However, prior works have led to inconclusive results about the performance of object storage systems, since they have left large margin for optimization. To verify that object storage has been underrated, we devise a novel shuffle manager for serverless data analytics called SEER. Specifically, SEER dynamically chooses between two shuffle algorithms to maximize performance. The algorithm choice is made online based on some predictive models, and very importantly, without end users having to specify intermediate shuffle data sizes at the time of the job submission. We integrate SEER with PyWren-IBM [31], a well-known serverless analytics framework, and evaluate it against both serverful (e.g., Spark) and serverless systems (e.g., Google BigQuery, Caerus [46] and SONIC [22]). Our results certify that our new shuffle manager can deliver performance improvements over them.
    Áreas temáticas: Theoretical computer science Software Matemática / probabilidade e estatística Interdisciplinar Hardware and architecture Engenharias iv Engenharias iii Computer science, theory & methods Computer networks and communications Ciência da computação Artificial intelligence
    Acceso a la licencia de uso: https://creativecommons.org/licenses/by/3.0/es/
    Direcció de correo del autor: germantelmo.eizaguirre@urv.cat germantelmo.eizaguirre@urv.cat marc.sanchez@urv.cat
    Identificador del autor: 0000-0002-9700-7318
    Fecha de alta del registro: 2024-08-03
    Versión del articulo depositado: info:eu-repo/semantics/publishedVersion
    URL Documento de licencia: https://repositori.urv.cat/ca/proteccio-de-dades/
    Referencia al articulo segun fuente origial: Journal Of Parallel And Distributed Computing. 183
    Referencia de l'ítem segons les normes APA: Eizaguirre, GT; Sánchez-Artigas, M (2024). A Seer knows best: Auto-tuned object storage shuffling for serverless analytics. Journal Of Parallel And Distributed Computing, 183(), -. DOI: 10.1016/j.jpdc.2023.104763
    Entidad: Universitat Rovira i Virgili
    Año de publicación de la revista: 2024
    Tipo de publicación: Journal Publications
  • Palabras clave:

    Artificial Intelligence,Computer Networks and Communications,Computer Science, Theory & Methods,Hardware and Architecture,Software,Theoretical Computer Science
    Shuffle
    Serverless computing
    Object storage
    I/o optimization
    Theoretical computer science
    Software
    Matemática / probabilidade e estatística
    Interdisciplinar
    Hardware and architecture
    Engenharias iv
    Engenharias iii
    Computer science, theory & methods
    Computer networks and communications
    Ciência da computação
    Artificial intelligence
  • Documentos:

  • Cerca a google

    Search to google scholar