Articles producció científicaEnginyeria Informàtica i Matemàtiques

A Seer knows best: Auto-tuned object storage shuffling for serverless analytics

  • Identification data

    Identifier:  imarina:9330486
    Authors:  Eizaguirre, German T; Sanchez-Artigas, Marc
    Abstract:
    Serverless platforms offer high resource elasticity and pay-as-you-go billing, making them a compelling choice for data analytics. To craft a “pure” serverless solution, the common practice is to transfer intermediate data between serverless functions via serverless object storage (IBM COS; AWS S3). However, prior works have led to inconclusive results about the performance of object storage systems, since they have left large margin for optimization. To verify that object storage has been underrated, we devise a novel shuffle manager for serverless data analytics called SEER. Specifically, SEER dynamically chooses between two shuffle algorithms to maximize performance. The algorithm choice is made online based on some predictive models, and very importantly, without end users having to specify intermediate shuffle data sizes at the time of the job submission. We integrate SEER with PyWren-IBM [31], a well-known serverless analytics framework, and evaluate it against both serverful (e.g., Spark) and serverless systems (e.g., Google BigQuery, Caerus [46] and SONIC [22]). Our results certify that our new shuffle manager can deliver performance improvements over them.
  • Others:

    Link to the original source: https://www.sciencedirect.com/science/article/pii/S0743731523001338
    APA: Eizaguirre, German T; Sanchez-Artigas, Marc (2024). A Seer knows best: Auto-tuned object storage shuffling for serverless analytics. Journal Of Parallel And Distributed Computing, 183(), 104763-. DOI: 10.1016/j.jpdc.2023.104763
    Paper original source: Journal Of Parallel And Distributed Computing. 183 104763-
    Article's DOI: 10.1016/j.jpdc.2023.104763
    Journal publication year: 2024
    Entity: Universitat Rovira i Virgili
    Paper version: info:eu-repo/semantics/publishedVersion
    Record's date: 2025-01-28
    URV's Author/s: Eizaguirre Suárez, Germán Telmo / Sanchez Artigas, Marc
    Department: Enginyeria Informàtica i Matemàtiques
    Licence document URL: https://repositori.urv.cat/ca/proteccio-de-dades/
    Publication Type: Journal Publications
    Author, as appears in the article.: Eizaguirre, German T; Sanchez-Artigas, Marc
    licence for use: https://creativecommons.org/licenses/by/3.0/es/
    Thematic Areas: Theoretical computer science, Software, Matemática / probabilidade e estatística, Interdisciplinar, Hardware and architecture, Engenharias iv, Engenharias iii, Computer science, theory & methods, Computer networks and communications, Ciência da computação, Artificial intelligence
    Author's mail: germantelmo.eizaguirre@urv.cat, germantelmo.eizaguirre@urv.cat, marc.sanchez@urv.cat
  • Keywords:

    Shuffle
    Serverless computing
    Object storage
    I/o optimization
    Artificial Intelligence
    Computer Networks and Communications
    Computer Science
    Theory & Methods
    Hardware and Architecture
    Software
    Theoretical Computer Science
    Matemática / probabilidade e estatística
    Interdisciplinar
    Engenharias iv
    Engenharias iii
    Ciência da computação
  • Documents:

  • Cerca a google

    Search to google scholar