Articles producció científica> Enginyeria Informàtica i Matemàtiques

A comprehensive analysis on software vulnerability detection datasets: trends, challenges, and road ahead

  • Identification data

    Identifier: imarina:9378601
    Authors:
    Guo YBettaieb SCasino F
    Abstract:
    As society's dependence on information and communication systems (ICTs) grows, so does the necessity of guaranteeing the proper functioning and use of such systems. In this context, it is critical to enhance the security and robustness of the DevSecOps pipeline through timely vulnerability detection. Usually, AI-based models enable desirable features such as automation, performance, and efficacy. However, the quality of such models highly depends on the datasets used during the training stage. The latter encompasses a series of challenges yet to be solved, such as access to extensive labelled datasets with specific properties, such as well-represented and balanced samples. This article explores the current state of practice of software vulnerability datasets and provides a classification of the main challenges and issues. After an extensive analysis, it describes a set of guidelines and desirable features that datasets should guarantee. The latter is applied to create a new dataset, which fulfils these properties, along with a descriptive comparison with the state of the art. Finally, a discussion on how to foster good practices among researchers and practitioners sets the ground for further research and continued improvement within this critical domain.
  • Others:

    Author, as appears in the article.: Guo Y; Bettaieb S; Casino F
    Department: Enginyeria Informàtica i Matemàtiques
    URV's Author/s: Casino Cembellín, Francisco José
    Keywords: Benchmarking Datasets Devsecop Devsecops Software vulnerability detection
    Abstract: As society's dependence on information and communication systems (ICTs) grows, so does the necessity of guaranteeing the proper functioning and use of such systems. In this context, it is critical to enhance the security and robustness of the DevSecOps pipeline through timely vulnerability detection. Usually, AI-based models enable desirable features such as automation, performance, and efficacy. However, the quality of such models highly depends on the datasets used during the training stage. The latter encompasses a series of challenges yet to be solved, such as access to extensive labelled datasets with specific properties, such as well-represented and balanced samples. This article explores the current state of practice of software vulnerability datasets and provides a classification of the main challenges and issues. After an extensive analysis, it describes a set of guidelines and desirable features that datasets should guarantee. The latter is applied to create a new dataset, which fulfils these properties, along with a descriptive comparison with the state of the art. Finally, a discussion on how to foster good practices among researchers and practitioners sets the ground for further research and continued improvement within this critical domain.
    Thematic Areas: Ciência da computação Computer networks and communications Computer science, information systems Computer science, software engineering Computer science, theory & methods Engenharias iv Information systems Matemática / probabilidade e estatística Safety, risk, reliability and quality Software
    licence for use: https://creativecommons.org/licenses/by/3.0/es/
    Author's mail: franciscojose.casino@urv.cat
    Author identifier: 0000-0003-4296-2876
    Record's date: 2025-02-24
    Paper version: info:eu-repo/semantics/publishedVersion
    Paper original source: International Journal Of Information Security. 23 (5): 3311-3327
    APA: Guo Y; Bettaieb S; Casino F (2024). A comprehensive analysis on software vulnerability detection datasets: trends, challenges, and road ahead. International Journal Of Information Security, 23(5), 3311-3327. DOI: 10.1007/s10207-024-00888-y
    Licence document URL: https://repositori.urv.cat/ca/proteccio-de-dades/
    Entity: Universitat Rovira i Virgili
    Journal publication year: 2024
    Publication Type: Journal Publications
  • Keywords:

    Computer Networks and Communications,Computer Science, Information Systems,Computer Science, Software Engineering,Computer Science, Theory & Methods,Information Systems,Safety, Risk, Reliability and Quality,Software
    Benchmarking
    Datasets
    Devsecop
    Devsecops
    Software vulnerability detection
    Ciência da computação
    Computer networks and communications
    Computer science, information systems
    Computer science, software engineering
    Computer science, theory & methods
    Engenharias iv
    Information systems
    Matemática / probabilidade e estatística
    Safety, risk, reliability and quality
    Software
  • Documents:

  • Cerca a google

    Search to google scholar