Assessing LLMs in malicious code deobfuscation of real-world malware campaigns

Patsakis, Constantinos; Casino, Fran; Lykousas, Nikolaos

Identification data

Identifier: imarina:9379058

Handle: https://hdl.handle.net/20.500.11797/imarina9379058

Authors:
Patsakis, ConstantinosCasino, FranLykousas, Nikolaos

Abstract:
The integration of large language models (LLMs) into various cybersecurity pipelines has become increasingly prevalent, enabling the automation of numerous manual tasks and often surpassing human performance. Recognising this potential, cybersecurity researchers and practitioners are actively investigating the application of LLMs to process vast volumes of heterogeneous data for anomaly detection, potential bypass identification, attack mitigation, and fraud prevention. Moreover, LLMs' advanced capabilities in generating functional code, interpreting code context, and code summarisation present significant opportunities for reverse engineering and malware deobfuscation. In this work, we comprehensively examine the deobfuscation capabilities of state-of-the-art LLMs. Specifically, we conducted a detailed evaluation of four prominent LLMs using real-world malicious scripts from the notorious Emotet malware campaign. Our findings reveal that while current LLMs are not yet perfectly accurate, they demonstrate substantial potential in efficiently deobfuscating payloads. This study highlights the importance of fine-tuning LLMs for specialised tasks, suggesting that such optimisation could pave the way for future AI-powered threat intelligence pipelines to combat obfuscated malware. Our contributions include a thorough analysis of LLM performance in malware deobfuscation, identifying strengths and limitations, and discussing the potential for integrating LLMs into cybersecurity frameworks for enhanced threat detection and mitigation. Our experiments illustrate that LLMs can automatically and accurately extract the necessary indicators of compromise from a real-world campaign with an accuracy of 69.56% and 88.78% for the URLs and the corresponding domains of the droppers, respe
Others:

Author, as appears in the article.: Patsakis, Constantinos; Casino, Fran; Lykousas, Nikolaos
Department: Enginyeria Informàtica i Matemàtiques
URV's Author/s: Casino Cembellín, Francisco José
Keywords: Code deobfuscation Cybersecurit Cybersecurity Large language models Malware analysis
Abstract: The integration of large language models (LLMs) into various cybersecurity pipelines has become increasingly prevalent, enabling the automation of numerous manual tasks and often surpassing human performance. Recognising this potential, cybersecurity researchers and practitioners are actively investigating the application of LLMs to process vast volumes of heterogeneous data for anomaly detection, potential bypass identification, attack mitigation, and fraud prevention. Moreover, LLMs' advanced capabilities in generating functional code, interpreting code context, and code summarisation present significant opportunities for reverse engineering and malware deobfuscation. In this work, we comprehensively examine the deobfuscation capabilities of state-of-the-art LLMs. Specifically, we conducted a detailed evaluation of four prominent LLMs using real-world malicious scripts from the notorious Emotet malware campaign. Our findings reveal that while current LLMs are not yet perfectly accurate, they demonstrate substantial potential in efficiently deobfuscating payloads. This study highlights the importance of fine-tuning LLMs for specialised tasks, suggesting that such optimisation could pave the way for future AI-powered threat intelligence pipelines to combat obfuscated malware. Our contributions include a thorough analysis of LLM performance in malware deobfuscation, identifying strengths and limitations, and discussing the potential for integrating LLMs into cybersecurity frameworks for enhanced threat detection and mitigation. Our experiments illustrate that LLMs can automatically and accurately extract the necessary indicators of compromise from a real-world campaign with an accuracy of 69.56% and 88.78% for the URLs and the corresponding domains of the droppers, respectively.
Thematic Areas: Administração pública e de empresas, ciências contábeis e turismo Administração, ciências contábeis e turismo Arquitetura, urbanismo e design Artificial intelligence Astronomia / física Biodiversidade Biotecnología Ciência da computação Ciências agrárias i Ciências ambientais Ciências biológicas i Ciências biológicas ii Ciências biológicas iii Ciências sociais aplicadas i Computer science applications Computer science, artificial intelligence Direito Economia Educação Enfermagem Engenharias i Engenharias ii Engenharias iii Engenharias iv Engineering (all) Engineering (miscellaneous) Engineering, electrical & electronic Farmacia General engineering Geociências Interdisciplinar Matemática / probabilidade e estatística Materiais Medicina i Medicina ii Medicina iii Operations research & management science Química
licence for use: https://creativecommons.org/licenses/by/3.0/es/
Author's mail: franciscojose.casino@urv.cat
Author identifier: 0000-0003-4296-2876
Record's date: 2025-02-18
Paper version: info:eu-repo/semantics/publishedVersion
Paper original source: Expert Systems With Applications. 256 124912-
APA: Patsakis, Constantinos; Casino, Fran; Lykousas, Nikolaos (2024). Assessing LLMs in malicious code deobfuscation of real-world malware campaigns. Expert Systems With Applications, 256(), 124912-. DOI: 10.1016/j.eswa.2024.124912
Licence document URL: https://repositori.urv.cat/ca/proteccio-de-dades/
Entity: Universitat Rovira i Virgili
Journal publication year: 2024
Publication Type: Journal Publications

Keywords:

Artificial Intelligence,Computer Science Applications,Computer Science, Artificial Intelligence,Engineering (Miscellaneous),Engineering, Electrical & Electronic,Operations Research & Management Science
Code deobfuscation
Cybersecurit
Cybersecurity
Large language models
Malware analysis
Administração pública e de empresas, ciências contábeis e turismo
Administração, ciências contábeis e turismo
Arquitetura, urbanismo e design
Artificial intelligence
Astronomia / física
Biodiversidade
Biotecnología
Ciência da computação
Ciências agrárias i
Ciências ambientais
Ciências biológicas i
Ciências biológicas ii
Ciências biológicas iii
Ciências sociais aplicadas i
Computer science applications
Computer science, artificial intelligence
Direito
Economia
Educação
Enfermagem
Engenharias i
Engenharias ii
Engenharias iii
Engenharias iv
Engineering (all)
Engineering (miscellaneous)
Engineering, electrical & electronic
Farmacia
General engineering
Geociências
Interdisciplinar
Matemática / probabilidade e estatística
Materiais
Medicina i
Medicina ii
Medicina iii
Operations research & management science
Química
Documents:

DocumentPrincipal
Cerca a google

Repositori URV

Articles producció científica> Enginyeria Informàtica i Matemàtiques

Assessing LLMs in malicious code deobfuscation of real-world malware campaigns

Identification data

Others:

Keywords:

Documents:

Cerca a google