Abstract:
The growing sophistication of malware has resulted in diverse challenges, especially
among security researchers who are expected to develop mechanisms to thwart these malicious
attacks. While security researchers have turned to machine learning to combat this surge in malware
attacks and enhance detection and prevention methods, they often encounter limitations when it
comes to sourcing malware binaries. This limitation places the burden on malware researchers to
create context-specific datasets and detection mechanisms, a time-consuming and intricate process
that involves a series of experiments. The lack of accessible analysis reports and a centralized platform
for sharing and verifying findings has resulted in many research outputs that can neither be replicated
nor validated. To address this critical gap, a malware analysis data curation platform was developed.
This platform offers malware researchers a highly customizable feature generation process drawing
from analysis data reports, particularly those generated in sandbox-based environments such as
Cuckoo Sandbox. To evaluate the effectiveness of the platform, a replication of existing studies was
conducted in the form of case studies. These studies revealed that the developed platform offers an
effective approach that can aid malware detection research. Moreover, a real-world scenario involving
over 3000 ransomware and benign samples for ransomware detection based on PE entropy was
explored. This yielded an impressive accuracy score of 98.8% and an AUC of 0.97 when employing
the decision tree algorithm, with a low latency of 1.51 ms. These results emphasize the necessity
of the proposed platform while demonstrating its capacity to construct a comprehensive detection
mechanism. By fostering community-driven interactive databanks, this platform enables the creation
of datasets as well as the sharing of reports, both of which can substantially reduce experimentation
time and enhance research repeatability.