Abstract:
This article presents a cloud-based method to classify 0-day attacks from a novel dataset
called UGRansome1819. The primary objective of the research is to classify potential unknown threats
using Machine Learning (ML) algorithms and cloud services. Our study contribution uses a novel
anomaly detection dataset that carries 0-day attacks to train and test ML algorithms using Amazon
Web Services such as S3 bucket and SageMaker. The proposed method used Ensemble Learning with
a Genetic Algorithm (GA) optimizer having three ML algorithms such as Naive Bayes (NB), Random
Forest (RF), and Support Vector Machine (SVM). These algorithms analyze the dataset by combining
each classifier and assessing the classification accuracy of 0-day threats. We have implemented
several metrics such as Accuracy, F1-Score, Confusion Matrix, Recall, and Precision to evaluate the
performance of the selected algorithms. We have then compared the UGRansome1819 performance
complexity with existing datasets using the same optimization settings. The RF implementation
(before and after optimization) remains constant on the UGRansome1819 that outperformed the
CAIDA and UNSWNB-15 datasets. The optimization technique only improved in Accuracy on the
UNSWNB-15 and CAIDA datasets but sufficient performance was achieved in terms of F1-Score with
UGRansome1819 using a multi-class classification scheme. The experimental results demonstrate
a UGRansome1819 classification ratio of 1% before and after optimization. When compared to the
UNSWNB-15 and CAIDA datasets, UGRansome1819 attains the highest accuracy value of 99.6% (prior
optimization). The Genetic Algorithm was used as a feature selector and dropped five attributes of the
UGRansome1819 causing a decrease in the computational time and over-fitting. The straightforward
way to improve the model performance to increase its accuracy after optimization is to add more
data samples to the training data. Doing so will add more details to the data and fine-tune the
model will result in a more accurate and optimized performance. The experiments demonstrate the
instability of single classifiers such as SVM and NB and suggest the proposed optimized validation
technique which can aggregate weak classifiers (e.g., SVM and NB) into an ensemble of the genetic
optimizer to enhance the classification performance. The UGRansome1819 model’s specificity and
sensitivity were estimated to be 100% with three predictors of threatening classes (Signature, Synthetic
Signature, and Anomaly). Lastly, the test classification accuracy of the SVM model improved by 6%
after optimization.