Abstract:
This research attempts to introduce the production methodology of an anomaly detection
dataset using ten desirable requirements. Subsequently, the article presents the produced dataset
named UGRansome, created with up-to-date and modern network traffic (netflow), which represents
cyclostationary patterns of normal and abnormal classes of threatening behaviours. It was discovered
that the timestamp of various network attacks is inferior to one minute and this feature pattern
was used to record the time taken by the threat to infiltrate a network node. The main asset of the
proposed dataset is its implication in the detection of zero-day attacks and anomalies that have
not been explored before and cannot be recognised by known threats signatures. For instance, the
UDP Scan attack has been found to utilise the lowest netflow in the corpus, while the Razy utilises
the highest one. In turn, the EDA2 and Globe malware are the most abnormal zero-day threats
in the proposed dataset. These feature patterns are included in the corpus, but derived from two
well-known datasets, namely, UGR’16 and ransomware that include real-life instances. The former
incorporates cyclostationary patterns while the latter includes ransomware features. The UGRansome
dataset was tested with cross-validation and compared to the KDD99 and NSL-KDD datasets to
assess the performance of Ensemble Learning algorithms. False alarms have been minimized with
a null empirical error during the experiment, which demonstrates that implementing the Random
Forest algorithm applied to UGRansome can facilitate accurate results to enhance zero-day threats
detection. Additionally, most zero-day threats such as Razy, Globe, EDA2, and TowerWeb are
recognised as advanced persistent threats that are cyclostationary in nature and it is predicted that
they will be using spamming and phishing for intrusion. Lastly, achieving the UGRansome balance
was found to be NP-Hard due to real life-threatening classes that do not have a uniform distribution
in terms of several instances.