UGRansome1819 : a novel dataset for anomaly detection and zero-day threats
dc.contributor.author | Nkongolo, Mike Nkongolo Wa | |
dc.contributor.author | Van Deventer, Jacobus Philippus | |
dc.contributor.author | Kasongo, Sydney Mambwe | |
dc.contributor.email | u21629545@tuks.co.za | en_US |
dc.date.accessioned | 2022-06-09T12:57:51Z | |
dc.date.available | 2022-06-09T12:57:51Z | |
dc.date.issued | 2021-09-30 | |
dc.description.abstract | This research attempts to introduce the production methodology of an anomaly detection dataset using ten desirable requirements. Subsequently, the article presents the produced dataset named UGRansome, created with up-to-date and modern network traffic (netflow), which represents cyclostationary patterns of normal and abnormal classes of threatening behaviours. It was discovered that the timestamp of various network attacks is inferior to one minute and this feature pattern was used to record the time taken by the threat to infiltrate a network node. The main asset of the proposed dataset is its implication in the detection of zero-day attacks and anomalies that have not been explored before and cannot be recognised by known threats signatures. For instance, the UDP Scan attack has been found to utilise the lowest netflow in the corpus, while the Razy utilises the highest one. In turn, the EDA2 and Globe malware are the most abnormal zero-day threats in the proposed dataset. These feature patterns are included in the corpus, but derived from two well-known datasets, namely, UGR’16 and ransomware that include real-life instances. The former incorporates cyclostationary patterns while the latter includes ransomware features. The UGRansome dataset was tested with cross-validation and compared to the KDD99 and NSL-KDD datasets to assess the performance of Ensemble Learning algorithms. False alarms have been minimized with a null empirical error during the experiment, which demonstrates that implementing the Random Forest algorithm applied to UGRansome can facilitate accurate results to enhance zero-day threats detection. Additionally, most zero-day threats such as Razy, Globe, EDA2, and TowerWeb are recognised as advanced persistent threats that are cyclostationary in nature and it is predicted that they will be using spamming and phishing for intrusion. Lastly, achieving the UGRansome balance was found to be NP-Hard due to real life-threatening classes that do not have a uniform distribution in terms of several instances. | en_US |
dc.description.department | Informatics | en_US |
dc.description.librarian | am2022 | en_US |
dc.description.uri | https://www.mdpi.com/journal/information | en_US |
dc.identifier.citation | Nkongolo, M.; van Deventer, J.P.; Kasongo, S.M. UGRansome1819: A Novel Dataset for Anomaly Detection and Zero-Day Threats. Information 2021, 12, 405. https://DOI.org/10.3390/info12100405. | en_US |
dc.identifier.issn | 2078-2489 | |
dc.identifier.other | 10.3390/info12100405 | |
dc.identifier.uri | https://repository.up.ac.za/handle/2263/85771 | |
dc.language.iso | en | en_US |
dc.publisher | MDPI | en_US |
dc.rights | © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. | en_US |
dc.subject | Netflow | en_US |
dc.subject | Anomaly detection | en_US |
dc.subject | Ensemble learning | en_US |
dc.subject | Zero-day threats | en_US |
dc.subject | Feature extraction | en_US |
dc.subject | Feature engineering | en_US |
dc.subject | Datasets | en_US |
dc.subject | Feature selection | en_US |
dc.subject | Cyclostationarity | en_US |
dc.subject | Ransomware | en_US |
dc.subject | Advanced persistent threats | en_US |
dc.title | UGRansome1819 : a novel dataset for anomaly detection and zero-day threats | en_US |
dc.type | Article | en_US |