Please note that UPSpace will be offline from 20:00 on 9 May to 06:00 on 10 May (SAST) due to maintenance. We apologise for any inconvenience caused by this.
 

UGRansome1819 : a novel dataset for anomaly detection and zero-day threats

dc.contributor.authorNkongolo, Mike Nkongolo Wa
dc.contributor.authorVan Deventer, Jacobus Philippus
dc.contributor.authorKasongo, Sydney Mambwe
dc.contributor.emailu21629545@tuks.co.zaen_US
dc.date.accessioned2022-06-09T12:57:51Z
dc.date.available2022-06-09T12:57:51Z
dc.date.issued2021-09-30
dc.description.abstractThis research attempts to introduce the production methodology of an anomaly detection dataset using ten desirable requirements. Subsequently, the article presents the produced dataset named UGRansome, created with up-to-date and modern network traffic (netflow), which represents cyclostationary patterns of normal and abnormal classes of threatening behaviours. It was discovered that the timestamp of various network attacks is inferior to one minute and this feature pattern was used to record the time taken by the threat to infiltrate a network node. The main asset of the proposed dataset is its implication in the detection of zero-day attacks and anomalies that have not been explored before and cannot be recognised by known threats signatures. For instance, the UDP Scan attack has been found to utilise the lowest netflow in the corpus, while the Razy utilises the highest one. In turn, the EDA2 and Globe malware are the most abnormal zero-day threats in the proposed dataset. These feature patterns are included in the corpus, but derived from two well-known datasets, namely, UGR’16 and ransomware that include real-life instances. The former incorporates cyclostationary patterns while the latter includes ransomware features. The UGRansome dataset was tested with cross-validation and compared to the KDD99 and NSL-KDD datasets to assess the performance of Ensemble Learning algorithms. False alarms have been minimized with a null empirical error during the experiment, which demonstrates that implementing the Random Forest algorithm applied to UGRansome can facilitate accurate results to enhance zero-day threats detection. Additionally, most zero-day threats such as Razy, Globe, EDA2, and TowerWeb are recognised as advanced persistent threats that are cyclostationary in nature and it is predicted that they will be using spamming and phishing for intrusion. Lastly, achieving the UGRansome balance was found to be NP-Hard due to real life-threatening classes that do not have a uniform distribution in terms of several instances.en_US
dc.description.departmentInformaticsen_US
dc.description.librarianam2022en_US
dc.description.urihttps://www.mdpi.com/journal/informationen_US
dc.identifier.citationNkongolo, M.; van Deventer, J.P.; Kasongo, S.M. UGRansome1819: A Novel Dataset for Anomaly Detection and Zero-Day Threats. Information 2021, 12, 405. https://DOI.org/10.3390/info12100405.en_US
dc.identifier.issn2078-2489
dc.identifier.other10.3390/info12100405
dc.identifier.urihttps://repository.up.ac.za/handle/2263/85771
dc.language.isoenen_US
dc.publisherMDPIen_US
dc.rights© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.en_US
dc.subjectNetflowen_US
dc.subjectAnomaly detectionen_US
dc.subjectEnsemble learningen_US
dc.subjectZero-day threatsen_US
dc.subjectFeature extractionen_US
dc.subjectFeature engineeringen_US
dc.subjectDatasetsen_US
dc.subjectFeature selectionen_US
dc.subjectCyclostationarityen_US
dc.subjectRansomwareen_US
dc.subjectAdvanced persistent threatsen_US
dc.titleUGRansome1819 : a novel dataset for anomaly detection and zero-day threatsen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Nkongolo_UGRansome1819_2021.pdf
Size:
1.16 MB
Format:
Adobe Portable Document Format
Description:
Article

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.75 KB
Format:
Item-specific license agreed upon to submission
Description: