Feature selection using Benford’s law to support detection of malicious social media bots

dc.contributor.authorMbona, Innocent
dc.contributor.authorEloff, Jan H.P.
dc.contributor.emailu15256422@tuks.co.zaen_ZA
dc.date.accessioned2021-11-30T07:44:43Z
dc.date.issued2022-01
dc.description.abstractThe increased amount of high-dimensional imbalanced data in online social networks challenges existing feature selection methods. Although feature selection methods such as principal component analysis (PCA) are effective for solving high-dimensional imbalanced data problems, they can be computationally expensive. Hence, an effortless approach for identifying meaningful features that are indicative of anomalous behaviour between humans and malicious bots is presented herein. The most recent Twitter dataset that encompasses the behaviour of various types of malicious bots (including fake followers, retweet spam, fake advertisements, and traditional spambots) is used to understand the behavioural traits of such bots. The approach is based on Benford’s law for predicting the frequency distribution of significant leading digits. This study demonstrates that features closely obey Benford’s law on a human dataset, whereas the same features violate Benford’s law on a malicious bot dataset. Finally, it is demonstrated that the features identified by Benford’s law are consistent with those identified via PCA and the ensemble random forest method on the same datasets. This study contributes to the intelligent detection of malicious bots such that their malicious activities, such as the dissemination of spam, can be minimised.en_ZA
dc.description.departmentComputer Scienceen_ZA
dc.description.embargo2023-09-15
dc.description.librarianhj2021en_ZA
dc.description.sponsorshipThe University of Pretoria and Bank Seta.en_ZA
dc.description.urihttp://www.elsevier.com/locate/insen_ZA
dc.identifier.citationMbona, I. & Eloff, J.H.P. 2022, 'Feature selection using Benford’s law to support detection of malicious social media bots', Information Sciences, vol. 582, pp. 369-381.en_ZA
dc.identifier.issn0020-0255 (print)
dc.identifier.issn1872-6291 (online)
dc.identifier.other10.1016/j.ins.2021.09.038
dc.identifier.urihttp://hdl.handle.net/2263/82899
dc.language.isoenen_ZA
dc.publisherElsevieren_ZA
dc.rights© 2021 Elsevier Inc. All rights reserved. Notice : this is the author’s version of a work that was accepted for publication in Information Sciences. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. A definitive version was subsequently published in Information Sciences, vol. 582, pp. 369-381, 2022. doi : 10.1016/j.ins.2021.09.038.en_ZA
dc.subjectBenford’s lawen_ZA
dc.subjectHigh-dimensional imbalanced dataseten_ZA
dc.subjectMalicious botsen_ZA
dc.subjectFeature selectionen_ZA
dc.subjectOnline social network (OSN)en_ZA
dc.titleFeature selection using Benford’s law to support detection of malicious social media botsen_ZA
dc.typePostprint Articleen_ZA

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Mbona_Feature_2022.pdf
Size:
381.09 KB
Format:
Adobe Portable Document Format
Description:
Postprint Article

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.75 KB
Format:
Item-specific license agreed upon to submission
Description: