An investigation of the effectiveness of using Twitter data for predicting South African protests with Graph Neural  Networks

Ngomane, Derwin

UPSpace Home
→
University of Pretoria: Research Output
→
Theses and Dissertations (University of Pretoria)
→
View Item

dc.contributor.advisor	Marivate, Vukosi
dc.contributor.coadvisor	Ahmed, Maxamed
dc.contributor.postgraduate	Ngomane, Derwin
dc.date.accessioned	2024-09-12T09:08:11Z
dc.date.available	2024-09-12T09:08:11Z
dc.date.created	2024-04
dc.date.issued	2024-04
dc.description	Mini Dissertation (MIT (Big Data Science))--University of Pretoria, 2024.	en_US
dc.description.abstract	Social media creates an echo chamber effect that is closely related to social movement theory, which aims to mobilise people to change society. In South Africa, there has been an increase in protests that appear to have started on social media. For example, consider the riots that occurred in July 2021 following the arrest of former President Jacob Zuma. Protests in South Africa, on the other hand, have culminated in violent incidents, such as the July 2021 protest. In that situation, the South African Human Rights Commission found that social media sites such as WhatsApp, Facebook, and Twitter aided the violence by sharing protest information. This study investigates whether social media can be utilised to signal upcoming South African protests. This research investigates the effectiveness of nose reduction techniques on Twitter data for predicting protest-related events in South Africa using Graph Neural Networks. It addresses research gaps by addressing the need for graph-based methodologies in the South African context, addressing the lack of noise reduction research for Twitter data, and using an automated method to extract relevant keywords in the word networks. The work aims to provide a new avenue for noise reduction in real-world scenarios where future events have not occurred. This study examines a three-year data window between 2019 and 2021 using the Global Dataset of Events, Location, and Tone (GDELT) and Twitter data. GDELT focuses on CAMEO codes related to protests and conflict, while Twitter extracts social media text related to protest-related posts. A sliding window approach is used to combine the data, with noise-reduction filtration techniques guiding the filtration. This work explores the potential of processing Twitter data to reveal signals for improved predictive capability. Derivative metrics, from hashtags, links, and mentions, are used to reveal such signals. The study compares different machine learning methods, including Logistic Regression, Graph Convolutional Networks, and Graph Isomorphism Networks, to model the data. It is discovered that the geometric deep learning methods struggle with overfitting in hold-out testing data but are stable and have better cross-validation scores. The GIN model exhibits higher accuracy and isomorphism detection, making it suitable for the task. However, graph neural networks struggle with limited data and hence overfit the training data, as well as isomorphism and isolated nodes due to message-passing paradigm. The intricacy of Twitter interactions and conversations is highlighted in this work, empha- sising the need for future research in data processing and model building. The study excluded other data features to add more information about the data space’s complexity, such as user interactions. Keyword selection was done independently, but node eigenvector centrality could be used for informed decision-making. The graph neural network paradigm of message passing has limited capability in the existence of isolated nodes, and isomorphism is crucial for network performance. Further research should investigate dynamic capabilities and edge weights in GIN networks.	en_US
dc.description.availability	Unrestricted	en_US
dc.description.degree	MIT (Big Data Science)	en_US
dc.description.department	Computer Science	en_US
dc.description.faculty	Faculty of Engineering, Built Environment and Information Technology	en_US
dc.description.sdg	SDG-09: Industry, innovation and infrastructure	en_US
dc.identifier.citation	*	en_US
dc.identifier.other	A2024	en_US
dc.identifier.uri	http://hdl.handle.net/2263/98149
dc.language.iso	en	en_US
dc.publisher	University of Pretoria
dc.rights	© 2021 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
dc.subject	UCTD	en_US
dc.subject	Twitter data	en_US
dc.subject	Graph Neural Networks	en_US
dc.subject	South African	en_US
dc.title	An investigation of the effectiveness of using Twitter data for predicting South African protests with Graph Neural Networks	en_US
dc.type	Mini Dissertation	en_US