Fast Data Analysis Methods For Social Media Data

Nhlabano, Valentine Velaphi

UPSpace Home
→
University of Pretoria: Research Output
→
Theses and Dissertations (University of Pretoria)
→
View Item

We are excited to announce that the repository will soon undergo an upgrade, featuring a new look and feel along with several enhanced features to improve your experience. Please be on the lookout for further updates and announcements regarding the launch date. We appreciate your support and look forward to unveiling the improved platform soon.

Show simple item record

dc.contributor.advisor	Lutu, Patricia Elizabeth Nalwoga
dc.contributor.postgraduate	Nhlabano, Valentine Velaphi
dc.date.accessioned	2019-12-09T08:55:16Z
dc.date.available	2019-12-09T08:55:16Z
dc.date.created	2019-12-15
dc.date.issued	2018-08-07
dc.description	Dissertation (MSc)--University of Pretoria, 2019.	en_ZA
dc.description.abstract	The advent of Web 2.0 technologies which supports the creation and publishing of various social media content in a collaborative and participatory way by all users in the form of user generated content and social networks has led to the creation of vast amounts of structured, semi-structured and unstructured data. The sudden rise of social media has led to their wide adoption by organisations of various sizes worldwide in order to take advantage of this new way of communication and engaging with their stakeholders in ways that was unimaginable before. Data generated from social media is highly unstructured, which makes it challenging for most organisations which are normally used for handling and analysing structured data from business transactions. The research reported in this dissertation was carried out to investigate fast and efficient methods available for retrieving, storing and analysing unstructured data form social media in order to make crucial and informed business decisions on time. Sentiment analysis was conducted on Twitter data called tweets. Twitter, which is one of the most widely adopted social network service provides an API (Application Programming Interface), for researchers and software developers to connect and collect public data sets of Twitter data from the Twitter database. A Twitter application was created and used to collect streams of real-time public data via a Twitter source provided by Apache Flume and efficiently storing this data in Hadoop File System (HDFS). Apache Flume is a distributed, reliable, and available system which is used to efficiently collect, aggregate and move large amounts of log data from many different sources to a centralized data store such as HDFS. Apache Hadoop is an open source software library that runs on low-cost commodity hardware and has the ability to store, manage and analyse large amounts of both structured and unstructured data quickly, reliably, and flexibly at low-cost. A Lexicon based sentiment analysis approach was taken and the AFINN-111 lexicon was used for scoring. The Twitter data was analysed from the HDFS using a Java MapReduce implementation. MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster. The results demonstrate that it is fast, efficient and economical to use this approach to analyse unstructured data from social media in real time.	en_ZA
dc.description.availability	Unrestricted	en_ZA
dc.description.degree	MSc	en_ZA
dc.description.department	Computer Science	en_ZA
dc.description.sponsorship	National Research Foundation (NRF) - Scarce skills	en_ZA
dc.identifier.citation	Nhlabano, VV 2018, Fast Data Analysis Methods For Social Media Data, MSc Dissertation, University of Pretoria, Pretoria, viewed yymmdd <http://hdl.handle.net/2263/72546>	en_ZA
dc.identifier.other	A2020	en_ZA
dc.identifier.uri	http://hdl.handle.net/2263/72546
dc.language.iso	en	en_ZA
dc.publisher	University of Pretoria
dc.rights	© 2019 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
dc.subject	Big Data	en_ZA
dc.subject	Machine Learning	en_ZA
dc.subject	Sentiment Analysis	en_ZA
dc.subject	Text Mining	en_ZA
dc.subject	Apache Hadoop	en_ZA
dc.subject	UCTD	en_ZA
dc.title	Fast Data Analysis Methods For Social Media Data	en_ZA
dc.type	Dissertation	en_ZA