Fast Data Analysis Methods For Social Media Data

Show simple item record

dc.contributor.advisor Lutu, Patricia Elizabeth Nalwoga
dc.contributor.postgraduate Nhlabano, Valentine Velaphi
dc.date.accessioned 2019-12-09T08:55:16Z
dc.date.available 2019-12-09T08:55:16Z
dc.date.created 2019-12-15
dc.date.issued 2018-08-07
dc.description Dissertation (MSc)--University of Pretoria, 2019. en_ZA
dc.description.abstract The advent of Web 2.0 technologies which supports the creation and publishing of various social media content in a collaborative and participatory way by all users in the form of user generated content and social networks has led to the creation of vast amounts of structured, semi-structured and unstructured data. The sudden rise of social media has led to their wide adoption by organisations of various sizes worldwide in order to take advantage of this new way of communication and engaging with their stakeholders in ways that was unimaginable before. Data generated from social media is highly unstructured, which makes it challenging for most organisations which are normally used for handling and analysing structured data from business transactions. The research reported in this dissertation was carried out to investigate fast and efficient methods available for retrieving, storing and analysing unstructured data form social media in order to make crucial and informed business decisions on time. Sentiment analysis was conducted on Twitter data called tweets. Twitter, which is one of the most widely adopted social network service provides an API (Application Programming Interface), for researchers and software developers to connect and collect public data sets of Twitter data from the Twitter database. A Twitter application was created and used to collect streams of real-time public data via a Twitter source provided by Apache Flume and efficiently storing this data in Hadoop File System (HDFS). Apache Flume is a distributed, reliable, and available system which is used to efficiently collect, aggregate and move large amounts of log data from many different sources to a centralized data store such as HDFS. Apache Hadoop is an open source software library that runs on low-cost commodity hardware and has the ability to store, manage and analyse large amounts of both structured and unstructured data quickly, reliably, and flexibly at low-cost. A Lexicon based sentiment analysis approach was taken and the AFINN-111 lexicon was used for scoring. The Twitter data was analysed from the HDFS using a Java MapReduce implementation. MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster. The results demonstrate that it is fast, efficient and economical to use this approach to analyse unstructured data from social media in real time. en_ZA
dc.description.availability Unrestricted en_ZA
dc.description.degree MSc en_ZA
dc.description.department Computer Science en_ZA
dc.description.sponsorship National Research Foundation (NRF) - Scarce skills en_ZA
dc.identifier.citation Nhlabano, VV 2018, Fast Data Analysis Methods For Social Media Data, MSc Dissertation, University of Pretoria, Pretoria, viewed yymmdd <http://hdl.handle.net/2263/72546> en_ZA
dc.identifier.other A2020 en_ZA
dc.identifier.uri http://hdl.handle.net/2263/72546
dc.language.iso en en_ZA
dc.publisher University of Pretoria
dc.rights © 2019 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
dc.subject Big Data en_ZA
dc.subject Machine Learning en_ZA
dc.subject Sentiment Analysis en_ZA
dc.subject Text Mining en_ZA
dc.subject Apache Hadoop en_ZA
dc.subject UCTD en_ZA
dc.title Fast Data Analysis Methods For Social Media Data en_ZA
dc.type Dissertation en_ZA


Files in this item

This item appears in the following Collection(s)

Show simple item record