Fast data analysis methods for social media data

dc.contributor.advisorLutu, Patricia Elizabeth Nalwoga
dc.contributor.emailvalezw@gmail.comen_ZA
dc.contributor.postgraduateNhlabano, Valentine Velaphi
dc.date.accessioned2019-12-09T08:55:16Z
dc.date.available2019-12-09T08:55:16Z
dc.date.created2019-12-15
dc.date.issued2018-08-07
dc.descriptionDissertation (MSc)--University of Pretoria, 2019.en_ZA
dc.description.abstractThe advent of Web 2.0 technologies which supports the creation and publishing of various social media content in a collaborative and participatory way by all users in the form of user generated content and social networks has led to the creation of vast amounts of structured, semi-structured and unstructured data. The sudden rise of social media has led to their wide adoption by organisations of various sizes worldwide in order to take advantage of this new way of communication and engaging with their stakeholders in ways that was unimaginable before. Data generated from social media is highly unstructured, which makes it challenging for most organisations which are normally used for handling and analysing structured data from business transactions. The research reported in this dissertation was carried out to investigate fast and efficient methods available for retrieving, storing and analysing unstructured data form social media in order to make crucial and informed business decisions on time. Sentiment analysis was conducted on Twitter data called tweets. Twitter, which is one of the most widely adopted social network service provides an API (Application Programming Interface), for researchers and software developers to connect and collect public data sets of Twitter data from the Twitter database. A Twitter application was created and used to collect streams of real-time public data via a Twitter source provided by Apache Flume and efficiently storing this data in Hadoop File System (HDFS). Apache Flume is a distributed, reliable, and available system which is used to efficiently collect, aggregate and move large amounts of log data from many different sources to a centralized data store such as HDFS. Apache Hadoop is an open source software library that runs on low-cost commodity hardware and has the ability to store, manage and analyse large amounts of both structured and unstructured data quickly, reliably, and flexibly at low-cost. A Lexicon based sentiment analysis approach was taken and the AFINN-111 lexicon was used for scoring. The Twitter data was analysed from the HDFS using a Java MapReduce implementation. MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster. The results demonstrate that it is fast, efficient and economical to use this approach to analyse unstructured data from social media in real time.en_ZA
dc.description.availabilityUnrestricteden_ZA
dc.description.degreeMScen_ZA
dc.description.departmentComputer Scienceen_ZA
dc.description.sponsorshipNational Research Foundation (NRF) - Scarce skillsen_ZA
dc.identifier.citationNhlabano, VV 2018, Fast Data Analysis Methods For Social Media Data, MSc Dissertation, University of Pretoria, Pretoria, viewed yymmdd <http://hdl.handle.net/2263/72546>en_ZA
dc.identifier.otherA2020en_ZA
dc.identifier.urihttp://hdl.handle.net/2263/72546
dc.language.isoenen_ZA
dc.publisherUniversity of Pretoria
dc.rights© 2019 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
dc.subjectBig dataen_ZA
dc.subjectMachine learningen_ZA
dc.subjectSentiment analysisen_ZA
dc.subjectText miningen_ZA
dc.subjectApache Hadoopen_ZA
dc.subjectUCTDen_ZA
dc.titleFast data analysis methods for social media dataen_ZA
dc.typeDissertationen_ZA

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Nhlabano_Fast_2019.pdf
Size:
5.74 MB
Format:
Adobe Portable Document Format
Description:
Dissertation

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.75 KB
Format:
Item-specific license agreed upon to submission
Description: