News classification and categorization with smart function sentiment analysis

dc.contributor.authorNkongolo, Mike Nkongolo Wa
dc.contributor.emailmike.wankongolo@up.ac.zaen_US
dc.date.accessioned2024-01-11T10:30:27Z
dc.date.available2024-01-11T10:30:27Z
dc.date.issued2023-11
dc.descriptionDATA AVAILABILITY : The data supporting the current study are available from the corresponding author upon request.en_US
dc.description.abstractSearch engines are tools used to find information on the Internet. Since the web has a plethora of websites, the engine queries the majority of active sites and builds a database organized according to keywords utilized in the search. Because of this, when a user types a few descriptive words on the home page of the search engine, the search function lists websites corresponding to these keywords. However, there are some problems with this search approach. For instance, if a user wants information about the word Jaguar, most search results are animals and cars. This is a polysemic problem that forces search engines to always provide the most popular but not the most relevant results. This article presents a study of using sentiment technology to help news classification and categorization and improve the classification accuracy. We have introduced a smart search function embedded into a search engine to tackle polysemic issues and record relevant results to determine their sentimentality. Therefore, this study presents a topic that involves several aspects of natural language processing (NLP) and sentiment analysis for news categorization and classification. A web crawler was used to collect British Broadcasting Corporation (BBC) news across the Internet, carried out preprocessing of text by using NLP, and applied sentiment analysis methods to determine the polarity of the processed text data. The sentimentality represents negative, positive, or neutral polarities assigned by the sentiment analysis algorithms. The research utilized the BBC news site to collect different information using a web crawler and a database to explore the sentimentality of BBC news. The natural language toolkit (NLTK) and BM25 indexed and preprocessed patterns in the database. The experimental results depict the proposed search function surpassing normal search with an accuracy rate of 85%. Moreover, the results show a negative polarity of BBC news using the Sentistrength algorithm. Furthermore, the Valence Aware Dictionary and sEntiment Reasoner (VADER) was the best-performing sentiment analysis model for news classification. This model obtained an accuracy of 85% using data collected with the proposed smart function.en_US
dc.description.departmentInformaticsen_US
dc.description.librarianhj2023en_US
dc.description.sdgNoneen_US
dc.description.sponsorshipThe University of Pretoria’s Faculty of Engineering, Built Environment, and Information Technology through the Doctorate University Capacity Development Program (UCDP). Open access funding was enabled and organized by SANLiC Gold.en_US
dc.description.urihttps://www.hindawi.com/journals/ijisen_US
dc.identifier.citationNkongolo, M.N.W. 2023, 'News classification and categorization with smart function sentiment analysis', International Journal of Intelligent Systems, vol. 2023, art. 1784394 , pp. 1-24, doi : 10.1155/2023/1784394.en_US
dc.identifier.issn0884-8173 (print)
dc.identifier.issn1098-111X (online)
dc.identifier.other10.1155/2023/1784394
dc.identifier.urihttp://hdl.handle.net/2263/93927
dc.language.isoenen_US
dc.publisherHindawien_US
dc.rights© 2023 Mike Nkongolo Wa Nkongolo. Tis is an open access article distributed under the Creative Commons Attribution License.en_US
dc.subjectValence aware dictionary and sentiment reasoner (VADER)en_US
dc.subjectNatural language processingen_US
dc.subjectSentiment analysisen_US
dc.subjectNews categorization and classificationen_US
dc.titleNews classification and categorization with smart function sentiment analysisen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Nkongolo_News_2023.pdf
Size:
3.92 MB
Format:
Adobe Portable Document Format
Description:
Article

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: