Toxic Speech Classification using Machine Learning Algorithms

IEEE BASE PAPER ABSTRACT:

In today’s era of online social media platforms, there has been a massive surge in the propagation of toxic content speech. They provide many betterments. However, persons with considerable differences in their viewpoints have contributed to an increase in lethality of people in internet posts and debates. With the outbreak of the pandemic, corporations, educational institutions, students, and the general public have all increased their usage in web sites. For a long time, the growing popularity of internet platforms like Twitter and Facebook has been a major cause of anxiety. These platforms not only allow for improved communication, but they also allow the users to express their thoughts, which are quickly shared with the rest of the world. Furthermore, given the diversity of these platforms’ users’ histories, beliefs, race, and customs, many of them choose to use disparaging, abusive, and antagonistic language while interacting with those who do not share their background. This online toxicity has been increasing exponentially by advancements provided by these social media platforms in this emerging world under the cloud of anonymity. Unlike manually, this problem can be solved using Machine Learning. Phrases like “Obscene”, “Toxic”, “Severe Toxic”, “Threat”, “Insult”, “Identity Hate’’ are used mutually and hence have been incorporated under “Toxic” speech content. As a result, it is vital to recognise and eliminate toxic speech from internet – based social media networks naturally. The numerous varieties of Machine Learning approaches, such as traditional Machine Learning, ensemble approach are explored in this paper. We use a corpus collected from online platform twitter to do binary and multi-class classification and investigate two techniques.: (a) a method which consists in extracting of word embeddings and then generating the model; (b)Improving the existing models- RF, DT, VC, LR, KNN. Any other sort of social media comment can be analyzed using the proposed methods. By this, we developed a model that can classify given comments into different categories of toxicity with greater precision, recall, and accuracy score.

PROJECT OUTPUT VIDEO:

ALGORITHM / MODEL USED:

Random Forest Classifier.

OUR PROPOSED ABSTRACT:

The internet and social media are now fundamental components of how people disseminate and receive information. Social media has changed considerably over time, and now roughly half of people use it to communicate their ideas and thoughts. The way that individuals communicate has changed substantially over the past ten years, partly as a result of social media’s pervasive rise. It has made it possible for a world that is more connected and informed, but it has also given way to a brand-new phenomenon: poisonous speech.

Due to the availability of an open platform for the creation, discussion, and sharing of content, some quite opportunistic people have taken part in poisonous speech and generally unfavourable comments. This pattern is what inspired our project.

In order to effectively identify the presence of toxic speech in comments and texts, we intend to construct a high accuracy classifier on toxic speech utilizing Random Forest Classifier algorithm. Since words like “Obscene,” “Toxic,” “Severe Toxic,” “Threat,” “Insult,” and “Identity Hate” are frequently used together, they have been included under the category of “Toxic” speech material. It is crucial to identify and organically remove poisonous speech from internet-based social media networks as a result. As a result, we created a model that has improved precision, recall, and accuracy scores when it comes to categorizing supplied comments into various levels of toxicity.

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

System : Pentium i3 Processor.
Hard Disk : 500 GB.
Monitor : 15’’ LED
Input Devices : Keyboard, Mouse
Ram : 4 GB

SOFTWARE REQUIREMENTS:

Operating System : Windows 10 / 11.
Coding Language : Python 3.8.
Web Framework : Flask.
Frontend : HTML, CSS, JavaScript.

REFERENCE:

Pabba Sumanth; Syed Samiuddin; K. Jamal; Srikanth Domakonda; Pathi Shivani, “Toxic Speech Classification using Machine Learning Algorithms”, 2022 International Conference on Electronic Systems and Intelligent Computing (ICESIC), IEEE Conference, 2022.

Python IEEE Projects

Toxic Speech Classification using Machine Learning Algorithms