SMS Spam Detection using Machine Learning
SMS Spam Detection using Machine Learning
IEEE BASE PAPER TITLE:
Investigating Evasive Techniques in SMS Spam Filtering: A Comparative Analysis of Machine Learning Models
IEEE BASE PAPER ABSTRACT:
The persistence of SMS spam remains a significant challenge, highlighting the need for research aimed at developing systems capable of effectively handling the evasive strategies used by spammers. Such research efforts are important for safeguarding the general public from the detrimental impact of SMS spam. In this study, we aim to highlight the challenges encountered in the current landscape of SMS spam detection and filtering. To address these challenges, we present a new SMS dataset comprising more than 68K SMS messages with 61% legitimate (ham) SMS and 39% spam messages. Notably, this dataset, we release for further research, represents the largest publicly available SMS spam dataset to date. To characterize the dataset, we perform a longitudinal analysis of spam evolution. We then extract semantic and syntactic features to evaluate and compare the performance of well-known machine learning based SMS spam detection methods, ranging from shallow machine learning approaches to advanced deep neural networks. We investigate the robustness of existing SMS spam detection models and popular anti-spam services against spammers’ evasion techniques. Our findings reveal that the majority of shallow machine learning based techniques and anti-spam services exhibit inadequate performance when it comes to accurately classifying SMS spam messages. We observe that all of the machine learning approaches and anti-spam services are susceptible to various evasive strategies employed by spammers. To address the identified limitations, our study advocates for researchers to delve into these areas to advance the field of SMS spam detection and anti-spam services.
PROJECT OUTPUT VIDEO:
ALGORITHM / MODEL USED:
SVC (Support Vector Classifier) and CatBoost.
OUR PROPOSED PROJECT ABSTRACT:
The project “SMS Spam Detection using Machine Learning” addresses the challenge of identifying spam messages in SMS communication by leveraging advanced machine learning techniques. Implemented using Python for backend processing and integrated with a user-friendly frontend crafted in HTML, CSS, and JavaScript, this web application utilizes the Flask framework to ensure a seamless and responsive user experience.
The core of this project involves the development and deployment of two distinct machine learning models to classify SMS messages as either spam or ham (non-spam). The first model employs a Support Vector Classifier (SVC), which achieved an impressive training accuracy of 99.2% and a test accuracy of 98.30%. This high level of precision underscores the model’s robustness and reliability in distinguishing between spam and legitimate messages.
In parallel, a CatBoost classifier, which is based on gradient boosting, was also developed and evaluated. This model demonstrated a training accuracy of 97.76% and a test accuracy of 97.19%, showcasing its effectiveness and efficiency in handling the classification task with a marginally lower, yet still commendable, performance compared to the SVC.
The dataset used for training and evaluation comprises 67,010 instances with two attributes, one of which is the target attribute for classification. The substantial size of the dataset ensures the models are well-trained and capable of generalizing effectively to new, unseen data.
Overall, the project exemplifies the application of cutting-edge machine learning methodologies in solving real-world problems, providing a robust tool for SMS spam detection. The integration with a modern web framework ensures accessibility and ease of use, making it a valuable resource for individuals and organizations aiming to filter and manage SMS communications effectively.
SYSTEM REQUIREMENTS:
HARDWARE REQUIREMENTS:
- System : Pentium i3 Processor.
- Hard Disk : 500 GB.
- Monitor : 15’’ LED.
- Input Devices : Keyboard, Mouse.
- Ram : 8 GB.
SOFTWARE REQUIREMENTS:
- Operating System : Windows 10 / 11.
- Coding Language : Python 3.10.9
- Web Framework : Flask.
- Frontend : HTML, CSS, JavaScript.
REFERENCE:
MUHAMMAD SALMAN , MUHAMMAD IKRAM , AND MOHAMED ALI KAAFAR, “Investigating Evasive Techniques in SMS Spam Filtering: A Comparative Analysis of Machine Learning Models”, IEEE ACCESS, VOLUME 12, 2024.