Detection of Fraud Scam Calls Using Machine Learning

IEEE BASE PAPER TITLE:

Classifying Scam Calls Through Content Analysis With Dynamic Sparsity Top-k Attention Regularization

IEEE BASE PAPER ABSTRACT:

The rise of scam calls in recent years necessitated effective countermeasures against these fraudulent activities, which cause financial losses and threaten personal security. Although previous research utilizing traditional machine learning techniques has fallen short in today’s technological landscape, this study introduces a novel approach for recognizing scam calls by analyzing their content. By leveraging natural language processing techniques and deep learning methodologies, we propose the D-STAR (Dynamic Sparse Attention with Top-k Regularization) model, a transformer-based architecture designed to enhance scam call content detection. Unlike conventional models, D-STAR integrates Dynamic Sparse Attention (DSA), Top-k selection, and sparsity regularization, optimizing computational efficiency while preserving key scam-related contextual information. Our data set consists of 400 scam and 400 non-scam call transcripts, collected from publicly available sources such as social media, news reports, and discussion forums. To ensure dataset diversity, ChatGPT was utilized only to augment real scam scenarios across different contexts while preserving their core fraudulent structures. The model was evaluated using various hyperparameter configurations and managed to achieve an accuracy of 94%, a recall of 91.67%, and an F1-score of 84.98% in classifying scam call contents, outperforming state-of-the-art baselines such as CNN, LSTM, Decision Tree, Random Forest, and SVM in the scam call detection domain. A knowledge graph-based preprocessing technique was also introduced to enrich scam-related contextual understanding. The proposed approach demonstrates its effectiveness in enhancing scam call classification while maintaining computational efficiency. Future work will focus on real-world validation with telecom providers and further optimizations for real-time deployment.

PROJECT OUTPUT VIDEO:

ALGORITHM / MODEL USED:

Stacking Classifier.

OUR PROPOSED PROJECT ABSTRACT:

The rapid increase in fraudulent scam calls has emerged as a serious threat to individuals and organizations, leading to financial loss, privacy breaches, and erosion of trust in communication systems. With scammers increasingly using persuasive language and social engineering techniques, there is a growing need for intelligent systems that can automatically identify and classify scam calls with high accuracy. This project focuses on the detection of fraud scam calls using machine learning technique by analyzing call transcripts and voice data to distinguish between scam and legitimate communications.

To address this need, a machine learning–based detection system is developed using Python as the core programming language, with HTML, CSS, and JavaScript for the front end and Flask as the web framework. The system is trained and evaluated on a balanced dataset consisting of 800 call transcripts, including 400 scam calls and 400 non-scam calls. Textual features are extracted using standard text processing and vectorization techniques, and the classification task is performed using an ensemble-based Stacking Classifier, which combines the strengths of multiple base learners to improve predictive performance. The proposed model achieved a test accuracy of 98.75%, demonstrating its effectiveness in accurately identifying scam calls.

The developed system operates in two different modes to enhance usability and real-world applicability. In the Text Mode, users can directly paste the call transcript into the system, which then analyzes the content and classifies the call as scam or non-scam. In the Voice Mode, users can upload recorded voice calls in .wav format, which are processed and analyzed to determine the nature of the call. In addition to prediction results, the system provides performance analysis and graphical visualizations to help users understand model behavior and effectiveness.

Overall, this project presents a reliable and user-friendly solution for fraud scam call detection, offering significant support in mitigating communication-based fraud through machine learning.

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

System : Pentium i3 Processor.
Hard Disk : 20 GB.
Monitor : 15’’ LED.
Input Devices : Keyboard, Mouse.
Ram : 8 GB.

SOFTWARE REQUIREMENTS:

Operating System : Windows 10 / 11.
Coding Language : Python 3.12.0.
Web Framework : Flask.
Frontend : HTML, CSS, JavaScript.

REFERENCE:

Brendan Hong Jun Zhi, Tee Connie, Thian Song Ong, Andrew Beng Jin Teoh, “Classifying Scam Calls Through Content Analysis With Dynamic Sparsity Top-k Attention Regularization”, IEEE ACCESS, VOLUME 13, 2025.

👉CLICK HERE TO BUY THIS PROJECT “Detection of Fraud Scam Calls Using Machine Learning” SOURCE CODE👈

Frequently Asked Questions (FAQ’s) and Answers

Q1. What is the objective of this project?

The main objective of this project is to automatically detect and classify fraud scam calls as scam or non-scam using machine learning techniques by analyzing call transcripts and voice call recordings.

Q2. What problem does this system address?

The system addresses the growing issue of scam calls that use deceptive language and social engineering techniques. Traditional methods based on caller ID or phone numbers are ineffective, so this system focuses on content-based analysis to identify fraudulent calls.

Q3. What technologies are used in this project?

The project is developed using Python for backend processing, Flask as the web framework, and HTML, CSS, and JavaScript for the frontend. Machine learning models are implemented using standard Python libraries.

Q4. What type of dataset is used?

The system uses a balanced dataset consisting of 400 scam and 400 non-scam call transcripts, ensuring unbiased learning and fair classification.

Q5. Which machine learning model is used?

The project uses an ensemble-based Stacking Classifier. It combines multiple base learners: Logistic Regression, SGD Classifier, Random Forest, and Gradient Boosting with Logistic Regression as the meta learner.

Q6. How are textual features extracted from call transcripts?

Textual features are extracted using TF-IDF (Term Frequency–Inverse Document Frequency) vectorization, which converts call transcripts into numerical representations based on word importance.

Q7. How does the system handle class imbalance?

The system uses SMOTE (Synthetic Minority Oversampling Technique) during training to handle class imbalance and ensure balanced learning between scam and non-scam calls.

Q8. What are the different input modes supported by the system?

The system supports two input modes: • Text Mode: Users paste call transcripts directly. • Voice Mode: Users upload recorded call audio files in .wav format, which are converted into text before classification.

Q9. What accuracy does the system achieve?

The system achieves a test accuracy of 98.75%, demonstrating high reliability in detecting fraud scam calls.

Q10. Is any personal data stored by the system?

No. The system does not store personal or sensitive information. Uploaded audio files and transcripts are processed securely and used only for classification.

Q11. What makes this project unique?

The project uniquely combines ensemble machine learning, dual-mode input (text and voice), and a web-based interface to provide an accurate and practical solution for detecting fraud scam calls.

Python IEEE Projects

Detection of Fraud Scam Calls Using Machine Learning