
Advanced Fake Product Review Detection using Machine Learning
Advanced Fake Product Review Detection using Machine Learning
IEEE BASE PAPER TITLE:
Enhancing Fake Review Detection Using Linguistic Exaggeration, BERT Embeddings, and Fuzzy Logic
IEEE BASE PAPER ABSTRACT:
The rapid proliferation of online reviews has made them a crucial factor in consumer decision-making. However, the presence of fake reviews threatens the credibility of review platforms, which requires advanced detection mechanisms. The core objective of this work is to develop a hybrid model that combines interpretable handcrafted linguistic cues with deep semantic features for more accurate, robust, and accurate fake review detection. In this study, we propose a novel hybrid approach to fake review detection that integrates fuzzy logic, BERT embeddings, attention-based feature fusion, and an enhanced multi-centroid K-means classifier. Our method first preprocesses reviews and extracts key linguistic features based on extravagant words using a fuzzy logic-based membership function that highlights extreme linguistic patterns. Simultaneously, BERT embeddings capture the deep semantic meaning of the text. These features are then combined using a softmax-based attention mechanism to optimize the contribution of each component. To classify the reviews, we introduce a personalized multicentroid K-means algorithm, which improves traditional K-means by allowing a more precise clustering of false and genuine reviews. We evaluated our approach against traditional classifiers such as Support Vector Machines (SVM), Logistic Regression. Our final model demonstrated statistically significant improvements across key metrics, supported by cross-validation and hypothesis testing (p < 0.01). The confidence intervals were calculated to assess the reliability of the observed performance gains. To validate the effectiveness of our proposed method, we conducted extensive evaluations on two real-world datasets. The results confirm that our approach effectively balances semantic understanding, linguistic heuristics, and clustering efficiency, leading to enhanced detection performance. This research provides a scalable and interpretable solution to the problem of detecting fake reviews, with potential applications in e-commerce platforms, online marketplaces, and digital content verification systems.
PROJECT OUTPUT VIDEO:
ALGORITHM / MODEL USED:
Decision Tree Classifier, AdaBoost Classifier.
OUR PROPOSED PROJECT ABSTRACT:
The rapid growth of e-commerce platforms has led to an overwhelming increase in user-generated product reviews, making it challenging for consumers to distinguish between genuine opinions and deceptive, computer-generated content. Fake reviews significantly influence purchasing decisions, brand reputation, and customer trust, creating a strong need for intelligent and automated detection mechanisms. To address this issue, this project focuses on the development of an Advanced Fake Product Review Detection using Machine Learning system that accurately identifies fraudulent reviews and supports informed decision-making.
The proposed system is developed using Python for backend processing, HTML, CSS, and JavaScript for the frontend, and the Flask web framework to integrate machine learning models with a user-friendly web interface. Two machine learning models: Decision Tree Classifier and AdaBoost Classifier are implemented and evaluated independently to analyze review authenticity.
The dataset used in this project consists of 40,432 records, including features such as category, rating, review text, label, binary label, vocabulary richness, average word length, sentence count, and text length. Reviews are classified into CG (Computer Generated) or OR (Original) categories based on linguistic and statistical characteristics.
The system operates in two distinct modes. In Manual Prediction mode, users can enter review text directly to receive instant classification results, with an additional option to export the prediction report in PDF format. In Live Prediction mode, users can paste product URLs from popular e-commerce platforms such as Amazon, Walmart, and Flipkart, enabling the system to extract real-time reviews automatically, analyze multiple reviews in bulk, and generate authenticity predictions using the trained machine learning models.
Experimental results demonstrate strong model performance. The Decision Tree Classifier achieved a training accuracy of 100% and a test accuracy of 97%, while the AdaBoost Classifier achieved a training accuracy of 98% and a test accuracy of 98%. The system also provides detailed performance evaluation metrics, including accuracy, precision, recall, and confusion matrix, offering a comprehensive analysis of model effectiveness. Overall, the developed system presents a robust, scalable, and practical solution for detecting fake product reviews in real-world e-commerce environments.
SYSTEM REQUIREMENTS:
HARDWARE REQUIREMENTS:
- System : Pentium i3 Processor.
- Hard Disk : 20 GB.
- Monitor : 15’’ LED.
- Input Devices : Keyboard, Mouse.
- Ram : 8 GB.
SOFTWARE REQUIREMENTS:
- Operating System : Windows 10 / 11.
- Coding Language : Python 3.12.0.
- Web Framework : Flask.
- Frontend : HTML, CSS, JavaScript.
REFERENCE:
MOHAMMED ENNAOURI AND AHMED ZELLOU, “Enhancing Fake Review Detection Using Linguistic Exaggeration, BERT Embeddings, and Fuzzy Logic”, IEEE ACCESS, VOLUME 13, 2025.
👉CLICK HERE TO BUY THIS PROJECT “Advanced Fake Product Review Detection using Machine Learning” SOURCE CODE👈
Frequently Asked Questions (FAQ’s) and Answers
The purpose of this project is to automatically detect fake product reviews on e-commerce platforms by analyzing review text using machine learning techniques. It helps distinguish between computer-generated and original reviews to improve trust and transparency for users.
The project is developed using Python for backend processing, Flask as the web framework, HTML, CSS, and JavaScript for the frontend, and machine learning models implemented using Scikit-learn.
Two supervised machine learning algorithms are used: • Decision Tree Classifier • AdaBoost Classifier Both models are trained and evaluated independently to analyze review authenticity.
The system is trained on a dataset containing 40,432 product reviews with features such as category, rating, review text, vocabulary richness, average word length, sentence count, and text length. Reviews are labeled as CG (Computer Generated) or OR (Original).
The system supports two modes: • Manual Prediction: Users enter review text manually to get instant prediction results. • Live Prediction: Users paste product URLs from platforms like Amazon, Flipkart, or Walmart to extract and analyze multiple reviews automatically.
In live prediction mode, the system extracts real-time reviews from the provided product URL, preprocesses the extracted text, applies the trained machine learning models, and displays authenticity predictions for each review.
Yes. In manual prediction mode, users have the option to export the prediction results as a PDF file for documentation or reporting purposes.
The system uses standard evaluation metrics such as accuracy, precision, recall, and confusion matrix to measure the effectiveness and reliability of the machine learning models.
Yes. The system is designed as a web-based application with live data extraction, automated analysis, and user-friendly interfaces, making it suitable for real-world e-commerce review monitoring scenarios.
Yes. The system is capable of analyzing multiple reviews simultaneously, especially in live prediction mode, making it suitable for bulk review analysis.
No. The system is designed for ease of use. Users can analyze reviews by simply entering text or pasting a product URL without any technical expertise.
This system can be useful for: • Online shoppers • E-commerce platform administrators • Researchers and students • Businesses and product analysts
Currently, the system supports popular platforms such as Amazon, Flipkart, and Walmart, but it can be extended to support additional platforms in the future.
No sensitive personal user data is stored. The system processes only review text for analysis, ensuring basic privacy and ethical compliance.
The project combines machine learning, web application development, real-time data processing, and performance evaluation, making it a comprehensive and industry-relevant final-year project. 1. What is the purpose of this project?
2. What technologies are used in this project?
3. Which machine learning algorithms are used?
4. What type of data is used for training the models?
5. What are the different modes available in the system?
6. How does live review prediction work?
7. Can the prediction results be saved or downloaded?
8. How is the performance of the system evaluated?
9. Is the system suitable for real-world deployment?
10. Can this system handle large numbers of reviews?
11. Does the system require technical knowledge to use?
12. What type of users can benefit from this project?
13. Does the system work only for specific e-commerce platforms?
14. Is user data stored in the system?
15. What makes this project suitable as a final-year academic project?



