Social Media Spammers Fake Review Detection System
Social Media Spammers Fake Review Detection System
ABSTRACT:
Nowadays, a big part of people rely on available content in social media in their decisions (e.g. reviews and feedback on a topic or product). The possibility that anybody can leave a review provide a golden opportunity for spammers to write spam reviews about products and services for different interests. Identifying these spammers and the spam content is a hot topic of research and although a considerable number of studies have been done recently toward this end, but so far the methodologies put forth still barely detect spam reviews, and none of them show the importance of each extracted feature type. In this study, we propose a novel framework, named NetSpam, which utilizes spam features for modeling review datasets as heterogeneous information networks to map spam detection procedure into a classification problem in such networks. Using the importance of spam features help us to obtain better results in terms of different metrics experimented on real-world review datasets from Yelp and Amazon websites. The results show that NetSpam outperforms the existing methods and among four categories of features; including review-behavioral, user-behavioral, review linguistic, user-linguistic, the first type of features performs better than the other categories.
PROJECT OUTPUT VIDEO:
EXISTING SYSTEM:
-
Existing system techniques can be classified into different categories; some using linguistic patterns in text which are mostly based on bigram, and unigram, others are based on behavioral patterns that rely on features extracted from patterns in users’ behavior which are mostly meta data based and even some techniques using graphs and graph-based algorithms and classifiers.
-
Existing system can be summarized into three categories: Linguistic-based Methods, Behavior-based Methods and Graph-based Methods.
-
Feng et al. use unigram, bigram and their composition. Other studies use other features like pairwise features (features between two reviews; e.g. content similarity), percentage of CAPITAL words in a reviews for finding spam reviews.
-
Lai et al. used a probabilistic language modeling to spot spam. This study demonstrates that 2% of reviews written on business websites are actually spam.
-
Deeper analysis on literature show that behavioral features work better than linguistic ones in term of accuracy they yield.
DISADVANTAGES OF EXISTING SYSTEM:
-
The fact that anyone with any identity can leave comments as review, provides a tempting opportunity for spammers to write fake reviews designed to mislead users’ opinion. These misleading reviews are then multiplied by the sharing function of social media and propagation over the web.
-
Many aspects have been missed or remained unsolved.
-
Previous works also aimed to address the importance of features mainly in term of obtained accuracy, but not as a build-in function in their framework (i.e., their approach is dependent to ground truth for determining each feature importance).
PROPOSED SYSTEM:
-
The general concept of our proposed framework is to model a given review dataset as a Heterogeneous Information Network (HIN) and to map the problem of spam detection into a HIN classification problem.
-
In particular, we model review dataset as a HIN in which reviews are connected through different node types (such as features and users). A weighting algorithm is then employed to calculate each feature’s importance (or weight). These weights are utilized to calculate the final labels for reviews using both unsupervised and supervised approaches.
-
We propose NetSpam framework that is a novel network based approach which models review networks as heterogeneous information networks. The classification step uses different metapath types which are innovative in the spam detection domain.
-
A new weighting method for spam features is proposed to determine the relative importance of each feature and shows how effective each of features are in identifying spams from normal reviews.
-
NetSpam improves the accuracy compared to the state of- the art in terms of time complexity, which highly depends to the number of features used to identify a spam review; hence, using features with more weights will resulted in detecting fake reviews easier with less time complexity.
ADVANTAGES OF PROPOSED SYSTEM:
-
Improved Accuracy
-
Easier in detecting fake reviews
-
Less time Complexity
-
As we explain in our unsupervised approach, NetSpam is able to find features importance even without ground truth, and only by relying on metapath definition and based on values calculated for each review.
-
There is no previous method which engage importance of features (known as weights in our proposed framework; NetSpam) in the classification step. By using these weights, on one hand we involve features importance in calculating final labels and hence accuracy of NetSpam increase, gradually.
-
On the other hand we can determine which feature can provide better performance in term of their involvement in connecting spam reviews (in proposed network).
MODULES:
1. Admin
2. User
3. Classification
MODULES DESCRIPTION
1. Admin
In this module, the Admin has to login by using valid user name and password. After login successful he can do some operations such as Add Product, View all product, view users, Purchase History and view the Spam Detection Details.
Works of Admin:
Upload the Products
Can View all User activities
Account activation
Block an account when review exceeds three from same ip address
2. User
In this module, there are n numbers of users are present. User should register before doing some operations. And register user details are stored in user module. After registration successful he has to login by using authorized user name and password. Login successful he will do some operations like view the all product ,buy the product, View top ranking product and insert review , view reviews, send messages, anomaly messages and followers.
Works of User:
-
Register
-
After registration they can’t login immediately,because they have to get permission from admin.
-
View the all products and purchase Product and insert review.
-
Review type (Positive,Negative,Fake)
3. Classification
The classification part of NetSpam includes two steps; (i) weight calculation which determines the importance of each spam feature in spotting spam reviews, (ii) Labeling which calculates the final probability of each review being spam. Next we describe them in detail. 1) Weight Calculation: This step computes the weight of each metapath. We assume that nodes’ classification is done based on their relations to other nodes in the review network; linked nodes may have a high probability of taking the same labels. The relations in a heterogeneous information network not only include the direct link but also the path that can be measured by using the metapath concept. Therefore, we need to utilize the metapaths defined in the previous step, which represent heterogeneous relations among nodes. Moreover, this step will be able to compute the weight of each relation path (i.e., the importance of the metapath), which will be used in the next step (Labeling) to estimate the label of each unlabeled review.
SYSTEM REQUIREMENTS:
HARDWARE REQUIREMENTS:
- System : Pentium i3 Processor
- Hard Disk : 500 GB..
- Monitor : 15’’ LED
- Input Devices : Keyboard, Mouse
- RAM : 4 GB.
SOFTWARE REQUIREMENTS:
- Operating system : Windows 10/11.
- Coding Language : C#.net.
- Frontend : ASP.Net, HTML, CSS, JavaScript.
- IDE Tool : VISUAL STUDIO.
- Database : SQL SERVER 2005.