Ranking Fraud Detection in Google Play Store
Ranking Fraud Detection in Google Play Store
ABSTRACT:
Fraudulent behaviors in Google Play, the most popular Android app market, fuel search rank abuse and malware proliferation. To identify malware, previous work has focused on app executable and permission analysis. In this paper, we introduce FairPlay, a novel system that discovers and leverages traces left behind by fraudsters, to detect both malware and apps subjected to search rank fraud. FairPlay correlates review activities and uniquely combines detected review relations with linguistic and behavioral signals gleaned from Google Play app data (87K apps, 2.9M reviews, and 2.4M reviewers, collected over half a year), in order to identify suspicious apps. FairPlay achieves over 95% accuracy in classifying gold standard datasets of malware, fraudulent and legitimate apps. We show that 75% of the identified malware apps engage in search rank fraud. FairPlay discovers hundreds of fraudulent apps that currently evade Google Bouncer’s detection technology. FairPlay also helped the discovery of more than 1,000 reviews, reported for 193 apps, that reveal a new type of “coercive” review campaign: users are harassed into writing positive reviews, and install and review other apps.
PROJECT OUTPUT VIDEO:
EXISTING SYSTEM:
-
Google Play uses the Bouncer system to remove malware. However, out of the 7, 756 Google Play apps we analyzed using Virus Total, 12% (948) were flagged by at least one anti-virus tool and 2% (150) were identified as malware by at least 10 tools.
-
Sarma et al. use risk signals extracted from app permissions, e.g., rare critical permissions (RCP) and rare pairs of critical permissions (RPCP), to train SVM and inform users of the risks vs. benefits tradeoffs of apps.
-
Peng et al. propose a score to measure the risk of apps, based on probabilistic generative models such as Naive Bayes.
-
Yerima et al. also use features extracted from app permissions, API calls and commands extracted from the app executables.
DISADVANTAGES OF EXISTING SYSTEM:
-
Previous work has focused on app executable and permission analysis only.
-
Not Efficient
-
Lower percentage of detection rate
-
Takes more time.
PROPOSED SYSTEM:
-
We propose FairPlay, a system that leverages to efficiently detect Google Play fraud and malware. Our major contributions are:
-
To detect fraud and malware, we propose and generate relational, behavioral and linguistic features, that we use to train supervised learning algorithms
-
We formulate the notion of co-review graphs to model reviewing relations between users.
-
We develop PCF, an efficient algorithm to identify temporally constrained, co-review pseudo-cliques — formed by reviewers with substantially overlapping co-reviewing activities across short time windows.
-
We use temporal dimensions of review post times to identify suspicious review spikes received by apps; we show that to compensate for a negative review, for an app that has rating R, a fraudster needs to post at least positive reviews. We also identify apps with “unbalanced” review, rating and install counts, as well as apps with permission request ramps.
-
We use linguistic and behavioral information to (i) detect genuine reviews from which we then (ii) extract user-identified fraud and malware indicators.
ADVANTAGES OF PROPOSED SYSTEM:
-
We build this work on the observation that fraudulent and malicious behaviors leave behind telltale signs on app markets.
-
FairPlay achieves over 97% accuracy in classifying fraudulent and benign apps, and over 95% accuracy in classifying malware and benign apps.
-
FairPlay significantly outperforms the malware indicators of Sarma et al. Furthermore, we show that malware often engages in search rank fraud as well: When trained on fraudulent and benign apps, FairPlay flagged as fraudulent more than 75% of the gold standard malware apps
-
FairPlay discovers hundreds of fraudulent apps.
-
FairPlay also enabled us to discover a novel, coercive review campaign attack type, where app users are harassed into writing a positive review for the app, and install and review other apps
MODULES:
- System model
- Adversarial model
- The Co-Review Graph (CoReG) Module
- Reviewer Feedback (RF) Module
MODULES DESCSRIPTION:
System model:
In the first module, of the project we develop the System environment model to evaluate the performance of the our system for Search Rank Fraud. We focus on the Android appmarket ecosystem of Google Play. The participants, consisting of users and developers, ha e Google accounts. Developers create and upload apps, that consist of executables (i.e., “apks”), a set of required permissions, and a description. The app market publishes this information, along with the app’s received reviews, ratings, aggregate rating (over both reviews and ratings), install count range, size, version number, price, time of last update, and a list of “similar” apps. Each review consists of a star rating ranging between 1-5 stars, and some text. The text is optional and consists of a title and a description. Google Play limits the number of reviews displayed for an app. In this module, we illustrate the participants in Google Play and their relations.
Adversarial model:
In the second module, we develop the Adversarial model for considering the malicious users. We consider not only malicious developers, who upload malware, but also rational fraudulent developers. Fraudulent developers attempt to tamper with the search rank of their apps, e.g., by recruiting fraud experts in crowdsourcing sites to write reviews, post ratings, and create bogus installs. While Google keeps secret the criteria used to rank apps, the reviews, ratings and install counts are known to play a fundamental part.
To review or rate an app, a user needs to have a Google account, register a mobile device with that account, and install the app on the device. This process complicates the job of fraudsters, who are thus more likely to reuse accounts across jobs. The reason for search rank fraud attacks is impact. Apps that rank higher in search results, tend to receive more installs. This is beneficial both for fraudulent developers, who increase their revenue, and malicious developers, who increase the impact of their malware.
The Co-Review Graph (CoReG) Module
This module exploits the observation that fraudsters who control many accounts will re-use them across multiple jobs. Its goal is then to detect sub-sets of an app’s reviewers that have performed significant common review activities in the past. In the following, we describe the co-review graph concept, formally present the weighted maximal clique enumeration problem, then introduce an efficient heuristic that leverages natural limitations in the behaviors of fraudsters. Let the co-review graph of an app, be a graph where nodes correspond to user accounts who reviewed the app, and undirected edges have a weight that indicates the number of apps reviewed in common by the edge’s endpoint users. The co-review graph concept naturally identifies user accounts with significant past review activities.
Reviewer Feedback (RF) Module
Reviews written by genuine users of malware and fraudulent apps may describe negative experiences. The RF module exploits this observation through a two step approach: (i) detect and filter out fraudulent reviews, then (ii) identify malware and fraud indicative feedback from the remaining reviews.
SYSTEM REQUIREMENTS:
HARDWARE REQUIREMENTS:
- System : Pentium i3 Processor
- Hard Disk : 500 GB..
- Monitor : 15’’ LED
- Input Devices : Keyboard, Mouse
- RAM : 4 GB.
SOFTWARE REQUIREMENTS:
- Operating system : Windows 10/11.
- Coding Language : C#.net.
- Frontend : ASP.Net, HTML, CSS, JavaScript.
- IDE Tool : VISUAL STUDIO.
- Database : SQL SERVER 2005.