
Advanced Fake Job Post Prediction Using Machine Learning for Online Recruitment Scam Detection
Advanced Fake Job Post Prediction Using Machine Learning for Online Recruitment Scam Detection
IEEE BASE PAPER TITLE:
Detection of Fake Online Recruitment using Machine Learning Approach
IEEE BASE PAPER ABSTRACT:
While many organizations these days prefer to post their job opportunities on the web so that job seekers can access them conveniently and easily, this practice might be an example of scam from the side of swindlers who offer job hunters tasks and services in exchange for money. Many people fall victims to this type of fraud and lose a considerable amount of money as a result. The proposed approach uses a variety of machine learning algorithms inclusive of supervised learning tools and natural language processing methods to analyze and sort job advertisements. By using both single classifiers and ensemble classifiers, the system assesses results and compares them, thus recognizing fraudulent job advertisements on the Internet. Model performance will be evaluated using metrics like accuracy, precision, recall, and F1-score. This study aims to demonstrate the potential of boosting techniques for achieving high accuracy in fake job posts prediction, potentially leading to improved outcomes Therefore, the value of the research in helping to create a more secure online job market can serve to establish a level of trust for job seekers and provides them with protection from the financial and emotional risks related to the misuse of deceptive job postings.
PROJECT OUTPUT VIDEO:
ALGORITHM / MODEL USED:
MLP Classifier, Passive Aggressive Classifier, Gradient Boosting Classifier, K-Neighbors Classifier.
OUR PROPOSED PROJECT ABSTRACT:
The widespread growth of online recruitment portals has inadvertently contributed to a significant increase in fraudulent job postings, leading to financial losses and emotional setbacks for unsuspecting job seekers. To tackle this pressing issue, the project titled “Advanced Fake Job Post Prediction Using Machine Learning for Online Recruitment Scam Detection” presents a robust and intelligent web-based solution. Developed using Python with a front-end powered by HTML, CSS, and JavaScript under the Flask framework, this system integrates multiple machine learning algorithms to accurately detect fraudulent job listings.
The core of the system relies on four classification models:
- MLP Classifier (Train Accuracy: 99%, Test Accuracy: 98%),
- Passive Aggressive Classifier (Train Accuracy: 98%, Test Accuracy: 97%),
- Gradient Boosting Classifier (Train Accuracy: 92%, Test Accuracy: 91%), and
- K-Neighbors Classifier (Overall Accuracy: 96%).
These models are trained using the Employment Scam Aegean Dataset (EMSCAD), which comprises 17,880 job postings and 18 detailed attributes such as title, location, company profile, job description, employment type, and fraud status.
Preprocessing steps include data loading and handling missing values to ensure quality input. For prediction, the system selectively utilizes key features: Description, Telecommuting, Has Company Logo, Has Questions, Employment Type, Required Experience, Required Education, and Function, to deliver accurate classification results. Users can choose a preferred model (MLP, Passive Aggressive, or Gradient Boosting) for prediction and conveniently export the result in PDF format for documentation.
A standout feature of this system is live job post analysis from the Internshala platform. Users can input any Internshala job URL, from which the system extracts the job description and predicts its legitimacy using the K-Neighbors Classifier, thus enabling real-time detection of suspicious listings.
To validate model performance, the system provides comprehensive evaluation metrics including Precision, Recall, F1-Score, Confusion Matrix, and static charts for visual analysis. This advanced and practical solution not only ensures safer online job browsing but also empowers users with insights to avoid falling prey to employment scams.
SYSTEM REQUIREMENTS:
HARDWARE REQUIREMENTS:
- System : Pentium i3 Processor.
- Hard Disk : 20 GB.
- Monitor : 15’’ LED.
- Input Devices : Keyboard, Mouse.
- Ram : 8 GB.
SOFTWARE REQUIREMENTS:
- Operating System : Windows 10 / 11.
- Coding Language : Python 3.12.0.
- Web Framework : Flask.
- Frontend : HTML, CSS, JavaScript.
REFERENCE:
Jayanth Medapati; Yashaswi Arradi; Ronan Kongala; Shanmugasundaram Hariharan; J. Shanmugapriyan; Karuppiah Natarajan, “Detection of Fake Online Recruitment using Machine Learning Approach”, 2025 Third International Conference on Augmented Intelligence and Sustainable Systems (ICAISS), IEEE XPLORE, 2025.
👉CLICK HERE TO BUY THIS PROJECT “Advanced Fake Job Post Prediction Using Machine Learning for Online Recruitment Scam Detection” SOURCE CODE👈
1. What is the main objective of this project?
The primary objective of this project is to build a web-based system that uses machine learning algorithms to detect whether a job posting is legitimate or fraudulent, helping job seekers avoid online recruitment scams.
2. Which machine learning models are used in the project?
The system uses the following models: • MLP Classifier – Train Accuracy: 99%, Test Accuracy: 98% • Passive Aggressive Classifier – Train Accuracy: 98%, Test Accuracy: 97% • Gradient Boosting Classifier – Train Accuracy: 92%, Test Accuracy: 91% • K-Nearest Neighbors Classifier (KNN) – Overall Accuracy: 96% (used for live prediction)
3. What dataset is used for training the models?
The models are trained on the Employment Scam Aegean Dataset (EMSCAD), which contains 17,880 job postings and 18 features, including job title, location, salary range, company profile, description, employment type, and fraud label.
4. How does the system make predictions?
Users input relevant job details with the given specifications. The system processes the description and selected attributes and predicts whether the job post is “Legitimate” or “Fraudulent” using the chosen model.
5. What is the purpose of the live prediction feature?
The live prediction feature allows users to paste a job URL from Internshala. The system automatically extracts the job description and uses the K-Nearest Neighbors model to determine its legitimacy.
6. Can users choose which model to use for prediction?
Yes, users can manually select one of the three models—MLP, Passive Aggressive, or Gradient Boosting—based on their preference. The selected model is used for that prediction session.
7. What are the preprocessing steps applied to the dataset?
The pre-processing includes: • Loading the dataset • Handling missing values • Selecting specific attributes for training and prediction • Converting textual features into a format suitable for model training
8. What features are used for prediction in the user input?
Only essential features are considered for prediction: • Description • Telecommuting • Has Company Logo • Has Questions • Employment Type • Required Experience • Required Education • Function
9. Can the prediction results be saved?
Yes, users can export the prediction result to a PDF file directly from the results page for reference and record-keeping.
10. Are there any visual analytics provided in the system?
Yes, the system displays: • Static pie charts for fraud vs. legitimate job distributions • Confusion matrices • Precision, Recall, F1-score values of each model These help users evaluate the model's performance.
11. Why do all real Internshala links show “Legitimate”?
Internshala is a reputed job platform with strong content moderation. Its job listings typically lack the red flags found in fraudulent posts, so the system classifies them as legitimate. For testing purposes, you can simulate scam-like descriptions.
12. Is any personal data stored in this system?
No. The system does not collect or store any personal data. All predictions are done in real-time based on user input or live URL analysis.
13. Can this system be integrated into other job platforms?
Yes can be done in future work. With some modifications, the system can be extended to work with other job portals like LinkedIn, Naukri, Indeed, etc., for broader fraud detection coverage.
14. What makes this system different from previous works?
This project includes: • Multiple model comparison • Real-time job link prediction • Selective feature-based predictions • PDF report generation • User-friendly web interface • These features provide a more practical, interactive, and accessible tool for end users.