AI-Powered Android Malware Detection using Machine Learning

IEEE BASE PAPER TITLE:

Enhancing the Sustainability of Machine Learning-Based Malware Detection Techniques for Android Applications

IEEE BASE PAPER ABSTRACT:

The rapid increase in smartphone usage has led to a corresponding rise in malicious Android applications, making it important to develop efficient and sustainable malware detection methods that maintain high accuracy. This paper presents a two-stage machine learning approach aimed at improving both detection accuracy and sustainability in Android malware classification. The first stage estimates the release year of an app using its SDK version information, while the second stage classifies apps as benign or malicious through a weighted voting mechanism applied to year-specific malware detection models. This method balances the high accuracy of retraining with reduced computational overhead, delivering robust and scalable malware detection. Using a dataset spanning 2014 to 2023, we evaluate the performance of the proposed method in comparison to retraining-based and incremental learning-based approaches. Experimental results demonstrate that while the retraining-based method achieves the highest accuracy and F1 score, it incurs a significant increase in training time. Conversely, the incremental learning-based method offers lower accuracy but reduced training time. Our two-stage model-based classification method effectively balances these trade-offs, providing accuracy comparable to the retraining-based approach while maintaining stable training times and moderate model sizes, making it a viable option for sustainable malware detection in real-world environments. Future research will explore non-machine-learning-based release year prediction methods to further optimize training efficiency and improve adaptability to the rapidly evolving malware detection landscape.

PROJECT OUTPUT VIDEO:

ALGORITHM / MODEL USED:

Logistic Regression, ExtraTree Classifier, Random Forest Classifier, Stacking Classifier.

OUR PROPOSED PROJECT ABSTRACT:

The rapid growth of Android applications has significantly increased the risk of malware attacks, leading to data theft, unauthorized access, privacy breaches, and system-level exploitation. To address this challenge, this project proposes an AI-Powered Android Malware Detection System that integrates machine learning techniques for effective identification of malicious applications. The system is developed using Python for backend processing, Flask as the web framework, and HTML, CSS, and JavaScript for the frontend interface, providing an interactive and user-friendly web-based platform.

The model development is carried out using the TUANDROMD dataset, which contains 4465 application records and 242 attributes related to permissions and API behaviors. Four machine learning models were trained and evaluated: Logistic Regression, ExtraTree Classifier, Random Forest Classifier, and Stacking Classifier. The Logistic Regression model achieved a training accuracy of 93.88% and test accuracy of 94.17%. The ExtraTree Classifier, Random Forest Classifier, and Stacking Classifier each achieved a training accuracy of 97.19% and a test accuracy of 97.42%, demonstrating stronger performance in detecting malware behavior.

The system supports two modes of detection: Manual Detection Mode: where selected Android permissions such as ACCESS_FINE_LOCATION, READ_CONTACTS, ACCESS_WIFI_STATE, READ_SMS, CAMERA, and others are input manually to predict whether an application is malware or benign using the trained machine learning model. APK File Analysis Mode: where an uploaded APK file is analyzed through systematic static analysis. The system extracts the APK contents to identify dangerous permissions, suspicious API calls, malware signatures, multiple DEX files, and native code libraries. A risk score is computed based on predefined security heuristics to classify the APK as Benign, Suspicious, or Malware.

Finally, the system provides a comprehensive performance dashboard, visualizing model evaluation metrics such as Accuracy, Precision, Recall, F1-Score, Training Time, and Prediction Time using graphical comparisons to aid interpretability. Overall, the proposed system demonstrates high detection accuracy and practical usability, making it suitable for real-time malware screening and cybersecurity applications in Android ecosystems.

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

System : Pentium i3 Processor.
Hard Disk : 20 GB.
Monitor : 15’’ LED.
Input Devices : Keyboard, Mouse.
Ram : 8 GB.

SOFTWARE REQUIREMENTS:

Operating System : Windows 10 / 11.
Coding Language : Python 3.12.0.
Web Framework : Flask.
Frontend : HTML, CSS, JavaScript.

REFERENCE:

SEYEON PARK, HOJUN LEE, DAEUN KIM1, HYEUN JUN MOON, SEONG-JE CHO, YOUNGSUP HWANG, HYOIL HAN, AND KYOUNGWON SUH, “Enhancing the Sustainability of Machine Learning-Based Malware Detection Techniques for Android Applications”, IEEE Access, Volume: 13, 2025.

👉CLICK HERE TO BUY THIS PROJECT “AI-Powered Android Malware Detection using Machine Learning” SOURCE CODE👈

Frequently Asked Questions (FAQ’s) and Answers

1. What is the main objective of this project?

The main objective of this project is to develop an intelligent and automated system that can detect Malware in Android applications using machine learning algorithms. It aims to enhance mobile security by leveraging AI-based models to identify potential threats based on permission-based and behavioural features.

2. Why is malware detection important for Android applications?

Android is the most widely used mobile operating system globally, making it a prime target for malicious software attacks. Malware can compromise user data, steal personal information, and harm device performance. Detecting malware early helps protect user privacy, secure sensitive data, and maintain system integrity.

3. What dataset is used in this project?

The project uses the TUANDROMD dataset, which contains 4,465 records with 242 attributes. The dataset includes features representing app permissions, API access, and behavioral characteristics that help distinguish between Malware and Benign Android applications.

4. Which machine learning algorithms are implemented in this project?

The following algorithms are implemented and evaluated: 1. Logistic Regression 2. ExtraTree Classifier 3. Random Forest Classifier 4. Stacking Classifier These models are trained using the TUANDROMD dataset to classify applications and compared based on performance metrics like accuracy, precision, recall, and F1-score.

5. Which algorithm achieved the best accuracy?

The Stacking Classifier, Random Forest Classifier and ExtraTree Classifier achieved the highest performance, with 97.19% testing accuracy. These ensemble-based models combine multiple classifiers to enhance prediction stability and accuracy.

6. What are the technologies used in this project?

• Backend: Python • Frontend: HTML, CSS, JavaScript • Framework: Flask (for backend integration) • Environment: Jupyter Notebook / Visual Studio Code

7. What are the two modes of detection in this system?

The system provides two detection modes: 1. Manual Detection Mode: Users manually input permission-based features (e.g., READ_SMS, ACCESS_FINE_LOCATION) to predict whether an app is Malware or Benign. 2. APK File Analysis Mode: Users upload an APK file, and the system automatically extracts relevant features and classifies it as Malware or Benign.

8. How does the system classify an APK file as malware?

When an APK file is uploaded, the backend extracts relevant permission-based and API-related features from the file. It extracts the APK contents and checks for dangerous permissions, suspicious API calls, malware signatures, multiple DEX files, and native libraries. A cumulative risk score is computed based on predefined security rules. Based on this risk score, the APK is classified into Benign, Suspicious, or Malware.

9. What performance metrics are used to evaluate the models?

The models are evaluated using standard classification metrics such as: • Accuracy • Precision • Recall • F1-score • Prediction Time These metrics ensure comprehensive performance analysis and help identify the most effective algorithm for deployment.

10. How accurate is the system overall?

The system demonstrates high overall accuracy, with ensemble models achieving around 97.5% test accuracy, indicating strong reliability and minimal false classifications during real-world testing.

Python IEEE Projects

AI-Powered Android Malware Detection using Machine Learning