Mining Health Examination Records
Mining Health Examination Records
ABSTRACT:
General health examination is an integral part of healthcare in many countries. Identifying the participants at risk is important for early warning and preventive intervention. The fundamental challenge of learning a classification model for risk prediction lies in the unlabeled data that constitutes the majority of the collected dataset. Particularly, the unlabeled data describes the participants in health examinations whose health conditions can vary greatly from healthy to very-ill. There is no ground truth for differentiating their states of health. In this paper, we propose a graph-based, semi-supervised learning algorithm called SHG-Health (Semi-supervised Heterogeneous Graph on Health) for risk predictions to classify a progressively developing situation with the majority of the data unlabeled. An efficient iterative algorithm is designed and the proof of convergence is given. Extensive experiments based on both real health examination datasets and synthetic datasets are performed to show the effectiveness and efficiency of our method.
PROJECT OUTPUT VIDEO:
EXISTING SYSTEM:
-
Huang et al. proposed iSELF, a SSL method based on local Fisher discrimination analysis for disease gene classification.
-
Nguyen et al constructed a protein-protein interaction network, which defines interacted genes as candidate genes and the rest as negative genes for SSL based on Gaussian fields and harmonic functions
-
Garla et al. applied Laplacian SVM as a SSL approach for cancer case management.
-
Wang et al. proposed a graph-based SSL method that is able to learn patient risk groups for patient risk stratification.
-
Kim et al. proposed a co-training graph-based SSL method for breast cancer survivability prediction. It iteratively assigns pseudo-labels to unlabeled data when there is a consensus amongst the learners and includes the pseudo-labeled instances in the labeled set until the unlabeled set stops decreasing.
DISADVANTAGES OF EXISTING SYSTEM:
-
Most existing classification methods on healthcare data do not consider the issue of unlabeled data. They either have expert-defined low-risk or control classes or simply treat non-positive cases as negative.
-
Methods that consider unlabeled data are generally based on Semi-Supervised Learning (SSL) that learns from both labeled and unlabeled data
PROPOSED SYSTEM:
-
This paper proposes a semi-supervised heterogeneous graph-based algorithm called SHG-Health (Semisupervised Heterogeneous Graph on Health) as an evidence-based risk prediction approach to mining longitudinal health examination records.
-
To handle heterogeneity, it explores a Heterogeneous graph based on Health Examination Records called HeteroHER graph, where examination items in different categories are modelled as different types of nodes and their temporal relationships as links.
-
To tackle large unlabeled data, SHG-Health features a semi-supervised learning method that utilizes both labeled and unlabeled instances. In addition, it is able to learn an additional K +1 “unknown” class for the participants who do not belong to the K known high-risk disease classes.
ADVANTAGES OF PROPOSED SYSTEM:
-
We present the SHG-Health algorithm to handle a challenging multi-class classification problem with substantial unlabeled cases which may or may not belong to the known classes. This work pioneers in risk prediction based on health examination records in the presence of large unlabeled data.
-
A novel graph extraction mechanism is introduced for handling heterogeneity found in longitudinal health examination records.
-
The proposed graph-based semi-supervised learning algorithm SHG-Health that combines the advantages from heterogeneous graph learning and class discovery shows significant performance gain on a large and comprehensive real health examination dataset
SYSTEM REQUIREMENTS:
HARDWARE REQUIREMENTS:
- System : Pentium i3 Processor
- Hard Disk : 500 GB..
- Monitor : 15’’ LED
- Input Devices : Keyboard, Mouse
- RAM : 4 GB.
SOFTWARE REQUIREMENTS:
- Operating system : Windows 10/11.
- Coding Language : C#.net.
- Frontend : Net, HTML, CSS, JavaScript.
- IDE Tool : VISUAL STUDIO.
- Database : SQL SERVER 2005.