An Improved Particle Swarm Optimization based classification model for high dimensional medical disease prediction

Recently, machine learning techniques have become popular and widely accepted for medical disease detection and classification on high dimensional datasets. Classification models is one of the essential model in machine learning models for medical disease prediction due to its fast processing speed, high efficiency and noisy datasets. Traditional machine learning models are failed to estimate the disease patterns with high true positive rate due to large number of features and data size. In this paper, a novel particle swarm optimization based hybrid classifier was implemented for medical disease prediction with high dimensions. The main objective of the feature selection based hybrid classifier is to classify the high dimensional data for large medical feature set. Proposed filtered based hybrid classifier is usually designed and implemented to improve the medical prediction rate on high dimensional data. In this work, we have used different ensemble learning models such ACO+NN, PSO+ELM, PSO+WELM to analyze the performance of proposed model(IPSO+WELM). Experimental results are evaluated on different types of medical datasets including lung cancer, diabetes, ovarian, and DLBCL-Stanford. Performance results show that proposed IPSO+WELM with ensemble model has high computational efficiency in terms of true positive rate, error rate and accuracy.


Introduction
Classification can be defined as a special kind of learning model which is responsible for categorization of different gene-disease datasets. These datasets are classified into set of finite or infinite classes. Apart from supervised and unsupervised machine learning approaches, there are two other machine learning techniques generally used for classification are:-regression and clustering. A learning function generally maps original data into their real-value variable in the process of regression. This technique can estimate the predictive variable for every individual sample. Clustering is categorized under the category of unsupervised learning and here groups are formed according to the similarity of data items. The groups which are built in the process of clustering are known as clusters. Data items having high similarity are included in the same cluster, whereas data items having no similarity or least similarity are included in different clusters. The classification scheme Bagging is categorized under a special kind of Bootstrap aggregation. Furthermore, the process of bagging also supports all characteristics of machine learning and meta-algorithm. Meta-algorithm can be defined as a specific algorithm which is developed for improvement of stabilization factor. Bagging has wide range of applications in the fields of statistical classification and regression. The process of bagging not only reduces variance, but also limits over fitting. Besides these, there exists another application of bagging classification i.e., decision trees. Some common factors are generally responsible for errors of machine learning algorithms, those are:-noise, bias and variance. Noise is generally defined as an error occurs by the target function. Biases are the targets which are not qualified to be learnt by the classification algorithms. Variance is the outcome of sampling process. The above mentioned classification approach reduces overall errors. Boosting can be defined as a special kind of machine learning meta-algorithm. This algorithm has the prime objective of reducing bias significantly. Additionally, it is also responsible for decreasing variance. In other words, boosting is the process of transforming weak learners to strong learners. Weak learners are the learners which are very poorly correlated along with true classification. But, strong learners are strongly correlated with true classification. Extreme Learning Machine can be defined as a single-hidden layer feed-forward neural network (SLFN) with learning model [4]. The traditional optimization approaches like gradient descent based back-propagation [5] evaluate weights and biases. The proposed technique is responsible for decreasing the training time effectively through random assignment of weights and biases. The above extended EL method results better efficiency and performance as compared to all traditional approaches. EL has wide range of applications in different domains like face recognition, human action recognition, landmark recognition and protein sequence classification, medical disease prediction [1,3,4]. But, EL has two major issues, those are:-1) This model has over fitting problem and the performance can't be predicted for unknown datasets. 2) This model is not applicable to binary classification and uncertain datasets.

Feature extraction based ELM:
The process of feature extraction has high significance in the field of classification. All features are divided into two groups, those are:-1) According to the first group, features extraction using noisy attributes and contextual information.
2) The second group contains correlated features. Traditional feature extraction models discard noisy features in order to decrease the high dimensional features to a lower dimensional feature. Let us assume Nmin and Nmax are minimum and maximum numbers of hidden neurons respectively, where N denotes the present value of hidden neurons. For each and every N, the average accuracy rate of ELM through 10-fold cross-validation scheme is evaluated. At last, hidden neurons having maximum average accuracy is chosen as optimal. After selecting the optimal numbers of hidden neurons, the ELM classifier is implemented in order to evaluate the classification accuracy by considering the outcomes of PCA and the outcomes are averaged later. Medical data prediction is the most important and complicated requirement in recent era. Many approaches are developed in the field of medical disease prediction such as cancer prediction. In [9], association rule mining approach is integrated with MLP and back propagation approaches in order to predict and detect the chances of breast cancer. Modular neural network is basically implemented to recognise and analyse cardiac diseases using Gravitational Search Algorithm (GSA) and fuzzy logic embedded algorithm for better performance [3]. With the exponential growth of information technologies, data mining approaches are implemented in various domains such as biomedical applications and disease prediction. Special emphasis can be given in not only detecting cancer, but also predicting the disease in an early stage [6]. A gene selection technique is generally applied to the pre-existing approaches for pattern classification. It not only decreases the influence of noise, but also reduces the error rate in medical datasets [4][5][6]. According to [1], gene selection technique is used to enhance the performance of the classifier through the detection of gene expression subsets. Among most widely implemented gene selection approaches, some are given below:-principal component analysis [4], singular value decomposition [5], independent component analysis [6], genetic algorithm (GA) [7], and recursive feature elimination method [8,9]. Deep learning is a special type of back propagation scheme for multi-layer networks. It supports unsupervised initialization, rather than classical random initialization. On the other hand, this method supports multilayer based unsupervised initialization. After successful initialization, the whole network is trained through back propagation based neural networks technique. The major issue in this model is the hidden parameters of deep learning framework are fine-tuned many times. It makes the overall training phase lengthier and time consuming. These approaches are very slow because of improper learning phases or coverage of local minimum. Additionally, a lot of iterative learning schemes are developed in order to achieve better learning efficiency and performance. This approach includes neurons in an increasing order and completely depends upon kernel based approaches. In case of Bayesian ELM, the hidden layer is not tuned unlike training of conventional feed-forward network. All the parameters of hidden layers are selected randomly. The input weights, output weights and biases are selected randomly in order to minimize the training error. The major disadvantages of the above schemes are:-slower learning rate of networks and reduced capability to manage complexities. The complexity issue of traditional neural networks are overcome by Modular Neural Network (MNN) .Modular Neural Network can be defined as a neural network where all modules are assigned with separate tasks. The outcomes of all modules are combined together in order to form the final outcome. In the past, machine learning models used a single classification model to predict the test data using the training samples. However, multiple classifiers can be used to predict the same test data using the training samples. This process is known as ensemble learning. Ensemble classification has been successfully applied to different classification problems to improve the classification accuracy using the optimal feature selection measures.
Particle swarm optimization (PSO) is very popular optimization techniques in machine learning models. PSO is generally applied in the literature to adjust the initialization parameters of base classifiers in the ensemble learning models. The main objective of this paper is to optimize the traditional PSO parameters in the ensemble classification model in order to improve the accuracy and error rate. In the ensemble model, neural network is used as one of the base classifier and weights are initialized using the proposed PSO technique. In the past, machine learning models used a single classification model to predict the test data using the training samples. However, multiple classifiers can be used to predict the same test data using the training samples. This process is known as ensemble learning. Ensemble classification has been successfully applied to different classification problems to improve the classification accuracy using the optimal feature selection measures. During the last two decades, different feature selection methods and classification models have been used for prediction and diagnosis of different cancer diseases. Today, with the exponential growth of technology, it is merely impossible to use these conventional methods for cancer prediction due to the high dimensional features and imbalance properties. Machine learning models are used to classify different medical datasets such as microarray data, clinical data, and proteomic data as input. Most of the traditional approaches consider the features as independent and linear. Most of the biological systems are non-linear and its parameters are interdependent, thus machine learning has become better choice. Curse of dimensionality is another problem, where there exist more variables and fewer examples. Both machine learning and conventional methods suffer from this problem. This problem can be resolved by either decreasing the number of variables or increasing the number of training datasets. Sample-feature ratio must be more than 5:1 every time. Machine learning algorithms can be categorized into three broad types, they are:-Supervised machine learning, Unsupervised machine learning and Reinforcement machine learning. Supervised learning consists of a prescient provider which provides the labelled training dataset as input to the algorithm and produces output after mapping. But on the contrary, for unsupervised machine learning only training datasets are given as input without labels. Some of the examples of unsupervised learning are:-Self Organizing feature Maps, hierarchical clustering, k-means clustering, and so on. All machine learning algorithms used for cancer prediction come under supervised learning. Most widely applicable algorithms are:-Artificial Neural Networks, Decision Trees, Genetic algorithms, Linear Discriminant Analysis, k-Nearest Neighbour etc. The problem of gene extraction or selection has become a main challenging factor in the field of microarray disease datasets. There are two types of feature selection models, the filter based feature selection and wrapper based feature selection. Filter based feature selection rank features according to certain statistical measure such as the information gain; gain ratio, t-statistic etc. The wrapper based feature selection models are more complex than filter based feature selection, because they require a training dataset to select subset of features. Wrapper based methods finds a subset of features and predicts its relevance using a machine learning classifier.

Related Work
Sarkar, et.al, developed an advanced classifier ensemble framework in order to classify huge amount of multimedia big data [1]. There exist a numbers of different classification approaches for different data types. A single classifier can't give optimized results in all types of datasets. Hence, an ensemble learning approaches have become very much popular. Generally accuracy of the ensemble learning approaches depend on base classifiers and data dimensionality. In this work, an advanced classifier ensemble framework is developed along with a group of predefined base classifiers. The main objective of this approach is to classify the test instances based on the majority voting of multiple base classifiers. This ensemble system was executed on Spark environment in order to process large quantity of big data. Series of experiments are performed on multimedia big data and showed that this framework performs better than that of other traditional classification algorithms. Further future works can be carried out in order to modify and enhance the ensemble model with high dimensionality. Advanced techniques can be developed to resolve multi-class classification issues with big datasets. Myoung, et.al, proposed a mobel to resolve the issues of evolutionary unsampling for imbalanced big data classification [2]. Most of the applications require appropriate classification approaches in order to classify big data. The dimensionality issue of microarray data has become more complex in case of imbalanced big data classification process. A proper solution for imbalanced classification is evolutionary ensemble approaches. In this paper, an optimized MapReduce approach is implemented to classify imbalanced data. The whole process can be broadly divided into two phase MapReduce. In the initial phase, a decision tree is learnt on every individual map after successful completion of pre-processing and prediction of test samples using trained dataset. A windowing approach is applied in order to enhance the EUS process. EUS is implemented on arbitary sized datasets in order to achieve parallelization. In future, this model will be extended to implement hybrid oversampling or undersampling techniques instead of EUS in order to enhance the system's performance. I. Triguero, et.al, introduced a new approach for evolutionary under-sampling for extremely imbalanced big data classification [3]. Ensemble under-sampling technique is considered as a perfect solution for this skewed class distribution. Due to memory and time limitations, an evolutionary approach can't be implemented in high dimensional applications. In order to resolve such types of problems, divide-and-conquer techniques are implemented using MapReduce scheme. According to this technique, the whole dataset is divided into numbers of subsets of data. In the high dimensional data, divide-and-conquer methods are failed to find the majority and minority problems. [3] proposed a novel technique using the Apache Spark. The main objective of this model is to solve the problem of extremely imbalanced datasets. In the proposed technique, majority and minority class instances are handled independently in order to decrease the influence of small sample size. The process of evaluation is carried out along with different datasets up to one million instances. In future, several preprocessing schemes like hybrid oversampling or under-sampling techniques can be implemented to improve the performance of the true positive rate and accuracy. B Sudha et.al, developed a new approach for efficient big data classification [4]. They implemented a novel model know as Random projections fuzzy k-nearest neighbor (RPFKNN). As the size of the data features increases, there is need to minimize the dimensionality of feature space for classification models. Let us consider an example of activity recognition process. Here, a large amount of high dimensionality data are required to process real time classification applications such as medical disease prediction and sensor applications. Some traditional dimension reduction techniques like PCA and feature selection models are appropriate for datasets of limited size and high dimensionality. In this paper, they integrated traditional classifier with fuzzy k-nearest neighbor and feature selection through random projection technique in order to carry out classification of big datasets. In PCA approach, the projection matrix is evaluated only once through a least square optimality. But in random projections, the projection matrix is evaluated more than once. A fusion scheme is required to combine these outcomes of random projection technique and it is named as RPFKNN. This technique usually depends upon class membership values generated by FKNN and classification accuracy. Zhao et.al, developed a cloud-based framework for big data applications [5]. Classical classification models and feature reduction methods are incapable of processing these big data efficiently. The framework involves two phases i.e., feature extraction and ma-chine learning phases evaluated in both training mode as well as testing mode. Proposed technique gives a better solution for big data classification with limited feature components. The automatic adjustment of classification makes this approach very much efficient as compared to other traditional classification approaches. Lan, et.al, designed and developed a solution to imbalanced classification on small datasets with limited feature space [6]. The main problems of the traditional classification models are:imbalanced data , high dimensionality and error cost estimation. Here, a new distributed evolutionary cost sensitive balancing approach is proposed in order to handle imbalanced data as well as error cost estimation. At first, a genetic algorithm is applied to obtain an optimal cost matrix along with the base classifier. In this paper, two distinguished alternatives are proposed which are given below:-: a computation-oriented technique and a data-oriented technique. These two techniques are implemented using the Apache Watchmaker framework. Apart from this, both the techniques are deployed in Hadoop-based environment. Series of experiments are carried out in order to evaluate both the approaches and demonstrated that, the computation-oriented technique results better performance than that of the other one. Besides, it is incapable of decreasing the overall execution time. In case of data-oriented techniques, they can easily decrease the overall execution time significantly compared to traditional SVM and kNN. The main limitations of these models are high runtime, high error rate . In future, this work can be extended to implement faster algorithms like ensemble decision trees. Begum, et.al, introduced an adaptive classification scheme for microarray analysis using big data [7]. Microarray technology is used for study of biological operations within cells. Each and every cell contains some gene expression data those are represented by microarray. Microarray data is responsible for detecting required information out of numerous genes within a common cell at the same time. By studying these gene expressions thoroughly, detection of several diseases like viral infection and cancer can be identified at a very early stage of diagnosis. Let us consider a sample tissue where numbers of different genes are expressed. Among the total numbers of genes in a sample tissue, many are not related to medical diagnosis. Diseases can be properly classified when among vast numbers of genes a small portion of genes are chosen. [7] model emphasizes on some commonly implemented gene expression classification schemes based on MapReduce framework. They also worked for detecting differentially expressed genes and choose relevant genes out of gene expression. They cross validated their theory and performed evaluation process. The outcomes of evaluation process show better performance as compared to other conventional classification techniques. Huang,et.al, proposed an advanced hybrid classification approach for microarray datasets [8]. In this research work, they developed a new hybrid technique in order to improvise the traditional microarray data classification approaches. Their proposed technique basically depends upon the base classifiers such as nearest neighbour, Naïve Bayes and SVM. Later the performance of given approach is compared with classical classification techniques (such as SVM, Naïve Bayes, and so on). The process of feature selection is successfully executed with the help of discrete wavelet transform and moving window approach. The hybrid classifier is developed by merging the features of NN, NB and SVM. During the classification phase, similar feature are retrieved and compared with the references generated in training phase. A three-fold cross validation process is carried out to measure the robustness of the system. Each and every fold is associated with a single classifier to test the system's accuracy. All generated classification accuracies from classifiers are assumed to have same weight in case of hybrid classifiers. This technique is evaluated over five gene expression datasets. Numbers of experiments are performed on both real as well as benchmark datasets. The main limitations of this model are feature space and feature subset selection. Zong, et.al, proposed a new approach for big data classification by using RIPPER [9]. They termed their approach as distributed multiclass rule based classification scheme. There is necessity of more computational resources with respect to space and time in order to retrieve the hidden patterns. Hence, distributed computing has gain popularity in the data mining process. The limitation of space and time can be overcome by presenting such data mining approach which will be capable of mining in a non-centralized manner. Here, they have implemented a new algorithm for distributed multiclass rule based classification (DiRUC). DiRUC is responsible for applying repeated evolutionary pruning strategy to generate error reduction at local level. These error reduction schemes are combined together at global level in a distributed manner by integrating all the local models, the global model is built and placed at all sites for next class label prediction. The outcomes of evaluation phase show better performance as compared to conventional RIPPER approach. The main limitation of the model is to high error rate and false positive rate as the number of dimensions is increasing. Huang et.al, presented a new technique for gene selection and decision tree based classification [10]. The main objective of this paper is to predict cancerous severity class labels. Due to the highdimensionality of microarray gene dataset and gene expression sequences, the process of data classification is getting more complicated in the traditional models. High dimensional problems such as gene selection and classifier construction can be minimized by advanced rule based classification techniques [12][13][14][15][16]. Rough set theory and decision tree approaches are integrated with the mathematical and statistical techniques to resolve these issues. In the process of evaluation, the overall accuracy of rule-based classifier J48 is computed for each and every dataset. In the classification model, the ranking of genes are computed by correlation and Jaccard coefficient to predict the cancer label.

Proposed Methodology
Proposed PSO based ensemble classifier is a multi-objective technique which finds the local and global optimum solutions by iteratively searching in a high dimensional space. If the attribute is numerical and the value is null then it is replaced with computed value of equation (1). Also, if the attribute is nominal and has missing values then it is replaced with computed value of equation (2). Finally, if the class is numeric then it is converted to nominal. Proposed PSO based ensemble model is usually designed and implemented to improve the overall classification true positive rate on high dimensional feature selection. Generally, ensemble learning model is constructed from a group of base classifiers to predict the high dimensional class labels. Here, search space of the traditional PSO model is optimized using the proposed measures such as optimized fitness measure, candidate solutions, fitness measure and chaos gauss based randomization measure. In our model, different base classifiers such as ACO+NN,PSO+ELM,PSO+WELM, IPSO+WELM are used to test the performance of proposed model to the traditional models.

Proposed Improved PSO based Ensemble Classification Algorithm:
Step  Step 4: Apply KNN classification model with highest true positive rate acci on the selected features in the ith iteration.
Step 5: To each particle compute fitness value and true positive rate of the attribute in the previous step.
Step 6: Update the particle position, velocity, local best, global best, fitness and logistic chaos randomization functions.
Step 7: This procedure is repeated until minimum error rate or high true positive rate.

4.Experimental Results
In this section, experimental results are performed on cancer medical datasets using proposed feature selection based classification model. Precision measure computes the ratio of correctly predicted cancer instances among the entire disease affected instances.     Table 2, describes the performance analysis of proposed model to the existing models in terms of classification accuracy. Here the accuracy is computed in terms of true positive rate and true negative rate on medical dataset. From the table, it is clearly observed that the classification rate was optimized on medical dataset using our proposed feature selection based weighted ELM in ensemble classifier. Here the accuracy is computed in terms of true positive rate and true negative rate on medical dataset. From the table, it is clearly observed that the classification rate was optimized on medical dataset using our proposed feature selection based weighted ELM in ensemble classifier.

Conclusion
In this paper, an optimized PSO feature selection method is integrated with the weighted ELM model for ensemble learning on microarray datasets. Most of the traditional feature selection based classification algorithms have computational issues such as dimension reduction, uncertainty and class imbalance on microarray datasets. Feature selection based ensemble classifier is one of the scalable models for extreme learning machine due to its high efficiency, the fast processing speed for real-time applications. Proposed filtered based hybrid classifier is usually designed and implemented to improve the medical prediction rate on high dimen-