Extreme Learning Machine Neural Networks for Multi-Agent System in Power Generation

Extreme Learning Machine (ELM) is widely known as an effective learning algorithm than the conventional learning methods from the point of learning speed as well as generalization. The hidden neurons are optional in neuron alike whereas the weights are the criteria required to study the linking among the output layer as well as hidden layers. On the other hand, the ensemble model to integrate every independent prediction of several ELMs to produce a final output. This particular approach was included in a Multi-Agent System (MAS). By hybrid those two approached, a novel extreme learning machine based multi-agent systems (ELM-MAS) for handling classification problems is presented in this paper. It contains two layers of ELMs, i.e., individual agent layer and parent agent layer. Several activation functions using benchmark datasets and real-world applications, i.e., satellite image, image segmentation, fault diagnosis in power generation (including circulating water systems as well as GAST governor) were used to test the ELM-MAS developed. Our experimental results suggest that ELM-MAS is capable of achieving good accuracy rates relative to others approaches.


Introduction
Lately, Extreme Learning Machine (ELM) is widely known as an effective learning algorithm than the conventional learning methods from the point of learning speed as well as generalization [1][2][3][4][5][6]. ELM has the capability to make universal approximation with haphazard biases and input weights [7]. Otherwise speaking, the hidden neurons are optional in neuron identical whereas the weights are the criteria required to study the connection between the output layer and hidden layers. According to Huang et al. [8], the ELM is extremely effective and inclines to universal prime in contrast to the conventional feedforward neural network (FNN). Furthermore, ELM is able to achieve the greatest generalization bound of the conventional FNN, in which each parameter was learned with activation functions that were normally utilized [9]. In the context of generalization and efficiency, ELM performed much better than that of traditional FNN, as evidenced in diverse types of problems [1][2][3][4][5][6]. The ELM can also be applied to other fields, namely hyperspectral images [10], biomedical analysis [11][12], system modelling [13][14], chemical process [15], action recognition [16], power systems [17], and others. The focus of some research group was on ensemble model to integrate every independent prediction of several ELMs to produce a final output [18][19][20][21][22]. This particular approach was included in a Multi-Agent System (MAS) as well [23]. In the past decade, Multi-Agent Systems (MASs) had been a focus of attention. It had been successfully applied by researchers for tackling problems in different domains, as shown by its extensive applications in decision support [24], military support [25], healthcare [26], control systems [27], e-Commerce [28], and knowledge management [29]. The MAS's general structure was illustrated in Figure 4.1, in which the base platform was consisted of a collection of ELMs known as individual agents. Normally, an individual ELM (individual agent)'s outcome was delivered to a parent agent, which was the decision combination module where the ultimate decision was made in this structure. In general, the average output in the decision combination module was derived from the methods including weighted average [7], exact average [20], voting [30], and confusion matrix [31]. Regrettably, in the decision combination module, these aforementioned approaches often needed additional algorithms to generate outcome. In this paper, decision combination module based on ELM was proposed. Meta-learning can roughly be described in terms of at least a learners are learning information, where it is a common approach used to combine the outcomes of multiple learners [32][33]. The model is achieved by multiple ELMs as hidden neurons, while the meta-learner learns from the outputs of hidden neurons. Theoretical analysis and experimental results from several studies using artificial and benchmark regression datasets illustrated that Meta-ELM, which is trained by multiple ELMs, could give good performance with a lower computational cost [34]. Meta-ELM [34] is an ELM with a special design, i.e., used ELMs as hidden neurons. However, in this paper, an ELM-MAS (extreme learning machine based multi-agent systems) is designed from another perspective. It has two layers of full ELMs: the first layer made up of at least an ELMs and each ELM is considered as an individual agent; the second layer consists of a single ELM and it is the parent agent. This two layers structure of the proposed ELM-MAS resembles a typical multi-agent neural network (Fig. 1).

The Algorithms of ELM and ELM-MAS
Depending on the type of activation function it utilizes, an ELM can be either a feedforward or a RBF network with a sophisticated learning algorithm (Fig. 2). Deliberate a series of N training samples (with a input vector and corresponding target output vector),  consists of L number of hidden neurons. Five ELMs are generated as individual agents in this paper, with each having different random input weights. As shown in Fig. 3, the output of each ELM k (for k = 1, 2, ..., 5), in response to j  The procedure of the training phase is as follows.
Step 1: Haphazardly designate the input weights Step 2: Calculation of the hidden layer output matrix for ELM k (for k = 1, 2, ...5), H k , as follows.

(4)
Step 3: Computation of the output weights of ELM k , k β . Due to the reason that H is possibly a non-symmetrical matrix, the inverse matrix cannot be solved. As results, a Moore-Penrose pseudo inverse matrix method was adopted to circumvent this problem, which was demonstrated by the equation below, where T N ] ,..., is the corresponding targeted output vectors.
Step 4: After the calculation of ELM k 's output weights, the outputs of ELM k were computed using the training samples.
Step 5: The input weights for parent ELM, i.e., i p and i q for i = 1,… L 1 , where L 1 is the number of hidden neuron of parent ELM are assigned at random.
Step 6: The hidden layer output matrix for parent ELM, S, is computed as below where w j is the combination outputs of ELM k (for k = 1, 2,...5) in response to x j , i.e., ] [ Step 7: Used the output of ELM k to calculate the output weights of parent ELM, α by the equation below, where T N ] ,..., is the particular targeted output vectors. . . . Once the training of all samples were completed following Step 1 to Step 7, prediction of unknown input vector z based on the k a , k b , p , q , k β and α can be done using the ELM-MAS i.e., where h k and y k are hidden layer output and of ELM k respectively, ] [  (2) and (3), there are several activation functions that have been used in this paper, i.e., Equations (13), (14), and (15) are the Gaussian activation function, Laplacian activation function and Laplacian Basis function. The proposed ELM-MAS and the Meta-ELM [34] have a similar structure. However, they have subtle distinctions in structural representation, and also differences in the way they handle training datasets. Firstly, Meta-ELM partitions the entire training dataset into several random subsets and each ELM (hidden neurons) learns a subset. However, all ELM k (individual agent) of the proposed ELM-MAS trained by the same entire training dataset. Secondly, ELMs in Meta-ELM are considered as hidden neurons. But for ELM-MAS, ELM k are considered as individual agents. Lastly, the Meta-ELM is a three layers neural network, but ELM k and parent ELM of ELM-MAS have a full three layers neural network structure hence total of six layers. As a conclusion, the structure of Meta-ELM is the same as a standard ELM (as shown in Fig. 2) but with ELM in the hidden neuron. This is different from the proposed ELM-MAS with two layers of ELMs (six layers of neurons) which formed a multi-agent system.

Experimental and Results using Benchmark Data
In this paper, there are two benchmark datasets (e.g. Satellite Image and Image Segmentation) which were used to exam the enactment of ELM-MAS. The datasets' specifications were detailed in Table I [ 35]. All experiments were run on MATLAB (ver.2010) using a private computer equipped with Intel(R) Core(TM) i7 2.9 GHz CPU and 8 G RAM. In the experiment, we referred to the suggestion of Liang [35] to fix the number of hidden neuron of each ELM k (i.e., L) to 400 for Satellite Image and 180 for Image Segmentation. On the other hand, 2/3 of the training samples were utilized for training while 1/3 were used to decide the optimum amount of neurons of the parent ELM (i.e., L 1 ) via a validation process. For each type of the activation function of ELM-MAS, training and validation processes start by setting L 1 = 50 units and then increased by an increment of 50 units. As an example, Table II shows the training and validation processes based on sigmoid activation function. From the Table II, the number of hidden neurons with the most excellent validation result is chosen for ELM-MAS's performance evaluation.
In the Table II, the snowballing amount of hidden neuron is not enriched the accuracy rate. This was due to a condition termed overfitting, where the neural networks overestimate the targeted problem's complexity. In contrast, it also significantly reduced generalization capability, which resulted insubstantial variation in predictions. Hence, using feedforward neural network to allocate the proper number of hidden neurons with the intention to avoid overfitting was of the utmost importance in function approximation.   Comparison was also made between the proposed ELM-MAS and other variants of ELMs, such as ELM [35] and ensemble ELM [20]. Based on Table IV, the test accuracy rates of ELM-MAS were comparable (if not greater) with ELM (RBF) and ELM (Sigmoid). In term of processing times, the comparisons between ELM and ELM-MAS was inconclusive because of improvement of computer hardware in past several years. In general, the processing time of ELM-MAS is considerably fast (within few seconds). In addition, it can be faster if training of all ELM k were conducted in parallel.

Application in Power Generation
Application of the developed ELM-MAS on power generation system is discussed in the following section.  [36][37]. As shown, the system comprised of turbine condensers, drum strainer, and piping between the sea water's inlet and the drain, where water will be sent back into the sea. Turbine condenser is the major component in the CWS which simultaneously function in the removal of heat from low pressure steam as well as the maintenance of the turbine backpressure at the bottom potential nevertheless constant level. Undeniably, the heat transfer's efficiency in a condenser had a major impact on the condenser vacuum. Maintenance of the tur bine backpressure at a low level via an effective heat transfer process allowed the turbine to work at high efficiency for power generation. Nonetheless, in the case of excessive amount of gases due to cooling presented in the condenser, the vacuum level would be affected and the heat transfer efficiency in the condenser would be reduced. Additionally, hygiene of the condenser tubes had substantial effect on the capability of the condenser to transfer heat from the exhaust steam to the cooling water. Accumulation of mud plus fine solid materials like sand, shells and seaweed in the water which slipped away during the filtering process of the circulating water system will be resulted in blockage, which ultimately led to inefficient heat transfer in the condenser tubes. A series of 2500 datasets were gathered and segregated into training, validation, and testing sets (Table V) [38]. Before the tests, the ELM-MAS model was trained and validated to attest the most appropriate number of hidden neurons. As shown in Table VI, training ELM-MAS with a Laplacian activation function at 96.96% attained the highest test accuracy. The test accuracy rate of ELM-MAS is comparable (if not superior) to other approaches including FAM [39] and SVM [40] (Table VII).

GAST Governor
No more Modern power plants could generate and supply high quality electricity to customers. Many computers had been installed with simulation programs to analyze the characteristics of power systems in the planning phase as well as the actual operations which included monitoring and control activities. System analysis software was executed repeatedly in the planning stage of a power plant [41]. Modification and adjustment of the input data into the software were made based on the engineers' experiences and heuristic knowledge till satisfactory plans had been determined. Currently the development of the programs in the power plant analysis as well as planning are according to the mathematical models besides only implemented for the numerical computation. Thus, development of techniques and methodologies for incorporating practical knowledge of planning were used in sophisticated approaches and were applied into the system planning [41]. Computer based in Energy Management Systems were frequently introduced into gas turbine monitoring and control. Besides that, these computer-based systems were commonly utilized in energy control center. Gas Turbine analysis software as well as other application software was being introduced into the Energy Management Systems with intentions of examining as well as forecasting the behaviour of gas turbines during steady-state operations [41]. Although these software was a powerful tool, its abilities in assisting the operation engineers in making the best decisions were very limited during the time when unplanned or unexpected modes of two operations were detected. Most of the abnormal modes in the system operation were triggered by either reactive or active power imbalances, network faults or frequency deviations. A partial or complete systems blackout can be happened in an unplanned operation [41]. Thus, experienced operation engineers will be making the decisions under these emergency situations for restoration of the gas turbine into normal state. There is a need to incorporate the knowledge of the experienced operation engineers with conventional application software into a situation such as operational strategies for network restoration, efficiency in diagnosis of network faults, and balancing reactive and active power [41]. Therefore, the development of fast and efficient methods of predicting abnormal system behaviour is required.
In recent years, Malaysia had experienced several large-scale blackout incidents [42]. In the latest incident occurred in 2005, several gas turbine plants sequentially set off inadvertently following a frequency drop of about 1.5 Hz, which caused a total genera-tion loss of 5760 MW. Following this, several studies were performed to observe responses of combined cycle power plants in relation to frequency drops [42][43][44][45]. These models were based on gas turbine models developed by Rowen [46] and Mello et al. [47] to simulate practical plants, which are then utilized to compute responses to frequency changes. Nevertheless, comprehensive analysis on behavior of plant variables during frequency drops has not been done. The major priority for most power generation companies is to attain maximum availability of the gas turbine. To attain maximum availability, the company must prevent accidental shutdowns and if accidental shutdown does occur, the recovering time must be minimized. Consistent monitoring of the gas turbine condition is the best method to achieve maximum availability so that minor problems can be detected before they evolve into major problems. In this case it is believed that an Artificial Intelligent gas turbine condition monitoring system can be used to determine the condition of gas turbine parameters during contingencies to minimize trouble shooting time and restore the gas turbine to its normal operating condition. The GAST denotes the key dynamic features of industrial gas turbines driving generators linked to electric power systems. Speed differences from nominal were envisioned designate minute (roughly 5%). Figure 5 illustrated the model, which comprised of a forward path with governor time constant, T 1 , as well as a combustion chamber's time constant, T 2 , in addition to a load-limiting feedback path. The load limit was susceptible to turbine exhaust temperature, whereby T 3 indicated the time constant of the exhaust gas measuring system. The constant, K T , was utilized for adjusting the gain of the load-limited (A T ) feedback path. The training data is assembled on the output of the GAST block, which is the mechanical power, P mech for a normal operating gas turbine [48]. A total of 630 data were gathered for all 7 input attributes in the GAST with variation within their operating range values [49] (Table VIII). As shown in Table IX, the data were preallocated into training, validation, as well as test sets.

Conclusion
In this study, a novel ELM-MAS model with two layers of ELMs is established. The established model was validated by utilizing benchmark datasets such as satellite image and image segmentation. According to the results, the test accuracy rates of ELM-MAS were comparable (if not superior) to ELM (RBF) and ELM (Sigmoid). Furthermore, the developed model was evaluated by applying it on the power generation system including circulating water systems as well as governor (GAST). Thus far, our results demonstrated that the test accuracy rates of ELM-MAS for circulating water systems was comparable (if not superior) to other algorithms.
Although results obtained from the benchmark studies (using satellite image and image segmentation datasets) as well as applications in power generation (circulating water systems and governor, GAST) were reassuring, further research with datasets from diverse application fields were essential for validating the suitability of ELM-MAS application in actual world.