As environmental performance becomes increasingly important, the sintering process is receiving more attention since it consumes large amounts of energy. This paper proposes a data-driven model for sintering energy consumption, which considers both model accuracy and time efficiency. The proposed model begins with removing data anomalies using a local outlier factor (LOF) algorithm and an attribute selection module using the RReliefF method. Then, to accurately predict sintering energy consumption, an integrated predictive model is employed that uses bagging-enhanced extreme learning machine (ELM) and support vector regression (SVR) machine, combined with an entropy weight method. A case study is used to demonstrate the effectiveness of the proposed model using actual production data for a year. Results show that the proposed model outperforms other models and is computationally efficient. Optimal parameters of the LOF (1.3) and number of attributes (30) were identified. It was found that coke powder has the most significant impact on the solid energy consumption (SEC), while cooling water flow rate provides the most significant impact on the gas energy consumption (GEC) within each recorded attribute variation. Parametric analysis further revealed the relationships between energy consumption and the significant attributes mentioned above. It is suggested that the proposed model could effectively reduce the energy consumption by attaining more efficient attribute settings.

## Introduction

Iron and steel production are globally recognized as being energy intensive [1,2]. China is the world's largest steel producer and accounts for about 50% of the total world output (822.0 × 10^{6} tons in 2013 and 822.7 × 10^{6} tons in 2014) [3]. This share is expected to keep increasing due to growth in domestic demand. Correspondingly, the amount of energy consumed by steel producers in China is extraordinarily large. Steel companies in China are increasingly being asked to achieve higher energy efficiencies. One of the critical process steps involved in steelmaking is sintering, which converts iron-rich materials (e.g., fine-grained iron ore, mill scale, dust, and sludge) along with fuel (coke breeze and fine coal) and additives (e.g., limestone and dolomite) into a porous material suitable for adding to a blast furnace. The sintering process typically accounts for 8–10% of the energy use of a steel-making enterprise. With this in mind, efforts are being made to reduce energy consumption and emissions associated with the sintering process, such as sintering burdening [4]. However, it has been reported that the energy efficiency of Chinese sintering operations lags that of other steel-producing nations, and sintering energy consumption varies significantly from plant to plant. Sintering energy consumption modeling is seen as a critical issue to achieve higher energy efficiency [5].

The energy consumption of a sintering process is largely determined by the feed material composition and the process parameters, and these can be optimized to achieve higher energy efficiencies. Such process optimization needs an accurate model that can predict the sintering energy consumption as a function of input material composition and process parameters. Several attempts have been made to predict the quality of the sinter product based on process mechanisms. For example, models based on multiphase theory [6] and the finite difference method [7] have been developed, which aim to comprehensively simulate the sintering process and predict various properties [8,9]. Unfortunately, due to the complex mechanisms involved in the sintering process, it is extremely challenging to develop a mechanistic model based on first principles. The models mentioned above, on the one hand, usually make many assumptions that simplify the model at the cost of fidelity to the actual process. On the other hand, more complex models that better mimic the actual process may suffer from having too many parameters that need to be estimated. For the latter case, the parameters are often set by trial-and-error or cross-validation methods since theoretical guidance on the parameter values does not exist. Due to these issues, mechanistic models, in general, suffer from poor accuracy in actual application.

As an alternative approach, data-driven methods based on machine learning and artificial intelligence for energy consumption prediction have gained attention in recent years [10,11]. A handful of sintering process studies have been conducted, which were aimed at developing a relationship between the input variables and energy consumption (output) via historical data and using some artificial neural network (ANN) algorithms, such as back propagation neural networks [12–14] and radial basis function networks [15]. One issue associated with these studies is the unsatisfactory accuracy of the models that have been developed (which generally use a single learning algorithm). ANN models have been used in many applications, such as predicting the tool wear in cutting operations [16] and temperature in a hot strip mill [17]. However, many predictions, e.g., motor shaft misalignment [18] and cutting force [19], have been reported better made with SVR. One advantage of SVR is its good generalization capability, even with small number of samples. This is the case because it considers structure risk, including both empirical risk and Vapnik–Chervonenkis dimension [20], which can avoid the issue of “curse of dimensionality.” ELM is another efficient method [21] that has been attempted in many applications [22]. Unlike ANN, based on a gradient descent algorithm which is likely to be stuck in local optima, ELM can theoretically achieve a globally optimum solution and has much fewer parameters to be set before training. In addition, ELM has the advantage of shorter computation time. Besides, integrated learning methods and techniques have the potential to avoid the drawbacks of a single learning algorithm and to produce models with improved accuracy. And, to the best of our knowledge, such integrated learning methods have yet to be applied to the prediction of energy consumption in the sintering process.

Another issue is the selection of the appropriate observations, as this is often based on experience. Real production data contain “outliers” due to faulty measurements or abnormal operations and also involve “pseudo-outliers” due to short-run production (working less than 24 hrs) on some days. Those pseudo-outliers are of normal-operation data but prone to be treated as anomalies. In addition, with real data, since a large number of factors could potentially affect energy consumption through complicated mechanisms, it is difficult to identify the key factors and eliminate redundant variables to obtain a parsimonious model. These three issues are largely responsible for poor generalization in data-driven energy consumption prediction and are motivation for developing a suitable framework that can effectively detect outliers and objectively select input variables as well as improve the previous learning algorithms for better accuracy.

In the present work, an integrated predictive model for two types of sintering energy consumption is proposed that takes advantage of multiple predictive methods. A general framework of the predictive model is presented in which two data-preprocessing methods, the LOF and RReliefF algorithm, are adopted to detect anomalies and extract input attributes that are significant in terms of energy consumption. Then, bagging-enhanced ELM and SVR methods to give two candidate prediction models for energy consumption are presented. Subsequently, results from each model are weighted and integrated using an entropy weight method. Thereafter, a case study is used to validate the generalization capability and computational efficiency of the proposed approach, using measured data from a typical sintering plant in China. The optimal parameter configuration of the proposed model is identified; and sensitivity and parametric analyses are performed to reveal key attributes and hidden relationships.

## Sintering Process Analysis and Framework of the Integrated Predictive Model

### Energy Consumption in Sintering Process.

The sintering process, as shown in Fig. 1, begins with the preparation of the raw materials consisting of iron ore fines, fluxes, in-plant metallurgical waste materials, fuel, and return fines from the sintering process. These materials are mixed in a rotating drum, and water is added in order to reach the desired agglomeration of the raw materials. The raw materials are continuously charged together with hearth-layer material (serves as bedding) to form a bed of approximately 500–800 mm thick and ignited using gas or oil burners. Air is drawn downward through the moving bed causing the fuel to burn. The velocity of the strand and gas flow rate are controlled to ensure that “burn through” occurs just prior to the sinter being discharged. At the end of the grate, the sintered material in the form of a cake is discharged into the hot sinter crusher, where the hot sinter cake is crushed to the desired size. After crushing, the hot sinters are sieved and the undersize portion (the hot return fines, usually smaller than 5 mm) is delivered back to the feeding area and used as the hearth-layer material. The oversize portion is then cooled and transferred to the cold-screening machines where materials that are large enough are transferred to the blast furnace. Before charging, they are screened again, and the undersize portion together with the cold return fines is recycled.

where $mcoke$ and $mgas$ denote the amount of coke (in tons (t)) and gas (in m^{3}) consumed in a day; $mi$ (in t) denotes the amount of the *i*th input raw material, $di$ (in t) denotes the loss rate of the *i*th raw material, and $mreturn$ (in t) denotes the amount of sinter material that ends up in the blast furnace waste streams, yet is recovered and returned as fines for future use. Both equations have the same denominator, which is the total sinter throughput for a day. It is assumed that the heating value of the two fuels, i.e., coke and gas, remains constant (unit of heating value: for coke MJ/kg and for gas MJ/m^{3}). In this way, we can use mass or volume to represent energy consumption. The SEC (in kg/t) and GEC (in m^{3}/t) reflect the ratio of mass/volume for the respective energy source. As is evident, the specific energy consumption for sinter ores is not simply the ratio of the mass of the coke (or the ratio of the volume of the gas) to the total mass of the raw materials since there are burning losses and returned fines produced during the process. In addition, loss rates of the raw materials are not constant; they vary with the extent of combustion. Moreover, the amount of return fines is also difficult to predict as it is influenced by many parameters, such as ignition temperature and strand speed. The authors are unaware of any study that predicts the amount of return fines based on the mechanisms involved. Given the lack of understanding of the mechanisms, the data-based intelligent predictive modeling approach adopted here is appropriate.

### General Framework of the Data-Driven Energy Consumption Model.

In this section, a framework of the energy consumption model is proposed based on the sintering process data. There are many factors that influence energy consumption during sintering, which can be summed up into three types: feed material composition, system state parameters, and operation parameters. Feed material composition variables include the amounts of primary raw materials, fluxes, and media. Primary raw materials are a mixture of iron ore, coke, gas dust, slag, and return ores; fluxes include burnt lime, limestone, and dolomite; and media include mixed water, cooling water, and gas. The amounts of all these variables are known before the sintering process begins. System state parameters are associated with the production status of the system, e.g., windbox temperature and negative pressure, central flue gas temperature and negative pressure, and exhaust gas temperature. Sensors may be used to measure these parameter levels. Operational parameters are parameters selected prior to ignition and include strand speed, bed depth, amount of water added to the blending drum, amount of gas and air used during ignition, ignition temperature and pressure, and cooling water flow rate. The levels for these variables are also set before sintering starts.

Some parameters, like primary raw materials, are recorded in each day. Here, we assume that the quality of all the raw materials keeps consistent every day. Others, like windbox temperature, are measured in each hour, which means there is more than one record in those parameters. For those parameters, we transform all the records in a day into one by averaging. Thus, given the number of parameters is $n$, each observation, in which all the parameters are recorded, exists in a *n*-dimension space. The basic structure of the sintering data is illustrated in Table 1.

Considering the characteristics of the sintering process data, the proposed framework is composed of five modules: anomaly detection module, attribute selection module, bagging module, weight assignment module, and database module, as shown in Fig. 2. The anomaly detection module is based on the LOF algorithm and detects outliers in the original dataset by employing a density-based method. The attribute selection module is based on RReliefF and rapidly selects attributes that are significant in terms of energy consumption; this module reduces the dimensionality of the sintering data set. The bagging module is designed to improve the stability and generalization capability of both the ELM and SVR algorithms which are applied in the prediction of sintering energy consumption. The number of candidate models in each bagging module is set to be greater than or equal to one. If “one” is selected, a single method is employed without using the bagging technique for comparison. The weight assignment module is based on entropy and evaluates the weight of each bagging model to obtain final energy consumption predictions. The database module stores all the raw sintering process data as well as model parameters.

## Anomaly Detection and Attribute Selection for Sintering Process Data

### Anomaly Detection Using LOF.

In the case of sintering, anomalies may emerge because of malfunctions in sensors or equipment breakdown/irregularity, during which data may be recorded incorrectly. However, on some days the sintering machines may undergo short-run production (less than 24 hrs) because of maintenance. For such cases, the throughput and corresponding raw material consumption during these part-time working days will be less than that associated with round-the-clock operation, such normal-operational observations are likely to be treated as outliers. Statistical methods and other methods, like distance-based methods [23], usually are prone to remove these pseudo-outliers, which may lead to the loss of valuable information. A better way to remove those true outliers and keep the pseudo-outliers is needed. As an alternative, the LOF method, as a density-based method [24], assigns to each data point an “outlier degree” instead of treating outlier as a binary property. That is, every data point is treated as an outlier to some degree determined by the relative density of data points in the local neighborhood. Due to the local approach, e.g., “reachability distance,” LOF is able to detect outliers within some subclusters instead of seeing the data as a whole. As a result, data from partial day operations would not necessarily be treated as outliers, and correspondingly, outliers within local clusters may be found easily.

*N*, $d(p,o)$ denotes the distance between two sintering observations

*o*and

*p*in the sample (each observation represents a row records in Table 1), and $k_distance(p)$ is the

*k*th smallest distance among all the distances measured from

*p*to other sintering observations in the sample (here, Euclidean distance is adopted). Then, the reachability distance of a sintering observation

*p*with respect to

*o*is defined as

*k*-distance neighborhood of

*p*, which contains every sintering observation whose distance from

*p*is not greater than $k_distance(p)$. The local reachability density of a sintering observation

*p*is then defined as

*k*-distance neighborhood of

*p*. Consequently, the LOF of observation

*p*can be defined as

$LOFk(p)$ can be used to describe the degree of being an outlier for a given sintering observation *p*, which is approximately equal to 1 in a cluster. That is, since the whole sintering data may fall into several clusters, observations within each cluster will be seen as normal records although some clusters are relatively large while others are relatively small. Only the observations which apparently do not belong to any clusters would be larger than 1 and be likely to be treated as outliers, which helps keep the diversity of the raw sintering data yet remove the true outliers. As $LOFk(p)$ increases, the possibility of *p* being an outlier increases. In addition, the time is mainly consumed on finding the *k* nearest neighbors, so the computational complexity of LOF within the model results in $o(N\u22c5\u2009log\u2009N)$ when using the quicksort algorithm.

### RReliefF-Based Attribute Selection.

As stated earlier, there are many factors that could influence energy consumption, and the relationship between those attributes is unknown. Previous work has generally selected these factors based on experience. Such a subjective way may fail to capture key variables or include variables that are unimportant. Certainly, objective approaches like attribute selection methods may be employed using historical data to identify the important factors. Since the total attributes embedded in the sample are relatively large, many evolution-based methods could result in overfitting and significant computation time. Filter methods such as gray correlation analysis [25], principal component analysis (PCA) [26], and RReliefF [27,28] are commonly used. However, the drawback of gray correlation analysis is that the degree of correlation is still graded subjectively. Other methods, like PCA, cannot derive the weight of each attribute directly and/or have requirements on the independence of the original variables.

Since the attributes related to sintering energy consumption are not clearly elaborated before, and the potential related attributes might be large, it is necessary to address the relevance of attributes with solid/gas energy consumption. As a special type of relief algorithm, the RReliefF algorithm can directly find the relevancy between continuous attributes and energy consumption [29]. RReliefF estimates the quality of attributes according to how well they distinguish similar sintering observations and only cares about the correlation between the attributes and the energy consumption. It endeavors to separate the sintering observations within a given neighborhood. The basic idea is that good attributes separate sintering observations with significantly different energy consumption values and do not separate sintering observations with close energy consumption values. Compared to the other methods, RReliefF is particularly effective in terms of computational cost and avoiding overfitting, especially when the number of attributes is large.

*k*nearest neighbors of $Di$ are identified and designated as set $Sk$ (once again, Euclidean distance is employed). Assume the corresponding SEC/GEC is $\tau (\u22c5)$. The algorithm estimates the weight of attribute A,

_{n}*W*[

*A*], using Bayes' theorem

*k*nearest neighbors. $Pdiff\tau =(diff(\tau ,Di,Dj)|Dj\u2208Sk)$ denotes the probability of different values of energy consumption $\tau (\u22c5)$ between $Di$ and the $kn$ nearest neighbors. $Pdiff\tau |diffA=(diff(\tau ,Di,Dj)|diff(A,Di,Dj),Dj\u2208Sk)$ denotes the conditional probability of different energy consumptions $\tau (\u22c5)$ given that different values of attribute

_{n}*A*between $Di$ and the $kn$ nearest neighbors are known. $Pdiff\tau \u2009and\u2009diffA=(diff(\tau ,Di,Dj)\u22c5diff(A,Di,Dj)|Dj\u2208Sk)$ represents the probability of different energy consumptions and different attributes between $Di$ and the $kn$ nearest neighbors, where

in which $value(A,Di)$ is the value of attribute A in sintering observation $Di$, and $max(A)$ and $min(A)$ are the maximum and minimum of attribute A among $Di$ and its $kn$ nearest neighbors. $diff(\tau (\u22c5),Di,Dj)$ can also be calculated in a similar way.

The procedure of RReliefF algorithm applied to the proposed model is as follows:

Step 1: Set $i=1$ and $j=1$. Usually k is chosen to be between 10 and 20. _{n} |

Step 2: Randomly select sintering observation $Di(i=1,\u2026,N)$, then calculate k neighbors that are nearest to $Di$ from the remaining _{n}N − 1 observations to form the set $Sk$. Use the quicksort algorithm to achieve efficient sorting that has computational complexity of $o(N\u22c5\u2009log\u2009N)$. $i=i+1$. |

Step 3: Pick one sintering observation $Dj(j\u22081,\u2026,kn)$ in $Sk$ and calculate the weights for different energy consumptions, $Ndiff\tau $, $Ndiff\tau =Ndiff\tau +diff(\tau (\u22c5),Di,Dj)/kn$. $j=j+1$. |

Step 4: For each attribute A, calculate the weights for different attributes, $NdiffA$, and for different energy consumptions and attributes, $Ndiff\tau \u2009and\u2009diffA$, $NdiffA=NdiffA+diff(A,Di,Dj)/kn$ and $Ndiff\tau \u2009and\u2009diffA=Ndiff\tau \u2009and\u2009diffA+diff(\tau (\u22c5),Di,Dj)\u22c5diff(A,Di,Dj)/kn$. |

Step 5: If $j<kn$, go to step 3, else if $i<N$, go to step 2; otherwise, for each attribute A, calculate the final estimation of each attribute according to Eq. (6) |

$W[A]=Ndiff\tau \u2009and\u2009diffA/Ndiff\tau \u2212(NdiffA\u2212Ndiff\tau \u2009and\u2009diffA)/(N\u2212Ndiff\tau )$. |

Using the obtained attribute weights, the attributes are sorted in descending order. Larger weights imply greater relevance of the corresponding attributes. |

Step 1: Set $i=1$ and $j=1$. Usually k is chosen to be between 10 and 20. _{n} |

Step 2: Randomly select sintering observation $Di(i=1,\u2026,N)$, then calculate k neighbors that are nearest to $Di$ from the remaining _{n}N − 1 observations to form the set $Sk$. Use the quicksort algorithm to achieve efficient sorting that has computational complexity of $o(N\u22c5\u2009log\u2009N)$. $i=i+1$. |

Step 3: Pick one sintering observation $Dj(j\u22081,\u2026,kn)$ in $Sk$ and calculate the weights for different energy consumptions, $Ndiff\tau $, $Ndiff\tau =Ndiff\tau +diff(\tau (\u22c5),Di,Dj)/kn$. $j=j+1$. |

Step 4: For each attribute A, calculate the weights for different attributes, $NdiffA$, and for different energy consumptions and attributes, $Ndiff\tau \u2009and\u2009diffA$, $NdiffA=NdiffA+diff(A,Di,Dj)/kn$ and $Ndiff\tau \u2009and\u2009diffA=Ndiff\tau \u2009and\u2009diffA+diff(\tau (\u22c5),Di,Dj)\u22c5diff(A,Di,Dj)/kn$. |

Step 5: If $j<kn$, go to step 3, else if $i<N$, go to step 2; otherwise, for each attribute A, calculate the final estimation of each attribute according to Eq. (6) |

$W[A]=Ndiff\tau \u2009and\u2009diffA/Ndiff\tau \u2212(NdiffA\u2212Ndiff\tau \u2009and\u2009diffA)/(N\u2212Ndiff\tau )$. |

Using the obtained attribute weights, the attributes are sorted in descending order. Larger weights imply greater relevance of the corresponding attributes. |

With the attributes now sorted, attention shifts to discarding those attributes that have little relevance on the energy consumption. The number of attributes remaining usually is determined by the performance of the predictive model. In addition, the computational complexity of RReliefF with the model asymptotically approaches $o(a\u22c5N\u22c5\u2009log\u2009N)$ with the quicksort algorithm, in which $a$ is the number of attributes.

## Proposed Ensemble-Based Predictive Models for Sintering Energy Consumption

As stated before, sintering energy consumption prediction has the potential to use SVR and ELM methods. However, SVR faces some challenging issues like trivial human intervene [30], and ELM is sensitive to noise in the data [31]. Sintering observations sometimes are recorded with noise because of the uncertainty of environment and the zero drift of sensors. In fact, previous studies employing one of the methods have shown this problem, and many efforts have been carried out to improve their performance [32]. Among these, integrated learning is an emerging and efficient schema. Bagging (bootstrap aggregating) is one of the most popular machine learning ensemble meta-algorithms designed to promote the stability and accuracy of machine learning algorithms applied in classification and regression [33,34]. Another type of aggregated methods, entropy weighting, derived from Shannon entropy theory [35], makes use of explicit and implicit information and is aimed at assigning different weights to different submodels to eliminate bias and variance. A schema combining the bagging technique and the entropy weight method is expected to have the advantages of both, i.e., reducing the errors of energy consumption prediction and decreasing the possibility of overfitting due to human intervene and noise in the sintering data when employing one learning algorithm.

### Proposed Bagging-Enhanced ELM.

Previous research shows that a single ELM does not have good generalization capability for data with much noise, and overfitting usually occurs in such circumstances [36]. With this in mind, we propose an enhanced ELM that utilizes the bagging technique, which improves upon the generalization capability of the original ELM algorithm.

*K*, and the activation function as $g(x)$. There are

*N*distinct sintering observations $(xi,ti)$, $i=1,2,\u2026,N$, in the training sample space $Str$, where $xi=[xi1,xi2,\u2026,xin]T\u2208Rn$ is an observation of sintering, and $ti=[ti1,ti2,\u2026,tim]T\u2208Rm$ is the energy consumption vector. The attribute matrix and the energy consumption vector are related by

where $H{hij}(i=1,\u2026,N,j=1,\u2026,K)$ is the output matrix of the hidden layer, and $hij=g(wj\u22c5xi+bj)$ is the output value of the *j*th hidden-layer node with respect to sintering observation $xi$. $wj=[wj1,wj2,\u2026,wjn]T$ is the weight vector connecting the *j*th hidden-layer neuron and each attribute neuron, $wj\u22c5xi$ denotes the inner product of $wj$ and $xi$, and $bj$ is the threshold of the *j*th hidden-layer node. $\beta =[\beta 1T,\beta 2T,\u2026,\beta KT]T$ is the output weight vector, in which $\beta j=[\beta j1,\beta j2,\u2026,\beta jm]T$, $j=(1,\u2026,K)$ is the weight vector connecting the *j*th hidden-layer node with energy consumption neurons, and $T=[t1T,t2T,\u2026,tNT]T$ is the energy consumption matrix.

**w**_{j}and hidden-layer threshold

*b*

_{j}can be set to random initial values and need not change during the learning process. Thus, Eq. (8) can be seen as a linear system, for which output weight

**can be solved by**

*β*where $H\u2020$ is the Moore–Penrose generalized inverse matrix of $H$.

Then, in order to enhance the stability and generalization capability of the ELM energy consumption predictive models, the bagging technique is adopted. It uses bootstrap sampling to establish multiple candidate ELM networks with as large diversity among sample as possible, and then, the integrated predicted energy consumption value is calculated via averaging. The procedure of the proposed bagging ELM (B-ELM) is as follows:

Step 1: Set the number of candidate ELM learning machines, M. It is common to choose M = 10 (If M is set to one, it means there is only one candidate ELM model, namely, an ELM model without bagging). |

Step 2: Take sample from sintering data set $Str$ with replacement. Using the bootstrap method, obtain the training sample $Strc(c=1,\u2026,M)$ of each candidate energy consumption submodels. The size of $Strc$ is N. Set $c=1$. |

Step 3: Use the sintering data set, $Strc$, to train the associated energy consumption submodels. Get initial input weight $wj$ and threshold $bj$ arbitrarily, where $j=1,\u2026,K$. Calculate the energy consumption matrix of the hidden layer $H{hij}(i=1,\u2026,N,j=1,\u2026,K)$, then calculate output weight β using Eq. (9). Test the accuracy of the model using the uniform and independent testing observation set $Ste$, which was not used for training the model. Set $c=c+1$. |

Step 4: If $c\u2264$M, then return to step 3; otherwise, calculate the integrated predictive results of the testing sintering observation set via averaging and compare with the measured SEC or GEC. |

Step 1: Set the number of candidate ELM learning machines, M. It is common to choose M = 10 (If M is set to one, it means there is only one candidate ELM model, namely, an ELM model without bagging). |

Step 2: Take sample from sintering data set $Str$ with replacement. Using the bootstrap method, obtain the training sample $Strc(c=1,\u2026,M)$ of each candidate energy consumption submodels. The size of $Strc$ is N. Set $c=1$. |

Step 3: Use the sintering data set, $Strc$, to train the associated energy consumption submodels. Get initial input weight $wj$ and threshold $bj$ arbitrarily, where $j=1,\u2026,K$. Calculate the energy consumption matrix of the hidden layer $H{hij}(i=1,\u2026,N,j=1,\u2026,K)$, then calculate output weight β using Eq. (9). Test the accuracy of the model using the uniform and independent testing observation set $Ste$, which was not used for training the model. Set $c=c+1$. |

Step 4: If $c\u2264$M, then return to step 3; otherwise, calculate the integrated predictive results of the testing sintering observation set via averaging and compare with the measured SEC or GEC. |

The computational complexity of the standard ELM is $o(K2\u22c5N)$, while the proposed B-ELM model has the computational complexity $o(K2\u22c5M\u22c5N)$. It may be noted that the number of hidden-layer nodes, $K$*,* and the number of candidate ELM models, *M*, generally are much less than the sample size $N$ in actual models, that is, $K\u226aN$ and $M\u226aN$. With this in mind, the proposed B-ELM will require approximately *M* times longer to converge than a standard ELM approach. This difference may not be significant when the algorithm is implemented for predicting sintering energy given that a standard ELM often converges very quickly.

### SVR Model.

A key element of SVR is to introduce a loss function, *ε*, on the basis of support vector machine which is used to find an optimal classification surface to minimize the error between all the sintering observations and the surface [37]. The principle of SVR used in sintering process has been elaborated in previously published paper [38], and the procedure of a bagging-enhanced SVR (B-SVR) is similar to those of B-ELM. Since SVR is a strong learning machine, the performance of B-SVR needs to be validated in real case. Here, we only discuss the computational cost. SVR has a computational complexity of $o(N2)$ [36], and B-SVR tends to be slower owing to its computational complexity: $o(M\u2032\u22c5N2)$.

### Entropy Weighting Integration.

After obtaining the two types of sintering energy consumption models, entropy weight method is adopted to assign weights based on the variance of the predicted energy consumption. The basic idea is that the bigger the relative error of each sintering observation, the smaller will be the corresponding weight [39].

where $euq$ and $yuq$ refer to the relative error and predicted energy consumption of the *q*th sintering observation in the *u*th energy consumption model, respectively; $y\u0303q$ denotes the measured energy consumption of the *q*th sintering observation, $q=1,2,\u2026,N$, and $u=1,2$ stands for the two proposed energy consumption models.

*u*th energy consumption model is represented as

where $puq=euq/\u2211r=1Neur$.

The entropy weight method only has a computational complexity of $o(N)$. Therefore, for the proposed model, the overall computational cost results in $o(N\u22c5\u2009log\u2009N+a\u22c5N\u22c5\u2009log\u2009N+K2\u22c5M\u22c5N+M\u2032\u22c5N2+N)$, which can be simplified as $o(M\u2032\u22c5N2)$. This implies that SVR is likely to take the longest time in the proposed model.

## Case Study

To demonstrate the proposed modeling approach, data were collected from a sintering plant which operates two sintering lines with total annual sinter throughput of 8.36 × 10^{6} tons, and with an annual steel throughput of 6.5 × 10^{6} tons. For the purposes of this case study, one of the two sintering lines was considered here. (The other sintering line, which is independent with this one, was also validated by the proposed model, similar results obtained as following.) For this sintering line, there were 311 working days (Jan. 1–Dec. 31), and as a result, a total of 311 observations were obtained. Each observation contains raw material information, production indicators, and equipment states of the sintering line (total 73 attributes) and two energy consumption indicators, SEC and GEC. All of the 73 attributes (12 feed material compositions, 44 system state parameters, and 17 operation parameters) were summarized and grouped based on their nature and location (shown in Table 2), and most of them are also illustrated in Fig. 1. It should be noted that the last column in Table 2 denotes the number of attributes within each group. For example, since there are two rows of blowers beneath the strand and each row involves eight wind boxes, in total there are 16 records for the windbox negative pressure in each observation.

The raw dataset was first processed by the LOF algorithm to remove outliers. Using RReliefF algorithm, the most related attributes were then identified based on weight sorting. The top ten attributes and the weights are listed in Tables 3 and 4. For SEC, the five most related attributes turned out to be: (1) amount of coke, (2) amount of dolomite, (3) amount of ore blends, (4) ignition temperature #3, and (5) material flow rate on conveyor 1S-1 (ascending to raw mix hopper). For GEC, they were: (1) material flow rate on conveyor PD-1 (carrying hearth-layer material), (2) ignition temperature #3, (3) cooling water flow rate, (4) #1 pipe gas flow rate, and (5) ignition temperature #2. The top five significant attributes identified for each indicator were confirmed by the plant managers.

It should be noted that all the parameters in all the models have been optimized by the cross-validation method, and anomaly detection and attribution selection are employed in all the models. After repeated trials, for anomaly detection, the threshold of LOF (denoted as $\theta $) was set to 1.3, which leads to 286 observations out of 311. The training sample consists of 229 observations (randomly selected), and the remaining 57 observations are saved for use as a testing sample (that is, testing data were not used to train the model, they are independent from the training data and only were used for validating the trained model). Every model runs for 30 times, the average of which serves as the “final result.” For attribute selection, the number of attributes involved in the models after RReliefF (denoted as *η*) is set to 30. (Other parameter combinations show similar results.) After training the energy consumption model via training data, testing sintering data, which were independent from the training data, were then employed to validate the performance of the model. The predicted values and measured values, denoted by circles and asterisks, respectively, are displayed simultaneously in Fig. 3.

As we can see in Fig. 3, the predicted values are pretty close to corresponding measured ones in most cases, which shows the effectiveness of the proposed model. The following four evaluation metrics are commonly employed to validate and compare models' accuracy and performance and have been used here:

- (1)Mean relative error (MRE)$MRE=1L\u2211t=1L|y\xaft\u2212y\u0303t|\u2215y\xaft$(14)
- (2)Residual error average$e=1L\u2211t=1L|y\xaft\u2212y\u0303t|$(15)
- (3)Maximum error$Emax=max{|y\xaft\u2212y\u0303t|}$(16)
- (4)Precisionwhere $y\xaft$ and $y\u0303t$ are the predicted and measured values for observation $t$, $L$ is the size of testing sample space $Ste$, and σ is the standard deviation: $\sigma =(1/L)\u2211t=1L(y\xaft\u2212y\u0303t)2$. Equation (17) indicates the dimensionless normalized precision metric for the model predictions.$Pr=(1\u2212\sigma 1L\u2211t=1Ly\u0303t)\xd7100%$(17)

### Evaluation of Energy Consumption Models.

In order to validate the effectiveness of the proposed model, predictions from the proposed integrated model (int. model) are compared with ELM, B-ELM, SVR, and B-SVR models. As shown in Figs. 4 and 5, the MREs of the proposed integrated models are lower than those of the other four models. Other evaluation metrics like e and $Pr$ also show advantages of the proposed integrated model to different degrees. These results reveal the effectiveness of our integrated model. As for $Emax$, it is reasonable that the $Emax$ of the integrated model is between those of the two candidate models, B-ELM and SVR, since it is calculated by the normalization weighted summation of the two candidate models (see Eq. (13)).

It may be noted that the ELM results after bagging are significantly improved, while the SVR results after bagging deteriorate. This may be due to the fact that the accuracy of the SVR model is determined by support vectors (SVs). The bootstrap sampling for each candidate model may miss some significant sintering observations, which would have been selected as SVs, thus lowering the accuracy of each candidate model. Therefore, our proposed integrated model adopts B-ELM and SVR (instead of B-SVR) as the two candidate models for final entropy weighting.

Some descriptive statistics of the relative errors (Eq. (10)) of the models are shown in Table 5. It shows several statistical metrics: the sample mean for the relative error (Eq. (14)), standard deviation of the relative error, and the 95% confidence interval for the MRE. These metrics, specifically the small size of the sample mean residual error, indicate that the overall performance of the proposed model is better than that of the other models. The last column of Table 5 displays the results of a paired *t*-test to validate the goodness-of-fit of the models. The null hypothesis for the paired *t*-test is that the difference between the model prediction and the measured response is zero. As is evident, all of the *p*-values are greater than 0.05, so there is not enough evidence to reject the null hypothesis. In other words, there is no significant difference evident between the actual and predicted values. Based on the statistical analysis of the models, it could be concluded that the proposed model outperforms the other models and is able to capture the dynamics of the sintering process in the prediction of energy consumption.

Figure 6 was prepared to examine the effectiveness of LOF and RReliefF algorithms. It shows the performance of the integrated models with and without these two methods. Here, only results from models for SEC are shown (models for GEC show similar results). In Fig. 6, “I.Model” denotes the proposed integrated model with neither LOF nor RReliefF. “I.M.RReliefF” denotes the proposed integrated model with RReliefF, but without LOF. “I.M.LOF” denotes the proposed integrated model with LOF, but without RReliefF. Finally, “I.M.L.R” denotes the proposed integrated model with both RReliefF and LOF. It can be seen that LOF and RReliefF both help to achieve better performance for SEC prediction, while RReliefF leads to a larger improvement than LOF. When combining them together, the largest improvement is achieved. Here, the threshold of LOF was set to 1.3 and the number of attributes used is 30. Similar results are observed when other combinations of these two parameters were used.

Regarding computational cost, as stated before, since B*-*ELM does not require network weight adjustments, which results in fast training speeds, and LOF and RReliefF also have high time efficiency, the time efficiency of the proposed model largely depends on that of SVR. As shown in Table 6, the time consumption of the integrated predictive model is around 39 s, which is acceptable in actual production applications, and the enhanced energy consumption predictive capability would compensate for the incremental computational cost compared to the first three models. It is worth noting that the final proposed model only adopts one single SVR model, that is *M*′ = 1, which leads to computational complexity of $o(N2)$. As expected, the integrated model requires only half of the time needed for B-SVR. All the computations were conducted on a personal computer with Intel^{®} Core^{TM} i7-4710HQ CPU (2.50 GHz) and 8 GB RAM.

### Model Parameter Selection.

In the proposed model framework, two parameters need to be selected for best performance. These are the local outlier threshold factor (denoted as $\theta $) and the number of attributes retained by the RReliefF algorithm (denoted as $\eta $). Values for $\theta $ of 1.2, 1.3, 1.5, 2.0, and $+\u221e$ were examined (the corresponding numbers of outlier observations were 43, 25, 15, 3, and 0). Values for $\eta $ that were considered were 10, 30, 50, and 73. The performance of the integrated model with respect to $\eta $ under the different $\theta $ values for SEC prediction is illustrated in Fig. 7 (GEC models show similar results).

As shown in Fig. 7, the performance of the proposed model varies with $\eta $. When $\eta =30$, i.e., retaining the top 30 attributes following the modeling procedure, the best observed performance is achieved. When a smaller number of attributes are selected, the model misses some important attributes. On the other hand, when a larger number of attributes are selected, there is excessive redundancy. Similarly, it can be seen that a threshold factor of $\theta =1.3$ outperforms the other values. This suggests that when $\theta $ is larger than 1.3, the threshold of LOF is too loose, thus allowing more noise to be kept in the dataset. When $\theta $ is less than 1.3, the LOF threshold is likely too tight so that some observations are misclassified as outliers. This leads to loss of diversity in the original data and results in deterioration of accuracy.

### Sensitivity and Parametric Analysis of the Proposed Models.

This section is aimed at validating the robustness of the proposed model by conducting sensitivity (SA) and parametric analyses. The major reasons behind performing these analyses are to reveal relationships between energy consumption and process/state parameters and to improve our understanding of sintering process for energy-conservation endeavor [5].

where $ECmax(xi)$ and $ECmin(xi)$ are the maximum and minimum SEC/GEC predicted outputs for the *i*th input interval, respectively. Other attributes are kept fixed at their mean values. $Ri$ denotes the range of variation for the *i*th attributes, and $SAi$ is the percentage of each variation range. The top ten ranked variables obtained from RReliefF are analyzed as shown in Fig. 8. It is found that coke has the highest sensitivity for SEC, and cooling water has the highest sensitivity for GEC. This suggests that changing the amount of coke and cooling water flow rate may result in the maximum change in the SEC and GEC, respectively. Meanwhile, from Fig. 8(b), the third ranked input variable (cooling water) is shown to have a stronger effect than the first two ranked ones (i.e., material flow rate on conveyor PD-1 and ignition temperature #3) although they were both validated to be more relevant to GEC by RReliefF. This implies that the recorded cooling water flow rate has larger variation recorded than those of the material flow rate on conveyor PD-1 and ignition temperature #3. From an actual physical understanding, since coke is the main solid energy material for sintering, it is reasonable that it has a major impact on SEC. As cooling water is used for waste heat recovery of the exhaust and then heating up the air above the sintering machines, its flow rate may produce a significant impact on GEC.

The parametric analysis demonstrates how the energy consumption changes in response to the variation in attributes. With the proposed model, the energy consumption is computed as one attribute is changed across its variation interval, while the other attributes are kept fixed at their mean values. Figure 9 plots the predicted SEC and GEC over the ranges of the most significant attributes mentioned above, respectively. As shown in Fig. 9, the SEC monotonically increases with respect to coke, and GEC monotonically increases with respect to cooling water. It is easy to understand that SEC would ramp up with an increase in the amount of coke consumed. For cooling water, the more cooling water circulating indicates that more waste heat from exhaust needs to be recycled, which implies the gas consumption would increase.

The above analysis shows the relationships of energy consumption with respect to the significant attributes. Other attributes can also be analyzed similarly, which could then guide sintering production by choosing more efficient values for such attributes.

## Conclusion and Discussion

Modeling the energy consumption of sintering process represents a first key step in improving energy efficiency of steelmaking. In this paper, an integrated predictive model framework combined with anomaly detection and feature selection for sintering energy consumption is proposed. The contributions of this study are as follows. First, considering the pseudo-outliers within the sintering data, LOF is adopted for detecting anomalous points without removing those pseudo-outliers to obtain more diverse yet purified data. Second, RReliefF is used to eliminate redundant attributes in order to enhance robustness in energy consumption prediction and reduce computation time. Since production managers could not select out all these key factors very well based on experience, and conventional attributes were selected subjectively, this paper provides a novel and more objective way. Third, a bagging technique integrated with entropy weight method is employed to take advantage of both algorithms, aiming at improving the stability and generalization capability of ELM and SVR. The effectiveness of the proposed model framework is validated using data collected from a typical sintering line over 1 year.

Several conclusions could be derived: (1) The results suggest that the integrated model shows good generalization capability, high time efficiency, and excellent potential for engineering application and decision-making support, compared to other typical models. (2) Outlier detection and attribute selection play important roles on the accuracy of the model. In our case study, the first three related attributes for SEC are: amount of coke, amount of dolomite, and amount of blending ore; and for GEC, they are: material flow rate on the conveyor PD-1 (carrying hearth-layer material), ignition temperature #3, and cooling water flow rate. This reveals that different energy indicators are related to different attributes. (3) Parameters of the LOF (1.3) and number of attributes (30) were identified to be the optimal configuration for the proposed model. (4) The sensitivity and parametric analysis further indicated the robustness of the model by illustrating the relationships between the energy consumption and the significant attributes. It was found that amount of coke has the most significant impact on the SEC, while cooling water flow rate provides the most significant impact on the GEC.

It is noted that the emphasis of our work is to suggest a methodology of generalization for sintering energy consumption modeling. The process parameters may submit to change subtly according to specific sintering lines. Nevertheless, the case study adopted in this paper is typical among Chinese steel plants by prior survey, which could be easily applied to many other sintering lines. Even if data structure is different in other sintering plants, the model could be established robustly to adapt to the new cases, which is also an advantage of the model. Meanwhile, in attribute selection, by using the objective attribute selection method, RReliefF, we acknowledge that some attributes are little specific to be explained by physical process mechanism, which might imply unknown potential for energy reduction. Further research is needed to address these implicit relationships with energy consumption. In addition, the consistency of material quality is also an important factor for energy consumption. Further study could also be considered on change of the material quality within recorded data, which would also influence energy consumption.

Besides, we acknowledge that there are some coupled attributes which would exert influence on quality, cost, and robustness of the process when manipulating variables to obtain an optimum energy efficiency. Further research could be done on decoupling these attributes. In addition, it should be pointed out that the entire system can be seen as an optimization problem with objective function being sintering process burdening, aiming at achieving energy conservation. Such energy-conservation oriented optimization of sintering raw materials could be given more attention, which could result in achieving energy efficient burdening solutions. Some initial work has been done on this issue [4] in which energy consumption was described as a linear model. It is believed that the proposed nonlinear prediction models could result in better solutions for such optimization problems. Therefore, the proposed model would be highly potential for being applied into this problem.

## Acknowledgment

This work was supported by the NSF of China under Grant No. 61273046.