Manufacturers have faced an increasing need for the development of predictive models that predict mechanical failures and the remaining useful life (RUL) of manufacturing systems or components. Classical model-based or physics-based prognostics often require an in-depth physical understanding of the system of interest to develop closed-form mathematical models. However, prior knowledge of system behavior is not always available, especially for complex manufacturing systems and processes. To complement model-based prognostics, data-driven methods have been increasingly applied to machinery prognostics and maintenance management, transforming legacy manufacturing systems into smart manufacturing systems with artificial intelligence. While previous research has demonstrated the effectiveness of data-driven methods, most of these prognostic methods are based on classical machine learning techniques, such as artificial neural networks (ANNs) and support vector regression (SVR). With the rapid advancement in artificial intelligence, various machine learning algorithms have been developed and widely applied in many engineering fields. The objective of this research is to introduce a random forests (RFs)-based prognostic method for tool wear prediction as well as compare the performance of RFs with feed-forward back propagation (FFBP) ANNs and SVR. Specifically, the performance of FFBP ANNs, SVR, and RFs are compared using an experimental data collected from 315 milling tests. Experimental results have shown that RFs can generate more accurate predictions than FFBP ANNs with a single hidden layer and SVR.

## Introduction

Smart manufacturing aims to integrate big data, advanced analytics, high-performance computing, and Industrial Internet of Things (IIoT) into traditional manufacturing systems and processes to create highly customizable products with higher quality at lower costs. As opposed to traditional factories, a smart factory utilizes interoperable information and communications technologies (ICT), intelligent automation systems, and sensor networks to monitor machinery conditions, diagnose the root cause of failures, and predict the remaining useful life (RUL) of mechanical systems or components. For example, almost all engineering systems (e.g., aerospace systems, nuclear power plants, and machine tools) are subject to mechanical failures resulting from deterioration with usage and age or abnormal operating conditions [1–3]. Some of the typical failure modes include excessive load, overheating, deflection, fracture, fatigue, corrosion, and wear. The degradation and failures of engineering systems or components will often incur higher costs and lower productivity due to unexpected machine downtime. In order to increase manufacturing productivity while reducing maintenance costs, it is crucial to develop and implement an intelligent maintenance strategy that allows manufacturers to determine the condition of in-service systems in order to predict when maintenance should be performed.

Conventional maintenance strategies include reactive, preventive, and proactive maintenance [4–6]. The most basic approach to maintenance is reactive, also known as run-to-failure maintenance planning. In the reactive maintenance strategy, assets are deliberately allowed to operate until failures actually occur. The assets are maintained on an as-needed basis. One of the disadvantages of reactive maintenance is that it is difficult to anticipate the maintenance resources (e.g., manpower, tools, and replacement parts) that will be required for repairs. Preventive maintenance is often referred to as use-based maintenance. In preventive maintenance, maintenance activities are performed after a specified period of time or amount of use based on the estimated probability that the systems or components will fail in the specified time interval. Although preventive maintenance allows for more consistent and predictable maintenance schedules, more maintenance activities are needed as opposed to reactive maintenance. To improve the efficiency and effectiveness of preventive maintenance, predictive maintenance is an alternative strategy in which maintenance actions are scheduled based on equipment performance or conditions instead of time. The objective of proactive maintenance is to determine the condition of in-service equipment and ultimately to predict the time at which a system or a component will no longer meet desired functional requirements.

The discipline that predicts health condition and remaining useful life (RUL) based on previous and current operating conditions is often referred to as prognostics and health management (PHM). Prognostic approaches fall into two categories: model-based and data-driven prognostics [7–12]. Model-based prognostics refer to approaches based on mathematical models of system behavior derived from physical laws or probability distribution. For example, model-based prognostics include methods based on Wiener and Gamma processes [13], hidden Markov models (HMMs) [14], Kalman filters [15,16], and particle filters [17–20]. One of the limitations of model-based prognostics is that an in-depth understanding of the underlying physical processes that lead to system failures is required. Another limitation is that it is assumed that underlying processes follow certain probability distributions, such as gamma or normal distributions. While probability density functions enable uncertainty quantification, distributional assumptions may not hold true in practice.

To complement model-based prognostics, data-driven prognostics refer to approaches that build predictive models using learning algorithms and large volumes of training data. For example, classical data-driven prognostics are based on autoregressive (AR) models, multivariate adaptive regression, fuzzy set theory, ANNs, and SVR. The unique benefit of data-driven methods is that an in-depth understanding of system physical behaviors is not a prerequisite. In addition, data-driven methods do not assume any underlying probability distributions which may not be practical for real-world applications. While ANNs and SVR have been applied in the area of data-driven prognostics, little research has been conducted to evaluate the performance of other machine learning algorithms [21]. Because RFs have the potential to handle a large number of input variables without variable selection and they do not overfit [22–24], we investigate the ability of RFs for the prediction of tool wear using an experimental dataset. Further, the performance of RFs is compared with that of FFBP ANNs and SVR using accuracy and training time.

The main contributions of this paper include the followings:

Tool wear in milling operations is predicted using RFs along with cutting force, vibration, and acoustic emission (AE) signals. Experimental results have shown that the predictive model trained by RFs is very accurate. The mean squared error (MSE) on the test tool wear data is up to 7.67. The coefficient of determination (

*R*^{2}) on the test tool wear data is up to 0.992. To the best of our knowledge, the random forest algorithm is applied to predict tool wear for the first time.The performances of ANNs, support vector machines (SVMs), and RFs are compared using an experimental dataset with respect to the accuracy of regression (e.g., MSE and

*R*^{2}) and training time. While the training time for RFs is longer than that of ANNs and SVMs, the predictive model built by RFs is the most accurate for the application example.

The remainder of the paper is organized as follows: Section 2 reviews the related literature on data-driven methods for tool wear prediction. Section 3 presents the methodology for tool wear prediction using ANNs, SVMs, and RFs. Section 4 presents an experimental setup and the experimental dataset acquired from different types of sensors (e.g., cutting force sensor, vibration sensor, acoustic emission sensor) on a computer numerical control (CNC) milling machine. Section 5 presents experimental results, demonstrates the effectiveness of the three machine learning algorithms, and compares the performance of each. Section 6 provides conclusions that include a discussion of research contribution and future work.

## Data-Driven Methods for Tool Wear Prediction

Tool wear is the most commonly observed and unavoidable phenomenon in manufacturing processes, such as drilling, milling, and turning [25–27]. The rate of tool wear is typically affected by process parameters (e.g., cutting speed and feed rate), cutting tool geometry, and properties of workpiece and tool materials. Taylor's equation for tool life expectancy [28] provides an approximation of tool wear. However, with the rapid advancement of sensing technology and increasing number of sensors equipped on modern CNC machines, it is possible to predict tool wear more accurately using various measurement data. This section presents a review of data-driven methods for tool wear prediction.

Schwabacher and Goebel [29] conducted a review of data-driven methods for prognostics. The most popular data-driven approaches to prognostics include ANNs, decision trees, and SVMs in the context of systems health management. ANNs are a family of computational models based on biological neural networks which are used to estimate complex relationships between inputs and outputs. Bukkapatnam et al. [30–32] developed effective tool wear monitoring techniques using ANNs based on features extracted from the principles of nonlinear dynamics. Özel and Karpat [33] presented a predictive modeling approach for surface roughness and tool wear for hard turning processes using ANNs. The inputs of the ANN model include workpiece hardness, cutting speed, feed rate, axial cutting length, and mean values of three force components. Experimental results have shown that the model trained by ANNs provides accurate predictions of surface roughness and tool flank wear. Palanisamy et al. [34] developed a predictive model for predicting tool flank wear in end milling operations using feed-forward back propagation (FFBP) ANNs. Experimental results have shown that the predictive model based on ANNs can make accurate predictions of tool flank wear using cutting speeds, feed rates, and depth of cut. Sanjay et al. [35] developed a model for predicting tool flank wear in drilling using ANNs. The feed rates, spindle speeds, torques, machining times, and thrust forces are used to train the ANN model. The experimental results have demonstrated that ANNs can predict tool wear accurately. Chungchoo and Saini [36] developed an online fuzzy neural network (FNN) algorithm that estimates the average width of flank wear and maximum depth of crater wear. A modified least-square backpropagation neural network was built to estimate flank and crater wear based on cutting force and acoustic emission signals. Chen and Chen [37] developed an in-process tool wear prediction system using ANNs for milling operations. A total of 100 experimental data were used for training the ANN model. The input variables include feed rate, depth of cut, and average peak cutting forces. The ANN model can predict tool wear with an error of 0.037 mm on average. Paul and Varadarajan [38] introduced a multisensor fusion model to predict tool wear in turning processes using ANNs. A regression model and an ANN were developed to fuse the cutting force, cutting temperature, and vibration signals. Experimental results showed that the coefficient of determination was 0.956 for the regression model trained by the ANN. Karayel [39] presented a neural network approach for the prediction of surface roughness in turning operations. A feed-forward back-propagation multilayer neural network was developed to train a predictive model using the data collected from 49 cutting tests. Experimental results showed that the predictive model has an average absolute error of 2.29%.

Cho et al. [40] developed an intelligent tool breakage detection system with the SVM algorithm by monitoring cutting forces and power consumption in end milling processes. Linear and polynomial kernel functions were applied in the SVM algorithm. It has been demonstrated that the predictive model built by SVMs can recognize process abnormalities in milling. Benkedjouh et al. [41] presented a method for tool wear assessment and remaining useful life prediction using SVMs. The features were extracted from cutting force, vibration, and acoustic emission signals. The experimental results have shown that SVMs can be used to estimate the wear progression and predict RUL of cutting tools effectively. Shi and Gindy [42] introduced a predictive modeling method by combining least squares SVMs and principal component analysis (PCA). PCA was used to extract statistical features from multiple sensor signals acquired from broaching processes. Experimental results showed that the predictive model trained by SVMs was effective to predict tool wear using the features extracted by PCA.

Another data-driven method for prognostics is based on decision trees. Decision trees are a nonparametric supervised learning method used for classification and regression. The goal of decision tree learning is to create a model that predicts the value of a target variable by learning decision rules inferred from data features. A decision tree is a flowchart-like structure in which each internal node denotes a test on an attribute, each branch represents the outcome of a test, and each leaf node holds a class label. Jiaa and Dornfeld [43] proposed a decision tree-based method for the prediction of tool flank wear in a turning operation using acoustic emission and cutting force signals. The features characterizing the AE root-mean-square and cutting force signals were extracted from both time and frequency domains. The decision tree approach was demonstrated to be able to make reliable inferences and decisions on tool wear classification. Elangovan et al. [44] developed a decision tree-based algorithm for tool wear prediction using vibration signals. Ten-fold cross-validation was used to evaluate the accuracy of the predictive model created by the decision tree algorithm. The maximum classification accuracy was 87.5%. Arisoy and Özel [45] investigated the effects of machining parameters on surface microhardness and microstructure such as grain size and fractions using a random forests-based predictive modeling method along with finite element simulations. Predicted microhardness profiles and grain sizes were used to understand the effects of cutting speed, tool coating, and edge radius on the surface integrity.

In summary, the related work presented in this section builds on previous research to explore how the conditions of tool wear can be monitored as well as how tool wear can be predicted using predictive modeling. While earlier work focused on prediction of tool wear using ANNs, SVMs, and decision trees, this paper explores the potential of a new method, random forests, for tool wear prediction. Further, the performance of RFs is compared with that of ANNs and SVMs. Because RFs are an extension of decision trees, the performance of RFs is not compared with that of decision trees.

## Methodology

where $xi=(FX,FY,FZ,VX,VY,VZ,AE)$, $yi\u2208\mathbb{R}$. The description of these input data can be found in Table 1.

### Tool Wear Prediction Using ANNs.

ANNs are a family of models inspired by biological neural networks. An ANN is defined by three types of parameters: (1) the interconnection pattern between different layers of neurons, (2) the learning process for updating the weights of the interconnections, and (3) the activation function that converts a neuron's weighted input to its output activation. Among many types of ANNs, the feed-forward neural network is the first and the most popular ANN. Back-propagation is a learning algorithm for training ANNs in conjunction with an optimization method such as gradient descent.

Figure 1 illustrates the architecture of the FFBP ANN with a single hidden layer. In this research, the ANN has three layers, including input layer $i$, hidden layer $j$, and output layer $k$. Each layer consists of one or more neurons or units, represented by the circles. The flow of information is represented by the lines between the units. The first layer has input neurons which act as buffers for distributing the extracted features (i.e., $Fi$) from the input data (i.e., $xi$). The number of the neurons in the input layer is the same as that of extracted features from input variables. Each value from the input layer is duplicated and sent to all neurons in the hidden layer. The hidden layer is used to process and connect the information from the input layer to the output layer in a forward direction. Specifically, these values entering a neuron in the hidden layer are multiplied by weights $wij$. Initial weights are randomly selected between 0 and 1. A neuron in the hidden layer sums up the weighted inputs and generates a single output. This value is the input of an activation function (sigmoid function) in the hidden layer $fh$ that converts the weighted input to the output of the neuron. Similarly, the outputs of all the neurons in the hidden layer are multiplied by weights $wjk$. A neural in the output layer sums up the weighted inputs and generates a single value. An activation function in the output layer $fo$ converts the weighted input to the predicted output $yk$ of the ANN, which is the predicted flank wear $VB$. The output layer has only one neuron because there is only one response variable. The performance of ANNs depends on the topology or architecture of ANNs (i.e., the number of layers) and the number of neurons in each layer. However, there are no standard or well-accepted methods or rules for determining the number of hidden layers and neurons in each hidden layer. In this research, the single-hidden-layer ANNs with 2, 4, 8, 16, and 32 neurons in the hidden layer are selected. The termination criterion of the training algorithm is that training stops if the fit criterion (i.e., least squares) falls below 1.0 × 10^{−4}.

### Tool Wear Prediction Using SVR.

The original SVM for regression was developed by Vapnik and coworkers [46,47]. A SVM constructs a hyperplane or set of hyperplanes in a high- or infinite-dimensional space, which can be used for classification and regression.

The framework of SVR for linear cases is illustrated in Fig. 2. Formally, SVR can be formulated as a convex optimization problem

### Tool Wear Prediction Using RFs.

The random forest algorithm, developed by Breiman [22,48], is an ensemble learning method that constructs a forest of decision trees from bootstrap samples of a training dataset. Each decision tree produces a response, given a set of predictor values. In a decision tree, each internal node represents a test on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label for classification or a response for regression. A decision tree in which the response is continuous is also referred to as a regression tree. In the context of tool wear prediction, each individual decision tree in a random forest is a regression tree because tool wear describes the gradual failure of cutting tools. A comprehensive tutorial on RFs can be found in Refs. [22,48,49]. Some of the important concepts related to RFs, including bootstrap aggregating or bagging, slipping, and stopping criterion, are introduced in Secs. 3.3.1–3.3.4.

#### Bootstrap Aggregating or Bagging.

Given a training dataset $D={(x1,y1),(x2,y2),\u2026,(xN,yN)}$, bootstrap aggregating or bagging generates $B$ new training datasets $Di$ of size $N$ by sampling from the original training dataset $D$ with replacement. $Di$ is referred to as a bootstrap sample. By sampling with replacement or bootstrapping, some observations may be repeated in each $Di$. Bagging helps reduce variance and avoid overfitting. The number of regression trees $B$ is a parameter specified by users. Typically, a few hundred to several thousand trees are used in the random forest algorithm.

#### Choosing Variables to Split On.

For each of the bootstrap samples, grow an un-pruned regression tree with the following procedure: At each node, randomly sample $m$ variables and choose the best split among those variables rather than choosing the best split among all predictors. This process is sometimes called “feature bagging.” The reason why a random subset of the predictors or features is selected is because the correlation of the trees in an ordinary bootstrap sample can be reduced. For regression, the default $m=p/3$.

#### Splitting Criterion.

*s*, the inner minimization is solved by

Having found the best split, the dataset is partitioned into two resulting regions and repeat the splitting process on each of the two regions. This splitting process is repeated until a predefined stopping criterion is satisfied.

#### Stopping Criterion.

Tree size is a tuning parameter governing the complexity of a model. The stopping criterion is that the splitting process proceeds until the number of records in $Di$ falls below a threshold, and five is used as the threshold.

- (1)
Draw a bootstrap sample $Z$ of size $N$ from the training data.

- (2)
For each bootstrap sample, construct a regression tree by splitting a node into two children nodes until the stopping criterion is satisfied.

- (3)
Output the ensemble of trees ${Tb}1B$.

- (4)
Make a prediction at a new point $x$ by aggregating the predictions of the $B$ trees.

The framework of predicting flank wear using an RF is illustrated in Fig. 3. In this research, a random forest is constructed using $B$ = 500 regression trees. Given the labeled training dataset $D=(xi,yi)$, a bootstrap sample of size $N=630$ is drawn from the training dataset. For each regression tree, $m=9(m=(p/3),p=28)$ variables are selected at random from the 28 variables/features. The best variable/split-point is selected among the nine variables. A regression tree progressively splits the training dataset into two child nodes: left node (with samples <*z*) and right node (with samples $\u2265$*z*). A splitting variable and split point are selected by solving Eqs. (3.7) and (3.8). The process is applied recursively on the dataset in each child node. The splitting process stops if the number of records in a node is less than 5. An individual regression tree is built by starting at the root node of the tree, performing a sequence of tests about the predictors, and organizing the tests in a hierarchical binary tree structure as shown in Fig. 4. After 500 regression trees are constructed, a prediction at a new point can be made by averaging the predictions from all the individual binary regression trees on this point.

## Experimental Setup

The data used in this paper were obtained from Li et al. [50]. Some details of the experiment are presented in this section. The experimental setup is shown in Fig. 5.

The cutter material and workpiece material used in the experiment are high-speed steel and stainless steel, respectively. The detailed description of the operating conditions in the dry milling operation can be found in Table 2. The spindle speed of the cutter was 10,400 RPM. The feed rate was 1555 mm/min. The *Y* depth of cut (radial) was 0.125 mm. The *Z* depth of cut (axial) was 0.2 mm.

315 cutting tests were conducted on a three-axis high-speed CNC machine (Röders Tech RFM 760). During each cutting test, seven signal channels, including cutting force, vibration, and acoustic emission data, were monitored in real-time. The sampling rate was 50 kHz/channel. Each cutting test took about 15 s. A stationary dynamometer, mounted on the table of the CNC machine, was used to measure cutting forces in three, mutually perpendicular axes (*x*, *y*, and *z* dimensions). Three piezo accelerometers, mounted on the workpiece, were used to measure vibration in three, mutually perpendicular axes (*x*, *y*, and *z* dimensions). An acoustic emission (AE) sensor, mounted on the workpiece, was used to monitor a high-frequency oscillation that occurs spontaneously within metals due to crack formation or plastic deformation. Acoustic emission is caused by the release of strain energy as the microstructure of the material is rearranged. After each cutting test, the value of tool wear was measured off-line using a microscope (Leica MZ12). The total size of the condition monitoring data is about 8.67 GB.

## Results and Discussion

In machine learning, feature extraction is an essential preprocessing step in which raw data collected from various signal channels are converted into a set of statistical features in a format supported by machine learning algorithms. The statistical features are then given as an input to a machine learning algorithm. In this experiment, the condition monitoring data were collected from (1) cutting force, (2) vibration, and (3) acoustic emission signal channels. A set of statistical features (28 features) was extracted from these signals, including maximum, median, mean, and standard deviation as listed in Table 3.

Three predictive models were developed using ANNs, SVR, and RFs, respectively. Two-thirds (2/3) of the input data (i.e., three datasets) were selected at random for model development (training). The remainder (1/3) of the input data was used for model validation (testing). Figures 6–8 show the predicted against observed tool wear values with the test dataset using ANNs, SVR, and RFs, respectively. Figure 9 shows the tool wear against time with RFs.

In addition, the performance of the three algorithms was evaluated on the test dataset using accuracy and training time. Accuracy is measured using the $R2$ statistic, also referred to as the coefficient of determination, and mean squared error (MSE). In statistics, the coefficient of determination is defined as $R2=1\u2212(SSE/SST)$, where $SSE$ is the sum of the squares of residuals, $SST$ is the total sum of squares. The coefficient of determination is a measure that indicates the percentage of the response variable variation that is explained by a regression model. A higher *R*-squared indicates that more variability is explained by the regression model. For example, an $R2$ of 100% indicates that the regression model explains all the variability of the response data around its mean. In general, the higher the *R*-squared, the better the regression model fits the data. The MSE of an estimator measures the average of the squares of the errors. The $MSE$ is defined as $MSE=(1/n)\u2211i=1n(yi\u0302\u2212yi)2$, where $yi\u0302$ is a predicted value, $yi$ is an observed value, and $n$ is the sample size. The ANN, SVR, and RF algorithms use between 50% and 90% of the input data for model development (training) and use the remainder for model validation (testing). Because the performance of ANNs depends on the hidden layer configuration, five ANNs with a single hidden layer but different number of neurons were tested on the training dataset. Tables 4–8 list the MSE, *R*-squared, and training time for the ANNs with 2, 4, 8, 16, and 32 neurons. With respect to the performance of the ANN, the training time increases as the number of neurons increases. However, the increased in training time are not significant as shown in Fig. 10. In addition, while the prediction accuracy increases as the number of neurons increases, the performance is not significantly improved by adding more than eight neurons in the hidden layer as shown in Figs. 11 and 12. Tables 9 and 10 list the MSE, *R*-squared, and training time for SVR and RFs. While the training time for RFs is longer than that of ANNs and SVR, the predictive model built by RFs is the most accurate as shown in Figs. 10–12.

## Conclusions and Future Work

In this paper, the prediction of tool wear in milling operations was conducted using three popular machine learning algorithms, including ANNs, SVR, and RFs. The performance of these algorithms was evaluated on the dataset collected from 315 milling tests. The performance measures include mean squared error, *R*-squared, and training time. A set of statistical features was extracted from cutting forces, vibrations, and acoustic emissions. The experimental results have shown that while the training time on the particular dataset using RFs is longer than the FFBP ANNs with a single hidden layer and SVR, RFs generate more accurate predictions than the FFBP ANNs with a single hidden layer and SVR. The main contribution of this paper is twofold: (1) we demonstrated that the predictive model trained by RFs can predict tool wear in milling processes very accurately for the first time to the best of our knowledge and (2) we compared the performance of RFs with that of FFBP ANNs and SVR, as well as observed that RFs outperform FFBP ANNs and SVR for this particular application example.

In the future, a comparison of the performance of SVR and RFs with that of other types of ANNs, such as recurrent neural networks and dynamic neural networks, will be conducted. In addition, our future work will focus on designing the parallel implementation of machine learning algorithms that can be applied to large-scale and real-time prognosis.

## Acknowledgment

The research reported in this paper is partially supported by NSF under Grant Nos. IIP-1238335 and DMDII-15-14-01. Any opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the National Science Foundation and the Digital Manufacturing and Design Innovation Institute.