The results from this study indicate that all algorithms can in principle come up with remaining life estimation, although the actual remaining life estimates vary considerably. The figure below shows prediction trajectories obtained from all three algorithms. As can be seen in the figure, all the algorithms start predictions from different damage levels. This is because we let these algorithms use their own respective estimates of the current damage level at the time of prediction. One can observe similar trends for all algorithms with some variation in the local slopes. Two of the algorithms come up with late predictions as the time approaches failure, whereas the RVM does not produce late predictions. It must be noted that from safety point of view making conservative predictions is often preferred over making late predictions. The superimposed prediction of the three algorithms is shown at times t=3750, t=4250, t=4750, and t=5250 that have a true remaining life of 1637, 1137, 637, and 137 time units. The numerical results are summarized in Table I.

Damage prediction trajectory of the three algorithms at different times using algorithm specific damage estimates

Table I

Current state estimation accuracy is a function of the diagnostic capability of an algorithm. To provide a better comparison of the algorithms, we deployed them using the same starting damage levels. We chose the damage level estimates provided by the GPR algorithm as a common initial point. The results have been summarized in below in Table II. In this case, the RVM algorithm also results in late predictions.

Damage prediction trajectory of the three algorithms at different times using a common damage estimate

Table II

One issue encountered when the estimated time of failure is later than the actual time of failure is what operational conditions to use, since only the operational conditions up to actual failure exist. We assumed here that the conditions would be repeated using the same cycles as up to failure. Clearly, the error will change based on what operational conditions are being chosen.

**DISCUSSION**
Generally, the prediction accuracy seems acceptable. The (somewhat arbitrary) metric that the accuracy of the prediction performed halfway between first fault detection and actual failure should be within 20% of the actual remaining life is met by all three algorithms. However, this metric does not account for the prediction and accuracy at later times. Indeed, performance at other times varies considerably. Generally, one would expect that the prediction error becomes smaller the closer one gets to the actual end of life. Yet, that is only true for the RVM when it uses its own customized damage estimates. Moreover, all three algorithms predict late as the remaining useful life gets smaller when using the common damage estimation. This is clearly in part a function of the damage state estimation. The figure below shows the predictions using the GPR as an example of how the damage level estimates impact the prediction quality. What is apparent is that the damage degree estimate does not monotonically increase, which accounts for a large degree of the variation of the remaining life estimation. Superimposed are also the ground truth measurements which one would not typically have in a fielded system. The possible explanation that the damage progression does not follow the same model as during the earlier time should not distract from the lack of a metric that (besides accuracy) quantifies the prediction qualities over time. Indeed, while data-driven techniques may generally be considered an attractive alternative for prognostics in situations where models are hard to come by, unstable prediction results can occur due to sensitivity to state estimation (for the NN-based approach) or due to sensitivity to training data coherence (for the RVM-based approach).

RUL predictions using GPR

The intrinsic ability of RVM and GPR to fit probability distribution functions (PDFs) to the data is desirable for prognostics where uncertainty management is of paramount importance. What remains is a validation that the uncertainty estimates are in fact reasonable. A more formal approach for uncertainty management that gives an upper bound for the confidence would be desirable. In addition, we note here again the need for a metric that describes the quality of the uncertainty properties.

**Neural Networks**

The limitations of the NN based approach arise from the fact that it finds it difficult to provide a smooth curve for damage rate that can be obtained from the training data. If the training data, as was the case here, exhibit trajectories that do not support that model, it is hard to eliminate those trajectories when only few training data exist and without using some knowledge about the underlying physics. Consequently, the NN performance varies primarily with the choice of training data and of course also with the design of its architecture.

**Relevance Vector Machine**

In the case of the RVM, its power to detect underlying trends in noisy data lies in its ability to use probabilistic kernels to account for the inherent uncertainties in the application domain. However, this advantage can also be a drawback if there are insufficient points in the training dataset or if the test dataset is unknown or significantly different such that a validation of the RVM performance on the training data has little bearing on its performance in the test case. The figure below shows the difference in performance of the RVM in estimating the damage level from the feature values in the test dataset, having trained on selected datasets with different kernel widths. The plot on the left shows that all the widths do well on the training sets while their performances differ widely on the test case, with 7 being the optimal width value. Thus, it is difficult to come up with a strategy to select kernel width without any knowledge of the test case data.

Impact of kernel width on damage mapping for regression and prediction

**Gaussian Process Regression**

While GPR provides a theoretically sound framework for prediction tasks, it has some limitations in its use as well. As mentioned earlier, choosing a correct covariance function is critical because it encodes our assumption of inter-relationships within data. While there are several covariance functions available from the literature, it is sometimes difficult make a choice in absence of any knowledge about the actual process that governs the system. Although methods have been suggested to evaluate various covariance functions based on likelihood values, the task is reduced to picking the best out of available ones but this still does not guarantee that our assumptions about the process were correct.

GPR provides variance around its mean predictions. The premise is that it computes posterior by constraining the prior to fit the available training data. Therefore, any prediction points lying close to training data in the input space are often predicted fairly accurately and with high confidence (small variance). For the regions where training data was not sufficiently available GPR may still predict the mean functions fairly well, assuming a suitable covariance function was identified and hyper-parameters were reasonably set. However, the confidence bounds it provides tend to be extremely conservative (large variance). Whereas this may not be very counter-intuitive for predictions involving a long time horizon, these bounds get unmanageable unless somehow contained.

Another limitation arises from the fact that GPR scales typically as O(n3) with the increasing number of training examples. In our application this did not pose a problem as we had a small training data set. However, it may be a limitation in terms of computational time and power in an online prognosis type of application. Various methods have been suggested for approximating the computations to reduce the problem but it can get tricky as data size increases and prediction horizon shrinks.