To start, we compare here a Relevance Vector Machine (RVM), a Neural Network (NN)-based approach, and Gaussian Process Regression (GPR). These choices were motivated by the relative simplicity by which neural nets can approximate coefficients of an exponential damage propagation function in response to different operational stimuli and the elegance to deal with uncertainties of the Bayesian treatment of kernel-based methods in the form of RVM and GPR.

In our first study we used a dataset obtained on a test stand involving rotating equipment in an aerospace setting. This dataset contained a set of time series data from experiments running from no fault to some time after the fault. Several but not all experiments trip the failure threshold (set here at 45 units). In some experiments the equipment keeps operating after the failure criterion has been reached. Damage was measured for each run several times, once shortly after fault initiation and several times afterwards. There are only few measurements because it was very costly and impractical to obtain measurements. Different but constant, during a run, operational and environmental conditions were used for the training sets, except where the experiments were interrupted for taking ground truth measurement. In contrast, the set used for testing was subjected to varying conditions (cyclic loading).

We did not consider anomaly detection or diagnostics in this study and instead focused on the prognostic aspects. The data included a diagnostics flag indicating absence or presence of the fault. Perfect diagnostics is assumed and is used to trigger prognostics whenever diagnostic flag turns true. While this is an unrealistic assumption, it does not significantly affect this study. The primary challenges encountered arise from training with sparse damage measurements. Interpolation between the measurements or a curve fit performed on the set of measurements does not take into account that damage propagation is not necessarily a smooth process and can occur in non-linear increments. Another major issue is the extremely noisy nature of the data.

There are two main requirements for making predictions for such cases. First, we must estimate the current state of the system and second, we need to estimate the damage accumulation from there on till the failure condition is met. Features are expected to be good indicators of the damage level (system state). The operational conditions (e.g, system loading) are expected to affect the extent of damage accumulation. Keeping these requirements in mind for the dataset, we used our algorithms to learn two relationships. We made the form of the damage growth model was exponential in nature, i.e, D = exp(.t+C). First, we chose to exclude cases for training where ground truth data was either missing, consisted of less than 3 data points, or did not follow monotonic trends. We then fit an exponential curve to the damage ground truth data for that subset of cases, and assessed the values of parameters and C for each case. This provided a regression model to compute the damage progression rate for any set of operational conditions. Next, we established a relationship between the feature values and the extent of damage based on all ground truth data available from the training set. The model thus learned was used to estimate the current state of the damage based on feature values available at the time. Since the feature data was extremely noisy we used a simple moving average filter with window size ten to smooth any sharp variations.

We now briefly describe the three prediction algorithms used in our study.

*Neural Network-based Power Law Parameter Estimation*

For the NN-based approach, we first transformed the data into the log space, where damage propagation was linear. Then, the rate of change for operational settings could be learned such that the states for which there were no supporting experimental data were covered by a smooth curve, employing a network with low complexity (to avoid overfitting). The network was tasked to learn the damage propagation rate based on operational conditions which were given by two features. Data were preprocessed to remove bias. The results were smoothed to deal with large non-monotonicities.

*Relevance Vector Machine (RVM)*

Although the Support Vector Machine (SVM) is a state-of-the-art technique for classification and regression, it suffers from a number of disadvantages, one of which is the lack of probabilistic outputs that make more sense in health monitoring applications. The RVM attempts to address these very issues in a Bayesian framework. Besides the probabilistic interpretation of its output, it uses far fewer kernel functions for comparable generalization performance.

*Gaussian Process Regression (GPR)*

A Gaussian Process (GP) is a collection of random variables, any finite number of which have a joint Gaussian distribution. A real GP f(x) is completely specified by its mean function m(x) and co-variance function k(x,x'). The index set X in R is the set of possible inputs, which need not necessarily be a time vector. Given prior information about the GP and a set of training points {(x_{i},f_{ii = f(x) + ., where is additive IID N(0,. n 2). Once we have a posterior distribution it can be used to assess predictive values for the test data points.}