We have experimentally evaluated our approach on a number of ADAPT fault scenarios. In order to enable RTOS embedding, the ADAPT BN was compiled (off-line) into an arithmetic circuit, which was then evaluated on-line. A unique point compared to previous Bayesian network-based research on EPS diagnosis is how we reduced a complex diagnostic search space into an arithmetic circuit (supported by a small-footprint arithmetic circuit evaluator). Compiling an ADAPT BN, which contains over 400 nodes representing over 100 EPS components, to an arithmetic circuit, and evaluating it using the arithmetic circuit evaluator (ACE), turns out to give accurate diagnostic results as well inference times that are less than one millisecond for all our fault scenarios. This is a successful demonstration of our approach on a real-world problem of great importance to NASA.
We now turn to experiments using ADAPT and different inference algorithms. Experiments are divided into two sets: hand crafted, real-world scenarios from ADAPT and simulated scenarios that were automatically generated from an ADAPT BN. In both cases, we executed probabilistic queries over the health variables in order to find out which components or sensors, if any, were in non-healthy states.
The ACE system was used to (i) compile an ADAPT BN into an arithmetic circuit and (ii) evaluate that arithmetic circuit. The timing measurements reported here were made on a PC with an Intel 4 1.83 GHz processor, 1 GB RAM, and Windows XP.
ID | Fault Description | Diagnosis | Match |
---|---|---|---|
304 | Relay EY2 failed open | Health_relay_ey2_cl = stuckOpen | Yes |
305 | Relay feedback sensor ESH175 failed | Health_relay_ey175_cl = stuckOpen | Yes |
306 | Circuit breaker ISH262 tripped | Health_breaker_ey262_op = stuckOpen | Yes |
308 | Voltage sensor E261 failed | Health_e261 = stuckVoltageLo | Yes |
309 | Battery BATT1 voltage low | Health_battery1 = stuckDisabled | Yes |
310 | Inverter INV1 failed off | Health_inv1 = stuckOpen | Yes |
311 | Load sensor LT500 failed | Health_LT500 = stuckLow | Yes |
Diagnostic results for different fault scenarios (with IDs 304, 305, ...) for the electrical power system testbed ADAPT.
For experimentation using real-world data, EPS failure scenarios were generated using the ADAPT EPS at NASA Ames. These scenarios cover both component failures (experiments 304, 306, 309, and 310 in the table above) and sensor failures (experiments 305, 308, and 311); many previous efforts have only considered one type of failure. After ADAPT system reconfigurations and fault insertion (for example insertion of Relay EY260 failed open -- see ID 304 in the table above), the ADAPT BN or an arithmetic circuit compiled from it is used to compute a diagnosis. The variant of the ADAPT BN used here was largely auto-generated and contains 434 nodes and 482 edges; the BN node cardinalities range from 2 to 4 with mean 2.27. ACE was used to compute most probable explanations (MPEs) and most likely values (MLVs). To compute maximum aposteriori probability (MAP), SamIam was used. Here are the timing results for ACE:
Execution time results, in milliseconds, for ACE for the ADAPT testbed when computing diagnoses using the most probable explanation (MPE).
Execution time results, in milliseconds, for ACE for the ADAPT testbed when computing diagnoses using the most likely value (MLV).
The results of the ADAPT experiments are provided in the result table and figures above. Since there is over 120 nodes, we only show the variables deemed to be non-healthy in the table. Further, the diagnostic results of the MPE, MLV, and MAP queries turned out to be the same; hence we consolidate them into one column called “Diagnosis” in the table. ADAPT uses a 2 Hz sampling rate, and a probabilistic query was posed to ACE after each sample in an experimental run. The execution time statistics displayed in the above figures are based on the execution times for all probabilistic queries during an experimental run. Each execution time is for an entire inference step, i.e. translating measurements to evidence, committing evidence to the arithmetic circuit, and evaluating the arithmetic circuit.
Our main observations regarding these experiments are as follows. First, we see in the table above that the different diagnostic queries correctly diagnose all these component and sensor failure scenarios. Second, we emphasize the fast and predictable inference times for the ACs in the timing figures above. These are both very important factors in real-time electrical power system health management.
Simulated data was created by a program that (i) generated a set of failure scenarios according to the probabilities of the ADAPT BN's health nodes, and (ii) for each failure scenario, generated an evidence set on sensor nodes. This large number of evidence sets was then run through different inference systems. In addition to arithmetic circuit evaluation (ACE), we performed experiments with variable elimination (VE) and clique tree propagation (CTP).
Inference Time (ms) |
MPE | Marginals | ||
---|---|---|---|---|
VE | ACE | CTP | ACE | |
Minimum | 17.25 | 0.17 | 8.527 | 0.4934 |
Maximum | 38.45 | 2.779 | 54.51 | 5.50 |
Median | 17.63 | 0.1995 | 9.204 | 0.24 |
Mean | 17.79 | 0.2370 | 10.02 | 0.6981 |
St. Dev. | 1.513 | 0.2137 | 4.451 | 0.6669 |
Results for different inference algorithms (VE, ACE, and CTP) when computing MPEs and marginals using data generated from the ADAPT BN.
The table above summarizes the results of experiments with 200 simulated evidence sets generated from the ADAPT BN. ACE is, on average, over 75 times faster than VE when computing. In addition, we note how ACE can compute all marginals using just slightly more time than what is used for MPEs. In other words, ACE can compute over 400 probabilities 25 times faster than VE computes a single probability. CTP can be used to compute marginals in order to overcome VE's limitation of computing only one probability at a time, but even CTP is over 14 times slower and has higher standard deviation than ACE.
In summary, VE, CTP, and ACE all run quite efficiently on the ADAPT system, but ACE is one or two orders of magnitude more efficient than the other algorithms, while having lower standard deviation. Diagnostic inference for ADAPT is therefore very efficient for two reasons. First, the BN was carefully generated, using our novel auto-generation algorithm, in a manner that supports efficient inference using any reasonable exact inference algorithm. Second, the particular arithmetic circuit algorithms we have emphasized here, as implemented in ACE, provide very large additional gains.