My research (Jun 2003 to Present) has focused on making explicit state model checking a tractable verification technique for finding subtle concurrency errors in software systems. I have designed, implemented, and evaluated various heuristics for guided model checking, randomized search techniques, and benchmarking criteria for explicit state model checking.
Abstraction Guided Symbolic Execution: The abstraction-guided symbolic execution technique quickly detects errors in concurrent programs. The input to the technique is a set of target locations that represent a possible error in the program. We generate an abstract system from a backward slice for each target location. The backward slice contains program locations relevant in testing the reachability of the target locations. The backward slice only considers sequential execution and does not capture any inter-thread dependencies. A combination of heuristics are to guide a symbolic execution along locations in the abstract system in an effort to generate a corresponding feasible execution trace to the target locations. When the symbolic execution is unable to make progress, we refine the abstraction by adding locations to handle inter-thread dependencies. We demonstrate empirically that abstraction-guided symbolic execution generates feasible execution paths in the actual system to find concurrency errors in a few seconds where exhaustive symbolic execution fails to find the same errors in an hour.
Distance heuristics in Estes: I have worked extensively on the
Estes model checker that reasons about the semantic concurrent
behaviors of C programs at the machine code level. I have designed and
implemented various distance estimate heuristics in Estes. Distance
estimate heuristics operate on the structure and control flow of the
machine-level instructions and are used to compute a heuristic
value. The heuristic value guides the model checking search in order
to drive the state space exploration toward the error. The heuristic
value is an estimated number of steps from the current location of the
program to a possible error location. Different distance estimates use
varying levels of calling context information to improve the accuracy
of the distance estimate computed on the control flow representation
of the machine-level instructions. The EFSM heuristic adds partial
context information by inlining function calls up to a certain
user-specified depth and also extracts the runtime call trace to
compute context information. The e-FCA heuristic computes a fully
context-sensitive distance estimate for non-recursive programs. The
algorithm is polynomial in the number of machine-level instructions.
Randomized guided search in JPF: I have done work to show that it is important to control for default search order in model checker implementations when conducting empirical analysis so performance is understood as algorithmic improvement rather than optimal search order in the default implementation. Randomized guided search in JPF controls for default search order in guided search techniques. It is implemented through a new searcher that includes with the heuristic estimate a secondary randomly generated value and a new priority queue that uses the randomly generated value as a secondary key while prioritizing states. This fully randomly guided search effectively controls for default search order in the model checker implementations over states that produce identical heuristic estimates. The search is embarrassingly parallel, and an extensive empirical analysis has shown it to improve performance over randomized depth-first and breadth-first search.
Labeled Transition Graphs for Gene Regulatory Networks: This is research that I did for a graduate class and continued to work on it after the class ended. Biologists attribute Down Syndrome characteristics to the interactions of chromosome 21 genes with non-chromosome 21 genes. The size and complexity of gene regulatory networks make it hard for biologists to empirically determine how the chromosome 21 genes are related to non-chromosome 21 genes. A pathway analysis tool mines available gene regulatory network databases. To determine gene relationships the analyses first flattens the gene regulatory networks into a single labeled transition graph. Common genes in different networks are the basis for interconnecting the gene networks. Next, a reachability analysis on the transition graph derives various gene relationships between chromosome 21 genes and non-chromosome 21 genes. The results are exciting for biologists because it now provides them a list of interesting gene interactions that have not been previously considered in understanding Down Syndrome characteristics.