My research (Jun 2003 to Present) has focused on making explicit state model checking a tractable verification technique for finding subtle concurrency errors in software systems. I have designed, implemented, and evaluated various heuristics for guided model checking, randomized search techniques, and benchmarking criteria for explicit state model checking.
Abstraction Guided Symbolic Execution: The abstraction-guided symbolic execution technique quickly detects errors in concurrent programs. The input to the technique is a set of target locations that represent a possible error in the program. We generate an abstract system from a backward slice for each target location. The backward slice contains program locations relevant in testing the reachability of the target locations. The backward slice only considers sequential execution and does not capture any inter-thread dependencies. A combination of heuristics are to guide a symbolic execution along locations in the abstract system in an effort to generate a corresponding feasible execution trace to the target locations. When the symbolic execution is unable to make progress, we refine the abstraction by adding locations to handle inter-thread dependencies. We demonstrate empirically that abstraction-guided symbolic execution generates feasible execution paths in the actual system to find concurrency errors in a few seconds where exhaustive symbolic execution fails to find the same errors in an hour.
Distance heuristics in Estes: I have worked extensively on the
Estes model checker that reasons about the semantic concurrent
behaviors of C programs at the machine code level. I have designed and
implemented various distance estimate heuristics in Estes. Distance
estimate heuristics operate on the structure and control flow of the
machine-level instructions and are used to compute a heuristic
value. The heuristic value guides the model checking search in order
to drive the state space exploration toward the error. The heuristic
value is an estimated number of steps from the current location of the
program to a possible error location. Different distance estimates use
varying levels of calling context information to improve the accuracy
of the distance estimate computed on the control flow representation
of the machine-level instructions. The EFSM heuristic adds partial
context information by inlining function calls up to a certain
user-specified depth and also extracts the runtime call trace to
compute context information. The e-FCA heuristic computes a fully
context-sensitive distance estimate for non-recursive programs. The
algorithm is polynomial in the number of machine-level instructions.
Randomized guided search in JPF: I have done work to show that it is important to
control for default search order in model checker
implementations when conducting empirical analysis so
performance is understood as algorithmic improvement rather than
optimal search order in the default implementation. Randomized
guided search in JPF controls for default search order in guided
search techniques. It is implemented through a new searcher that
includes with the heuristic estimate a secondary randomly
generated value and a new priority queue that uses the randomly
generated value as a secondary key while prioritizing
states. This fully randomly guided search effectively controls
for default search order in the model checker implementations
over states that produce identical heuristic estimates. The
search is embarrassingly parallel, and an extensive empirical
analysis has shown it to improve performance over randomized
depth-first and breadth-first search.
Labeled Transition Graphs for Gene Regulatory Networks: This is
research that I did for a graduate class and continued to work
on it after the class ended. Biologists attribute Down Syndrome
characteristics to the interactions of chromosome 21 genes with
non-chromosome 21 genes. The size and complexity of gene
regulatory networks make it hard for biologists to empirically
determine how the chromosome 21 genes are related to
non-chromosome 21 genes. A pathway analysis tool mines available
gene regulatory network databases. To determine gene
relationships the analyses first flattens the gene regulatory
networks into a single labeled transition graph. Common genes in
different networks are the basis for interconnecting the gene
networks. Next, a reachability analysis on the transition graph
derives various gene relationships between chromosome 21 genes
and non-chromosome 21 genes. The results are exciting for
biologists because it now provides them a list of interesting
gene interactions that have not been previously considered in
understanding Down Syndrome characteristics.
