A Brief Introduction to SIGMA: An Intelligent Visual Programming Environment for Scientific Modeling

Richard M. Keller

Recom Technologies Inc., Computational Sciences Division, NASA Ames Research Center


Table of Contents

Introduction and Motivation

Within both NASA and the scientific community at large, computer models are playing an increasingly important role in the conduct of science today. Scientists construct software models to analyze data, to validate theories, and to predict a whole variety of phenomena. Developing a new scientific model is a time-intensive and painstaking process. Usually, scientific models are implemented using a general-purpose computer programming language, such as FORTRAN. Implementation can involve writing large and complex programs that access multiple datasets and utilize numerous different statistical and numerical processing packages. Software development time for large scientific models can take on the order of many months to years of effort.

Although considerable resources must be expended to build a scientific model, for a variety of reasons it may difficult to share the completed model with colleagues in the scientific community. Model-sharing is highly desirable because it enables researchers to conserve resources and build upon each others' efforts in a synergistic fashion. Unfortunately, modeling code is typically low-level and idiosyncratic, and it may be difficult for anyone but the model's developer to understand. The relationship between the computations in the code and the actual physical situation being modeled may be obtuse and indecipherable. Furthermore, a great deal of important information about the various modeling assumptions made by the modeler is buried in the code and is very difficult to recover. Finally, documentation may be minimal or lacking altogether.

Despite these well-recognized problems and despite the acknowledged importance of scientific model-building, scientists today generally lack adequate software engineering tools to facilitate the development and sharing of modeling software.

The SIGMA modeling tool

We have constructed a prototype knowledge-based software development environment that makes it easier for scientists to construct, modify, share, and understand scientific models. The SIGMA (Scientists' Intelligent Graphical Modeling Assistant) system provides a type of "visual programming" environment customized for scientists. Rather than construct models using a conventional programming language, scientists use SIGMA's graphical interface to "program" visually using a high-level data flow modeling language. The vocabulary of this modeling language includes high-level scientific constructs (e.g., physical quantities, scientific equations, and datasets) rather than low-level programming constructs (e.g., arrays, loops, counters). Because SIGMA enables users to express their models using a natural vocabulary and an intuitive format, colleagues can more rapidly understand and modify the content of a model without assistance from the modeler. These same characteristics make SIGMA an excellent instructional environment for demonstrating the principles underlying a scientific model.

During the model development process, SIGMA takes on the role of a knowledgeable and active assistant to the scientist rather than a passive and uninformed subordinate. SIGMA assists the scientist during the model- building process and checks the model for consistency and coherency as it is being constructed. Using knowledge about the modeling problem and the scientific domain, SIGMA can automatically interpret the high-level scientific model as an executable program, freeing the scientist from error-prone implementation details. Users can test these models, conduct sensitivity analyses, plot results, and modify models -- all within the SIGMA environment.


Figure 1. Data flow diagram representing computational dependencies in a model fragment.


The visual data flow interface

Within SIGMA, the scientist views a computational model as a graphical structure called a data flow diagram, as illustrated in Figure 1. The data flow diagram represents the computational dependencies between the scientific quantities being modeled. By scanning the diagram, users can understand rapidly how one quantity is derived from others by applying a series of scientific equations.

The data flow graph in Figure 1 consists of two types of nodes: equation nodes and quantity nodes. The equation nodes are depicted in thick-bordered boxes, while the quantity nodes are shown in thin-bordered boxes. The direction of computation in the data flow graph is from right to left. The quantities at the extreme right represent known input data or exogenous quantities in the model. These quantities flow toward one or more equation nodes, where they are used in an equation formula to yield an output quantity. In turn, these intermediate quantities flow toward other equation nodes, and the entire computation cascades along as new quantities are computed and passed forward to new equations. The entire model execution culminates in the production of one or more final output quantities at the extreme left of the diagram. To compute a model output, the user clicks with a mouse on the "Compute" button associated with that output quantity node. (The "Compute" button is only active if all the required input quantities for the computation have been properly entered.)

Accessing model information

Users can access a wide variety of documentation about the quantities and equations in the model by navigating through the data flow diagram. For example, by clicking on the "Info" button of an equation node, the user gets detailed information about the equation, including the equation formula and its inputs and outputs. Figure 2 illustrates the information window associated with the "Density computation" Equation. In addition to the formula and a brief description of the equation, note how each symbol in the formula is described in terms of the experimental situation being modeled. For instance, the symbol N represents the number density of a parcel of gases in the atmosphere of Titan, while R and p represent the refractivity and polarizability associated with a Voyager radiation source interacting with the atmospheric parcel.


Figure 2. Information window describing "Density Computation" equation


By clicking on the "Citation" button in the information window (Figure 2), the user can access a literature citation for the Density Computation equation. This citation is shown in Figure 3. If the user wants to go further and inspect the actual citation, the "Text" button brings up a scanned bitmap image of a relevant portion of the cited material.


Figure 3. Citation information associated with "Density Computation" equation


Clicking the "Info" button on a quantity node also provides useful information. Suppose the user has clicked the "Compute" button on the output density quantity at the extreme left of the data flow diagram in Figure 1. SIGMA will compute the model output and subsequently the user can view the results by clicking "Info". This action brings up the window shown in Figure 4. Because density is a gridded quantity, the system displays the value for each altitude gridpoint. The user can plot the values by clicking the "Plot" button at the bottom of the window. If the user wants to see the results converted into a different set of units, he or she simply clicks on the displayed units and specifies new units. Conversion is handled automatically by SIGMA.


Figure 4. Calculated values for number density


Modifying the model

Aside from executing a model, users may wish to modify the model or to conduct a "what-if" type of analysis. SIGMA facilitates modification because all changes are made via the high-level data flow interface. No low-level programming changes need to be made by the user to modify the model. To change an input value, the user clicks the "Input" button on an input quantity node and enters a new value. Any previously-computed value that depends on this value is then invalidated and the user must request recomputation if desired.

A more fundamental type of modification consists of changing one or more equations used to compute quantities in the model. This is done by clicking the right arrow button on the node representing the output quantity of the equation to be modified. For example, if the user wishes to compute number density using a different equation than the "Density Computation" shown in Figure 2, he or she clicks the right arrow button on the number density node and gets a menu of alternative equations to apply (Figure 5). These equations are fetched from SIGMA's equation library. Because SIGMA has a record of the conditions under which each equation in its library is applicable, SIGMA only presents the user with viable alternatives. These alternatives are filtered from among the set of over 150 different scientific equations in SIGMA's library. (Note that SIGMA's library contains black box subroutines, as well as explicit scientific equations. Users may add their favorite FORTRAN or C subroutines to the library and these can be inserted into SIGMA data flow diagrams.) If the user selects a different equation from the menu, SIGMA will modify the data flow diagram to reflect the change.


Figure 5. Applicable equations


For example, if the user decides to calculate density by applying the Ideal Gas Law rather than the Density Computation, the data flow graph is modified as shown in Figure 6. The Ideal Gas Law requires pressure and temperature as inputs to compute density. (The value of Boltzmann's Constant is already stored in SIGMA's knowledge base, so the user does not need to enter its value.) The user must now decide to either enter values for the required pressure and temperature inputs, or to select an equation to compute these input quantities. As with the number density computation above, the relevant equations can be viewed by clicking the right arrow button for these quantity nodes. The process of extending the data flow graph to the right of the Ideal Gas Law continues recursively until each of its inputs can be computed from known data.


Figure 6. Modified data flow graph after applying "Ideal Gas Law" equation


SIGMA's critical resource: Science Knowledge

There are a number of different visual programming tools available to scientists today, including tools for image processing and scientific visualization (Khoros [Khoros, 1992], AVS [AVS, 1992], SGI's Explorer [Explorer, 1993], Iconicode/IDF [Iconicode/IDF, 1992]), tools for scientific instrument design (LabVIEW [LabVIEW, 1992]), and tools for modeling or simulation (STELLA/IThink [STELLA/IThink, 1992], Extend [Extend, 1992]). Although these tools enforce simple syntactic checks on data flow graphs and perform some type-checking, none of these tools has an "understanding" of what the data flow program is doing or whether the operations on the data make sense. Because these software tools have virtually no information about the application domain, they have no basis upon which to evaluate the appropriateness of a data flow program for solving a particular application problem. As a result, it is possible with these tools to create a syntactically valid data flow graph that is semantically incoherent and fails to solve the intended problem.

SIGMA is unique because it utilizes an extensive knowledge base of information about the scientific domain to assist the user during the modeling process. SIGMA's knowledge base contains both general-purpose science knowledge (e.g., descriptions of widely-used quantities, scientific units, scientific constants, equations, scientific concepts) and problem-specific knowledge (information related to the specific modeling problem and scientific discipline). The general-purpose knowledge comes as a standard reusable component of SIGMA, while the model-specific knowledge must be added by the user to support each new modeling domain.

Utilizing its extensive knowledge base, SIGMA can provide the following types of unique knowledge-based support for the model-builder:

* Equation applicability testing: SIGMA actively screens each equation in its library to determine whether it is applicable in the current modeling situation. The user only sees a viable set of candidate equations.

* Model consistency checks: During the model-building process, SIGMA works to maintain the global consistency and scientific coherence of the evolving model.

* Equation entry error-checking : When entering new scientific equations, SIGMA ensures dimensional consistency.

* Automated scientific units maintenance: During model execution, scientific conversion is done automatically to maintain consistency.

* Reusable libraries: SIGMA's knowledge base includes reusable libraries of scientific equations, quantities, and constants.

Establishing the modeling context

Aside from its extensive knowledge about the scientific domain, SIGMA has available a detailed description of the background context against which the modeling activity occurs. This background knowledge about the modeling problem is essential for proper understanding and communication with the scientist.

One of the first and most important steps taken by a scientific modeler is to abstract a given real-world modeling problem by casting it in terms of a set of equations. Thereafter, the problem can be solved purely using mathematics. Unfortunately, as a result of this initial abstraction step, an important link back to the original problem has vanished; subsequently, model users may have difficulty making the connection between the equations and the real-world modeling context. Because the contextual information that gave rise to the set of equations is unavailable to these users, they may have a hard time understanding, interpreting, and modifying the model. Similarly, without the appropriate contextual information, SIGMA cannot understand and assist users with their modeling tasks.

Within SIGMA, we provide this essential connection to the modeling context by linking the numeric computation depicted in the data flow diagram with an object-oriented description of the physical system being modeled. We call this object-oriented description the modeling scenario.


Figure 7. Modeling scenario for Titan/Voyager encounter


Figure 7 illustrates the modeling scenario upon which the data flow diagram in Figure 1 is based. The diagram represents one portion of a model intended to compute an atmospheric profile of Saturn's moon Titan based on radio signals sent from the Voyager 1 spacecraft during its encounter with Titan in 1980. The scenario in Figure 7 describes all details of the Voyager/Titan encounter relevant to the modeling task. Associated with the Titan object in the Figure is an Atmospheric Grid of Location objects. At each location, there is an Atmospheric Parcel, which represents the mixture of gases at that location. Each parcel is composed of pure gas Constituents, such as nitrogen. The Voyager Signal originates from the Voyager Spacecraft and subsequently passes through the parcel, where it causes an energy-matter interaction represented by the Signal/Parcel Interaction object. Associated with each of these objects is a set of quantity attributes relevant to the modeling problem. Some of these attributes have known or assumed values, while other attributes are computed by applying scientific equations to the known attributes.

SIGMA relates the abstract numeric computation specified in a data flow graph to the real-world modeling context by linking each quantity node in the data flow diagram with a specific attribute of some object in the modeling scenario. For example, the node representing the number density quantity in Figure 1 corresponds to an attribute called "number-density" associated with the atmospheric parcel object in Figure 2, whereas the refractivity quantity corresponds to the "refractivity" attribute of the energy-matter interaction between the signal and the parcel.

SIGMA maintains useful information about each of the objects represented in the modeling scenario and their associated attributes. Each attribute has a text description and a set of associated scientific units. There is a hierarchy of object types, and each specific object instance in the scenario inherits information and attributes from more general objects in the hierarchy. For example, the Titan Atmospheric Parcel object is a specialization of the more general Physical Entity object. All subclasses of physical entity inherit attributes such as mass, density, and temperature, for example. By utilizing object-oriented techniques, SIGMA's infrastructure is easily modified to accommodate new scientific domains.

Two domains we have worked on extensively are planetary atmospheric modeling and terrestrial forest carbon-water transport modeling. Although these two domains seem quite different, they share in common some basic object and attribute definitions. SIGMA exploits these commonalities to reduce the user's burden of providing information to the system.

Status and Limitations

SIGMA has been developed in close collaboration with scientists in planetary sciences and ecosystem sciences at NASA Ames Research Center. We have successfully used SIGMA to reimplement and extend portions of two scientific models reported in the literature: TGM (Titan Greenhouse Model [McKay, Pollack, & Courtin, 1989]), and Forest-BGC (Forest Biogeochemical Cycles [Running & Coughlan, 1988]).

SIGMA is a prototype system and is still undergoing development and testing. The current version of SIGMA is being tested by several different types of users:

* model developers -- people who develop new models from scratch;

* model users -- people who primarily use models developed by others but who may need to make some modifications;

* model observers -- people interested in understanding a model, primarily for educational or training purposes.

SIGMA has shown promise for all three categories of users, but currently, its limitations are most serious with respect to the model developer.

SIGMA's main limitation is on the types of mathematical models that can be built within the framework. SIGMA currently handles non-coupled algebraic and first-order ordinary differential equations. However, many models require the use of simultaneous equations, and these cannot be handled easily within current system, although extensions are planned to enable their use.

SIGMA is written in CommonLISP and GINA, a Motif-based graphical user interface package. SIGMA runs on a Sun workstation.

For further information, see [Keller, Rimon, & Das, 1994], view the SIGMA Home Page at URL http://ic-www.arc.nasa. gov/ic/projects/sigma, or contact the author via email at keller@ptolemy.arc.nasa.gov.

References

AVS (1992). Software product. In Sunnyvale, CA: Stardent Computer, Inc.

Explorer (1993). Software Product. In Mountain View, CA: Silicon Graphics, Inc.

Extend (1992). Software product. In San Jose, CA: Imagine That, Inc.

Iconicode/IDF (1992). Software product. In Palo Alto, CA: Iconicon.

R. M. Keller, M. Rimon, and A. Das (1994). A Knowledge Based Prototyping Environment for Construction of Scientific Modeling Software. Automated Software Engineering, 1(1).

Khoros (1992). Software product. In Albuquerque, NM: Khoros Consortium, EECE Dept., Univ. of New Mexico.

LabVIEW (1992). Software product. In Austin, TX: National Instruments.

C. P. McKay, J. B. Pollack, and R. Courtin (1989). The Thermal Structure of Titan's Atmosphere. Icarus, 80, 23-53.

S. W. Running, and J. C. Coughlan (1988). A General Model of Forest Ecosystem Processes for Regional Applications: I. Hydrologic Balance, Canopy Gas Exchange and Primary Production Processes. Ecological Modelling, 42, 125-154.

STELLA/IThink (1992). Software products. In Lyme, NH: High Performance Systems.

Acknowledgments

The development of SIGMA has been an interdisciplinary effort, including contributions from many people over several years. Primary contributors include: Aseem Das, Jennifer Dungan, Caitlin Griffith, Chris McKay, Pandu Nayak, Esther Podolak, Michal Rimon, Michael Sims, and David Thompson. Thanks to Jennifer and Barry for providing useful comments on this article.

Joint funding for SIGMA has come from NASA's Applied Information Systems Research Program and from NASA's Mission Operations Program.