Spatial Modeling Environment Overview

This page describes the structure of our Spatial Modeling Environment, the ModelBase-View-Driver architecture, and our Modular Modeling Language. It then discusses some computational issues related to distributed ecosystem modeling (interprocessor communication, portability, load balancing, and linking existing simulation code).

1. INTRODUCTION

Economic activity, climate, populations, technology, and environmental change are all intimately intertwined. There are many signs that the collective global economic activity is dramatically altering the self-repairing aspects of the global ecosystem. Our ability to change the economic and the ecological systems, and the rate of spread of the impacts of these changes far exceeds our ability to predict the full extent of these impacts. Protecting and preserving our natural life support systems requires the ability to understand the direct and indirect effects of human activities over long periods of time and over large areas. Crucial questions, like the long-term ecological and economic implications of global anthropogenic emissions of CO2, methane, sulfur oxides or other gases, the economic evaluation of long term ecological damages, and the prospects for sustainable development in a world with high rates of population growth, are not being adequately investigated. Computer simulations are now becoming important tools to investigate these interactions, but our modeling and understanding of these systems has been largely isolated and unconnected in disciplinary specialties.

Such questions are not being addressed, except in a piecemeal fashion, because of the formidable challenges of comprehending and modeling the extensive interconnections between the climate system, natural ecosystems, and socioeconomic systems - although each of these systems have been usefully modeled in smaller, isolated pieces. We believe that it is now possible, because of recent developments in data acquisition and display, information-sharing technologies, and large-scale computing and modeling, to build computer simulation models that cover the salient features of the world's climate, economy, and ecosystems, and the major interactions among them. To achieve this goal it will be necessary to mobilize this scholarly community within a worldwide "collaboratory" based on new electronic information sharing technologies ( Smarr 1994 ), bringing together leaders in advanced computation and software development with leaders in global ecological and economic data collection and modeling. The combined global ecological economic models will provide both an integrated conceptual framework, and a practical tool allowing researchers from many disciplines to collaborate effectively in order to produce effective answers to the questions mentioned above. The following sections discuss an ongoing development program in support of this collaborative spatial modeling effort.

2. SUPPORTING COLLABORATIVE GLOBAL MODELING

Spatially explicit modeling of ecological-economic systems is essential if one's modeling goals include developing a relatively realistic description of past behavior and predictions of the impacts of alternative management policies on future system behavior ( Risser, Karr et al. 1984 ; Costanza, Sklar et al. 1990 ; Sklar and Costanza 1991 ). There exists a rich set of research problems associated with the implementation of computer based collaborative technologies for global scale spatially-articulated ecological economic modeling. Five important areas of ongoing research and development are integrated support for 1) modular, collaborative model development, 2) transparent access to high performance computing resources, 3) graphical display & manipulation of model structure and dynamics, 4) multiple spatial representations, and 5) multiple dynamic modes.

2.1 Collaborative, Modular Model Development

Development of ecosystem models in general has been limited by the ability of any single team of researchers to deal with the conceptual complexity of formulating, building, calibrating, and debugging complex models. The need for collaborative model building has been recognized ( Goodall 1974 ; Acock and Reynolds 1990 ) in the environmental sciences. Realistic ecosystem models are becoming much too complex for any single group of researchers to implement single-handed, requiring collaboration between species specialists, hydrologists, chemists, land managers, economists, ecologists, and others. The current generation of models tend to be "idiosyncratic monoliths that are comprehensible only to the builders" ( Acock and Reynolds 1990 ). Communicating the structure of the model to others can become an insurmountable obstacle to collaboration and acceptance of the model. Policy makers are unlikely to trust a model they don't understand.

A well-recognized method for reducing program complexity involves structuring the model as a set of distinct modules with well-defined interfaces ( Gauthier and Ponto 1970 ; Goodall 1974 ; Acock and Reynolds 1990 ; Silvert 1993 ) Modular, hierarchical model structuring is well developed in the context of discrete event modeling ( Zeigler 1976 ), but has received comparatively little development in the realm of continuous modeling ( Goodall 1974 ; Cellier 1991 ; Silvert 1993 ). Ecosystem models with a modular hierarchical structure should be closer to natural ecosystem structure than procedural models ( Goodall 1974 ; Silvert 1993 ), since the component populations of ecosystems are themselves complex hierarchical systems with their own internal dynamics. Modular design facilitates collaborative model construction, since teams of specialists can work independently on different modules with minimal risk of interference. Modules can be archived in distributed libraries and serve as a set of templates to speed future development. The inheritance property of object-oriented languages allows the properties of object-modules to be utilized and modified without editing the archived object. A modeling environment that supports modularity could provide a universal modeling language to promote worldwide collaborative model development.

2.2 High Performance Computing

Tremendous computational resources are required to integrate the equations of a large spatial model in a reasonable amount of computer time. Large models typically require supercomputers for efficient execution. This class of models is a near ideal application for parallel processing since a typical model consists of a large number of cells that can be simulated semi-independently. Each processor can be assigned a different subset of cells, and most interprocessor communication is nearest-neighbor only. Despite their great promise and increasing availability, parallel architectures have not found much usage in the life sciences. The major barrier to wide acceptance of these techniques has been the difficulty of programming and debugging large parallel programs, and reluctance on the part of scientists to invest time in learning new languages and architectures. Model builders must usually make a substantial time investment learning a new language or development system when beginning work on a parallel computer.

2.3 Graphical Display

A second step toward reducing model complexity involves the utilization of graphical, icon-based module interfaces, wherein the structure of the module is represented diagramatically, so that new users can recognize the major interactions at a glance. Scientists with little or no programming experience can begin building and running models almost immediately. Inherent constraints make it much easier to generate bug-free models. Built-in tools for display and analysis facilitate understanding, debugging, and calibration of the module dynamics.

One major advantage of this graphical approach to modeling is that the process of modeling can become a consensus building tool. The graphical representation of the model can serve as a blackboard for group brainstorming, allowing policy makers, scientists, and stakeholders to all be involved in the modeling process. New ideas can be tested and scenarios investigated using the model within the context of group discussion as the model grows through a collaborative process of exploration. When applied in this manner the process of creating a model may be more valuable than the finished product.

2.4 Multiple Spatial Representations

Building realistic spatially explicit ecological-economic models requires the integration of multiple spatial data structures in a single model. For example, variables such as elevation and vegetation cover may be require a grid representation, while entities such as roads, rivers, and canals may favor a vector representation. An "area" representation may be most appropriate for lumped-parameter models that may be embedded in a spatial grid, such as a spatially-aggregated lake model that covers multiple grid cells in a landscape. Other objects may be represented as mobile points, such as entities that can wander around in the landscape. These and other spatial data structures should be implemented in the modeling environment, and the details of linking, transferring data between, and decomposing (over multiple processors) spatial representations should be transparent to the modelers.

2.5 Multiple Temporal Modes

Building realistic ecological-economic models requires the integration of multiple dynamic modes in a single model. For example, many processes are best represented using differential equations, others are best represented using event-based simulation, and others, such as input-output economic models, use a "black-box" or look-up table implementation. Some processes, such as storm events, are best handled with a hybrid approach. Since continuous (differential-equation based) simulation can be emulated in a discrete event framework (but not vice-versa), the underlying temporal dynamics of the simulation environment should be event based, but structured to efficiently emulate continuous systems.

3. SPATIAL MODELING ENVIRONMENT

In an attempt to address the conceptual and computational complexity barriers to spatio-temporal model development, we have developed a spatial modeling environment (SME), which links icon-based graphical modeling environments with parallel supercomputers and a generic object database ( Costanza and Maxwell 1991 ; Maxwell and Costanza 1994 ; Maxwell and Costanza 1995). This system allows users to create and share modular, reusable model components, and utilize advanced parallel computer architectures without having to invest unnecessary time in computer programming or learning new systems. The following sections give a brief description of the current design of the SME.

The SME design has arisen from the need to support collaborative model development among a large, distributed network of scientists involved in creating a global-scale ecological/economic model. It is intended that it's design be progressively more inclusive of the full range of relevant ecologic/economic modeling activities. In the interest of maximizing accessibility to a distributed network of collaborators, the system is designed to support a range of platforms, both in the front-end development environment and in the back-end parallel computing environment. We are thus led to the formulation of a three-part Modelbase-View-Driver architecture. The three components are displayed in the Figure and described below.

3.1 View

The View component of the SME is used to graphically construct, calibrate, and test biological/ecological modules. Although a customized graphical module constructor which can interact directly with the ModelBase would be an optimal choice for this component, due to current time and manpower limitations we have chosen to initially utilize commercially available graphical modeling tools such as STELLA, EXTEND, SimuLab, or VenSim, and later develop our own customized tool as time permits. As an example of a completed ecological module in View see the STELLA version of the CELSS unit model (figure) or the simple example STELLA model (figure).

Thus a number of commercially available modeling packages can be utilized as the View component of the SME (figure). The View graphical interface uses symbols that are based on Jay Forrester's systems dynamics language (Forrester, 1961) , (as well as user-defined symbols) which has become popular among modeling practitioners as a way to define and communicate a model's structure. These icons, representing variables and functional relationships in the model, are manipulated with the mouse to graphically build the model structure. Once the structure is established, the dynamics of the model are defined by clicking on the appropriate icon to generate a dialog box. The defining equations can then be typed in analytically, making use of numerous built-in functions, or entered graphically, using either a graph pad or a data table. In some packages specialized modules exist for implementing either continuous-time or discrete-time dynamics. When the definitions are complete, the model can be run. View will scale and plot the variables of interest in various formats. View will greatly increase the ease with which one can change the model and see the effects of those changes on the model's behavior. It allows the computer to handle the computational details and frees the user to concentrate on modeling, greatly reducing model development time.

3.2 ModelBase

In the next step toward constructing a spatial model, the Module Constructor translates the View ecosystem component modules into Module objects defined in our text-based Modular Modeling Language (MML). An example of a simple STELLA model simulating a predator-prey system is shown in the figure and the corresponding set of MML objects is shown in the table. The MML modules can then be archived in the ModelBase to be accessed by other researchers, and/or used immediately to construct a working spatial simulation. Many MML objects can be combined hierarchically in the MML. This MML hierarchy can then be converted by the Code Generator into a C++ object hierarchy within the spatial modeling environment (SME), where it drives a spatial simulation.

The MML, which is described in greater detail below, is designed to capture only the relevant dynamics of the simulation module being constructed, and leave out all implementation-specific details. For example, the features that can be represented in the MML include dynamics of growth, death, and transformation of biological/ecological entities, fluxes of water, nutrients, pollutants, etc., and the internal decision and learning processes of biological agents. The features that are not represented in the MML include the spatio-temporal implementation of the model, input and output of model data, and the distribution of the model over a set of processors. These features are implemented by the Code Generator and the simulation drivers.

3.3 Code Generators

The Code Generators convert a MML object hierarchy into a C++ object hierarchy which is incorporated into the simulation driver application to create a spatial simulation. The user customizes the set of objects generated by entering information into a set of configuration files that are initially generated by the Code Generator. An example of a simple configuration file is shown in the table. In the final version a menu-driven interface will be provided to facilitate this configuration step. During the configuration step the user specifies the additional information that is required to transform the MML object into a dynamic simulation object. The information entered falls into several general categories:

1) Space-time implementation. In this step, each MML object is associated with a frame, which specifies it's space-time implementation. A frame is a C++ object which specifies the topology of the spatial implementation of the module, methods for interacting with and transferring data to other frames, and temporal methods for handling the passing of time. The driver geometry object maintains a catalog of available frames. Examples of available frames include two-dimensional grids (e.g. for landscapes), graphs and networks (e.g. for river, canal, or neural networks), and agents (e.g. for individual agents moving about in the landscape). The user specifies a frame type (in the configuration language, see the table) as well as (a set of) GIS map(s) that the frame will read at runtime to configure itself. For example, with the g(2D,StudyArea) command in the table the user has configured a 2-dimension grid frame with an initialization map called StudyArea. At runtime the frame will read the map StudyArea from the GIS, configure itself to have the same number of rows and columns as StudyArea, and denote all cells corresponding to non-zero values in StudyArea as active.

2) Input/Output configuration. In this step the user configures input to the simulation from the biological/ecological databases and GIS. Input configuration must be done at code generation time because the Code Generator uses this information (together with the variable dependency graph) to determine variable types. Output configuration is done at runtime, although default values can be specified in the CG configuration files. For example, the command d(HareMap.PLM94) in the table denotes that at runtime the map HareMap.PLM94 will be read from the GIS and used to initialize the (spatial) variable HARES. The command A(2,0.2,0.0) in the same table means that by default the variable HARE_DENSITY will be output as an animation with a frame generated every 2 simulation timesteps and rescaled using the parameters (0.2,0.0).




Figure: Overview of the Modelbase-View-Driver architecture.

3.4 PointGrid Library


The PointGrid library (PGL) is a set of C++ distributed objects designed to support computation on irregular, distributed networks and grids. It contains the core set of objects on which the SME Driver is constructed. The PGL builds spatial representations from sets of Point objects (see below) with links. It transparently handles: 1) creation and decomposition (over processors) of Point Sets, 2) mapping of data over and between Point Sets, 3) Iteration over Point Sets and Point Sub-Sets, 4) data access and update at each Point, and 5) swapping of variable-sized PointSet boundary (ghost) regions. Some of the important PGL classes are: For example, consider the study area for the Patuxent Landscape Model, displayed in Figure 2. Each cell in the (non-black) study area is represented by a Point Object. Each cell (Point) in the blue-green/gray area is also part of the PLM river network. The PLM utilizes two PointSets: 1) A base grid PointSet which includes all Points in the study area with links to nearest neighbors in eight directions, and 2) a Network PointSet which includes all Points in the river network, each Point having a single link to it's downstream cell.
The white lines on the figure show how a DistributedPointSet object distributes the Points among four processors. The distribution algorithm attempts to allocate equal sub-sets of the study area (base grid) to each processor; all other PointSets inherit the same decomposition.
A Coverage object associates a floating-point number with each Point in it's PointSet. Each PointSet contains methods for efficiently 1) iterating over Points in the Set, 2) accessing neighboring Points in the Set, 3) accessing and updating associated Coverage values, 4) and mapping Coverages between PointSets.

3.5 Driver

The driver is a distributed object-oriented simulation environment which incorporates the set of code modules that actually perform the spatial simulation on the targeted platform. It is implemented as a set of distributed C++ objects linked by message passing. The code generator produces a set of code modules which are transferred to the target platform, compiled, and linked with the local driver modules to produce a working spatial simulation. The code generators also produce a set of simulation resource files that are used for runtime configuration of model parameters, input, output, and other simulation parameters. The driver then handles input-output of parameter, database, and GIS files and execution of the simulation.
Some of the important driver simulation classes are: The general structure of the driver is displayed in the figure below. Of all the objects displayed, only the interface object is visible to the user, the rest will perform their tasks automatically and invisibly. The major driver simulation service classses include:

Figure: The major Components of the SME driver.

4. A MODULAR MODELING LANGUAGE


The core of the SME is our text-based Modular Modeling Language (MML). The structure and syntax of the MML is described in greater detail here. The MML is designed to capture only the relevant dynamics of the simulation module being constructed, and leave out all implementation-specific details. For example, the features that can be represented in the MML include dynamics of growth, death, and transformation of biological/ecological entities, fluxes of water, nutrients, pollutants, etc., and the internal decision and learning processes of biological agents. The features that are not represented in the MML include the spatio-temporal implementation of the model, input and output of model data, and the distribution of the model over a set of processors. These features are implemented by the Code Generator and the simulation drivers.

As an example of MML Module development, the table displays a set of MML modules corresponding to the STELLA model in the figure, which simulates a simple deer-vegetation system. These equations were generated by the ModuleConstructor application from the STELLA model's equation export file. There are three modules displayed, DEER_Module (encapsulating deer dynamics), VEGETATION_Module (encapsulating vegetation dynamics), and Globals_Module (representing the linked model). Each Module has a set of Variable Objects, which can be internal (declared with the Variable command) or input from another Module (declared with the Input command). All internal Variables can serve as exports to other modules. The higher level Module Globals_Module incorporates the Connection commands which link outputs of one module to inputs of another. A more detailed example can be found here.
The table displays the configuration file associated with this model. A default version of this file is generated by the ModuleConstructor application. The lines beginning with '$' configure Module objects, and lines beginning with '*' configure Variable objects. The g(2D,file) command links the Modules to 2D grid frames which are initialized with a map file. The d(file) commands declare the associated variables to be map input objects, which are initialized with map file. The A() commands configure animation output.

5. PATUXENT LANDSCAPE MODEL EXAMPLE APPLICATION



The current applications of this framework include the Everglades Landscape Model (ELM), spatial modeling at the UIUC GMSLab, and the Patuxent Landscape Model, a regional landscape simulation model that can address the effects of different management and climate scenarios on the ecosystems in the Patuxent Watershed (see Figure 2). The PLM is being developed as part of the Ecological Ecosystem Models for Evaluating the Interactive Dynamics of the Patuxent River Watershed and Estuary Project funded by the Chesapeake Bay Research & Monitoring Division, Maryland Department of Natural Resources. The PLM contains about 6,000 spatial cells each containing a dynamic simulation model (based on the GEM model ( Fitz, DeBellevue et al. 1995 ) of approximately 20 state variables partitioned into 14 modules. It uses two frames, a 2D grid frame, covering the entire study area, (for modules such as Consumers, Nitrogen, Hydrology, Macrophytes, Detritus, etc.) and a tree-network frame, covering the river network (blue areas in the figure below) for the River module. The model will be calibrated with data from 1973 and 1985, and run for a scenario analysis period from 1985 to 2020 with selectively variable time steps from hourly to daily depending on forcing function dynamics. Application of this model in the Patuxent watershed is expected to allow extensive analysis of past and future management options, and will form the basis for future application to other areas in the Chesapeake Bay watershed.





Figure 2: The study area for the Patuxent Landscape Model, showing decomposition over 4 processors.

6. Distributed Processing

The SME simulation drivers are designed to run on a single platform, a heterogeneous distributed network of platforms, or on most massively parallel supercomputers. The following sections discuss some of the issues that must be addressed in creating a simulation application with this degree of versatility.

6.1 Inter-Processor Communication

When a simulation is distributed over a network of processors, it must stop at each timestep to exchange information between the processors, e.g. to flux water from a part of the landscape being simulated on one processor to an adjacent part being simulated on a different processor. The process of sending a packet of information from one processor to another is called "message passing".

Portability & Message Passing

The SME version 1 distributed simulation driver is currently implemented on the Connection Machine 2 (CM2) and on networks of Transputers and Sun Workstations. Each of these versions was implemented using a different set of communication protocols, making each new version of the driver machine-specific. The SME version 2 simulation driver is implemented using the MPI message passing library, allowing it to be ported to a wide variety of machines with only minor reconfiguration. The goal of this exercise is to generate code that is compatible with a wide range of architectures, not to obtain maximum performance on and single architecture. As increasingly efficient version of MPI become available for the various platforms supported, the computational efficiency of the SME will increase.

Link configuration

The View component of the SMP is utilized to build components of ecosystem site models. The site models correspond to single cells in the spatial array represented by the full spatial model. These site models must be linked by fluxes of materials and information in the process of building the spatial model. The SMP provides two methods for configuring interactions between cells.

The first method utilizes the following naming convention within the modeling environment. Variables that are given names such as name@(x,y) (e.g.,. surface_water_depth@(1,3)) are configured by the code generators to represent the value of variable name (e.g., surface_water_depth) x cells to the north and y cells to the east of the current cell. Through the use of this formalism, the user can configure a wide range of complex interactions between cells.

The second method allows the user to attach predefined fluxes, residing in Driver libraries, to variables in the modeling environment (e.g., Mannings equations attached to variable surface_water_flow) by editing the object-constructor configuration file, mentioned above. This process is described in more detail in (Maxwell & Costanza, 1993).

6.2 Load Balancing

Load balancing is an important concern for this type of application. In SME version 1 the spatial extent of the rectangular grid (see Figure 1) was divided into a set of nearly equally-sized rectangles, which were then distributed among the processors. This arrangement resulted in very efficient inter-processor communication with very poor load balancing, since some processors might be handling a sub-grid that fell entirely outside of the study area, and hence were idle for the duration of the simulation. In SME version 2 the rectangular grid is divided using a recursive N-section algorithm which allocates nearly equal portions of the study area to each processor (see Figure 2). This algorithm results in slightly less efficient inter-processor communication which is more then compensated by excellent load balancing.

6.3 Linking Existing Simulation Code into the SME

In order to create a new module in the SME one must develop it in the View graphical modeling environment or the Modular Modeling language (MML). There is a wealth of complex simulation code in existence in the world today, written mainly in FORTRAN or C, that would be too difficult to completely rewrite in the MML or a View-supported language. Therefore we are developing a standalone version of the network object displayed in figure 6 that will form the core of a "SME wrapper". This "wrapper" is a library of FORTRAN or C functions that a simulation developer can embed in existing "legacy" code to give it the ability to interact and exchange data with the SMP over the internet. We are currently investigating the feasibility of layering this functionality on top of existing object request broker infrastructure.

Once the wrapper is incorporated into the legacy simulation code, then SME variables can be linked with legacy variables using simple configuration commands. The SME and the legacy code can be run simultaneously and feed information back and forth across the internet. For example, a SME landscape simulation might wish to link with an existing hydrodynamics simulation to handle the hydrodynamics of the watershed.

7. REFERENCES

Acock, B. and J. F. Reynolds (1990). Model Structure and Data Base Development. Process Modeling of Forest Growth Responses to Environmental Stress.. R. K. Dixon, R. S. Meldahl, G. A. Ruark and W. G. Warren. Portland, OR, Timber Press.

Cellier, F. E. (1991). Continuous System Modeling. New York, NY, Springer-Verlag.

Costanza, R. and T. Maxwell (1991). "Spatial Ecosystem Modeling Using Parallel Processors." Ecological Modelling58: 159-183.

Costanza, R., F. H. Sklar, et al. (1990). "Modeling Coastal Landscape Dynamics." BioScience 40: 91-107.

Extend Simulation Software (1995). Imagine That Inc. (408)-365-0305.

Fitz, H. C., E. DeBellevue, et al. (1995). "Development of a General Ecosystem Model (GEM) for a range of scales and ecosystems." Ecological Modelling (in press).

Gauthier, R. L. and S. D. Ponto (1970). Designing Systems Programs. Englewood Cliffs, NJ, Prentice-Hall.

Goodall, D. W. (1974). The Hierarchical Approach to Model Building. Proceeding of the First International Congress of Ecology, Wageningen, Center for Agricultural Publishing and Documentation.

Maxwell, T. and R. Costanza (1994). Spatial Ecosystem Modeling in a Distributed Computational Environment. Toward Sustainable Development: Concepts, Methods, and Policy. J. van den Bergh and J. van der Straaten. Washington, D.C., Island Press: pp. 111-138.

Maxwell, T. and R. Costanza (1995). "Distributed Modular Spatial Ecosystem Modelling." International Journal of Computer Simulation: Special Issue on Advanced Simulation Methodologies 5(3): 247-262.

Risser, P. G., J. R. Karr, et al. (1984). Landscape Ecology: Directions and Approaches, Illinois Natural History Survey, Champaign, IL.

Silvert, W. (1993). "Object-Oriented Ecosystem Modeling." Ecological Modeling 68: 91-118.

Sklar, F. H. and R. Costanza (1991). The Development of Dynamic Spatial Models for Landscape Ecology. Quantitative Methods in Landscape Ecology. M. G. Turner and R. Gardner. New York, NY, Springer-Verlag. 82: 239-288.

Smarr, L. (1994). Personal Communication.

Zeigler, B. P. (1976). Theory of Modeling and Simulation. New York, N.Y., Wiley.