Preparing an initial data load for a model sometimes requires almost as much work as does creating and maintaining the dynamics of the model. Data inconsistency and data holes require attention; in a model like IFs with physical representations of partial equilibrium sectoral models (agriculture and energy) as well as a general equilibrium multi-sector model represented in value terms, there is the also the need to reconcile the physical and value data.
Creation of a data pre-processor within IFs moved the project from manual handling of issues around data loads to automatic, algorithm-based processing. The pre-processor greatly facilitates both partial data updates as better data become available and rebasing of the entire model to a new initial year (such as the rebasing from 2005 to 2010). It works with an extensive raw data file for all areas of the model, using data gathered for 1960 through the most recent year available. This allows it to create an historical data load (based in 1960) for the purposes of historical validation analysis, as well as the load for forecasting.
It is not the purpose here to fully document the pre-processor, but a summary description is important (see Hughes and Irfan 2006 for much more detail). In general, the pre-processing begins with demographics, and imposes total population data on the cohort-specific data by normalizing cohort numbers to the total. The pre-processor reads values for a wide range of population-related variables: total fertility rate, life expectancy, crude birth rate, crude death rate, urban population, migration etc. IFs uses cross-sectionally estimated relationships to fill holes in such data (generally with separate functions for the 1960 (historical base) and 20xx (future) data loads). Most often, functions driven by GDP per capita at PPP have had the highest correlations with existing data; the best functions have often been logarithmic, because the most rapid structural change occurs at lower levels of GDP per capita (Chenery 1979; Hughes 2001). The philosophy in demographics and in subsequent issue areas in the pre-processor is that values for all countries in IFs will come from data when they are available, but will be estimated when they are not.
The pre-processor then proceeds to the agricultural and energy issue areas. In agriculture, the pre-processor reads data on production and trade. It aggregates production of various crops into a single crop production variable used by the model. It similarly aggregates meat and fish production for the model. It computes apparent consumption. It reads data on variables such as the use of water and on the use of grain for livestock feed. It uses estimated functions or algorithms to fill holes and to check consistency (for instance, checking grain use against livestock herd and grazing land data).
In energy, the pre-processor reads and converts energy production and consumption to common units (billion barrels of oil equivalent). It checks production and reserve/resource data against each other and adjusts reserves and resources when they are inconsistent. Null/missing production values are often overridden with a very small non-zero value so that a “seed” exists in a production category for the subsequent dynamics of the model (a technique used by the Interfutures model of the OECD). World energy exports and imports are summed; world trade is set at the average of the two and country-specific levels are normalized to that average.
The outputs from processing of agricultural and energy data become inputs to the economic stage of pre-processing. The economic processing begins by reading GDP at both exchange rates and purchasing power and saving the ratio of the two for subsequent use in forecasting. The first real stages of economic data pre-processing center on trade. Total imports and exports for each country are read; world sums are computed and world trade is set at the average of imports and exports; country imports and exports are normalized to that global average. The physical units of agricultural and energy trade are read and converted to value terms. Data on materials, merchandise, service, and ICT trade are read. Merchandise trade is checked to assure that it exceeds food, energy, and materials trade, and manufactures trade is identified as the residual. All categories of trade are normalized. When this process is complete, the global trade system will be in balance. The use in IFs of pooled trade rather than bilateral trade makes this easier, but a similar process could be used for bilateral trade with Armington structures.
The processes for filling the SAM with goods and services production and consumption, and with financial flows among agent-classes follow next. More information can be found in the documentation of the economic model. The pre-processor then moves on to other models in IFs, including education, health, infrastructure and socio-political ones.
See Hughes and Hossain (2003) for earlier documentation on the SAM and also its initialization in the pre-processor. See Hughes with Irfan (2006) for documentation on the pre-processor.