ANOTHER IT CO
Loading Fact Tables
When populating fact tables, the value of foreign keys must be known before a fact row can be inserted. These foreign key values, primary key values in dimension tables, may not be initially known. This most often is the case when new dimension rows are discovered, and your warehouse design requires that dimension table primary key values be automatically assigned by the database. To overcome this situation, a multi-step design must be applied to processing fact rows.
Processing fact rows in a multi-step manner exploits the unique capabilities of DataStage to integrate operating system and database features within a DataStage job.
- Process fact rows without regard to dimension key values, instead retaining dimension column values. Theses dimension column values will later be used to determine dimension key values.
- For each dimension table, create a temporary dimension table in you database whose structure is similar to the dimension table.
- Populate the temporary dimension tables using the retained dimension column values from step 1, setting the dimension key column value to NULL.
- Join the temporary dimension tables with the dimension tables, updating the dimension key column in each temporary dimension table.
- For all rows in the temporary dimension tables with a dimension key column value of NULL, insert the row into its dimension tables.
- Join the temporary dimension tables with the dimension tables for all rows in the temporary dimension tables with a dimension key column value of NULL, updating the dimension key column in each temporary dimension table.
- Create a hash file for each temporary dimension table whose key columns are all columns other than the dimension key value.
- Populate the temporary dimension hash files with the rows from the temporary dimension tables.
- Process the fact rows created in step 1, performing reference lookups to the temporary dimension hash files, resolving the dimension key values, and creating an output file compatible with your database’s bulk loader (e.g. SQLLDR).
- Execute your database’s bulk loader using the file created in step 9 as input.
The implementation of this multi-step process in simpler than its description. The entire process can be implemented as three DataStage jobs and two database scripts.
- Step 1 of the process is simple a DataStage job with multiple outputs, one for the fact table, and one for each (temporary) dimension table.
- Steps 2 through 6 are implemented as a single database script that is executed as an after job stage of the step 1 DataStage job.
- Steps 7 and 8 represent a single DataStage job with an independent active stage to load each hash dimension table from the temporary dimension table.
- Step 9 is another simple DataStage job with a single input and output, and multiple reference lookups.
- Step 10 is another database script that is executed as an after job stage of the step9 DataStage job.
Copyright 2016 Another IT Co