Not everybody tends to display data warehouse information in the same way. This is why we have created four so-called special dimension members to avoid confusion and establish some uniform ground rules. Do you want to jog your memory on data warehouses, dimensions or fact tables? Be sure to read my previous blog on it, then.
NULL values in records may not only cause errors in calculations, they may also be interpreted differently by data warehouse users. In order to use NULLs in various data warehouse scenarios, we introduce special row members in dimensions tables.
- as Unknown
- as Not Specified
- as Not Applicable
- as History/Archive
Figure 1: Product dimension table including special members
The Unknown special member is used as a replacement for a missing mandatory value. To illustrate: the sale of a product X, on date A, for customer B. Suppose that a wrong product code was specified, either an invalid one or maybe one that was missing in the sales record. To prevent incorrect or incomplete data from being uploaded into the data warehouse, a possible solution would simply be not to upload this sales fact at all. However, this would impact information analyses that involve other dimensions too. After all, sales reports concerning other dimensions (such as time, customer …) would no longer be generated over the complete data set. Allocating this sales fact to the Unknown special member would offer a fitting answer. As a result, the sales row can be added to the facts table and analysis of sales data involving other dimensions will be complete. So, the Unknown member represents an erroneous situation. It is applied when mandatory dimension values are missing or do not exist in the specific dimension table. Afterwards, when it turns out that the sales fact can be linked to a valid product code, the row can be uploaded again, referring to the correct product.
Sometimes, facts are linked to non-mandatory dimensions. For example: suppose that our fact table registers online customer surveys. A possible dimension would be an optional telephone number record. In many survey results this value would be missing. This is not interpreted as an erroneous situation but as acceptable.
In certain fact tables, the absence of a dimension value is an explicit choice. For example: a vacation trip. Often, some type of insurance (repatriation, accidents, cancellation …) will be offered. However, not opting for an insurance package is an acceptable situation.
It happens that a data warehouse has been operational for some time and that new business needs require the upload of a data set from an old archived system. To illustrate: the production orders from a former ERP system that has been replaced in the meantime. Some dimensions values may no longer be available, such as product categories. In this case, the fact could be expanded with old production records, using the History special member as the value for the product category dimension. This way, old data can be uploaded into an existing model.
Special members for NULL scenarios
The distinctive NULL scenarios mentioned above should no longer impose an expansion of your data warehouse model. Using special row members in the dimension tables, allows you to continue to use your existing facts tables. The ETL process should map facts to the corresponding special members.