Looking at the BI market place today, a lot of products and vendors are telling us that we don’t need data warehouses to create those flashy and easy-to-use reports and dashboards. Although from a technical point of view this statement could be valid, in the real world you are not only using the data warehouse to gather information in one location. In the real world we see that a data warehouse does much more than that.
Next to just loading the data to a central store, data warehouses also validate data during ETL. This can be done in various ways. At the basic level, data will be checked whether or not it has the correct format or whether the business keys are indeed unique. When a relation should exist, data warehouses validate this relation and verify whether all mandatory fields are filled out.
Besides basic validations, also more advanced data quality checks could be performed. Think about validations to make sure that pricing is set correctly. Combine customer or other information from several sources and make sure that this information is merged correctly and in valid way. So both checking the quality of loaded data as improving or rating the quality are features of data warehouses. These data quality checks and improvements will assist in getting coherent, correct and structured information.
Although data relations were briefly mentioned already in the previous paragraph, it is important enough to mention it again as a separate topic. Linking data correctly is the basic foundation for getting correct information and figures. If a relation is using wrong or non-existing codes, no warnings are given. Yet data will be missing in reports. For example: if you have sales records linked to a customer that is not part of the self-service solution, the sales figures will not be showing up on the reports or dashboard. As a user you will most likely not see that these records are missing, because the tools aren’t equipped to handle that kind of problems.
Having relations validated by performing lookups in the correct dimension tables will make sure that records are linked correctly. If the dimension information is missing but records are still linked, it most likely will turn out to be a dummy record. If all sales records are linked to existing dimensional information, all sales figures will be visualized in reports and dashboards.
After data quality has been checked and relational lookups are performed, we verify the consistency of our data warehouse. We log all data warehouse information to provide the principal users of the system with the necessary information on the loading and data quality. Delivering users with data quality reports enables them to take necessary actions to improve data at the source system, resulting in higher quality at the reporting side. Also, by showing quality issues to other users, they can work proactively on avoiding and fixing the problems. Giving key users a tool to work proactively on quality will give a better “feeling” to end users as opposed to when these users would need to inform the key users about problems with the data.
Next to having data available for analysis and standard reporting, a lot of other projects benefit from automated reports. Some examples that are only a small subset of the existing possibilities:
Looking at operational systems we see that some information is very fragmented. Getting that fragmented information into a model will result in a more difficult reporting model. One of the major advantages of creating a data warehouse is that you can actually convert a complicated source model into an easy to use reporting model that provides information at the correct level.
When looking back at the past, most will probably remember the excel hell where all business users were creating their own reports and giving identical names to different calculations. Next to reinventing the wheel over en over again, this could also lead to different implementations of that same wheel. Leading to different results. So, one of the main reasons for having a data warehouse is having one location where data is loaded consistently over time according to standardized processes. This gives you the same figures with the same co-notation in different reports.
As you can see, there are plenty of reasons to perform your self-service reporting on top of a data warehouse. Especially for data that is at the core of your organisation. This doesn’t mean of course that you can’t enrich data with external or your own created data. Extending the information will give added value and more insight.