Data Warehouse

How it started

Data warehouse has become an important part of support information system for audit work of the Supreme Audit Office. We process and evaluate available data from individual institutions of public administration. Prior to the setup of the data warehouse, the processing of external data took place in separate departments without complex coordination. This resulted in duplication, higher error rate, and incompatible outputs. As a consequence, the work of auditors was not always as effective and efficient as needed due to longer collection and preparation of the data. Consequently, the auditors did not have enough time for the actual audit activity.

To eliminate the above mentioned hindrances, it was the reason to centralise the collection of selected data, and provide for their processing in the data warehouse (DWH). Our focus was mainly set on data integrity and their historisation (performance of analyses focusing on time development, and data series over time). We had evaluated the pilot project and started to build the data ware house in January 2015.

Content and technology

We have chosen a standard database solution based on the Microsoft SQL Server 2012 platform. Thanks to its components it provides for corresponding environment with required flexibility and possibility for upgrade. Due to the potential variability of external sources, we had to ensure the flexibility of import procedures (various data formats, physical location, technical possibilities of data administrators, etc.). ETL processes are carried out within the SQL Server Integration Services (SSIS). At present, we upload the data in batches with monthly or quarterly periods. Most of the incoming files are in Excel (*.xlsx) or dBase (*.dbf) format. The intermediate stage between source files and tables in the DWH is secured by stage databases which, inter alia, help to correct and validate the data. The final product of the whole process is than a relational database containing data in the form of fact and dimensional tables. Nowadays, we have been storing data from three information systems of public administration.

An imaginary transition between the DWH and BI at the Supreme Audit Office provides OLAP technology and its multidimensional structure (cube). The structure is defined with the help of SQL Server Analysis Services (SSAS), and produces data readable even for ordinary users. Another advantage of this structure is the speed of data access. The speed is enabled by data storage in MOLAP which ensures the best performance for queries. This we use mainly for analyses and reporting.

As it was not possible to specify all tasks and requirements for data processing in advance, we had to arrange for substantially flexible and user-friendly tools. This role has been currently fulfilled by reliable MS Excel and a third tool from the Data Tools series - SQL Server Reporting Services (SSRS).

We carried out a partial upgrade of the system in the second half of 2016, and while the Database Engine and SSIS stayed in the 2012 versions, the BI part (SSAS and SSRS) are running separately on SQL Server 2016.

Future

For the next steps in development of our DWH we plan to use modern, but well-tried technology, for example, we will migrate the system into the SQL Server 2016 environment. An inseparable part of the DWH extension process is a gradual transition from the batch file import to fully automated import process within information systems of public administration.

The content should be consequently expanded by the use of new resources and the possibilities in the field of “Open Data”. This is currently beyond the DWH area, as it would concern directly prompted and connected LOD for immediate analyses of external data.

All SAO staff involved in the DWH agenda strives for the effectiveness of the whole process. It means that the DWH should now primarily integrate data from the most frequently used systems and stand as a reliable source of information in order to make the activities of the SAO more efficient.

„It is a capital mistake to theorise before one has data.“

Arthur Conan Doyle, Sherlock Holmes