Testing for BI & DWH

Since ever testing is part of every IT project plan – that’s true as well for Business Intelligence (BI) & Data Warehouse (DWH) projects. The practical implementation of testing in the BI / DWH environment has confronted me with troubles in the past again and again. Often I’ve had the impression that the BI / DWH world is still back in the Stone Age regarding development processes and environments. At least it is significantly behind the maturity level I know from the software engineering domain. The below chart illustrates this gap:

rbra_testing1 (1)

Cultural differences between the software development and BI community (Source: http://agiledata.org/essays/culturalImpedanceMismatch.html)

If there is something tested at all, typically in the BI frontend area things are tested manually. In the DWH backend we see – besides manual tests – self coded test routines, e.g. in the form of stored procedures or dedicated ETL jobs. However the integration into a test case management tool and systematic evaluation of the test results doesn’t happen. This is heavily contrasting with the software engineering domain where automated regression testing combined with modern development approaches like test driven design are applied. At least for some time we find first inputs regarding BI specific testing (cf. the (German) TDWI book here). Concepts and paper are patient though. Where are we with regard to a possible tool support, namely for the area of regression tests?

Since summer 2014 we at IT-Logix are actively looking for better (tool based) solutions for BI specific testing. We do this together with the Austrian company Tricentis. Tricentis develops the Tosca product suite, one of the worldwide leading software solutions for test automation. In a first step we run a proof of concept (POC) for regression tests for BI frontend artefacts, namely typical reports. One of the architectural decisions was to use Excel and PDF export files as a base for our tests. With this choice of a generic file interface the efforts to develop BI product specific tests were omitted. And this way we reduced the implementation effort to about two days in the POC. The goal was to run “Before-After” tests in batch mode. We took 20 reports for the POC case (these were actually SAP BusinessObjects Web Intelligence reports, but you can imagine whatever tool you like as long as you can export to PDF and / or Excel). A current version of the PDF or Excel output of the report is compared with a corresponding reference file. Typical real life situations where you can use this scenario are:

  • recurringly scheduled regression tests to monitor side effects of ongoing DWH changes: The reference files are created somewhen e.g. after a successful release of the DWH. Imagine there are ongoing change requests on the level of your DWH. Then you want to make sure these changes only impact the reports where a change is expected. To make sure all the rest of your reports aren’t concerned by any side effects, you now run your regression tests e.g. every weekend and compare the hereby produced files with the reference files.
  • BI platform migration projects: If you run a migration project for example to migrate your SAP BusinessObjects XI 3.1 installation to 4.1, you’ll want to make sure reports still work and look the same in 4.1 as they did in XI 3.1. In this case you create the reference files in XI 3.1 and compare them with the ones from 4.1. (As the export drivers vary between the two versions, especially the Excel exports are not very useful for this use case. Still, PDF worked pretty fine in my experience)
  • Database migration projects: If you run a database migration project for example migrating all your Oracle databases to Teradata or SAP HANA, then you want to make sure all of your reports show still the correct data (or at least the data as was shown with the original datasource…)
rbra_testing1 (3)

Sample configuration of a test case template using the GUI of Tosca (Source: IT-Logix POC)

Tosca searches for the differences between the two files. For Excel this happens on a cell by cell basis, for PDF we used a text based approach as well as an image compare approach.

rbra_testing1 (2)

Depending on the chosen test mode the differences can be visualized differently (Source: IT-Logix POC)

Using the solution implemented during the POC we could see very quickly which reports were different in their current state compared to the reference state.

Another important aspect of the POC was the scalability of the solution approach as I work primarily with (large) enterprise customers. If I have not only 20 but hundreds of reports (and therefore test cases), I have to prioritize and manage the creation, execution and error analysis of these test cases somehow. Tosca helps with the feature to model business requirements and to connect them with the test cases. Based on that we can derive and report classical test metrics like test case coverage or test execution rate.

rbra_testing1 (1)

Requirements and test cases are tightly related (Source: IT-Logix POC)

In my eyes an infrastructure like Tosca is a basic requirement to systematically increase and keep the quality in BI / DWH systems. In addition advanced methods like test driven development are only adaptable to BI / DWH undertakings if the necessary infrastructure for test automation is available.

In this blog post I’ve shown a first, rudimentary solution for regression tests for BI frontend tools. In a next article I’ll show the possibilities to implement regression test for DWH backend components.

Event recommendation: Learn about a real life scenario to run a SAP BusinessObjects migration project in an agile manner. Hence test automation is key and explained in some more details. Join me during sapInsider’s BI2015 at Nice by mid of June. Find more information here.

(This blog post was first published by me in German here)