Positioning Architecture T-Shirt-Sizes

In my previous post, I’ve introduced the idea of architecture t-shirt sizes to depict the idea that you BI architecture should growth with your requirements. In this blog post I position the four example t-shirt sizes on Damhof’s Data Management Quadrants.

T-Shirt Sizes in the Context of Data Management Quadrants

If you haven’t read my previous blog post, you should do so first.

In [Dam], Ronald Damhof describes a simple model for the positioning of data management projects in an organization. Here, he identifies two dimensions and four quadrants (see also Figure 1). On the x-axis, Damhof uses the terms push-pull strategy, known from business economics. This expresses how strongly the production process is controlled and individualized by demand. On the right or pull side, topic-specific data marts and from them information products such as reports and dashboards, for example, are developed in response to purely technical requirements. Agility and specialist knowledge are the key to this. The first two T-shirt sizes, S and M, can be categorized as belonging on this side. On the left or push side, the BI department connects various source systems and prepares the data in a data warehouse. The focus here is on economies of scale and deploying a stable basic infrastructure for BI in the company. Here we can see the other two T-shirt sizes, L and XL.

Figure2
Figure 2: Damhof’s Data Quadrant Model

On the y-axis, Damhof shows how an information system or product is produced. In the lower half, development is opportunistic. Developers and users are often identical here. For example, a current problem with data in Excel or with other tools is evaluated directly by the business user. This corresponds to the S-size T-shirt. As can be seen in my own case, the flexibility gained for research, innovation, and prototyping, for example, is at the expense of the uniformity and maintainability of results. If a specialist user leaves the company, knowledge about the analysis and the business rules applied is often lost.

In contrast, development in the upper half is systematic: developers and end users are typically different people. The data acquisition processes are largely automated and so do not depend on the presence of a specific person in daily operations. The data is highly reliable due to systematic quality assurance, and key figures are uniformly defined. The L- and XL-size T-shirts can be placed here in most cases.

The remaining T-shirt, the M-size, is somewhere “on the way” between quadrants IV and II. This means it is certainly also possible for a business user without IT support to implement a data mart. If the solution’s development and operation is also systematized, this approach can also be found in the second quadrant. This also shows that the architecture sizes are not only growing in terms of the number of levels used.

The positioning of the various T-shirt sizes in the quadrant model (see Figure 2) indicates two further movements.

  • The movement from bottom to top: We increase systematization by making the solution independent of the original professional user. In my own dashboard, for example, this was expressed by the fact that at some point data was no longer access using my personal SAP user name but using a technical account. Another aspect of systematization is the use of data modeling.
  • While my initial dashboard simply imported a wide table, in the tabular model the data was already dimensionally modelled.
  • The movement from right to left: While the first two T-shirt sizes are clearly dominated by technical requirements and the corresponding domain knowledge, further left increasing technical skills are required, for example to manage different data formats and types and to automate processes.
Figure3
Figure 2: T-Shirt Sizes in the Data Quadrant Model
Summary and Outlook

Let’s get this straight: BI solutions have to grow with their requirements. The architectural solutions shown in T-shirt sizes illustrate how this growth path can look in concrete terms. The DWH solution is built, so to speak, from top to bottom – we start with the pure information product and then build step by step up to the complete data warehouse architecture. The various architectural approaches can also be positioned in Ronald Damhof’s quadrant model: A new BI solution is often created in the fourth quadrant, where business users work exploratively with data and create the first versions of information products. If these prove successful, it is of particular importance to systematize and standardize their approach. At first, a data mart serves as a guarantor for a language used by various information products. Data decoupling from the source systems also allows further scaling of the development work. Finally, a data warehouse can be added to the previous levels to permanently merge data from different sources and, if required, make them permanently and historically available.

Organizations should aim to institutionalize the growth process of a BI solution. Business users can’t wait for every new data source to be integrated across multiple layers before it’s made available for reporting. On the other hand, individual solutions must be continuously systematized, gradually placed on a stable data foundation, and operated properly. The architecture approaches shown in T-shirt sizes provide some hints as to what this institutionalization could look like.

This article was first published in TDWI’s BI-SPEKTRUM 3/2019

Growing a BI Solution Architecture Step by Step

The boss needs a new evaluation based on a new data source. There isn’t time to load the data into an existing data warehouse, let alone into a new one. In this article I introduce the idea of architectural T-shirt sizes. Of course, different requirements lead to different architectural approaches, but at the same time, it should be possible that architecture can grow as a BI system is expanded.

T-shirt sizes for BI solutions

The boss needs a new evaluation based on a new data source. There isn’t time to load the data into an existing data warehouse, let alone into a new one. It seems obvious to reach for “agile” BI tools such as Excel or Power BI. After several months and more “urgent” evaluations, a maintenance nightmare threatens. But there is another way: In this article I introduce the idea of architectural T-shirt sizes, illustrated using our own dashboard. Of course, different requirements lead to different architectural approaches, but at the same time, it should be possible that architecture can grow as a BI system is expanded.

The summer holidays were just over, and I was about to take on a new management function within our company. It was clear to me: I wanted to make my decisions in a data-driven way and to offer my employees this opportunity. From an earlier trial, I already knew how to extract the required data from our SAP system. As a BI expert, I would love to have a data warehouse (DWH) with automated loading processes and all the rest. The problem was that, as a small business, we ourselves had no real BI infrastructure. And alongside attending to ongoing customer projects, my own time was pretty tight. So did the BI project die? Of course not. I just helped myself with the tool I had to hand, in this case Microsoft Power BI. Within a few hours the first dashboard was ready, and it was published in our cloud service even faster.

The Drawbacks of Quick and Dirty

My problem was solved in the short term. I could make daily-updated data available to my employees. Further requirements followed, so I copied the power BI file and began fitting it in here and there. Over the next few weeks, I added new key figures to it and made more copies. However, I was only partly able to keep the various dashboard copies “in sync”. In addition, operational problems came up. Of course, I had connected the SAP system under my personal username, whose password has to be changed regularly. This in turn led to interruptions to the data updating and required manual effort in reconfiguring the new password in Power BI.

T-Shirt Sizes for Step-by-Step Development of BI Architecture

I guess a lot of professional users have been in my shoes. You have to find a solution on the quick – and before you know it, you’re in your own personal maintenance nightmare. At the same time, a fully developed BI solution is a distant prospect, usually for organizational or financial reasons.

As an expert in the adaptation of agile methods for BI projects, I have been dealing with this problem for a long time: How can we both address the short-term needs of the professional user and create the sustainability of a “clean” solution? As a father of two daughters, I remembered how my children grew up – including the fact that you regularly have to buy bigger clothes. The growth process is continuous, but from time to time we have to get a new dress size. It is exactly this principle that can also be applied to BI solutions and their architecture. Figure 1 shows four example T-shirt sizes, which are explained in more detail below.

Figure1
Figure 1 Architectural solutions in T-shirt sizes at a glance
Think Big, Start Small: The S-Size T-Shirt

This approach corresponds to my first set of dashboards. A BI front-end tool connects directly to a source. All metadata, such as access data for the source systems, business rules, and key figure definitions, are developed and stored directly in the context of the information products created.

This T-shirt size is suitable for a BI project that is still in its infancy. It is often only then that a broader aim is formulated involving everything that you would like to analyze and evaluate – this is when “thinking big” begins. Practically, however, not much more known than the data source. But you would also like to be more explorative and produce first results promptly, so it makes sense to begin with the small and technically undemanding.

However, this approach reaches its limits very quickly. Here is a list, by no means complete, of criteria for deciding when the time has come to think about the next architectural dress size:

  • There exist several similar information products or data queries that use the same key figures and attributes over and over again.
  • Different users regularly access the information products, but access to the data source is through the personal access account of the original developer.
  • The source system suffers from multiple, and mostly very similar, data queries of the various information products.
Developing a Common Language: The M-Size T-Shirt

In this size, we try to save commonly useful metadata such as key figure definitions and the access information for the source systems at its own level rather than the  information product itself. This level is often also called the data mart or the semantic layer. In the case of my own dashboard, we developed a tabular model for it in Azure Analysis Services (AAS). The various specifications or copies of the dashboards as such in large part remained – only the substructure changed. However, all variants now rested on the same central foundation. The advantages of this T-shirt size in comparison to the previous are clear: Your maintenance effort is considerably reduced, because the basic data is stored centrally once and does not need to be maintained for every single information product. At the same time, you bring consistency to the definition and designation of the key figures. In a multilingual environment, the added value becomes even more apparent, because translations are centralized only once and maintained uniformly in the semantic layer. All your dashboards thus speak a common language.

In this M-size t-shirt, we still do not store any data permanently outside the source. Even if the source data is transferred to the tabular model in AAS, it must be re-imported for larger adjustments to the model. Other manufacturers’ products come completely without data storage, for example, the Universes in SAP BusinessObjects. This means that a high load is sometimes applied to the source systems, especially during the development phase. Here is a list of possible reasons to give your “BI child” the next largest dress size:

  • The load on the source systems is still too large despite the semantic layer and should be further reduced.
  • The BI solution is also to be used as an archive for the source data, for example in the case of web-based data sources where the history is only available for a limited period of time.
  • If the source system itself does not historize changes to data records, this can be interpreted as another reason for using the BI solution as a data archive.
Decoupling from the Source: The L-Size T-Shirt

The next larger T-shirt for our BI architecture, the L-size, replaces the direct data access of the data mart to the source. To do this, the data is extracted from the source and permanently stored in a separate database. This level corresponds to the concepts of a persistent staging area (PSA) (see also [Kim04] pp. 31ff. and [Vos]) or an actively managed data lake (see also [Gor19], for example p. 9). They all have one thing in common: that the data is taken from the source with as little change as possible and stored permanently. This procedure means that the existing data mart can be reassigned to this new source relatively easily. In my own dashboard example, we’re not at this stage yet. But in the next step, we have planned to extract the SAP data using the Azure Data Factory and store it permanently in an Azure SQL database. For people who, like us, use the ERP as a cloud solution, this layer reduces the lock-in effect of the ERP cloud provider. Other advantages of this persistent data storage beyond the source systems include the archiving and historization function it brings with it: Both new and changed data from the source are continuously stored. Deleted records can be marked accordingly. Technically, our data model becomes very close to the data model of the source we use, and under certain circumstances, we are already harmonizing some data types. While this layer can be realized practically and fast, there are also indicators here when to jump to the next T-shirt size:

  • The desired information products require integrated and harmonized data from several data sources.
  • The data quality of the source data is not sufficient and can’t simply be improved in the source system.
  • Key calculations should be saved permanently, for example for audit purposes.
Integrated, Harmonized and Historicized Data: The XL-Size T-Shirt

The next, and for the time being last, T-shirt, the XL-size extends the existing architecture to a classical data warehouse structure between PSA and a data mart. It foregrounds the integration and harmonization of master data from the various source systems. This is done using a central data model, which exists independently of the source used (for instance, a data vault model or a dimensional model). This enables systematic data quality controls to be carried out when initially loading and processing data. Historization concepts can also be integrated into the data here as required. The persistent storage of data in this DWH layer means it is also permanently available for audit purposes.

The various T-shirt sizes don’t only differ in the number of levels they use. To characterize the four architectural approaches more comprehensively, it’s worth taking a brief look at Damhof’s model of Data Management Quadrants [Dam]. I’ll do this in the next blog post.

Literature

[Dam] Damhof, Ronald: “Make data management a live issue for discussion throughout the organization”. https://prudenza.typepad.com/files/english—the-data-quadrant-model-interview-ronald-damhof.pdf , accessed on 22.11.2019

[Gor19] Gorelik, Alex: The Enterprise Big Data Lake. O’Reilly 2019

[Kim04] Kimball, Ralph: The data warehouse ETL toolkit: practical techniques for extracting, cleaning, conforming, and delivering data. Wiley 2004

[Vos] Vos, Roelant: “Why you really want a Persistent Staging Area in your Data Vault architecture”. 25.06.2016, http://roelantvos.com/blog/why-you-really-want-a-persistent-staging-area-in-your-data-vault-architecture/, accessed on 22.11.2019

This article was first published in TDWI’s BI-SPEKTRUM 3/2019

Debugging access to REST data sources in Power BI

Recently I was involved in building a Power BI report connecting to a REST API as the data source. In this blog post I show you what tools made my life easier working with REST services.

The Error

In order to access the REST service we needed to create a token first and pass this to the service in the HTTP header. We followed the advice here to achieve this in Power BI directly. But it didn’t work and we always got the following error at some point: “Specified value has invalid HTTP Header characters”

PBIErrorREST

As far as I observed it, Power BI is pretty intransparent what happens behind the scene. So first of all I wanted to know whether the REST service works as expected.

Testing the REST service outside of Power BI

There are various options make a REST call, I’ll mention two of them.

PowerShell

A low installation footprint approach is using PowerShell. With the following command you can send a REST request and retrieve the result:

Invoke-RestMethod -Uri <URL> -Body (‘{“systemId”: “it-logix”, “apiToken”: “abcd”}’) -Method POST

SoapUI

At least for me a more convenient way is using a non-command-line tool, for example SoapUI.

Using SoapUI I could proof the REST service works as expected. In the second screenshot on the right side I can see the actual data returned from the REST service.

SOAPUI

SOAPUI_Response

But why would Power BI throw an error anyway?

Bringing light into what happens behind the scene of Power BI with Fiddler

Looking at the error message, my first guess was that something is wrong with how we set the request header in Power Query. To bring more light into this I used Fiddler. Fiddler links itself into all HTTP traffic on your computer and makes it visible for you to analyze it.

The Power BI traffic can be found in the microsoft.mashup… process.

Fiddler1

To my surprise I found that Power BI called the REST service successfully including getting the correct data back similar to what I got in SoapUI. But why would Power BI throw an error anyway?

Compare with a generic REST service and get the riddle solved

As always in a debugging process it is helpful to have a second example to check whether an error is generic or specific to one case. For that I used a service called JSON Placeholder. Here the data import to Power BI worked like a charm. Once again using Fiddler my interest was about the response header. I compared the header of the generic service with the one in my project. Can you see it?

JSONResponse1
The HTTP response header of the actual REST service
JSONResponse2
The HTTP response header of the JSON placeholder service

Finally we asked the REST service developer to remove the header line starting with an appostrophe – and suddenly it worked in Power BI 🙂

My life as a BI consultant: Update Spring 2017

Spring 2017 in Provence (France)

Obviously I hadn’t time to write much on my blog during the last nine months. Let me share with you what topics kept me busy:

For the upcoming months I’ll be visiting and speaking at various events:

  • IBCS Annual Conference Barcelona: June 2nd, Discussion of the next version of the International Business Communication Standards.
  • TDWI 2017 Munich: June 26-28, Half-day workshop about practical gathering of requirements for a dashboard.
  • MAKEBI 2017 Zurich: July 3rd, I’ll be presenting a new keynote around alternatives to traditional estimation practices
  • BOAK 2017 Zurich: September 12th, same as with MAKEBI, I’ll be presenting a new keynote around alternatives to traditional estimation practices
  • WhereScape Test Drive / AgileBI introduction Zurich: September 13th: During this practical half-day workshop you learn hands-on how to use a DWH automation tool and you’ll get an introduction to the basics of Agile BI.
  • My personal highlight of today, I’ll be speaking during Agile Testing Days 2017: I’ll do a 2.5 hours workshop regarding the introduction of Agile BI in a sustainable way.

It would be a pleasure to meet you during one of these events – in case you’ll join, send me a little heads-up!

Last but not least, let me mention the Scrum Breakfast Club which I’m visting on a regular basis. We gather once a month using the OpenSpace format to discuss practical issue all around the application of agile methods in all kind of projects incl. Business Intelligence and Datawarehousing. The Club has chapters in Zurich, Bern as well as in Milan and Lisbon.