Nurturing relations and solidarity around online meetings

Today, I’m having the pleasure of publishing a co-authored blog post together with Patricia and Felicity with whom I work regularly on language learning and interpersonal communication topics, see also the author information at the end of the article!

Are you missing the chats by the coffee machine or maybe you’re exhausted by back-to-back Zoom meetings? A lack of small talk may be one of the missing links in today’s dystopic world of work.


We all ‘do’ small talk, from greeting people when we used to arrive in the office, to talking about what we did at the weekend or where we are planning to go on holiday. Far from superficial, such exchanges help to build and maintain good relations. They allow staff to discover areas of common ground beyond work. But with C-19 many of us no longer meet by the coffee machine or even have brief informal chats before a meeting begins or at the end.

But there are solutions. By scheduling a little extra time around meetings and giving participants the opportunity to enter a meeting before the start, those who wish can catch up with each other on a more social level, perhaps sharing some of the frustrations of working at home or talking about post-Covid plans. Another option is to organize a virtual after-work session once a week. Such small talk provides some down time, not only nurturing relations, but also building solidarity, which can be so important in difficult times. Happy, mutually supportive staff are more likely to be productive staff.

So allow time for small talk and don’t forget its value when we really do get back to face-to-face normality – even if it is with a mask for a while!


*** About the authors ***

Patricia and Felicity are enthusiastic lifelong language learners and teachers. They are keen to share their knowledge and unlock the secrets of successful language learning. (https:/

Raphael is passionate about data & analytics. He works as an agile coach and is therefore  aware of the important role of language and communication in both business and IT projects. (

Jira (Server) Reporting: Creating the issue dimension using pagination (part 2)

In part 1 I explained how to retrieve an issue list via the JIRA Rest API. The shown approach works fine as long as you do not have too many (e.g. < 1000) issues. If you need to retrieve more issues, we need to use JIRA’s paging mechanism. That means we will load issues by chunks of 1000, and then iterate this process as many times as we need to fetch all the issues we have.

Let’s start with an overview of the queries and function which we will need to build before I’ll explain each of them subsequently:

  • load_issue: Here happens the magic of pagination by calling the first of the three “get” functions shown above
  • stage_dim_issue: Contains the logic you already know from part 1, that means expanding certain fields etc.
  • dim_issue: for now is a 1:1 copy of the data from stage_dim_issue. This way we can keep the final table separated from any transformation logic.

load_issue table

Next, we’ll have a look at the M script of the load_issue table:

// Read all pages of data.
// After every page, we check the "NextLink" record on the metadata of the previous request.
// get_NextPage will keep asking for more pages until we return null.
get_NextPage((previous) => 
        // if previous is null, then this is our first page of data
        newStartAt = if (previous = null) then 0 else Value.Metadata(previous)[NextLink]?,
        // if NextLink was set to null by the previous call, we know we have no more data
        page = if (newStartAt <> null) then get_IssueWithStartAt(newStartAt) else null

The code above was inspired by this article in the Microsoft Power Query documentation as well as the this tutorial. Important for your understanding: The documentation as well as the tutorial are refering to some helper functions like “Table.GenerateByPage”. It is stated that “these functions may eventually be moved to the official M library, but for now can be copied into your extension file code”. Unfortunatley, afaik, this extension file does only exist for Power BI, but not for Power Query in Excel. Nevertheless you can benefit of the documented logic by creating functions directly in the Power Query editor. My function “get_Nextpage” is the equivalent to the function Table.GenerateByPage. The load_issue code above is the equivalent to “GetAllPagesByNextLink” in the tutorial.

Looking at the code above, first of all we are calling this get_NextPage function. This starts the looping and will return the combined chunks of JIRA issues. Then we need to define the “newStartAt”, which means the beginning of the next chunk of issues being loaded. For this we are refering to a metadata value “NextLink” which is set in the function get_IssueWithStartAt. This function is called subsequently and does the actual job of calling the REST API.

get_Nextpage aka Table.GenerateByPage

As mentioned before, this code I copy & pasted from here:

// The getNextPage function takes a single argument and is expected to return a nullable table
(getNextPage as function) as table =>
listOfPages = List.Generate(
() => getNextPage(null), // get the first page of data
(lastPage) => lastPage <> null, // stop when the function returns null
(lastPage) => getNextPage(lastPage) // pass the previous page to the next function call
// concatenate the pages together
tableOfPages = Table.FromList(listOfPages, Splitter.SplitByNothing(), {"Column1"}),
firstRow = tableOfPages{0}?
// if we didn't get back any pages of data, return an empty table
// otherwise set the table type based on the columns of the first page
if (firstRow = null) then
Table.ExpandTableColumn(tableOfPages, "Column1", Table.ColumnNames(firstRow[Column1])),


This function is the equivalent to GetPage in the tutorial. It basically calls the REST API, calculates the beginning of the next chunk (cf. function get_newStartAt) and sets the metadata record entry:

(startAt as number) as table =>
Quelle = Json.Document(Web.Contents(jiraBaseUrl,
[RelativePath = "rest/api/2/search?jql=project=myproject&maxResults=1000&startAt=" & Text.From(startAt) & "&fields=summary,issuetype,timespent,labels,components,customfield_10101"])),
nextLink = get_newStartAt(startAt, Number.From(Quelle[total])),
issues = Quelle[issues],
#"Converted to Table" = Table.FromList(issues, Splitter.SplitByNothing(), null, null, ExtraValues.Error)
#"Converted to Table" meta [NextLink = nextLink]

We are now using the “startAt” parameter in the URL. In addition we are fetching the total number of records using the field “total” (this number is retruned by JIRA so that we know what is the overall number of issues to fetch). This field is handed over to the get_newStartAt together with the currently used startAt value.

By the way, compared to the direct URL used in part 1, I’m using now a parameter “jiraBaseUrl” to keep the basic server name separated and reusable. in addition I’m using [RelativePath] in order to tell the Power Query security mechanism that I’m always connecting to the same server, though to different endpoints of the API.


This function is used to calculate the beginning fo the next chunk, assuming we are always fetching 1000 issues per page:

(lastStartAt as nullable number, total as nullable number) => 
    #"newStartAt" = if lastStartAt + 1000 > total then null else lastStartAt + 1000


REST APIs often use “paging” in order to break large result sets into chunks. So does the JIRA REST API when you want to retrieve more than 1000 issues for the issue dimension. In this post I explained how I adapted a generic tutorial for paging in Power Query to querying JIRA issues in Excel. I hope it saves you some time…

Jira (Server) Reporting: Creating the issue dimension (part 1)

In this part we’ll query the Jira REST API in order to retrieve a simple issue list with Power Query.

In the previous blog post I introduced you to the target data model I’m about to create for doing some reporting on top of Jira issue and worklog data. In this article I’ll outline how to create the issue dimension using Excel Power Query (by the way, this works the same in Power BI!).

For me, both using the Jira REST API as well as using Power Query are quite new. Therefore I started with an internet search and found “Reporting Jira data in Power BI” by OptimalBI. The second important information is the API reference, e.g. this one here. For my sample case I want to retrieve all issues for a specific project or Jira filter. This can be accomplished using the “search” method.

Before doing anything in Power Query, let’s have a look at REST URL, replace the red elements with your own values:


In the above URL I’m searching for all issues in the project myproject. By default, the REST API will return a maximum of 50 issues. In order to increase this number, you can define the parameter maxResults. Nevertheless, you have to be aware that there is a server setting which will limit the number of returned objects even if you set the maxResults parameter. That’s why we are looking at pagination in part 2. For now, let’s stay with our REST URL above: As there are tons of fields you can retrieve, we want to limit ourselves to the fields we really need. In this case the summary, issuetype, timespent, labels and customfield_10101. Now what it this last field all about? In my case this represents the link to the epic. According to my current knowledge this might differ for different Jira installations. Therefore, to find out which field name it is in your environment, you can use the following URL to find out:


Replace “myissue” with an issue number you are familiar with and know the associated epic. Login to your Jira site and then copy & paste this URL to your browser. It will return all fields for the given issue. Afterwards just use the search feature of your browser to look for the epic name and note the associated field name and add it to the field list of your URL.

Now let’s do a first test in Power Query. In Excel go to the Data tab and click Get Data, then From Other Sources and From Web.

Copy & paste the URL from above:

Set the login credentials, I used my Jira login using the basic authentication:

In the upcoming Power Query Editor you should then see the initial result. Right click and choose “Into table”:

Now that you have a table object, right click on the “List” item and choose “Drill Down”

You should a retrieve a list of records, which you can turn into a table once again:

Choose the default settings in the next dialog:

Click on the “Expand” button:

I uncheck “Use original column name as prefix” as this would simply add “Column1” which doesn’t make sense at this stage.

Repeate the same step once again on the “fields” column:

For issuetype, components and labels you want to expand the columns as well, here you might now choose the detail columns more selective, e.g. like this:

Finally we end up with our basic issue list:

Choose “Close & Load” in the upper left corner to load the data into Excel.

In the second part we’ll have a look at how we can circumvent a server side limit of returned issues (e.g. 1000) using pagination.

Jira (Server) Reporting using Power Query – an overview

This is the introduction to a blog post series on extracting data from Jira into Excel using PowerQuery and Power Pivot

Having read the #NoEstimates book of Vasco Duarte, I started to shift from estimating efforts to forecasting them. The most crucial foundation of a forecast is empirical progress data. Many of us are using Jira for handling their projects. Besides managing “issues” (which can be all kind of things like stories, tasks etc.) you can also log your work hours directly on issues. Once you start registering the hours worked on an issue, this is a good start for our data collection to create a forecast later on.

The big question is: How can I extract the data from Jira and load it into Excel? If you are using the cloud version of Jira you can access the data e.g. by using the following add-in:

Unfortunately, my customers are often running a self-hosted Jira server. For now, I didn’t want to buy a commercial add-ins but use Power Query and Power Pivot which are both built-in features in Excel. I neither wanted to load the data into a proper data warehouse for now as I need a “portable” solution which I can easily share with my different customers. Have a look at my earlier blog post about architecture t-shirt sizes and you’ll find that this here is about an S-sized solution.

In a first instance, the target model looks like this:

Center-stage is the fact_worklog table which contains a row for every worklog. The fact table is sourrounded by three dimensions: An issue dimension, an epic dimension as well as a date dimension.

In the upcoming blog post series I’ll write about my experience connecting to the Jira REST API and load the data into Excel using Power Query and model the data in Power Pivot:

  • Getting the issue details incl. paging
  • Getting the worklog details by looping through the issue list
  • Getting the epic details
  • Generating a date dimension incl. ISO week number
  • Defining the data model in Power Pivot

Please be patient with me, I hope to find some time soon to start with the first part 🙂

Generating a Persistent Staging Area using Data Warehouse Automation

When building a new data warehouse or a data lake, one of the questions that will arise is whether you want to build a Persistent Staging Area (from now on PSA). In this blog post we will show you what a PSA is, what benefits to expect from it, typical design patterns to build it and finally an evaluation of how you can implement these patterns using a data warehouse automation toolset like the one of WhereScape.

The following article was written by my co-worker Elgar Brunott and redacted by myself. Elgar is a long year data warehouse expert who joined IT-Logix AG back in last autumn. We’ve worked together on various projects so far and had the pleasure in joining for a lot of thought exchange since then. So it’s a pleasure to see parts of our joint worked published in the subsequent blog post.

When building a new data warehouse or a data lake, one of the questions that will arise is whether you want to build a Persistent Staging Area (from now on PSA). In this blog post we will show you what a PSA is, what benefits to expect from it, typical design patterns to build it and finally an evaluation of how you can implement these patterns using a data warehouse automation toolset like the one of WhereScape.

Benefits of a Persistent Staging Area

Let’s start off with explaining what a PSA might look like:


  • A database or schema where all the needed tables from one or multiple source systems are transferred one to one (no transformations allowed, but adding some columns like an insert timestamp or batch key are perfectly fine)
  • In the PSA you have a historical overview of data changes in the source system. Meaning that you can make so called “as of” queries which will return the data how it looked on a specific moment (of course depending on how frequently you load new / changed data into the PSA)
  • This important feature of supporting “as of” queries comes with the requirement that you have to keep track of rows that are deleted in the sourcesystem and mark those in the PSA. Otherwise those “as of” queries don’t result in exactly the same results. This is a tricky requirement which we will elaborate on further down in this article.

Having such a PSA will give you all the historical changes in data since you started loading data. This has the following benefits:

  • Loading new data into the PSA can be automated to a large degree as we will show in this article. Hence it doesn’t take much effort to load (and historize) data. Now you can take your time to model your (core) data warehouse properly and load new data from the PSA once it is appropriate.
  • Historical data is available in the PSA which is possibly not available anymore in the source system itself.
  • Being able to respond on changing requirements over following a plan. For example you need to massively extend one of the modules in the data warehouse because of changed business requirements. When adding the requested requirements you can initially load the adapted model with the historical data already available in the PSA.
  • Decreasing the development time and being able to initially load without burdening the source system with heavy data requests.

Basic Design Pattern

For implementing a PSA we separated the corresponding design patterns into basic and advanced. In the below diagram you see a basic impelmentation of a PSA where the following assumptions apply:

  • We only load data in full, not in delta.
  • For every table we handle the technical historization in the PSA (technical historization can be done in different ways, e.g. doing insert only, snapshots or SCD2. In this blog post we will not go into the details of these historization types)
  • The source sytem will inform us in the case data is physically deleted (for example with a delete flag in the source table itself).

PSA Basic

In this diagram we took the table “customer” from source system X as an example.

Let’s explain the different objects you see in the diagram:

  • src_customer: Data from the source system is transferred one to one. No business transformations are allowed. The table will always be emptied before starting a new load.
  • v_src_customer_input: A view that is used as an interface for loading the data into the (persistent) PSA table. Normally this view only selects the data without adding any logic. But for some source systems it is necessary to add basic data quality checks like checking on duplicate primary keys.
  • psa_customer: This table holds the historized data of the customer table. In this table you will find all the changes for every data set loaded by the ETL/ELT process, e.g. using a SCD2 pattern.
  • v_psa_customer_cid: This view is used as an interface for subsequent layers, for example the core warehouse. CID stands for “current records including deletes”, that means all the current truth records (including those which have a delete flag set to Yes).
  • v_psa_customer_cur: same as above, but without deleted records.

As described in the assumptions above, we assume that delete detection is handled by the source system itself. And that data is loaded in full over and over again. Fore some projects this will work perfectly fine. But in many situations you will have the need to load the delta from big tables only. In addition, many source systems sadly do not support delete detection on their own. In such cases we will need to extend the PSA with more advanced logic as described in the next paragraph.

Advanced Design Pattern

The advanced pattern extends the basic  pattern with several database objects:

PSA Advanced

Let’s have a closer look at these objects:

  • v_src_customer_dct: This view handles the delete detection (dct). For tables which are loaded in full mode the view does the following – the logic for incrementally loaded tables is explained further down:
    • Check all rows which are coming from the v_psa_customer_cur view and compare these with all the rows which are loaded in the src_customer table. For records which are only present in the PSA view but not the src table anymore, we can assume they were physically deleted from the source system.
    • The view will return only the missing rows and mark the field “deleted_in_source” with “Yes”.
  • stg_customer_dct: The rows which are marked as deleted in the v_src_customer_dct view are temporary stored in this staging table and loaded into the psa_customer table afterwards (having in mind that the “deleted_in_source” flag is set to “Yes”)
  • src_customer_bk: This table is only created for tables which are regularly loaded in delta mode. Its only goal is enabling the option to implement delete detection:
    • For incrementally loaded tables you cannot detect deleted records in the source as explained above.
    • Therefore we create this bk table which consists only of the business key columns of the given source table (as the PSA is very source system driven, often the business key is here the same as the primary key of the source system table, which uniquely identifies the source row).
    • After loading all the data the delete detection view (v_src_customer_dct) will use the bk table as input to check if any rows were deleted in the source system.
    • This bk tables is loaded in full mode only. Yet loading only one or a few attributes is typically faster than doing the same thing for the complete table. Proper indexing can also speed up the delete detection process.
    • In cases where a full load of the bk tables still takes too long, you can eventually decide to execute this load only once a week or even less often.

Minimizing the PSA investment with data warehouse automation

Given the explanations above you realize that such a PSA needs a lof of database objects and logic for each and every source table. Creating them manually results in a high investment with a lot of repetitive work. Nevertheless, in a long term and architecture perspective, building up a PSA is a very good investement as explained in the introduction of this article. On the other hand, defending it to stakeholders can be hard because your investment in the starting phase of your project is higher without having a directly visible business benefit.

This challenge can be addressed using data warehouse automation tools. Our goal to achieve was generating a PSA including delete detection, historization and delta handling with a minimal amount of manual effort. We selected WhereScape as the right tool suite for this and developed such a solution. In WhereScape you’ll find two applications to help you generate the PSA: WhereScape 3D (3D = data driven design) and WhereScape RED.

WhereScape RED is the actual development environment. You define the metadata for every needed object (like the tables and views shown above) and RED will generate the DDL and SQL statements for you and deploy them to the target database. Even though this already speeds up the process tremendously, you still need to add every object manually.

In WhereScape 3D you start with a model of your source system’s tables. Afterwards you’ll define “model conversion rules”. In those rules you can define what should be done with every source table using a rich set of rules and features, e.g. “copy entity” to derive the necessary objects for every source table. This allows for generating all needed database objects automatically. Afterwards you can export these objects to RED and let them be deployed to your target database. Whereas RED automates the code generation, 3D “automates the automation” by feeding RED with the necessary metadata. Here the true power of data warehouse automation becomes visible. But it’s not only about the decrease in development time. The following aspects are also very important selling points for using a data warehouse automation approach:

  • Quality & consistency: While everything is generated using a specific number of design patterns, you will not have any problems caused by manually written code.
  • Transparency & maintenance: As a direct consequence of the previous point, the solution is transparent and easier to maintain.
  • Data lineage: While using a metadata driven approach, you will always have a complete data lineage from the source system throughout your data warehouse.
  • Documentation: Using the metadata you can automatically generate technical and user documentation fo the data warehouse.

Findings & Challenges

WhereScape 3D gives you all the tools for defining all the layers of a data warehouse. But it doesn’t give you a complete set of “ready to use” model conversion rules which will do this for you (at least not for the PSA requirements described in this article). We had to implement this part ourselves.

For generating the basic version of the PSA, setting up the generation process was fairly easy. When we made the step to the advanced PSA we had several challenges which needed some out of the box thinking. These were related to topics like

  • Different ETL flow for delta loads and full loads
  • The PSA table gets data from two different sources (delete detection and delta data)
  • The delete detecion view has different logic based on delta or full load pattern
  • Some technical challenges, for example data type mappings

Learning to work with 3D is hard in the begining (we highly recommend taking some training!). But once you have overcome this first hurdle, you can setup a very powerful and flexible data warehouse automation environment. For the genration of the PSA layer we found that generating objects is the fastest way. Using data warehouse automation for the core data warehouse ifself is going to be our next challenge.

Also the fact that WhereScape is platform independent makes the investment very valubale. For example, you can migrate from Oracle to Azure SQL by simply choosing a different target platform. The needed ETL and database logic will then be generated in Azure optimized SQL code.

Thanks a lot reading this article. We hope it was informative. On a lot of topics we scratched purposely only the surface and did not go into too much details. Otherwise the article was going to be a lot longer than it already got :-).

Please leave any comments / questions below, we’d love to hear from you!

(User) Stories for Analytics Projects – Part 2

You can’t spend long in agile project management without encountering the term “user story”. I want to use this two-part article series to summarize how the concept, which originated in software development, can be applied to analytics projects, also known as business intelligence and data warehousing. In the first part, I introduced the basics of stories. Now I would like to explain how we can structure or tailor stories to suit our circumstances in analytics systems and development projects and meet these three principles:

  1. The story is the smallest planning unit I use to plan a project.
  2. The story covers a timespan limited to one or two working days.
  3. A story always has an end-to-end characteristic. In analytics, this often means that a story has a data source as its start and a form of data evaluation as its end.

From feature to story

Let’s take a look at this graphic and go through it point by point:

On the left, we see the simplified architecture stack of an analytics solution. For requirements elicitation and subsequent agile project structuring, the three levels shown are sufficient in most cases:

  • BI application: These are the most visible aspects of an analytics solution. They incorporate information products of all kinds, from reports to dashboards and data interfaces to other systems. Often, the finished product at this level is composed of data from several entities on the level below, such as data from a sales data mart and a logistics data mart.
  • DWH or data basis: Here I deliberately combine the core data warehouse and data mart levels. Even if they are technically different things, a Core DWH alone hardly adds any value to the practical use of a story. It’s only through the implementation of initial key figures and the application of case-specific business rules that the data really becomes useful.
  • Connectivity and infrastructure: Developing a new data source can be a costly business. The same applies to providing the initial infrastructure. Clearly a maximum implementation time for a story of one to two days requires an established infrastructure. But this also means that we need our own stories to develop this foundation.

In many cases, an analytics solution consists of all three levels. At this point, the technology plays a subordinate role. For instance, the DWH aspect might be mapped purely virtually in the BI front-end tool. Nevertheless, it must be considered equally in terms of content (cf. also the model of T-shirt size architecture). These three levels represent for me the macro variant of end to end, from the actual source system to the finished information product. At this macro level, I find it useful to speak of “features”. A feature is formulated analogously to a user story; you can find an example in the upper right corner of the graphic. A feature is often implemented within two to three weeks. But this is too long for the daily monitoring of success in a project. A feature must then be cut further into “epics” and stories on the three levels of BI application, DWH, and connectivity and infrastructure. Epics are merely a means of bracketing stories together. Let’s assume that Feature 1 in the fictitious club dashboard introduced in Part 1 can be outlined in four stories at the BI application level. Then I would gather these four stories into an epic to emphasize the coherence of the stories.

The central aspect of the division into the three architectural levels is that each level also has end-to-end characteristics, but this time at the micro level. Moreover, these three levels overlap somewhat. Let’s look at the club dashboard example and the story there: “As a TWDI back-office employee, I’d like a diagram showing the participants registered per round table in a city that I can select”. To implement this story, the underlying data model is our data source. At best, this may need to be adapted or extended in the application story. The final product is part of the finished dashboard and potentially generates immediate benefits for the end user.

At the DWH level, we build on an established interface to the source system and a functioning data infrastructure. But what does the other end look like, where we also expect some form of evaluation? We don’t need to build a complete dashboard for this either; there’s a story for that. Instead, we can generate a simple analysis on top of any data model, for instance with Excel, and make data visible. And immediately the product owner can start a conversation with the DWH developer about whether these initial results meet expectations.

This approach can even be applied at the connectivity level. Here, the starting point is the source system. In our example, it’s a fictitious, web-based system that allows either a CSV export or a web service query. In a first-user story, a CSV export is loaded into a load layer of the DWH. The result the product owner sees is not just a table definition but a simple evaluation of the raw data. This enables the product owner to detect possible data quality problems early.

As you can see from these example stories, I like to use the traditional story pattern with “I (end user) in my role as want so that ” at the feature and BI application levels. At the DWH and connectivity and infrastructure levels, I prefer an alternative pattern: <action> <result> <after/for/from/in> <object>.

Summary and outlook

In this second article on using stories in analytics, I showed how small stories can be reconciled with the demand for an end-to-end story. It’s important not to lose sight of the big picture, the feature, and to always work towards concrete visible analysis at the epic and story levels.
Even with this structure, it is still a challenge to tailor stories so that they can be implemented in one or two days. There are also many practical approaches to this—and material for another blog 😉

(This article was first published in German on my IT-Logix blog. Translated to English by Simon Milligan)

(User) Stories for Analytics Projects – Part 1

You can’t spend long in agile project management without encountering the term “user story”. I want to use this two-part article series to summarize how the concept, which originated in software development, can be applied to analytics projects, also known as business intelligence and data warehousing.

User stories or general stories are a tool for structuring requirements. Roman Pichler summarizes their use like this:

 “(User) stories are intended as a lightweight technique that allows you to move fast. They are not a specification, but a collaboration tool. Stories should never be handed off to a development team. Instead, they should be embedded in a conversation: The product owner and the team should discuss the stories together. This allows you to capture only the minimum amount of information, reduce overhead, and accelerate delivery.”

To keep my explanations a bit less abstract, I’ll introduce a simple, fictitious case study. Let’s assume that a club like TDWI, of which I am an active member, wants a new club dashboard. For this purpose, the project manager has already sketched their first ideas as a mockup:


All user stories follow a particular pattern, of which there are different basic variations. The most common pattern is as follows:

Some projects involve systems that aren’t designed for end users. Let’s assume we build a persistent staging area and a data interface from it that delivers integrated data to another system. In such cases, it seems a bit artificial to speak of “user stories”. Here’s an alternative pattern for formulating a story (source):

One of the central questions when using stories is always how they are tailored. How broad should a story be? I follow three principles:

  1. The story is the smallest planning unit I can use to plan a project. Of course, you can also divide a story into tasks and technical development steps. But that’s up to the people who implement it. As long as we are at the level where the product owner, and thus a specialist, is involved, we remain at the story level.
  2. The story covers a timespan limited to one or two working days. This includes the specification of requirements (a reminder to have a conversation about it), their technical implementation, and integrated testing. This forces us to create small stories of a more or less uniform size. This principle makes it possible to measure a flow in implementation daily (by now, it should be clear that I don’t think much of story points, but that’s another story).
  3. A story always has an end-to-end characteristic, and it should provide something usable in the implementation of the overall system. In analytics, this often means that a story has a data source as its start and a form of data evaluation as its end.

Let’s take a more concrete look at this using the case study. First, we focus on a specific part of the dashboard let’s call it Feature 1, where the number of participants is evaluated:

Independently of technological implementation, we can deduce two things here: a simple, analytical data model, such as a star schema, and the data sources required. From the principles formulated above, the question now arises what “end to end” means. In simple cases, it may well be that the scope of a story extends from the data source to the data model, including the actual dashboard, and that this scope can easily be implemented in one or two working days. But what if the technical development of the data source connectivity alone means several days of work? Or the calculation of a complex key figure requires several workshops?

Particularly in larger organizations, I hear people say: “This end-to-end idea is nice, but it doesn’t work for us. We prefer to write ‘technical’ stories”. This means, for example, that I as data architect model the fact table XY and event dimension, I as ETL developer load the fact table XY and the event dimension, or I as data protection officer define the row-level security for fact table XY. Although each of these examples provides a benefit to the overall system, on its own each does not add value and success is difficult to test. Another common response is: “This end-to-end story makes perfect sense, but it doesn’t work with a lead time of one to working days. We just work on a story for two or three weeks”. That’s just as problematic. If a story takes several weeks to implement, then monitoring success and progress becomes very difficult. In the worst case, we don’t realize that we’re on the wrong track after one or two days but only after three or even four weeks.
Both technical stories and long stories should be avoided. How to do this exactly is explained in the second part of this series.

(This article was first published in German on my IT-Logix blog. Translated to English by Simon Milligan)

Positioning Architecture T-Shirt-Sizes

In my previous post, I’ve introduced the idea of architecture t-shirt sizes to depict the idea that you BI architecture should growth with your requirements. In this blog post I position the four example t-shirt sizes on Damhof’s Data Management Quadrants.

T-Shirt Sizes in the Context of Data Management Quadrants

If you haven’t read my previous blog post, you should do so first.

In [Dam], Ronald Damhof describes a simple model for the positioning of data management projects in an organization. Here, he identifies two dimensions and four quadrants (see also Figure 1). On the x-axis, Damhof uses the terms push-pull strategy, known from business economics. This expresses how strongly the production process is controlled and individualized by demand. On the right or pull side, topic-specific data marts and from them information products such as reports and dashboards, for example, are developed in response to purely technical requirements. Agility and specialist knowledge are the key to this. The first two T-shirt sizes, S and M, can be categorized as belonging on this side. On the left or push side, the BI department connects various source systems and prepares the data in a data warehouse. The focus here is on economies of scale and deploying a stable basic infrastructure for BI in the company. Here we can see the other two T-shirt sizes, L and XL.

Figure 2: Damhof’s Data Quadrant Model

On the y-axis, Damhof shows how an information system or product is produced. In the lower half, development is opportunistic. Developers and users are often identical here. For example, a current problem with data in Excel or with other tools is evaluated directly by the business user. This corresponds to the S-size T-shirt. As can be seen in my own case, the flexibility gained for research, innovation, and prototyping, for example, is at the expense of the uniformity and maintainability of results. If a specialist user leaves the company, knowledge about the analysis and the business rules applied is often lost.

In contrast, development in the upper half is systematic: developers and end users are typically different people. The data acquisition processes are largely automated and so do not depend on the presence of a specific person in daily operations. The data is highly reliable due to systematic quality assurance, and key figures are uniformly defined. The L- and XL-size T-shirts can be placed here in most cases.

The remaining T-shirt, the M-size, is somewhere “on the way” between quadrants IV and II. This means it is certainly also possible for a business user without IT support to implement a data mart. If the solution’s development and operation is also systematized, this approach can also be found in the second quadrant. This also shows that the architecture sizes are not only growing in terms of the number of levels used.

The positioning of the various T-shirt sizes in the quadrant model (see Figure 2) indicates two further movements.

  • The movement from bottom to top: We increase systematization by making the solution independent of the original professional user. In my own dashboard, for example, this was expressed by the fact that at some point data was no longer access using my personal SAP user name but using a technical account. Another aspect of systematization is the use of data modeling.
  • While my initial dashboard simply imported a wide table, in the tabular model the data was already dimensionally modelled.
  • The movement from right to left: While the first two T-shirt sizes are clearly dominated by technical requirements and the corresponding domain knowledge, further left increasing technical skills are required, for example to manage different data formats and types and to automate processes.

Figure 2: T-Shirt Sizes in the Data Quadrant Model

Summary and Outlook

Let’s get this straight: BI solutions have to grow with their requirements. The architectural solutions shown in T-shirt sizes illustrate how this growth path can look in concrete terms. The DWH solution is built, so to speak, from top to bottom – we start with the pure information product and then build step by step up to the complete data warehouse architecture. The various architectural approaches can also be positioned in Ronald Damhof’s quadrant model: A new BI solution is often created in the fourth quadrant, where business users work exploratively with data and create the first versions of information products. If these prove successful, it is of particular importance to systematize and standardize their approach. At first, a data mart serves as a guarantor for a language used by various information products. Data decoupling from the source systems also allows further scaling of the development work. Finally, a data warehouse can be added to the previous levels to permanently merge data from different sources and, if required, make them permanently and historically available.

Organizations should aim to institutionalize the growth process of a BI solution. Business users can’t wait for every new data source to be integrated across multiple layers before it’s made available for reporting. On the other hand, individual solutions must be continuously systematized, gradually placed on a stable data foundation, and operated properly. The architecture approaches shown in T-shirt sizes provide some hints as to what this institutionalization could look like.

This article was first published in TDWI’s BI-SPEKTRUM 3/2019

Growing a BI Solution Architecture Step by Step

The boss needs a new evaluation based on a new data source. There isn’t time to load the data into an existing data warehouse, let alone into a new one. In this article I introduce the idea of architectural T-shirt sizes. Of course, different requirements lead to different architectural approaches, but at the same time, it should be possible that architecture can grow as a BI system is expanded.

T-shirt sizes for BI solutions

The boss needs a new evaluation based on a new data source. There isn’t time to load the data into an existing data warehouse, let alone into a new one. It seems obvious to reach for “agile” BI tools such as Excel or Power BI. After several months and more “urgent” evaluations, a maintenance nightmare threatens. But there is another way: In this article I introduce the idea of architectural T-shirt sizes, illustrated using our own dashboard. Of course, different requirements lead to different architectural approaches, but at the same time, it should be possible that architecture can grow as a BI system is expanded.

The summer holidays were just over, and I was about to take on a new management function within our company. It was clear to me: I wanted to make my decisions in a data-driven way and to offer my employees this opportunity. From an earlier trial, I already knew how to extract the required data from our SAP system. As a BI expert, I would love to have a data warehouse (DWH) with automated loading processes and all the rest. The problem was that, as a small business, we ourselves had no real BI infrastructure. And alongside attending to ongoing customer projects, my own time was pretty tight. So did the BI project die? Of course not. I just helped myself with the tool I had to hand, in this case Microsoft Power BI. Within a few hours the first dashboard was ready, and it was published in our cloud service even faster.

The Drawbacks of Quick and Dirty

My problem was solved in the short term. I could make daily-updated data available to my employees. Further requirements followed, so I copied the power BI file and began fitting it in here and there. Over the next few weeks, I added new key figures to it and made more copies. However, I was only partly able to keep the various dashboard copies “in sync”. In addition, operational problems came up. Of course, I had connected the SAP system under my personal username, whose password has to be changed regularly. This in turn led to interruptions to the data updating and required manual effort in reconfiguring the new password in Power BI.

T-Shirt Sizes for Step-by-Step Development of BI Architecture

I guess a lot of professional users have been in my shoes. You have to find a solution on the quick – and before you know it, you’re in your own personal maintenance nightmare. At the same time, a fully developed BI solution is a distant prospect, usually for organizational or financial reasons.

As an expert in the adaptation of agile methods for BI projects, I have been dealing with this problem for a long time: How can we both address the short-term needs of the professional user and create the sustainability of a “clean” solution? As a father of two daughters, I remembered how my children grew up – including the fact that you regularly have to buy bigger clothes. The growth process is continuous, but from time to time we have to get a new dress size. It is exactly this principle that can also be applied to BI solutions and their architecture. Figure 1 shows four example T-shirt sizes, which are explained in more detail below.

Figure 1 Architectural solutions in T-shirt sizes at a glance

Think Big, Start Small: The S-Size T-Shirt

This approach corresponds to my first set of dashboards. A BI front-end tool connects directly to a source. All metadata, such as access data for the source systems, business rules, and key figure definitions, are developed and stored directly in the context of the information products created.

This T-shirt size is suitable for a BI project that is still in its infancy. It is often only then that a broader aim is formulated involving everything that you would like to analyze and evaluate – this is when “thinking big” begins. Practically, however, not much more known than the data source. But you would also like to be more explorative and produce first results promptly, so it makes sense to begin with the small and technically undemanding.

However, this approach reaches its limits very quickly. Here is a list, by no means complete, of criteria for deciding when the time has come to think about the next architectural dress size:

  • There exist several similar information products or data queries that use the same key figures and attributes over and over again.
  • Different users regularly access the information products, but access to the data source is through the personal access account of the original developer.
  • The source system suffers from multiple, and mostly very similar, data queries of the various information products.
Developing a Common Language: The M-Size T-Shirt

In this size, we try to save commonly useful metadata such as key figure definitions and the access information for the source systems at its own level rather than the  information product itself. This level is often also called the data mart or the semantic layer. In the case of my own dashboard, we developed a tabular model for it in Azure Analysis Services (AAS). The various specifications or copies of the dashboards as such in large part remained – only the substructure changed. However, all variants now rested on the same central foundation. The advantages of this T-shirt size in comparison to the previous are clear: Your maintenance effort is considerably reduced, because the basic data is stored centrally once and does not need to be maintained for every single information product. At the same time, you bring consistency to the definition and designation of the key figures. In a multilingual environment, the added value becomes even more apparent, because translations are centralized only once and maintained uniformly in the semantic layer. All your dashboards thus speak a common language.

In this M-size t-shirt, we still do not store any data permanently outside the source. Even if the source data is transferred to the tabular model in AAS, it must be re-imported for larger adjustments to the model. Other manufacturers’ products come completely without data storage, for example, the Universes in SAP BusinessObjects. This means that a high load is sometimes applied to the source systems, especially during the development phase. Here is a list of possible reasons to give your “BI child” the next largest dress size:

  • The load on the source systems is still too large despite the semantic layer and should be further reduced.
  • The BI solution is also to be used as an archive for the source data, for example in the case of web-based data sources where the history is only available for a limited period of time.
  • If the source system itself does not historize changes to data records, this can be interpreted as another reason for using the BI solution as a data archive.
Decoupling from the Source: The L-Size T-Shirt

The next larger T-shirt for our BI architecture, the L-size, replaces the direct data access of the data mart to the source. To do this, the data is extracted from the source and permanently stored in a separate database. This level corresponds to the concepts of a persistent staging area (PSA) (see also [Kim04] pp. 31ff. and [Vos]) or an actively managed data lake (see also [Gor19], for example p. 9). They all have one thing in common: that the data is taken from the source with as little change as possible and stored permanently. This procedure means that the existing data mart can be reassigned to this new source relatively easily. In my own dashboard example, we’re not at this stage yet. But in the next step, we have planned to extract the SAP data using the Azure Data Factory and store it permanently in an Azure SQL database. For people who, like us, use the ERP as a cloud solution, this layer reduces the lock-in effect of the ERP cloud provider. Other advantages of this persistent data storage beyond the source systems include the archiving and historization function it brings with it: Both new and changed data from the source are continuously stored. Deleted records can be marked accordingly. Technically, our data model becomes very close to the data model of the source we use, and under certain circumstances, we are already harmonizing some data types. While this layer can be realized practically and fast, there are also indicators here when to jump to the next T-shirt size:

  • The desired information products require integrated and harmonized data from several data sources.
  • The data quality of the source data is not sufficient and can’t simply be improved in the source system.
  • Key calculations should be saved permanently, for example for audit purposes.
Integrated, Harmonized and Historicized Data: The XL-Size T-Shirt

The next, and for the time being last, T-shirt, the XL-size extends the existing architecture to a classical data warehouse structure between PSA and a data mart. It foregrounds the integration and harmonization of master data from the various source systems. This is done using a central data model, which exists independently of the source used (for instance, a data vault model or a dimensional model). This enables systematic data quality controls to be carried out when initially loading and processing data. Historization concepts can also be integrated into the data here as required. The persistent storage of data in this DWH layer means it is also permanently available for audit purposes.

The various T-shirt sizes don’t only differ in the number of levels they use. To characterize the four architectural approaches more comprehensively, it’s worth taking a brief look at Damhof’s model of Data Management Quadrants [Dam]. I’ll do this in the next blog post.


[Dam] Damhof, Ronald: “Make data management a live issue for discussion throughout the organization”.—the-data-quadrant-model-interview-ronald-damhof.pdf , accessed on 22.11.2019

[Gor19] Gorelik, Alex: The Enterprise Big Data Lake. O’Reilly 2019

[Kim04] Kimball, Ralph: The data warehouse ETL toolkit: practical techniques for extracting, cleaning, conforming, and delivering data. Wiley 2004

[Vos] Vos, Roelant: “Why you really want a Persistent Staging Area in your Data Vault architecture”. 25.06.2016,, accessed on 22.11.2019

This article was first published in TDWI’s BI-SPEKTRUM 3/2019

Debugging access to REST data sources in Power BI

Recently I was involved in building a Power BI report connecting to a REST API as the data source. In this blog post I show you what tools made my life easier working with REST services.

The Error

In order to access the REST service we needed to create a token first and pass this to the service in the HTTP header. We followed the advice here to achieve this in Power BI directly. But it didn’t work and we always got the following error at some point: “Specified value has invalid HTTP Header characters”


As far as I observed it, Power BI is pretty intransparent what happens behind the scene. So first of all I wanted to know whether the REST service works as expected.

Testing the REST service outside of Power BI

There are various options make a REST call, I’ll mention two of them.


A low installation footprint approach is using PowerShell. With the following command you can send a REST request and retrieve the result:

Invoke-RestMethod -Uri <URL> -Body (‘{“systemId”: “it-logix”, “apiToken”: “abcd”}’) -Method POST


At least for me a more convenient way is using a non-command-line tool, for example SoapUI.

Using SoapUI I could proof the REST service works as expected. In the second screenshot on the right side I can see the actual data returned from the REST service.



But why would Power BI throw an error anyway?

Bringing light into what happens behind the scene of Power BI with Fiddler

Looking at the error message, my first guess was that something is wrong with how we set the request header in Power Query. To bring more light into this I used Fiddler. Fiddler links itself into all HTTP traffic on your computer and makes it visible for you to analyze it.

The Power BI traffic can be found in the microsoft.mashup… process.


To my surprise I found that Power BI called the REST service successfully including getting the correct data back similar to what I got in SoapUI. But why would Power BI throw an error anyway?

Compare with a generic REST service and get the riddle solved

As always in a debugging process it is helpful to have a second example to check whether an error is generic or specific to one case. For that I used a service called JSON Placeholder. Here the data import to Power BI worked like a charm. Once again using Fiddler my interest was about the response header. I compared the header of the generic service with the one in my project. Can you see it?

The HTTP response header of the actual REST service

The HTTP response header of the JSON placeholder service

Finally we asked the REST service developer to remove the header line starting with an appostrophe – and suddenly it worked in Power BI 🙂