In my previous post, I’ve introduced the idea of architecture t-shirt sizes to depict the idea that you BI architecture should growth with your requirements. In this blog post I position the four example t-shirt sizes on Damhof’s Data Management Quadrants.
T-Shirt Sizes in the Context of Data Management Quadrants
In [Dam], Ronald Damhof describes a simple model for the positioning of data management projects in an organization. Here, he identifies two dimensions and four quadrants (see also Figure 1). On the x-axis, Damhof uses the terms push-pull strategy, known from business economics. This expresses how strongly the production process is controlled and individualized by demand. On the right or pull side, topic-specific data marts and from them information products such as reports and dashboards, for example, are developed in response to purely technical requirements. Agility and specialist knowledge are the key to this. The first two T-shirt sizes, S and M, can be categorized as belonging on this side. On the left or push side, the BI department connects various source systems and prepares the data in a data warehouse. The focus here is on economies of scale and deploying a stable basic infrastructure for BI in the company. Here we can see the other two T-shirt sizes, L and XL.
On the y-axis, Damhof shows how an information system or product is produced. In the lower half, development is opportunistic. Developers and users are often identical here. For example, a current problem with data in Excel or with other tools is evaluated directly by the business user. This corresponds to the S-size T-shirt. As can be seen in my own case, the flexibility gained for research, innovation, and prototyping, for example, is at the expense of the uniformity and maintainability of results. If a specialist user leaves the company, knowledge about the analysis and the business rules applied is often lost.
In contrast, development in the upper half is systematic: developers and end users are typically different people. The data acquisition processes are largely automated and so do not depend on the presence of a specific person in daily operations. The data is highly reliable due to systematic quality assurance, and key figures are uniformly defined. The L- and XL-size T-shirts can be placed here in most cases.
The remaining T-shirt, the M-size, is somewhere “on the way” between quadrants IV and II. This means it is certainly also possible for a business user without IT support to implement a data mart. If the solution’s development and operation is also systematized, this approach can also be found in the second quadrant. This also shows that the architecture sizes are not only growing in terms of the number of levels used.
The positioning of the various T-shirt sizes in the quadrant model (see Figure 2) indicates two further movements.
The movement from bottom to top: We increase systematization by making the solution independent of the original professional user. In my own dashboard, for example, this was expressed by the fact that at some point data was no longer access using my personal SAP user name but using a technical account. Another aspect of systematization is the use of data modeling.
While my initial dashboard simply imported a wide table, in the tabular model the data was already dimensionally modelled.
The movement from right to left: While the first two T-shirt sizes are clearly dominated by technical requirements and the corresponding domain knowledge, further left increasing technical skills are required, for example to manage different data formats and types and to automate processes.
Summary and Outlook
Let’s get this straight: BI solutions have to grow with their requirements. The architectural solutions shown in T-shirt sizes illustrate how this growth path can look in concrete terms. The DWH solution is built, so to speak, from top to bottom – we start with the pure information product and then build step by step up to the complete data warehouse architecture. The various architectural approaches can also be positioned in Ronald Damhof’s quadrant model: A new BI solution is often created in the fourth quadrant, where business users work exploratively with data and create the first versions of information products. If these prove successful, it is of particular importance to systematize and standardize their approach. At first, a data mart serves as a guarantor for a language used by various information products. Data decoupling from the source systems also allows further scaling of the development work. Finally, a data warehouse can be added to the previous levels to permanently merge data from different sources and, if required, make them permanently and historically available.
Organizations should aim to institutionalize the growth process of a BI solution. Business users can’t wait for every new data source to be integrated across multiple layers before it’s made available for reporting. On the other hand, individual solutions must be continuously systematized, gradually placed on a stable data foundation, and operated properly. The architecture approaches shown in T-shirt sizes provide some hints as to what this institutionalization could look like.
This article was first published in TDWI’s BI-SPEKTRUM 3/2019
The boss needs a new evaluation based on a new data source. There isn’t time to load the data into an existing data warehouse, let alone into a new one. In this article I introduce the idea of architectural T-shirt sizes. Of course, different requirements lead to different architectural approaches, but at the same time, it should be possible that architecture can grow as a BI system is expanded.
T-shirt sizes for BI solutions
The boss needs a new evaluation based on a new data source. There isn’t time to load the data into an existing data warehouse, let alone into a new one. It seems obvious to reach for “agile” BI tools such as Excel or Power BI. After several months and more “urgent” evaluations, a maintenance nightmare threatens. But there is another way: In this article I introduce the idea of architectural T-shirt sizes, illustrated using our own dashboard. Of course, different requirements lead to different architectural approaches, but at the same time, it should be possible that architecture can grow as a BI system is expanded.
The summer holidays were just over, and I was about to take on a new management function within our company. It was clear to me: I wanted to make my decisions in a data-driven way and to offer my employees this opportunity. From an earlier trial, I already knew how to extract the required data from our SAP system. As a BI expert, I would love to have a data warehouse (DWH) with automated loading processes and all the rest. The problem was that, as a small business, we ourselves had no real BI infrastructure. And alongside attending to ongoing customer projects, my own time was pretty tight. So did the BI project die? Of course not. I just helped myself with the tool I had to hand, in this case Microsoft Power BI. Within a few hours the first dashboard was ready, and it was published in our cloud service even faster.
The Drawbacks of Quick and Dirty
My problem was solved in the short term. I could make daily-updated data available to my employees. Further requirements followed, so I copied the power BI file and began fitting it in here and there. Over the next few weeks, I added new key figures to it and made more copies. However, I was only partly able to keep the various dashboard copies “in sync”. In addition, operational problems came up. Of course, I had connected the SAP system under my personal username, whose password has to be changed regularly. This in turn led to interruptions to the data updating and required manual effort in reconfiguring the new password in Power BI.
T-Shirt Sizes for Step-by-Step Development of BI Architecture
I guess a lot of professional users have been in my shoes. You have to find a solution on the quick – and before you know it, you’re in your own personal maintenance nightmare. At the same time, a fully developed BI solution is a distant prospect, usually for organizational or financial reasons.
As an expert in the adaptation of agile methods for BI projects, I have been dealing with this problem for a long time: How can we both address the short-term needs of the professional user and create the sustainability of a “clean” solution? As a father of two daughters, I remembered how my children grew up – including the fact that you regularly have to buy bigger clothes. The growth process is continuous, but from time to time we have to get a new dress size. It is exactly this principle that can also be applied to BI solutions and their architecture. Figure 1 shows four example T-shirt sizes, which are explained in more detail below.
Think Big, Start Small: The S-Size T-Shirt
This approach corresponds to my first set of dashboards. A BI front-end tool connects directly to a source. All metadata, such as access data for the source systems, business rules, and key figure definitions, are developed and stored directly in the context of the information products created.
This T-shirt size is suitable for a BI project that is still in its infancy. It is often only then that a broader aim is formulated involving everything that you would like to analyze and evaluate – this is when “thinking big” begins. Practically, however, not much more known than the data source. But you would also like to be more explorative and produce first results promptly, so it makes sense to begin with the small and technically undemanding.
However, this approach reaches its limits very quickly. Here is a list, by no means complete, of criteria for deciding when the time has come to think about the next architectural dress size:
There exist several similar information products or data queries that use the same key figures and attributes over and over again.
Different users regularly access the information products, but access to the data source is through the personal access account of the original developer.
The source system suffers from multiple, and mostly very similar, data queries of the various information products.
Developing a Common Language: The M-Size T-Shirt
In this size, we try to save commonly useful metadata such as key figure definitions and the access information for the source systems at its own level rather than the information product itself. This level is often also called the data mart or the semantic layer. In the case of my own dashboard, we developed a tabular model for it in Azure Analysis Services (AAS). The various specifications or copies of the dashboards as such in large part remained – only the substructure changed. However, all variants now rested on the same central foundation. The advantages of this T-shirt size in comparison to the previous are clear: Your maintenance effort is considerably reduced, because the basic data is stored centrally once and does not need to be maintained for every single information product. At the same time, you bring consistency to the definition and designation of the key figures. In a multilingual environment, the added value becomes even more apparent, because translations are centralized only once and maintained uniformly in the semantic layer. All your dashboards thus speak a common language.
In this M-size t-shirt, we still do not store any data permanently outside the source. Even if the source data is transferred to the tabular model in AAS, it must be re-imported for larger adjustments to the model. Other manufacturers’ products come completely without data storage, for example, the Universes in SAP BusinessObjects. This means that a high load is sometimes applied to the source systems, especially during the development phase. Here is a list of possible reasons to give your “BI child” the next largest dress size:
The load on the source systems is still too large despite the semantic layer and should be further reduced.
The BI solution is also to be used as an archive for the source data, for example in the case of web-based data sources where the history is only available for a limited period of time.
If the source system itself does not historize changes to data records, this can be interpreted as another reason for using the BI solution as a data archive.
Decoupling from the Source: The L-Size T-Shirt
The next larger T-shirt for our BI architecture, the L-size, replaces the direct data access of the data mart to the source. To do this, the data is extracted from the source and permanently stored in a separate database. This level corresponds to the concepts of a persistent staging area (PSA) (see also [Kim04] pp. 31ff. and [Vos]) or an actively managed data lake (see also [Gor19], for example p. 9). They all have one thing in common: that the data is taken from the source with as little change as possible and stored permanently. This procedure means that the existing data mart can be reassigned to this new source relatively easily. In my own dashboard example, we’re not at this stage yet. But in the next step, we have planned to extract the SAP data using the Azure Data Factory and store it permanently in an Azure SQL database. For people who, like us, use the ERP as a cloud solution, this layer reduces the lock-in effect of the ERP cloud provider. Other advantages of this persistent data storage beyond the source systems include the archiving and historization function it brings with it: Both new and changed data from the source are continuously stored. Deleted records can be marked accordingly. Technically, our data model becomes very close to the data model of the source we use, and under certain circumstances, we are already harmonizing some data types. While this layer can be realized practically and fast, there are also indicators here when to jump to the next T-shirt size:
The desired information products require integrated and harmonized data from several data sources.
The data quality of the source data is not sufficient and can’t simply be improved in the source system.
Key calculations should be saved permanently, for example for audit purposes.
Integrated, Harmonized and Historicized Data: The XL-Size T-Shirt
The next, and for the time being last, T-shirt, the XL-size extends the existing architecture to a classical data warehouse structure between PSA and a data mart. It foregrounds the integration and harmonization of master data from the various source systems. This is done using a central data model, which exists independently of the source used (for instance, a data vault model or a dimensional model). This enables systematic data quality controls to be carried out when initially loading and processing data. Historization concepts can also be integrated into the data here as required. The persistent storage of data in this DWH layer means it is also permanently available for audit purposes.
The various T-shirt sizes don’t only differ in the number of levels they use. To characterize the four architectural approaches more comprehensively, it’s worth taking a brief look at Damhof’s model of Data Management Quadrants [Dam]. I’ll do this in the next blog post.
Quite a while ago, I published a blog post about my Agile BI Maturity Model. In this post I’d like to show you the current state of the model.
First of all I renamed the model to “Agile BI Building Blocks”. I don’t like the “maturity” term anymore as it somehow values the way you are doing things. Building blocks are more neutral. I simply want to show what aspects you need to take into consideration to introduce Agile BI in a sustainable way. The following picture shows the current model:
What changed compared to version 1? Let me go through the individual building blocks:
Agile Basics & Mindset. No change – still very important: You need to start with agile basics and the agile mindset. A good starting point is still the Agile Manifesto or the newer Modern Agile.
Envision Cycle & Inception Phase. No change – this about the importance of the Inception Phase especially for BI project. Simply don’t jump straight into development but do some minimal upfront work like setup the infrastructure or create a highlevel release scope and secure funding.
BI specific User Stories. Changed the term from simply User Stories to “BI specific User Stories”. Unfortunately I didn’t manage to write a blog post about this yet, but in my recent workshop materials you’ll find some ideas around it.
No / Relative Estimating. Changed from Relative Estimating (which is mainly about Story Points) to include also No Estimating which is basically about the #NoEstimates movement. I held a recent presentation at TDWI Schweiz 2017 about this topic (in German only for now)
(Self Organizing) Teams. Changed and put the term “Self Organizing” in brackets as this building block is about teams and team roles in general.
Workspace & Co-Location. Added “Workspace” as this building block is not only about co-location (though this is an important aspect of the agile workspace in general)
Agile Contracting. No change, in my recent presentation at TDWI Schweiz 2017 I talked about Agile Contracting including giving an overview of the idea of the “Agiler Festpreis”, more details you can find in the (German) book here.
New: Data Modeling & Metadata Mgt. Not only for AgileBI data modeling tool support and the question around how to deal with metadata is crucial. In combination with Data Warehouse Automation these elements become even more important in the context of AgileBI.
New: Data Warehouse Automation. The more I work with Data Warehouse Automation tools like WhereScape, the more I wonder how we could work previously without it. These kind of tools are an important building block on your journey of becoming a more agile BI environment. You can get a glimpse at these tools in my recent TDWI / BI-Spektrum article (again, in German only unfortunately)
Version Control. No change here – still a pity that version control and integration into common tools like Git are not standard in the BI world.
Test Automation. No change here, a very important aspect. Glad to see finally some DWH specific tools emerging like BiGeval.
Lean & Fast processes. No change here – this block refers to introducing an iterative-incremental process. There are various kinds of process frameworks available. I personally favor Disciplined Agile providing you with a goal-centric approach and a choice of different delivery lifecycles.
Identify & Apply Design Patterns. No change except that I removed “Development Standards” as a separate building block as these are often tool or technology specific formings of a given design pattern. Common design patterns in the BI world range from requirements modeling patterns (e.g. the BEAM method by Lawrence Corr as well as the IBIREF framework) to data modeling patterns like Data Vault or Dimensional Modeling and design patterns for data visualization like the IBCS standards.
New: Basic Refactoring. Refactoring is a crucial skill to become more agile and continously improve already existing artefacts. Basic refactoring means that you are able to do a refactoring within the same technology or tool type, e.g. within the database using database refactoring patterns.
New: Additive Iterative Data Modeling. At a certain step in your journey to AgileBI you can’t draw the full data model upfront but want to design the model more iteratively. A first step into that direction is the additive way, that means you typically enhance your data model iteration by iteration, but you model in a way that the existing model isn’t changed much. A good resource around agile / iterative modeling can be found here.
Test Driven Development / Design (TDD). No change here. On the data warehouse layer tools like BiGeval simplify the application of this approach tremendously. There are also materials availble online to learn more about TDD in the database context.
Sandbox Development Infrastructure. No change, but also not much progress since version 1.0. Most BI systems I know still work with a three or four system landscape. No way that every developer has its own full stack.
Datalab Sandboxes. No change. The idea here is that (power) business users can get their own, temporary data warehouse copy to run their own analysis and add and integrate their own data. I see this currently only in the data science context, where a data scientist uses such a playground to experiment with data of various kinds.
Scriptable BI/DWH toolset. No change. Still a very important aspect. If your journey to AgileBI takes you to this third stage of “Agile Infrastructure & Patterns” which includes topics like individual developer environments and subsequently Continuous Integration, a scriptable BI/DWH toolset is an important precondition. Otherwise automation will become pretty difficult.
Continuous Integration. No change. Still a vision for me – will definitely need some more time to invest into this in the BI context.
Push Button Deployments. No change. Data Warehouse Automation tools (cf. building block 9) can help with this a lot already. Still need a lot of manual automation coding to have link with test automation or a coordinated deployment for multiple technology and tool layers.
New: Multilayer Refactoring. In contrast to Basic Refactoring (cf. building block 14) this is the vision that you can refactor your artefacts across multiple technologies and tools. Clearly a vison (and not yet reality) for me…
New: Heavy Iterative Data Modeling. In contrast to Additive Iterative Data Modeling (cf. building block 15) this is about the vision that you can constantly evolve your data model incl. existing aspects of it. Having the multilayer refactoring capabilities is an important precondition to achieve this.
Looking at my own journey towards more agility in BI and data warehouse environments, I’m in the midst of the second phase about basic infrastructure and basic patterns & standards. Looking forward to an exciting year 2018. Maybe the jump over the chasm will work 😉
What about your journey? Where are you now? Do you have experience with the building blocks in the third phase about agile infrastructure & patterns? Let me know in the comments section!
Another aspect of this project is that I’m working with the Microsoft BI stack only. SQL Server 2016 with SQL Server Reporting Services (SSRS) as well as PowerBI is a great experience especially in combination with the IBCS dataviz standard.
Regarding blogs and publications, I was a bit more active contributing German articles:
My personal highlight of today, I’ll be speaking during Agile Testing Days 2017: I’ll do a 2.5 hours workshop regarding the introduction of Agile BI in a sustainable way.
It would be a pleasure to meet you during one of these events – in case you’ll join, send me a little heads-up!
Last but not least, let me mention the Scrum Breakfast Club which I’m visting on a regular basis. We gather once a month using the OpenSpace format to discuss practical issue all around the application of agile methods in all kind of projects incl. Business Intelligence and Datawarehousing. The Club has chapters in Zurich, Bern as well as in Milan and Lisbon.
Just a quick note for my UK based readers who are interested in AgileBI: Join my workshop in London on Nov 7!
Just a quick note for my UK based readers who are interested in AgileBI: I’m proud to have been selected as a speaker for the upcoming Enterprise Data & Business Intelligence (EDBI) conference in London, taking place on November 7 – 10. I’ll lead a half day workshop on Monday afternoon November 7 around my Agile BI Maturiy Model and of course would be happy to welcome you there too!
In this blog post I discussed various building blocks which can help to establish and manifest an agile approach to Business Intelligence within an organisation. In this article I will focus on the aspect “Agile Mindset & Organisation”. The probing question is, which approach is best suited to develop a data warehouse (= DWH; not just the BI Frontend) incorporating the agile principles. Closely linked to this matter is the question of cutting user stories. Is it sensible to size a user story “end to end”, i.e. from the connection of the source, staging area and core DWH all the way to the output (e.g. a dashboard)? As you can imagine, the initial answer is “it depends”.
When looking at the question more closely, we can identify multiple factors that will have an impact on our decision.
First of all, I would like to point out the distinction of the following two cases of the development organisation:
Organizational Structure A) DWH and BI Frontend are considered one application and are developed by the same team.
Organizational Structure B) DWH und BI Frontend are considered as separate applications (while being loosely coupled systems) and developed by different teams.
Characteristics of the Organizational Culture
In a next step we need to differentiate between two possible characteristics of organizational culture, which are extracted from Michael Bischoff’s book “IT Manager, gefangen in Mediokristan”, (engl.: IT Manager, trapped in Mediokristan). A nice review of the book can be found in this blog entry (in German).
„Mediokristan“: The „country“ Mediokristan is described as a sluggish environment where hierarchies and framework are predetermined, and risk- and management concepts are dominating factors. It stands symbolically for the experiences I have made in large corporate IT organizations. Everything moves at a slow pace, cycle times tend to be high; mediocrity is the highest standard.
“Extremistan”: The “country” Extremistan is best described as the opposite of Mediokristan. New, innovative solutions are developed and implemented quickly, fostered by individual responsibility, self-organization and a start-up atmosphere. Its citizens strive for the extreme: extremely good, extremely flexible, regardless of the consequences. What works will be pursued further, what does not will be discontinued. Any form of regulation or framework is rejected extremely too in case they are seen to hinder innovation.
Needless to say, the two cultures described are extremes, there are various other characteristics between the two ends of the spectrum.
Alternatives to agile development approaches
The third distinction I would like to point out are the following two alternative approaches to agile development:
Development Approach A) Single Iteration Approach (SIA): A number of user stories is selected in the beginning of an iteration (called a Sprint in the Scrum jargon), with the goal to have a potentially deployable result at the end of one iteration. With Organizational Structure A in place (see above), the user story will be cut “end to end” and has to encompass all aspects: Connecting the required source system (if not available already), data modeling, loading of the data into the staging layer, EDWH and data mart layer and last but not least creating a usable information product (e.g. a report or a dashboard). The processing of the user story also includes developing and carrying out tests, writing the appropriate documentation, etc. It is a very challenging approach and will generally require a Team of T-Skilled People, where each member of the team possesses the skills to manage and fulfil any and all of the upcoming tasks over the time.
Development approach B) Pipelined Delivery Approach (PDA): The PDA exists because of the assumption that the SIA is illusive in many cases. One reason is the lack of T-Skilled people in our specialist driven industry, or the involvement of multiple teams (e.g. separate teams of Business Analysts, Testing) in the development process in extreme cases. Another reason is the mere complexity we often see in DWH solutions. An iteration cycle of two to four weeks is already quite ambitious when – in the figurative sense – doing business in Mediokristan.
As an alternative, PDA describes the creation of a DWH/BI solution based on a production line (also see the book written by Ralph Hughes: “Agile Data Warehousing Project Management: Business Intelligence Systems Using Scrum, Morgan Kaufmann, 2012). The production line (=pipeline) consists of at least three work stations: 1. Analysis & Design, 2. Development, 3. Testing. In the shown illustrations below (taken from a concrete customer project) I added a fourth station dedicated for the BI Frontend development. A user story runs through each work station in one iteration at the most. Ideally, the entire production line is run by the same team, which we will assume in the following example:
The user stories that are to be tackled in the early stages are defined and prioritized during a regular story conference at the outset of a production cycle. They will thereafter be worked on in the first work station “Analysis & Design”. Evidently, in this initial phase, the pipeline as well as some members of the team are not used to full capacity. In accordance with the Inception Phase in the Disciplined Agile approach, these gaps in capacity can be used optimally for other tasks necessary at the beginning of a project.
After the first iteration, the next set of user stories that will be worked on will be defined at the story conference and passed on to the first work station. Simultaneously, the user stories that were worked on in the first work station in the previous iteration will be passed on to the next one, the development station.
After the second iteration, more user stories will be chosen out of the backlog and passed on to the first work station, while the previous sets will move further along the pipeline. Thus, the user stories of the first iteration will now be worked on in the testing work station. (One word re testing: Of course we test throughtout the development too! But why would we need a separate testing iteration then? The reason lies within the nature of a DWH, namley data: During the development iteration a developer will work and test with a limited set of (test) data. During the dedicated testing iteration full and in an ideal case production data will be processed over multiple days; something you can hardly do during development itself).
Consequently, the pipeline will have produced “production ready” solutions for the first time after three iterations have passed. Once the pipeline has been filled, it will deliver working increments of the DWH solution after each iteration – similar to the SIA.
The two approaches differ in the overall lead time of a user story between story conference and the ready-made solution, which tends to be shorter using the SIA in comparison to the PDA. What they have in common is the best practice of cutting user stories: They should always be vertical to the architecture layers, i.e. “end-to-end” from the integration of a source system up to the finished report (cf. organizational structure A) or at least the reporting layer within the DWH (cf. organizational structure B). While the PDA does incorporate all values and basic principles (this topic might be taken up in another article) of agile development, in case of doubt, the SIA is more flexible. However, put in practice, the SIA implementation is much more challenging and the temptation of cutting user stories per architectural layer (e.g. “Analysis Story”, “Staging Story”, “Datamart Story”, “Test Story”) rather than end-to-end is ever present.
When is which approach best suited?
Finally I would like to show where and how the above mentioned aspects are in correlation with each other.
Let’s have a look at “Organizational Structure A) DWH and BI Frontend are considered one application”. If the project team is working out of Extremistan and consists of mostly T-skilled team members, chances are that they will be able to implement a complete DWH/BI solution end-to-end working with the SIA. The success of the project with this approach is less likely, if large parts of the team are located in Mediokristan. Internal clarifications and dependencies will require additional time in matters of Analysis and Design. Furthermore, internal and legal governance regulations will have to be considered. Due to these factors, in my experience, the PDA has a better chance at success than the SIA when developing a DWH/Frontend in Mediokristan. Or, to put it more positively: Yes, Agile Development is possible even in Mediokristan.
The situation I have come across more often is „Organizational structure B) DWH and BI Frontend are considered two separate applications“. As a matter of principle, agile development with the SIA is simpler in the BI Frontend than the DWH backend development. That being said, the environment (Mediokristan vs. Extremistan) also has a great impact. It is possible to combine the two approaches (PDA for the DWH Backend, SIA for the BI Frontend), especially if the BI Frontend is already connected to an existing DWH and there is no need to adapt it according to every user story in the BI Frontend. Another interesting question in Organizational structure B) is how to cut the user stories in the DWH Backend. Does it make sense to formulate a user story when there is no concrete user at hand and the DWH is, in the practical sense, developed to establish a “basic information supply”? And if yes, how do we best go about it? An interesting approach in this case is the Feature Driven Development (FDD) approach, described in this Blog article of Agile-Visionary Mike Cohn. The adaption of the FDD approach when developing a DWH might be interesting material for a future article…
As you can see, the answer “It depends” mentioned at the beginning of this blog post is quite valid. What do you think? What is your experience with Agile BI in either Mediokristan or Extremistan? Please feel free to get in touch with me personally or respond with a comment to this blog post. I look forward to your responses and feedback.
(A preliminary version of this blog has been posted in German here. Many thanks to my teammate Nadine Wick for the translation of the text!)
In this post I outline how I managed to get a cloud based training environment ready in which WhereScape RED, a data warehouse automation tool, connects to a Teradata database test machine.
A few weeks ago I had to organize a so called “testdrive” for a local WhereScape prospect. The prospect uses a Teradata database appliance. Hence they evaluated to use WhereScape RED based on Teradata too. As a local Swiss based WhereScape partner we received a virtual machine containing a SQL Server based WhereScape RED environment. The training had to be run onsite at the customer’s location, IT-Logix provided their set of training laptops, each containing 4GB or RAM. These were my starting conditions.
First of all I thought about how to deal with Teradata for a training setup. Fortunately, Teradata provides a set of preconfigured VMs here. You can easily download them as zipped files and run it using the free VM Player.
Based on my previous experience with organizing hands-on sessions, e.g. during our local Swiss SAP BusinessObjects user group events, I wanted to use Cloudshare. This makes it much more easier (and faster!) to clone an environment for multiple training participants compared to copying tons of gigabytes to multiple laptops. In addition, the 4GB RAM wouldn’t be enough to run Teradata and WhereScape properly in a performant way. So I had two base VMs (one from WhereScape, one from Teradata) – a perfect use case to use the VM upload feature in Cloudshare for the first time.
I started with this support note which explains how to prepare your local VM and load it up to my Cloudshare FTP folder. From there you can simply add it to an environment:
After having uploaded both VMs it looks like this in Cloudshare:
I increased the RAM and CPU power a bit, and more important configured the network between the two machines:
Go to “Edit Environment” -> “Edit Networks”:
Here I had to specify to which virtual network I’d like to connect the VMs. Please keep in mind that this doesn’t provide an automatic DHCP server or similar. Either you create one within your machine or – as in my case – had to set static IPs within the individual VM (both were delivered by using a dynamic IP provided by the VM Player). Changing the IP wasn’t a big thing, neither on Windows nor on Linux.
But I quickly found out that the Teradata service didn’t run properly anymore afterwards.
First of all I had to create a simple test case to check if I can connect from the WhereScape VM to the Teradata machine. Besides a simple Ping (which worked) I installed the Teradata Tools & Utilities on the WhereScape machine. As I couldn’t establish a proper connection, I had to google a bit. The following article gave me the hint to add a “cop” entry to the host file:
After a restart of the machine, Teradata was up and running again. This you can verify with the following command “pdestate -a” by the way:
The next step in WhereScape was to create a new metadata repository on the Teradata database. For this I created a new schema and user in Teradata first and then created the metadata repository using the WhereScape Administrator:
In WhereScape RED I created a connection to point to the new Teradata database:
… and finally loaded a few tables from the SQL Server to Teradata:
Once I finished the work, the most important step is to create a snapshot:
Based on this snapshot I finally cloned the environment for the number of participants in the testdrive with just a few clicks. After all, every participant had his own (and isolated) environment consisting of a full stack of source database (SQL Server), WhereScape and the target DWH database (Teradata).