Data Warehouse and Scrum: An elephant in a porcelain shop
Posted on Jan 11, 2018
Scrum framework is nowadays commonly used also in Data Warehouse development - more or less successfully. Unlike in typical agile software development teams, DWH teams face challenges that are built into their management since decades ago. I’ve a Java Architect’s background and I’ve also been responsible for DWH teams, therefore I’ve witnessed their pain with some fresh perspective:
- Sprint activities get too complex. DWH structures can grow huge in large enterprises and governmental organisations. Dependencies in DWH design are typically more critical and tighter than in other forms of software development. Packages contain procedures and views that require massive amount of tables before they can be tested. Tables on the other hand often contain data that must be correctly available for testing. Everything relates to everything. And that’s the problem for Scrum, which encourages to work more on atomic small development activities, well defined deliverables and a clear definition of done. One of the ways to overcome this challenge is to understand that not everything can be built instantly. Be less greedy when negotiating about new features with customer. Also, don’t try to make sprints too short. It is easy to split tasks artificially into smaller steps, but in Scrum we are more interested about fully functioning deliverables instead of step-by-step implementation.
- Traditional roles in DWH team do not fit into Scrum team. Typically there’s at least architects and developers separated, while in Scrum team everyone should have a shared developer role. DWH is not as “development oriented" as other software development teams. Instead of creative features or new functions, the focus is more on pre-defined data structures and dependencies. Recently I’ve solved this by isolating architects from the rest of the team and making them available as an external resource for developers. Scrum teams anyhow interact with a lot of stakeholders and an architect can serve several Scrum teams that are working on the same warehouse. Developers’ responsibility is then to understand and follow the requirements provided by an architect.
- Continuous Integration is difficult to implement. This is not really mandatory for Scrum, but it describes a lot about challenges. Unlike in modern software development frameworks, the tools that DWH teams are using are typically not built for CI. While for example Java has a lot of frameworks, platforms and tools for DevOps, DWH teams must survive more on their own and rely on database vendor’s own solutions. Another major challenge is that databases are stateful. Deployment is not just about turning microservices on with the latest snapshot of code. Deployment requires a lot of data structure changes, massively time consuming data loads and tricky data conversions. Unfortunately there is no straight solution to change this, but a skilled developer can create scripted CI environment even without ready-made frameworks and tools. After all, in information technology everything is possible - it’s just a matter of resources and costs.
Overall I think that DWH teams can benefit a lot from Scrum, even though it still requires evolution in tools, methodologies and work culture. Here’s an interesting blog post that goes slightly deeper into these challenges: The Agile Data Warehouse (Medium.com).