An obeya to build and industrialize our IT delivery system

During a recent experience in an international insurance company, I coached and supported several teams as they worked to accelerate the delivery of an application server upgrade. A total of four teams of six to nine people were tasked with tackling server obsolescence by updating Operating Systems (OS) and Databases (DB) and by encrypting data (files, tables, etc.).

The teams were composed of internal customers (business representatives), IT project manager, architects, database analysts, operations people, security specialists, and external consultants. The IT system of this large company includes several thousand servers and the main goal of the upgrade projects was to ensure their robustness, availability and security. These projects were first launched several years ago but were suspended and restarted many times. The final push to complete the updates and ensure the conformity of the servers came within the legislative context surrounding GDPR.

To accelerate the delivery of the updates and enable strong collaboration among different company departments, it was decided to establish an Obeya and have the teams work within that context. In this large room, there were seven panels: Vision, Voice of the Customer, Product, Macro plan, Micro plan, KPIs and Continuous Improvement.

Illustration of the seven Obeya panels, by Edmond Nguyen at Operae Partners

My initial concern was to clarify the project’s vision. So, I interviewed the project sponsors – the CEO and the Security Officer – to define our goals in terms of topics, delay, metrics and security levels we had to reach. The first panel was then shared with the teams and displayed in the Obeya, always visible as our True North.

After that, however, I quickly realized that the teams didn’t have a clear enough understanding of the steps that were necessary to address server obsolescence. In particular, there was no agreement on the process and on who was responsible for what part of it. Things were even less clear when we got into the details of each activity, the execution time of each step (for example, how long it takes to encrypt 1TB on MariaDB and what the impact is on read and write performance), the necessary prerequisites and roles and responsibilities. Everyone had their own point of view on the actions they had to perform. The teams didn’t have a global overview on the process, nor did they agree on how different activities interfaced.

STEP 1 – LEARNING TO MANUALLY DELIVER A FIRST “GOOD PIECE”

Rather than embarking on an endless RACI (Responsible Accountable Consulted Informed) diagram, I asked the teams to use post-it notes to represent on the wall the macro-steps on which they agreed at least in principle. It was in this “minimalist” context that we held our first weekly Obeya ceremony with the four core teams (made up of IT specialists and operations people coming together to collaborate on improving security of systems), each of them led by an IT Project Leader turned Chief Engineer (the individuals my coaching was meant to support most directly).

We then asked the teams to set a first attainable goal, a Minimum Viable Product. The idea behind a MVP is on the one hand to bring value to the internal customer as quickly as possible by delivering a first version of the product on which he can give feedback (limiting the tunnel effect) and on the other hand to allow the team to learn how to work together. In Lean Thinking it is akin to the first good piece, offering confirmation that the process works and that the team are able to deliver the expected product.

Our first MVP was deliberately simple: a single Redhat server hosted on a single physical machine. The objective was to deliver it in two weeks (10 working days).

Based on this MVP, the teams went on to construct the panels Product, Macro plan (months and weeks) and Micro plan (day to day) with the support of the chief engineer. The result was a “visual roadmap” that represented the product to be delivered and the steps necessary to deliver it and that the teams were aligned to.

The MVP #1 was delivered one month later. The first server was OK, but two weeks late. With thousands of servers to upgrade, we knew this was just a beginning. In order to improve our next MVP, we decided to interview the internal customer (business representative) to identify any unfulfilled expectations. The feedback was that this first iteration was quite laborious: he was asked to stop and restart the server six times in order to carry the update, which impacted the service to end-users, and also had to perform four different test sessions following each main upgrade sequence. In the end, the server was unavailable for a total of four days, which penalized end users during the test sessions.

At the team level, as one can easily see from the spaghetti diagram below, there were many difficulties and obstacles. Spaghetti diagrams are very useful to show waste as experienced by the people during the execution of a process. In this first iteration of the MVP, there were some tension and disagreement (the issue was even escalated to managers at one point) over the prerequisites for the update operation as well as on the precise medium to be used to communicate the prerequisites (e.g. mail versus ticket, PDF versus Word document, and so on).

Spaghetti diagram: Exchanges between different people (each colored square) to prepare the operation
+ 200 mails sent, 4 managerial escalations, 6 face-to-face workshops.

In the first month, the team had addressed 37 problems (12 definitely solved and 25 in progress) using the PDCA (Plan Do Check Act) cycle, the real driving force behind continuous improvement in the Obeya. Each cycle/iteration was recorded on the seventh panel on the wall. PDCA develops the team members’ expertise, makes them autonomous in solving their problems, and removes obstacles to improvement one by one. Each improvement is then shared with the rest of the team at the weekly core-team ceremony.

Despite all these difficulties and a “moderately satisfied” internal customer, the team had managed to deliver its first server, achieve an initial success, and learn a lot in the process (with a lot of room for further improvement). In particular, they:

Designed a first version of the process, with six major steps and a dozen activities for each step and a measurement of the time spent on the key steps. This process was of course displayed in the Obeya.
Came up with four working standards (in the lean perspective) on key operations, featuring a list of prerequisites (versions, last server restart date, etc.), the exhaustive list of tasks to perform, the key point and the reason why this task is necessary: stopping the database, backup of tablespaces, command to encrypt the tablespaces with a key, and so on.
Created three checklists – how to make sure the data is encrypted, conditions before launching the load shots, required target disk spaces based on current size.
Introduced an abacus on the impact of data encryption on read and write performance, on the size of filesystems, and more.

STEP 2 – ACCELERATING DELIVERY AND IMPROVING SATISFACTION

Proud of this success, but with the firm intention of improving with the next iteration, the team set off to produce a second MVP of the server update process. This version had 10 virtualized servers (the first MVP had one) on two physical servers. The company set itself a target of two weeks to deliver it.

We started by building the product on the wall of the Obeya, so that the whole team could align to the scope of the project and succeed within the two-week timeframe. We created a panel with the different Virtual Machines (VMs), databases, physical servers, OS/Middleware versions, tablespace sizes. This is when the team discovered that two databases installed on one of the four servers were not in the scope of the MVP #2 : they represented a concern for another application with a different side of the business. It was therefore necessary to exclude them and to inform the business when the servers were stopped/restarted.

Starting from the work they had done with MVP #1, the team built an improved version of the server update process, which entailed the mutualization of certain activities in order to reduce the workload of the business (for example, on the shutdown and restart of databases and acceptance phases) and the elimination of unnecessary steps (like beginning to make corrections on the spot). The learnings from the experience with MVP #1 were still clear in the team’s mind, and the A3 sheets of the many PDCAs still hanging to the walls!

By the time the new process was ready, one step and many sub-activities had been removed or mutualized.

The team represented on the Obeya’s Macro-plan panel the final two milestone of two weeks to deliver MVP #2 and agreed on the operations to be performed day by day as listed on the Micro-plan panel: prepare scripts, provide technical prerequisites, prepare the load tests (like stress tests simulating large number of users), plan the ranges of intervention, business tests after the operation. Everyone knew what action they were supposed to perform, what was the expected deliverable and what standard or checklist they could rely on.

To the satisfaction of the internal customer, the team was able to deliver the MVP #2 on time. The process was improved, different standards were tested, and checklists prepared. They also scripted (wrote specific code for commands) some of the operations that were previously performed manually, like the data encryption operation. A team member commented: “We stopped pushing projects like wagons on a train and started to pull them from the customer perspective.”

They also encountered a number of problems, however. For example, the script erased the Wallet containing some OS access codes, which meant that some rework was necessary to repair the Wallet and take into account the backup. The two servers excluded from the project of the second MVP were partially impacted.

For the second iteration, the team didn’t draw a spaghetti diagram but they from over 200 email exchanges to a few dozen, without any need to escalate problems to managers or tension within the team. This is another way in which waste was drastically reduced.

TO CONCLUDE

The team has undeniably developed individual expertise and learned a lot about its delivery process. The Obeya has been very useful in helping people collaborate more effectively on a common goal. (One of the team members said: “The big added value of the Obeya was the ability to transcribe and share the explicit and implicit knowledge of each person, which must be extracted.”) The intensive problem-solving sessions they participated in allowed them to experience continuous improvement firsthand.

To achieve the expected takt time, the pace set by the customer, they will need to become even faster with their next iteration – MVP #3: their target is to deliver 10 servers/day. This will only be possible with a fully industrialized process, with standards to ensure consistent levels of quality and a set of stable and well-defined steps that don’t cost more than they should. It’s clear that the team will have to introduce a pulled flow to reveal the problems that need solving and shed a light on the waste that can be removed.