How about fixing the CI?

Every good story has a beginning somewhere. I want to think that this very story started on a whiteboard just like this one.

whiteboard

And then the lucky hero of our story made a great design putting all the pieces together on a project just around the corner, about to happen in the best of ways. And then all members of the team gathered together and made a spike of few ideas giving birth to a proof of concept design. And then they started building up the lovely product.

But it didn’t happen that way. In reality it rarely does.Permalink

On the contrary it started with our Jenkins server celebrating its 9000 unsuccessful job. I remember staring at it and I was confused. Why do we have a Continuous Integration with a Jenkins that doesn’t even access its repository? And it won’t stop trying.

OK, I thought - let’s get to the white board and see what we can do about it. We started drawing workflows and then we added some stakeholders, wiped the board clean and gave it another spin with different setup.

The funny thing about making a design is that it always plays so nice in your head. And then it goes through the scrutiny of the team finally ending at its greatest challenge - reality itself. And then it usually breaks a few times until we get it right only to find out that there is so much more work to be done. Surprisingly the apps needs to adapt too. And it doesn’t help when I’m being sarcastic about it.

What is a good CI anyway?Permalink

I went through the digital books library we use searching for some knowledge. I found around 20k books mentioning the topic. I felt confident with the idea that as they said in one movie “the truth is out there”. So I started digging through countless statements of what a successful CI should look like.

We need to use SCM.

That’s fairly straightforward - we use Git. Great.

When we develop a software feature we need to split a branch and having a good branching strategy is key.

Alright, we have that too. I mean - there is the develop branch and we merge all our feature branch goodies to it before going any further. So that’s checked too.

We should build before deploying to testing environment. And we should make sure the build succeeds.

That’s OK - we setup our Jenkins to do so every time we push to the Git server.

We should push the code to testing and run all our tests.

Hmm…

If all goes well we might be able to release!

Hold on a second. Let’s go back a step. We have all kinds of dependencies to figure out when making a successful testing environment. For example we often have common configuration that won’t change too often and have very little to do with the features of our application. This configuration acts to support our product but we can’t stitch it to our repositories because of many reasons like:

Security. API keys, credentials and certificates are very sensitive information and moving them around with our code is a bad idea.
Application instances and infrastructure configuration should usually be decoupled from the application code.

We choose a design to solve problems. We decided to write down those problems first:

We need to have a CI that runs our tests in an automated way.
We want to be able to test features developed in feature branches without locking out the whole testing environment until we are finished.

The first problem is pretty straightforward. We tune the Jenkins to use specific database for CI and configuration. Tests get run, we are happy.

However when testing a feature we want to test it with the exact same configuration that production has. To elaborate on this we use customized testing. We have some middleware to help us see transactions, our API is in debug mode throwing a lot of sensitive information with every reply call, not to mention all the mockups and testing frameworks.

If we execute one test in production and one in testing environment the branch we test could behave very differently. We might get different results due to the significant difference between those two environments. So to solve this we should normally go to staging and test our code there. In theory staging and production are the same. And there goes our next challenge. The staging environment is also used for UAT (user acceptance testing) and introducing changes there as the developers please would introduce hits on the UAT process. The fact that UAT is a lengthy procedure in many cases executed in more than a week doesn’t help.

What started as a small ticket on our scrum board is not being developed and cut into small, actionable pieces. We transformed the initial story ticket into epic and estimated new stories of slowly transforming the testing, staging and production environments. The nice thing about doing so in an Agile way is that we continuously get the benefits of every story sprint after sprint as things progress.

Now we have better testing and the clash the developers were experiencing is gone together with the constantly failing Jenkins jobs.