Why digger introduces an orchestrator

When Digger started out as an open source alternative to terraform cloud there was a very basic idea behind it and that is the hunch that you don't need yet another parallel CI system to run your terraform. We all acknowledge the reason there was a plethora of CI systems dedicated to Terraform that have come into the picture such as spacelift, terrafrom cloud, scalr, env0 and so on. Indeed there are several pains that exist from having to set up collaboration for terraform. In addition to concurrency issues there's also how to prevent access, how to set up locking properly. How to configure all your environments quickly, and how to scale your setup to larger teams which includes 100s of different smaller independent pieces.

So we sought out to solve all the original pains of terraform collaboration but this time we had one line that we didn't want to cross: The terraform needs to reuse existing CI systems. Digger will never implement its own CI runners, because we believe that CI is a solved problem. Instead we want to reuse existing CI and solve the problems in terraform collaboration right there.

Our second hypothesis is that digger should be CI-agnostic and git management agnostic. We want to support all CI systems including github actions, jenkins, gitlab pipelines and argoCD. We also want to support all source control

In the initial iterations of digger we have decided to focus on Github as source control, AWS as a target platform and Github Actions as a CI system. We needed a place to store where which digger project was locked by which pull request. A digger project is nothing but a piece of terraform that needs to be applied and maintained. So we settled on dynamoDB for storing the locks since it supports distributed mutex locks. This means that it guarantees that if a lock is written then other processes checking or its existence will not receive a conflicting result from the table check - an important characteristic for guaranteeing lock safety.

We got alot of traction from our initial version and a large number of support requests came through. Here is a summary of some of the major feature requests that have come in:

Feature	Category or Dimension
Support GCP based locks	platform
Support Azure Based locks	platform
Support storage of plans in GCP Backend	platform
Support storage of plans in Github artefacts	platform
Supporting gitlab pipelines	CI/CD
Supporting Azure Devops	CI/CD
Supporting Jenkins	CI/CD
Supporting Gitlab	SCM
Supporting Bitbucket (with jenkins)	SCM
Supporting Bitbucket (With bitbucket pipelines)	SCM

We quickly realised that everytime we added a new parallel dimension to the feature set. So the number of implementations multiplies when we add a new dimension. For example, adding a new SCM support requires implementing APIs for publishing comments to that SCM. We quickly realised that the number of implementations needed is quickly going to explode and this is going to be hard for our team of 3 engineers to maintain and ensure that each combination is reliable. Sure we can write integrations tests for all the flows but our past experience has shown that the number of flows or paths that a product has to support is inversely proportional to overall reliability of the product (since there are too many edgecases to deal with. For example a quick finger calculation on these dimensions: SCM, CI System, Target Platform, Orchestrator function, Output location showed us that we could be looking at thousands of interface implementations.

Therefore we decided to take a step back and think deeper about how we are planning to architect this product in a way that is laser-focused on making it extremely reliable, flexible and open.

The current approach

The current approach is that the digger cli runs independently in the pipeline and interfaces with the target platform for performing functionalities such as locking, plan storage, webhooks and state management. As mentioned above this leads to a quadratic explosion in product size