Five practices for serverless & distributed systems productivity

To productively build serverless & distributed systems, we need to adopt new practices, some which may seem counterintuitive at first. This post will suggest five concrete practices to up your game.

Let's jump in, roughly in order of importance:

1. Prefer running your system in the cloud over local emulation

Local development is sacred cow for many developers, being able to deploy, run and test an application or system on your laptop. But the reality of modern distributed systems is that some things cannot easily be emulated locally, if at all.

It is not for a lack of trying: systems running on Kubernetes are often emulated with Docker Compose or Minikube, with varying levels of success. For AWS Serverless, we have AWS SAM and LocalStack.

There are however a few issues with this type of emulation:

The cost of maintaining emulation is high, since you effectively maintain two stacks - one for local development, and one for "real" deployment.
Emulation isn't the real thing: you will find configuration drift, small or large differences in the behaviour of your stack.

In summary, our experience of trying to maintain emulation is that the costs far outweigh the benefits. That effort should instead be put towards the ability to deploy local code quickly to the cloud environment, or where possible, connect a locally running process to a real environment in the cloud.

One of the great benefits of Serverless in particular, is that the cost of creating and provisioning Feature Environments is approaching zero. Moving local development to the cloud is a low-cost proposition.

2. CI/CD Pipelines are not enough, local deployment automation is crucial

Given our stance on local emulation, the next instinct to suppress is that of relying on CI/CD to manage deployments to feature environments. If we rely entirely on CI pipelines for our ongoing development work, we end up wasting large amounts of time waiting for CI pipelines to build and deploy.

If instead, CI & local development workflows can share as much as possible of deployment infrastructure, we should be able to reduce/remove this bottleneck on development, while also minimizing the duplicated effort of maintaining two types of automation.

In summary, any developer should trivially be able to:

Deploy or connect locally built resources to their own feature environment in seconds with a single command from their laptop.
Deploy only the unit of deployment which has changed.
Have a development experience that is practically indistinguishable from "local development".

3. For AWS Serverless & "Function-as-a-Service" - monolithic functions are OK

The first thing many teams do when they first start using AWS Lambda, is that they create separate deployable functions for every type of function invocation. For instance, a REST API gets a function for every single endpoint on the API. However, if you think in terms of Domain Driven Design, this might mean you end up with a number of functions that logically make up a single Bounded Context.

There are also practical considerations: as an example, AWS CloudFormation has a default limit of maximum 200 resources per stack. When you start adding up the multiplicative effect of the resources required for a Serverless application, you quickly realise that this is a limit that can be easily reached. Sticking to a service per Bounded Context, that acts as the target for multiple types of Lambda invocations is a sensible thing to do. Doing this will also reduce the automation and coordination overhead.

4. Consider adopting a monorepo, with tooling appropriate for monorepos

Let us start with a caveat emptor: monorepos without using appropriate tooling can be a disaster of slow CI pipelines & low productivity. The upside is that some amazing and battle-tested tooling for monorepos exist these days (a personal favourite is Bazel, which originated from Google).

Without a monorepo, building distributed systems can be painful.

With dozens of repositories, code navigation for larger changes spanning multiple services become painful. Another thing that quickly becomes painful is dependency management, code sharing & reuse. It is not uncommon to find that different components have different, mutually incompatible dependencies, which become painful to upgrade.

A monorepo negates these pains, while also making things such as security audits easier to conduct and address.

However, to avoid the "rebuild the universe" problem with monorepos, you need tooling that solves three problems:

Change detection & dependency-graph tracking.
Dependency-graph based rebuilds (rebuild only what has changed and what is invalidated by the change).
Build caching.

5. Implement all three pillars of observability

Observability in control theory is defined as the ability to infer the internal state of a system by from knowledge about its external outputs. In practice, this is the triumvirate of event logs (log aggregation & analytics), metrics (for alerting) & tracing (driving visualization). Most organizations today skew heavily towards event logs only, with maybe some metrics, but very few do all three well.

In a distributed systems world, doing all three pillars of observability well means that the time to detect, find and address bugs and other issues can be cut down to a fraction of what it would otherwise be. Furthermore, great observability will also improve developer productivity, since we can better understand the state of our entire system.

This is perhaps an area where AWS Serverless stands out as a leader. While the Kubernetes eco-system is filled with many options of various complexity and quality, AWS gives us an easy way to achieve a high level of observability at low effort and cost. CloudWatch Logs, CloudWatch Metrics and AWS X-Ray can provide observability to a Serverless architecture at a relatively low threshold of effort and learning.

Conclusion

It should be obvious by now that Serverless & distributed systems require great discipline in deployment automation. It is unfortunate that many still make the distinction between deployment being an "ops" concern, separate from development.

Dev & Ops in modern systems are intertwined, the level and quality of automation has a great impact on DevEx (Developer Experience), and thus developer productivity. It is not a concern that can be postponed or thought about after the fact: it requires effort initially, and disciplined refinement throughout. If this is done well, you will be able to develop with a speed, reliability and level of productivity that will run rings around the competition.

We will return to this subject in the future to show what a practical Serverless toolchain and reference architecture could look like. Feel free to sign up to our email list below to get notified when we do!