Is Nomad a better Kubernetes?

CI/CD

In 2022 Kubernetes reigns supreme in mindshare among workload schedulers/orchestrators. But does mindshare and popularity mean it is the best one out there? There is another kid on the block, Hashicorps Nomad, that might just give Kubernetes a run for it’s money. Let’s take a look and compare!

Why do we need schedulers in the first place?

If you run a small- or midsized shop, you might not. Serverless offerings, or even just running a set of servers like EC2 or Google Compute Engine might be enough (we would lean towards Serverless in this case, to avoid having to baby-sit servers ourselves).

But as organisations grow, the economics and custom requirements may outgrow these options. While we here at Chaordic have a bias towards Serverless, we acknowledge that there is no one-size fits all solution: at certain points of scale, or specific non-functional requirements building your own in-house infrastructure competence, and in some cases even infrastructure might make sense.

If you hit this scale, or have very specific non-functional requirements that push you towards building your own platform based on Infrastructure-as-a-Service or even on-premise data centers, a scheduler makes sense. Why? The job of a scheduler is exactly what the name implies, schedule software jobs to be run optimally on you available fleet of servers. If you have a set of servers, but no scheduler like Nomad or Kubernetes, you would be forced to do the following on your own:

Find the optimal server to place the execution of software on, for each deployment, given existing resource utilization of other processes & hardware.
Consider the optimal placement, also with regards to resilience and redundancy.
Manually remedy whenever a process crashes.
Manually replace jobs if a server fails or crashes.
Manually configure your deployments so that processes run securely in isolation from each other.
Manually keep track of- and update configurations of network locations of different processes that may depend on each other.
Manually consider load balancing concerns.

”Why do you keep saying manually? I can write some code/automation to do that!” someone might say. The answer is simple: congratulations, if you do, you have just started writing your own scheduler. Chances are you’ll do it worse than Kubernetes or Nomad, which have hundreds of man-years of work poured into them collectively.

Additionally, once you start down the path, you'll soon realise all the other things you need to build, and then you are truly down the rabbit-hole of writing your own scheduler, that, if you're lucky, many years from now will probably look a lot like Nomad or Kubernetes.

Enter Hashicorp Nomad

Unless you’ve been living under a rock for the last five years, you have probably heard of Kubernetes. It’s built a position as practically the default scheduler in the industry - it has vendor support, cloud provider support, mindshare, eco-system, you name it.

But Hashicorps Nomad scheduler has quietly been existing in parallel for a long time. Chances are you are more familiar with some of Hashicorps other software, such as Terraform, Consul or Vault. If you know them, you also know that Hashicorps tools are easy to use, easy to operate and more often than not, have an excellent developer experience. What are some of the things that set Nomad apart?

Integration with Consul & Vault: As you’d expect, Nomad integrates easily & natively with both Consul & Vault. This practically solves the concerns of service discovery, load balancing, service mesh (if you need it) and secrets management.
Single binary, that is easy to operate: Nomad runs as a single binary, the same binary for the servers as for the clients (nodes in K8S speak). Compare to Kubernetes which requires a number of running components.
Excellent dev experience: Almost a sub-point to the previous point, but you can have a truly identical experience on a developer laptop as for your cluster. If you have ever tried to duplicate how Kubernetes runs & deploys on your cloud of choice on a laptop, you know Kubernetes has no real answer here, despite K3S, Minikube and a number of other purported solutions.
Pluggable runtimes: Nomad supports containers, binaries, VM’s, Java processes and even Windows and Mac as host systems, allowing running Windows and Mac native software on Nomad. This might be a game changer for some businesses.
Scalability: Kubernetes supports clusters up to 5000 nodes and 300,000 containers. Nomad has been tested with clusters exceeding 10,000 nodes and two million(!) containers.
Easy multi data-centre federation: If you want to run federated clusters across multiple regions and potentially multiple cloud providers, Nomad & Consul not only make this possible, but compared to Kubernetes, relatively easy.

What are the disadvantages?

We have looked at some of the differentiating points of Nomad compared to Kubernetes. Nomad certainly stacks up well, especially in key areas such as developer experience and ease of operations.

The elephant in the room is obviously that Nomad is a single-vendor product, even though it is open source. If Hashicorp decides to stop development, progress would slow down significantly. The good news is that Nomad seems integral to Hashicorps future plans, so it seems unlikely. They also have a pretty high profile list of clients who use Nomad. For us, the single-vendor factor is the biggest negative.

Another negative is that Nomad only has a fraction of the eco-system that Kubernetes has through the Cloud Native Foundation. This may be both a positive and negative: we will surely miss some excellent software, but at the same time, do you really need it? The Kubernetes eco-system is truly a bewildering jungle of choices, some good, some pointless, some seemingly good ideas but poor implementations. So before making this a deciding factor, consider what exactly you would miss, and whether these are really show-stoppers.

A third negative is the obvious lack of managed Nomad services from major cloud providers. GKE (especially Autopilot), AKS & EKS are options that take Kubernetes from the bewildering to palatable in terms of compute platform. We’d love to see a well thought-out, simple to setup, safe by default Nomad, Consul & Vault PaaS cloud offering (Hashicorp, are you listening?). That being said, Nomad is only a fraction as difficult to setup as on-premise Kubernetes, so it takes the edge off lack of cloud-vendor support.

What about vendor lock-in?

Hashicorp is a product from a single-vendor, but open source. Kubernetes is also open source, but has support from a wider array of industry participators. Your first glance might indicate this means Kubernetes means less vendor lock-in.

On closer inspection, this may not be entirely true if you run Kubernetes on a cloud providers managed service (and if you have made the decision to run Kubernetes, you probably should).

Have you tried moving from Kubernetes on AKS to GKE? From on-premise OpenShift to EKS? Or any other combination? If you have, you will be painfully aware that different Kubernetes vendors have as much lock-in as any other two products. Ingress controllers work in subtly different ways, networking isn’t quite the same, and a thousand subtle differences makes seamless migration between Kubernetes providers a pipe dream.

When should we consider using Nomad?

There are a few use-cases where we see Nomad as a definite go-to scheduler:

You have already bought into Hashicorps other products like Consul & Vault
You run on-premise data centers: If you are not moving to a public cloud, and won’t run a GCP, AWS, or Azure managed Kubernetes cluster, you’ll probably be better off running Nomad.
You need to run Windows or Mac workloads: Nomads ability to run both Windows and Mac workloads might be a clincher for many.
You need to run clusters at massive, global scale: If you are hitting the scale where Kubernetes simply falls apart, the choice is being made for you. Likewise if you need to globally federate your data centres.
You don’t want to run Serverless or any managed container-service, but you want developers to run services: If you lack dedicated ops resources and/or want developers to operate the services you build, Nomad might be simpler to both operate and develop against.

Conclusion

Many organisations jump straight to Kubernetes when it comes to their compute orchestration, simply because it is what the industry is doing. We like to challenge this in most instances unless you already have a large investment in Kubernetes.

The advantages of Serverless are obvious, but if Serverless or other managed services are not an option for one reason or another, Nomad provides yet another alternative. An alternative that in many cases does the core job of Kubernetes better than Kubernetes does it.

Image credit: Nomad on AWS