Certbot is pets, not cattle


  • Sun 03 March 2019
  • misc

I've long been a fan of LetsEncrypt, though not as early an adopter as I could have been since early tooling was extremely Linux-centric and that's not where we're at, at least in our shop.

A substantial fraction of LetsEncrypt's success has been via the use of CertBot. If you're not familiar with it, it's basically set-and-forget x.509 certificate generation and renewal, by default via the http-01 authentication method.

I don't want to talk anyone out of using Certbot. For degenerate case small sites or places with no automation (the target audience for LetsEncrypt), it's a great choice.

It's not without its drawbacks and tradeoffs though. For starters, use of Certbot is fundamentally incompatible with immutable infrastructure and the cattle-not-pets philosophy, where the source of truth to rebuild a (server|firewall|load balancer|whatever) lives elsewhere, and they're always subject to being rebuilt on a whim.

Unless special care is taken with certbot, the only copy of both the account key and the certificates lives on the server itself. What if your certificates got compromised and you wanted to revoke them only to discover that the bad guy deleted both the certificates and the account key from your server?

Back in the early days of LetsEncrypt, you couldn't create nearly as many certificates per domain name under a Public Suffix as you can today, and there was the very real risk that you'd find yourself penalty-boxed by rebuilding all of your VMs at once, to address some kind of emergent vulnerability or something of that ilk.

To address this problem, I've never used Certbot and until recently had been using http-01 with a little Python-based ACME protocol speaker called acme-tiny and passing through the /.well-known/acme-challenge subdirectory (using nginx) or the entire socket for VMs that didn't need a web server (using socat or haproxy) to a web server running on the Ansible jumphost. In this manner, I was able to end up with certs on the provisioning (Ansible) server as a central source of truth and then push them out, repeatedly if necessary, to VMs as I created or reprovisioned them.

I could have gone off-label and run certbot on the central jumphost which would have made a certain degree of sense too, had I not found acme-tiny agreeable with my pre-existing workflow for putting x.509 certificates on VMs.

I'm about to change out that http-01 handshake for dns-01. But that's a different discussion and a different blog post. The salient point here is that, unless you jump through hoops, certbot is incompatible with a cattle-not-pets methodology.