Gaige and I have been using Let’s Encrypt for creating certificates for use with SSL/TLS services since Lets Encrypt was an invite-only beta. Naturally this included expansion to our day job a year or a little more ago (more on that later). This is a bit of a travelogue involving our transition to the DNS-01 handshake method and flexibility and security concerns associated with it.

How does ACME work?

Let’s Encrypt uses a protocol called ACME to automate the signing of X.509 certificates. The automation and removal of human intervention is the whole point - without it, democratization of DV certs by driving the cost to zero would not be possible.

At a high level, ACME works like this:

Client: Hey, I’m going to ask you to sign X.509 certs for (list of one or more hostnames)

Server: Sure, here is a secret number for each hostname, you must put them in a place where only the person who controls the domain could and let me know.

Client: [puts the magic numbers in a place retrievable by the server]

Client: Hey Server, I put those secret numbers out there. Here’s a certificate signing request. Will you sign?

Server: [checks] Yup, looks good to me! Here’s your signed certificate!

ClueTrust ACME tooling timeline

Our approach (and tooling) has evolved over time. Originally, the widely promoted validation method was HTTP-01, and the leading method was various web server plugins or Certbot. Unfortunately, being fans of both unusual platforms, a precompiled plugin that made everything Just Work inside of our web server created more than one problem for us. What if we didn’t want to run a web server at all on a device?
Things like SMTP and IMAP run over TLS as well. Did it make sense to have the keys right out there with the certificate, so our revocation capability could be compromised at the same time as our certificate?

Besides, we are proponents of cattle-not-pets and redeploy-don’t-patch. We already had a workflow involving pushing out certificates from a central repository of truth with Ansible.

I finally settled on an arrangement involving acme-tiny - a client for ACME written in Python that assumed a local web server and access to a directory to put the HTTP-01 challenge in (http://example.org/.well-known/acme-challenge). The Let’s Encrypt folks would make a validating query, which would go to the server, where it would be proxied - either with a rule in nginx like so:

    server {
        listen       80 default_server;
        listen       [::]:80 default_server;
...
...
        location '/.well-known/acme-challenge' {
                proxy_pass http://ansible-jumphost.example.com/.well-known/acme-challenge;
                proxy_http_version 1.1;
                proxy_set_header Host $host;
        }

or in the case of servers that didn’t run a web server at all, either socat or haproxy running on port 80 and forwarding the query to the Ansible jumphost.

HTTP-01 drawbacks

This was kind of awesome but once the appeal of free X.509 certificates wore off, we got tired of some awkwardness in the workflow. To wit:

  • Hosts for which we were issuing certificates had to be reachable from the global Internet. This means at a minimum a public IP address, and either running wide open or filtering to permit the addresses Let’s Encrypt uses for validation based on observed behavior (unlike GitHub’s notification hooks source addresses, Let’s Encrypt didn’t publish this information last time I looked).

  • Moreover, the hosts had to be up.

  • Not only that, but the forwarder process had to be up and running. In the case of socat or haproxy this wasn’t such a big deal, but in the case of nginx this led to a chicken-or-egg problem - server won’t start without a certificate file, can’t get a certificate file without the server up. This resulted in splitting the Ansible role in two - bogusserver-keys which would copy in a set of self-signed certs to make nginx happy if there wasn’t already a file there, and realserver-keys which copied in the real certs if they existed yet. Awkward.

  • Lastly, it was a non-starter for appliance hosts that were not conducive to running plugins or having hacks made to their web server configs and really wanted certs copied in. Things like routers and OpenVPN server VM images and the like…

DNS-01

It turns out that there’s a better way. We can put the challenge in the DNS in the form of a TXT record. It has the following advantages:

  • Hosts for which you are issuing certs don’t have to be accessible from the global Internet (i.e. you can now issue certs for your internal infrastructure on RFC-1918 space or non-routed IPv6 space. More on that later.)

  • It stands to reason that if they don’t have to be on the Internet, they don’t have to be up either for Internet-connected hosts. So now we have the possibility of having good certificates generated for servers that haven’t been created yet.

  • We can have wildcard certificates (i.e. *.home.example.com).

We host our own DNS, so putting records into a zone and taking them out again was a matter of just using tsig-signed UPDATE messages. There’s an Ansible module for that, it turns out.

Iron Chef Secret Ingredient: CNAME

It turns out that mixing dynamically updated and static zones (we hand-edit most of our stuff) is a recipe for pain. But Boulder, the Let’s Encrypt CA, has no problem following CNAMES. Thus, one can create a zone which is all-dynamic-update and CNAME into it, along these lines:

Let’s say we want to get a certificate issued for www.example.org. We create a zone acme.example.org for which updates can be made automatically and for which there is no manually maintained content. Then, we set up a CNAME record as follows: _acme-challenge.www.example.org IN CNAME _acme-challenge.www.example.org.acme.example.org. Then we can issue updates to _acme-challenge.www.example.org.acme.example.org and they’ll be seen by Let’s Encrypt as valid responses to the challenge. This, by the way, even works cross-zone to different delegations out the bottom of the Public Suffix List, so you really only need one of these around to service all zones under one administrative control unless you fancy having more than one. Just keep in mind that the ability to put records in acme.example.org translates into the ability to issue certs for any host that has a CNAME pointed into that zone, so you might want to think about authenticating with hmac-sha256 or better rather than hmac-md5 like all the dated examples on the internet have. More here.

This is how our Ansible role works. We have developed a thin playbook workflow (which is pretty much the antithesis of the monorepo approach favored by some others), and the certificates get stored in the host_files subdirectory of the inventory_directory. There, they can encrypted with Ansible Vault as a layer of protection against accidental data spills.

Certificate Transparency

My spidey senses are tingling. Someone or some people are about to beat on my door right now and yell at me for putting records in the public DNS that relate to my internal servers. After all, _acme-challenge.supersecret.int.example.com IN TXT blah is a pretty good clue that there is a host called supersecret.int.example.com even if you don’t put the actual IP address in your external DNS right? And putting your internal hosts in externally visible DNS is not a “best practice”, amirite?

Well, I’ve got some bad news there. First off, security in obscurity isn’t security at all. But suppose it was… is there any information being leaked here that isn’t already being leaked via another channel?

It turns out that if you’re using a public CA, then certificate transparency is telling all your secrets. Brought about by the Comodo compromise several years ago (which was hard to figure out that someone had issued new certificates on top of), CT publishes the public information associated with every new X.509 certificate generated in near real time. Pretty much all the public CAs are doing it too; the popular browsers have demanded it since 2016.

Hackers who wear all shades of hats are hip to this. You can expect any penetration tester worth the money you’re paying them to start here. Since the horse is already out of the barn I’m not inclined to think of this as a big deal. Note that the _acme-challenge name is a completely different domain name than the host domain name, and there is no reason whatsoever to have an A or AAAA record in the public DNS at that node.

Don’t want Certificate Transparency? Fine. Run your own CA. Plenty to choose from. This is its own kettle of fish of course, since you’ll need to populate your own trust anchor store, and may be more painful than it’s worth or may be a complete non-starter if you have external customers.