An Experiment in Datacenter Power Use Reduction

  • Wed 13 March 2019
  • misc

As previously mentioned, ClueTrust has a standard SmartOS server deployment of a DL160g6 with 2 x Intel L5520 Nehalem EP CPUs, 72gb of RAM in the form of 18 x 4gb ECC DDR3 sticks, and 4 x WD Red drives of various sizes depending on when we installed the particular machine.

These machines are generally up to our requirements (despite a weak disk subsystem) but getting up there in years. The L5520 was released 10 years ago and our particular boxes are 8 or 9 years old.

I was content to leave well enough alone and concentrate on our "next generation" server specs but we are up against our power budget in colo and have gotten pricing for a power increase that is uncompelling. Consequently I gave some thought to where we could go in terms of cutting our power use - even a few watts per machine adds up and contributes to fitting within our power envelope.

The mechanical socket for this CPU is called LGA1366 aka Socket B (yes, there are actually 1366 connections to the bottom of the chip!) and it spans a couple of generations of CPU.

I got to wondering if I could go to a later processor with more cores and cut down to running a single die. I could cut the memory stick count in half by going to higher density memory (8g sticks vs. 4g). The latter is unfortunately not optional since unlike the PCI slots, the main memory is tightly tied to each die in most current NUMA designs. In short, if you get rid of a CPU die, you get rid of its associated memory too.

In my searches, I happened upon the L5640. Same TDP (which is the engineered not-to-exceed power draw of the CPU) and clock speed as the L5520, but two more cores so I'd only be down 25% if I went to a single die. Which is fine; the long pole in the tent on our systems is the disk subsystem, not CPU load. They are supported from the factory in this generation of server by HP, so it should be drop-in.

As an added benefit, these are Westmere-EP CPUs, which gets across the line to support for bhyve on smartos - for whatever reason, our Nehalem CPUs won't work despite having VMX and EPT support.

Pleasant surprise: the L5640 has the AES-NI instructions, which OpenSSH and all of the FOSS TLS/SSL libraries know how to talk to. The old L5520 does not.

How much for the CPUs? $27.99 for a pair on eBay. How much for the RAM? I have plenty of 8gb sticks in my spares box and they seem to be less than $15 apiece on eBay.

Installation went without a hitch from a hardware and BIOS perspective. The only difficulty was that I ran into a regression in the CPU identification code that affected older Westmere processors on the particular version of SmartOS that I was running. Talk about bad luck. Fixed by upgrading to the very next release.

This test was on the machine in my basement, where we are at 120v, using a Kill-A-Watt for instrumentation, so bear this in mind when thinking of significant digits (accuracy and repeatability is actually fairly good on cheap test equipment here in the future). In this case, the disks were 8TB WD Reds (WD80EFZX).

With the old CPUs the system was drawing 1.37a (165 watts) after the disks settled, with two KVM virtual machines running. The older KVM in SmartOS is both a performance issue and a resource hog, which informs our interest in running bhyve.

Once I shut down the KVM virtual machines, leaving four LX zones and 15 OS zones (many of which are tftp servers and similar "insignificant load" guests) the draw dropped to 1.25a (150 watts).

After I swapped to the new CPU and memory configuration, the system was drawing 0.89 to 0.91a or 108 watts with the same (no KVM) load.

About 40 watts of savings doesn't seem like a lot, but when you multiply by 5 (or if we're lucky, 7) machines where the upgrade/downgrade could be performed, things start to look very interesting. 200-280 watts would be enough to get us out of being complained at for power draw. It might have a slight salutary effect on my UPS runtime at home, but at 13.6 percent load we are hardly stressing it and likely down in the "inefficient because it's loafing" zone.