Using Docker Machine for Canary Testing

April 13th, 2015

Aaron Welch

Aaron Welch

SVP, Product
We have a tough job here at Packet. Making bare metal servers provision on-demand in 5 minutes or less is a tricky proposition.  Add to the mix some of the unusual and cool things we are doing at Packet, like bonded line-rate network interfaces or programming customer networks with software overlays, and things start getting real.

We have a tough job here at Packet. Making bare metal servers provision on-demand in 5 minutes or less is a tricky proposition. Bringing a server up from power-off and no operating system to booting into an OS with functional network and user access is complicated. It involves configuring and orchestrating many different systems, many of which don’t really like to talk to each other (*cough* IPMI *cough*) which can make troubleshooting extraordinarily difficult. Add to the mix some of the unusual and cool things we are doing at Packet, like bonded line-rate network interfaces or programming customer networks with software overlays, and things start getting real.

We’ve setup a nice suite of tools that help monitor, diagnose, test and fix problems as they arise: service monitoring, aggregated logging, unit and functional tests, exception reporting, and alerts all play a critical role in exposing problems and helping us fix issues in the stack. But they don’t give us a consistent indicator that all things in the life cycle of a device are working properly at all times. We need a way to ensure that everything works as expected from an initial “device create” API call, to building a correctly configured environment for the end-user, to a clean deprovision once the device is terminated, across all operating systems and server configurations, at all times. We need a canary that can tell us when something has gone wrong.

Docker Machine

The solution, as can happen, presented itself while I was working on building the Packet driver for Docker Machine.

As it states in the README, Docker Machine makes it really easy to create Docker hosts on your computer, on cloud providers and inside your own data center. It creates servers, installs Docker on them, then configures the Docker client to talk to them. In short, it makes it really easy to create and manage hosts running docker, regardless of platform or provider. For example, if I want to create a host called “funtimes” running docker on Packet, I simply do:

[email protected]:~$ docker-machine create -d packet --packet-api-key=xxxxxxxx --packet-project-id=xxxxxxxxx --packet-os=ubuntu_14_04 funtimes

and once complete, I can then ssh into the machine using the docker ssh key pair generated during setup like so:

[email protected]:~$ docker-machine shh funtimes

 

and if I setup the docker environment, I can issue docker commands on the remote hosts on my local docker machine host:

[email protected]:~$ eval "$(docker-machine env funtimes)"
[email protected]:~$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
[email protected]:~$ docker run hello-world
Unable to find image 'hello-world:latest' locally
WARNING: open /home/welch/.dockercfg: permission denied
511136ea3c5a: Pull complete
31cbccb51277: Pull complete
e45a5af57b00: Pull complete
hello-world:latest: The image you are pulling has been verified. Important: image verification is a tech preview feature and should not be relied on to provide security.
Status: Downloaded newer image for hello-world:latest
Hello from Docker.
This message shows that your installation appears to be working correctly.
 
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (Assuming it was not already locally available.)
3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.
 
To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash
 
For more examples and ideas, visit:
http://docs.docker.com/userguide/
[email protected]:~$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
hello-world         latest              e45a5af57b00        3 months ago        910 B

 

Neat!

Machine also supports creating nodes and having them join a Docker Swarm pool, and although it only supports Ubuntu at the moment, there is active work on adding additional operating systems (e.g. CoreOS, RancherOS, etc).

Packet Canary

As I was developing the driver, I had the realization that this was exactly what we wanted for testing the end-user life cycle. Docker Machine creates a server using our API, creates an ssh key pair that is installed on the server, logs in with it, and installs docker. Once setup, it’s trivial to execute remote commands over ssh, and then destroy the device using “docker-machine rm”.

Nice!

Once the machine driver was working well, I made a little program in Go to ping one of our channels in slack, and made a docker image for it using Quay.io. I then whipped up a bash script (canary.sh) which uses docker machine to create a server with a hostname based on the current time and operating system flavor, run my slack pinger docker image on the new host, and then deprovision the host. The script logs these events to logentries where we have two alerts configured: the first triggers if a failure is logged, and the second triggers if provision test log entries *don’t* appear after a period of time. This ensures that we’ll be notified if the canary logs a failure, but also if something has happened to the canary which is preventing it from logging, or has stopped it from working altogether.

Slack is great!

We plan to expand the tests that run on the host so that in addition to pinging slack, we do things like checking to make sure the network configuration is setup properly, that the disk / cpu and ram all report what they should, and that the host is getting the expected metadata from our metadata service. We also plan to use Docker Machine and a little more sophisticated application to do full rack burn-in and benchmarking on new inventory (we get servers in bunches of 120 per rack, so manually doing this is out of the question).

Hope you enjoyed the post!