Wednesday, June 22, 2016

Getting to grips with Docker

A while ago, I described how we took an existing application build script and managed to run it inside Docker.

Having played with this inside Docker a little more, it's probably worth scribbling down a few notes I happened to stumble across on the way.

I'm looking at having 2 basic images: as a foundation, Ubuntu with all the packages we want added; then an image that inherits FROM that with our application stack built and installed (but not configured). The idea behind this layering is simply to separate the underlying OS, which is fairly standard, from the unique stuff that is all ours.

Then, you create an instance image from the application image, simply by running a configuration script that you COPY in. Once you've got a configured application instance, you create a volume container from it, and then run the application image using the volume(s) from the instance image. You keep that volume container around, just as a home for your data, essentially forever. And you can run multiple application instances from the same base image, you just need to configure and create a volume container for each instance.

That's a brief overview of the workflow, now some tweaks and pitfalls.

We're using Ubuntu, so the first step is to run apt-get with our list of packages. This originally created a 965MB image. It's not going to be small, we need both java and a full development stack to create our application.

However, some of the stuff installed we'll never need. Using the --no-install-recommends flag to apt-get saved us about 150M. The recommends list is stuff that might be useful, but not essential. But remember - our Docker container is only ever going to run a fixed set of applications, so we'll never need any of the optional stuff. The only thing to be careful of here is if you accidentally depend on something in the recommends list without realizing you're only getting it indirectly.

We can do slightly better in terms of saving space. We use postgresql, but get it to store the database files in our own locations, so we can remove /var/lib/postgresql/9.X and what's underneath it, saving almost another 40M.

One thing to be aware of is that the list of packages in the official Ubuntu Docker image isn't quite the same as you would get from a regular Ubuntu install. There are one or two packages we didn't bother adding because they were there in a regular install that we need to add with Docker. Things like sudo and wget are on this list, so I needed to add those to the apt-get list.

Another thing to be aware of is that because you're building images afresh each time, you aren't guaranteed that new users will always get the same uid and gid. If you change the list of packages (even by just adding --no-install-recommends), this might change which users exist, and that affects the uid assigned to later users. I got burnt when a later base build ended up giving the postgres user a different uid, so it didn't own its database files on the persistent volume any more. I think the long term fix here is to create the users you need by hand before installing any packages, forcing the uid and gid to known values.

In order to keep image sizes small, you'll often see "rm -rf /var/lib/apt/lists/*" in a Dockerfile. In general, deleting temporary files is a good idea. This includes any files created by your own software deployment stage. Cleaning that up properly saved me another 200M or so in the final image. (Remember to clean up /tmp, that's part of the image too.)

It isn't strictly related to Docker, but I hit an ongoing problem - in some environments I ended up blocking on /dev/random. Search around and you'll find a lot of problems reported, especially related to java and SecureRandom (or, in our case, jruby). Running Docker on my Mac was fine, running it on a server in the cloud gave me 15-minute startup times. The solution here is to add or to your java startup (or JAVA_OPTIONS).

(And, by the way, this illustrates that while Docker can guarantee that your app is the same in all environments, it doesn't magically protect you from differences in the underlying environment that can have a massive impact on your application.)

My application listens on ports 8080 and 8443, which I map on the host to the common ports, with

docker run -p 80:8080 -p 443:8443 ...

This works fine for me in testing, when I'm only running one copy and simply point a browser at the host. Networking gets a whole lot more complicated with multiple containers, although I think something like a load-balancer in front might work.

I've been using the Docker for Mac beta for some of this - while at times it's been beta in terms of stability, generally I can say it's a very impressive piece of work.

No comments: