If you’re into DevOps, Docker is one of those tools you’ll encounter sooner or later. It’s pretty awesome. But getting started with it can be a tad difficult, so I’m here to help with that.
I’ve been on a bittersweet honeymoon with Docker, and there are certainly things I wish I’d known about when I first started out. One of the greatest benefits of learning from the Internet is the massive amount of information that is available. One of the greatest downsides? The massive amount of information that is available.
A downside of learning things on your own is that there is just so much content out there, but it’s up to you to organize and process it all into something you can understand….and sometimes, after banging your head against the wall to figure out a strange bug, you realize that the information is outdated!
So in this post, I’ll give a quick and simple introduction of Docker, what it is capable of, and how you can get started with it as of Docker version 1.9.0.
Table of Contents
- Applications of Docker
- The Problem
- Docker: Containerization vs. Virtualization + Configuration Management
- Getting Started
- Wrapping Up
- References and Further Reading
Applications of Docker
There is no such thing as a tool that can do everything, and there is no magical panacea that can solve all of our problems. Docker is no different. Like any tool, there are some use cases that better suit it, and other situations when it is ill suited. You wouldn’t use a hammer to dig a hole, or a shovel to screw a nail. Whenever you learn a new tool, instead of thinking, “Hey, I wonder if I can use it to do ____?” a better question to ask would be “What is this tool best suited for?”
Applications of Docker that really make it shine include using it for:
- Ephemeral (temporary) isolated processes (e.g. Codepen, Codeship)
- Short running tasks, maintenance scripts, or worker processes that receive then process a workload
- Running and deploying stateless applications
- Web frontends
- Backend APIs
- Running isolated environments with imposed CPU/memory limits
However, because of the ephemeral nature of Docker containers, it is not a great choice for applications that need to store state/files, though it is possible to mount folders/volumes from the Docker host into the container.
Imagine you have a web application that involves an API server, Redis, MySQL, and a web app server. All of those different components may have their own dependencies and require a unique configuration to fit the needs of the application. It might be easy enough to tweak and hammer away at things on your local development machine until everything works, but how can we be sure that what will work in development will work in production as well?
The problem here can be broken down into three parts:
- How can we be sure that our development environment is the same as our production environment?
- How can we be sure that all of our configuration changes and the tweaks applied in our development environment will be replicated faithfully in the production environment?
- How can we deploy our code/application to our production environment, which may involve multiple servers?
Virtualization: Development Environment = Production Environment
By using virtual machines to run the same Linux/kernel in development as in production environment, any of the configuration and tweaks that work in the development virtual machine should work on the production server as well. (Running Ubuntu 14.04 on the production server? Then use an Ubuntu 14.04 virtual machine for development.)
You can think of it as being similar to a stone carver trying carve two identical sculptures. If the carver started out with two very different stones, many tweaks will be necessary in order to get them to look similar––and the corrections applied on one might not work on the other!
Virtualization is much like using two identical pieces of stone so that the starting material is the same.
However, now that we have a starting development environment that is identical (ideally) to the production environment, that still leaves the second problem of actually making sure that the configuration/tweaks applied on our local, development virtual machine will be applied to all of our production servers as well.
Configuration management: Development Configuration = Production Configuration
Virtualization makes sure that the starting environments for development and production are the same. However, as we work on a web application, it is highly likely that special configuration and tweaks will be necessary to get it running smoothly and to get the app up and running (for example, additional packages etc).
Configuration management tools like Chef and Puppet make sure that the changes applied to the development environment are also applied to the production environment as well.
Using our analogy of the stone carver, configuration management is much like reapplying all the carving applied to one stone to the other so that both sculptures will be identical. But, this only works as intended when used along with virtualization to ensure that the starting environment (the starting material) is the same.
Then finally, once the development and production environment has been made identical, your codebase can be pushed to your production servers using an orchestration tool like Capistrano. This works, but Docker takes the same problem and approaches it in another way.
Docker: Containerization vs Virtualization + Configuration Management
To recap, the virtualization + configuration management workflow involves:
- Ensuring that the development starting environment = production starting environment (Vagrant)
- Reapplying configuration applied in development to production (Chef, Puppet)
- Orchestration to push changes in code to all production servers (Capistrano)
The Docker workflow, however, involves:
- Building an image on your development environment that runs your application (along with any necessary tweaks)
- Pushing the image to a registry (or tarballing it to a shared location)
- Orchestration to pull the new image onto your servers and spin up new containers based off the new image, which is the same image that was built in development
Instead of reapplying the changes you make in your development environment onto your production servers, any changes you make in the Docker image (analogous to virtual machine images)––whether in your local development environment or in a hosted registry––can be saved into that image, which can then be deployed onto the server.
That is, the image you work with in development will be the same image shipped into production, with all of the saved changes already applied; there is no need to replay those changes in every single production server.
To do this, Docker uses the concept of containers.
Containers are, well, containers that isolate processes and filesystems, acting in a similar fashion as virtual machines. However, a virtual machine includes the kernel in addition to the filesystem, but a container only includes the filesystem on top of a shared kernel.
Multiple containers (and their different filesystems, e.g. Debian/Fedora/Ubuntu) can run on a single, shared linux kernel. Each virtual machine has its own kernel, however, and different virtual machine instances cannot share the same kernel.
The implications of this are that:
- Containers are smaller than virtual machines; they’re also faster to start up.
- Containers can integrate with each other more easily since they run on the same kernel.
A useful analogy for the role of Docker containers is the shipping container. In the same way that shipping containers disrupted the way goods were loaded and unloaded onto ships and other means of transportation, Docker containers are poised to disrupt the way code is shipped as well.
Items that need to be shipped come in many different sizes and shapes. A piano, a car, a lamp, a candelabra… Packing all of these goods onto a ship is much like a game of Tetris––they need to be loaded and unloaded in a certain way, otherwise the result would be disaster. Needless to say, this process was very time consuming and inefficient, but this was the way goods were transported until the shipping container came along and changed everything, rendering the previous method obsolete.
With the shipping container, goods are packed into a standardized container. It doesn’t matter what size and shape the goods that were packed inside are; from the perspective of the loader/unloader, all containers look the same from the outside. As a result, dependencies between the shipped items are less tightly knit (the piano should be loaded first before the lamp, which should be placed behind the candelabra…etc) and transporting goods from one place to another became very easy.
Docker containers work in the same way. From the perspective of the system, your app is just another Docker container, and all configuration tweaks etc that your app needs to get running is isolated inside. Whatever is inside one container won’t influence the way another container needs to be shipped or loaded––from the outside, they’re standardized.
In other words, virtualization + configuration management remembers the method and order you load/unload all the items, but containerization makes the items you load look the same from the outside: it doesn’t matter what language the application within the container is in, the result is always a Docker image/container. As a result, they can be loaded in the same way.
This doesn’t mean that virtualization and configuration management are no longer necessary. Instead, Docker can be used as an addition to your DevOps toolkit. Configuration management tools can still be used, for example, to set up docker on new servers and push/pull images etc as needed, and virtualization can still an invaluable asset for consistency between development and production environments.
|Kernel of host shared between all containers||Kernel not shared|
|Isolated processes/filesystems||Isolated OS|
|Ephemeral/long lasting usage||Long-lasting use|
|Fast to get running||Slower to get running|
|A single base image can be used for multiple containers||A virtual machine image can have be used for multiple virtual machines|
|Make your goods a standard size/shape from the outside, then load them all the same way||Remember and load the goods according to the same recipe every time|
|Only one startup process––whatever you told it to when you ran the container||Many, many startup processes|
Docker server/daemon: The docker server/daemon runs on the host containers will run on and runs containers within as processes.
Docker client: The CLI that is able to connect to the Docker server and tell it what to do. This can run on your local machine or on the same host as the Docker server.
Image: A master “template” for creating Docker containers, analogous to virtual machine instances.
Container: An instance of an image. Multiple containers can be created from a single image.
Docker registry: A server that hosts your Docker images; it acts in a fashion analogous to repositories in that you can push and pull images from the registry. You can use a public registry like Docker hub, or you can setup your own for private images.
Docker Hub: A public Docker registry that is the default for pulling images. It provides many common base images such as Debian, Ubuntu, Fedora, etc.
Dockerfile: File that describes the steps needed to build a Docker image. Each line in the Dockerfile creates a new image/filesystem layer, which is cached between builds. To make building an image faster, put commands that are unlikely to include changes on top and commands with more churn on the bottom.
Docker Compose: A tool for defining and running multi-container applications. It uses a compose file (docker-compose.yml) that defines what containers your app needs to run. You will need to install it separately after Docker.
Docker Remote API: An API for the Docker daemon/server that allows you to make requests to/query the server for creating/editing or for information on containers/images.
The documentation on Docker is a great place to get started and find the installation instructions pertinent to your platform.
The Docker server/daemon runs on the host and runs containers within as processes.
From a networking perspective it acts as a virtual bridge between the network shared by the containers it runs and the host system. Containers have their own Ethernet interfaces and corresponding IP addresses connected to the virtual bridge via the docker0 interface. What this means is that since the containers are on the same network, they can easily communicate with each other; however, in order for the container to communicate with the host, it will need to go through the virtual bridge. As a result, when we want to make a process running on a port on the container (e.g. Nginx or Apache on port 80) available on the host as well, we will need to specify the port forwarding explicitly.
If you have Linux, the Docker host will be your base OS, but if you are on a non-Linux system, then Docker will create a new virtual machine (docker-machine) that will act as the host for your Docker server/containers. This is because Docker needs a Linux kernel to use as the shared base between the Docker containers.
What this means is that things are relatively straightforward on a Linux host. However, things can get quite gnarly on hosts that aren’t Linux.
On a Linux host, the Docker host = the base OS, making things relatively straightforward. Mounting volumes and port forwarding between the host to container is straightforward. The IP address of the host is the same as the IP address of the server, and a port that is forwarded from a container onto the host––for example, port 80––will be available at 0.0.0.0:80. You can go to the browser from your base OS to http://localhost or http://0.0.0.0 to see the result from the server running on your container.
Ah, such simplicity!…Especially compared to the alternative.
On non-Linux hosts, a virtual machine with a Linux kernel will be the Docker host, instead of the base machine/OS. This adds another complicating layer on top of our interactions with the Docker container.
Mounting volumes and forwarding ports is done not between the base MacOS/Windows host system and the Docker containers, but between the Linux virtual machine running on the host and the Docker containers. As a result, any volumes that are mounted from the host will be volumes inside the virtual machine. What this means is that if you’re on a Mac and you want to mount the folder ~/myproject into your container, it would involve:
- Mounting the volume from host OS (e.g. Mac/Windows) to the Docker host virtual machine
- Mounting the volume from the Docker host virtual machine to the Docker container
Furthermore, since the Docker host isn’t the base OS but a virtual machine running on the base OS, when ports are forwarded, the IP address for the available ports will not be that of base OS but of the virtual machine.
For example, let’s say you have a Docker container running Nginx on port 80. You want to make that server available on your base OS (Mac or Windows) for viewing/debugging, so you forward port 80 from the container onto your Docker host:
docker run --name="myapp" -p 80:80 -d nginx:latest nginx -g "daemon off;"
Then, you happily go to http://localhost from a browser on your base OS and see….
Well, that wasn’t what we expected. What went wrong?
On a linux host, things would go as expected. Going to http://0.0.0.0 or http://localhost would give us the result from the server. But on a non-linux host, we’re forwarding the port from the container to the virtual machine running Linux/Docker. As a result, instead of localhost, we have to go port 80 on the IP address of the Docker server virtual machine.
We can find the IP address of the virtual machine running Docker by:
docker-machine ip default
Which will return the IP address of the virtual machine named default, our Docker host that runs our Docker server/daemon. If our Docker host is running at 192.168.99.100, then we can see our application by going to http://192.168.99.100:80 in the browser from our base OS.
|docker-machine create –driver virtualbox default||Create a new Docker host virtual machine called “default” using the VirtualBox driver|
|docker-machine start default||Start a new Docker host virtual machine called default|
|docker-machine ls||Show running Docker machines|
|docker-machine ip default||Get the IP address of the Docker machine named default|
|docker-machine stop default||Stop the Docker machine named default|
|docker-machine restart default||Restart the Docker machine neamed default|
|eval “$(docker-machine env default)”||Set environmental variables for the Docker machine named default so that it is our Docker host. Try this if you get a “Cannot connect to the Docker daemon. Is the docker daemon running on this host?” error. If it doesn’t resolve the issue, run
To get an image:
docker pull debian:latest
to pull the latest Debian image from Docker Hub (the default registry) onto your computer. By default, Docker will first look for a debian:latest image in your local computer, and if it doesn’t find one, then it will look on Docker Hub for a corresponding image.
If you want to pull an image from a separate registry, you will have to login to the registry first:
docker login https://myownregistry.com:5000
Then, once authorized, you can pull from the repository as usual:
docker pull https://myownregistry.com:5000/node:latest
To see your available images:
You can also build your own images based on instructions in a Dockerfile:
docker build -t myimage:latest .
This will create an image tagged myimage:latest using the Dockerfile in the current directory (and using the current directory as the build context).
Checkout the Docker documentation for best practices for making Dockerfiles and for a reference of Dockerfile commands. In the future I will also be making a post on Dockerfiles as well and will update this post with the link once that is up.
|docker images||See all downloaded images|
|docker pull debian:latest||Pull an image from a respository|
|docker push myuser/debian:latest||Push an image to a repository. Myuser is a namespace corresponding to your username on Docker Hub.|
|docker rmi debian:latest||Remove image|
|docker build -t myimage:latest .||Build an image tagged myimage:latest from a Dockerfile in the current directory (using the current directory as the build context)|
|docker save myimage:latest > myimage.tar||Save an image to a tarball|
|docker import myimage.tar||Import an image from a tarball|
|docker commit -m “newconfig” mycontainer myimage:latest||Commit/save the current state of the container named mycontainer (or use container ID) to myimage:latest|
Once you have an image, you can create and run a container from the image:
docker run --name="myapp" -it debian:latest bash
Which will create a container named “myapp” from the debian:latest image interactively (-i) and with a pseudo TTY (-t)e, then run bash in the container. Once you’re done, you can exit by pressing Ctrl+C.
To see your running containers:
#only running containers docker ps #all containers, running or exited docker ps -a
Now that we have a container, how can we get a terminal into it? The reflexive solution would be to use SSH, but that’s not necessary…and is adding SSH into your container truly necessary for your app to run?
Instead, we can just use:
docker exec -it mycontainer bash
We will then be executing bash within the container interactively and with a pseudo TTY.
|docker ps -a||See all containers, running or exited|
|docker run -it –name=”myapp” debian:latest bash||Create a container based on an image, then start it and run the specified command within it (interactive mode, runs bash)|
|docker run -dit –name=”myapp” debian:latest /src/bin/startserver||Create a container based on an image, then start it and run the specified executable within it (daemon mode, runs the script specified at /src/bin/startserver or wherever you desire)|
|docker exec -it myapp bash||Enter into the container to execute commands within it|
|docker stop myapp||Stops a container (SIGTERM)|
|docker kill myapp||Kills a container (SIGKILL)|
|docker start myapp bash||Starts a container|
|docker restart myapp bash||Restarts a container|
|docker rm myapp||Removes a container|
|docker logs myapp||View the output/logs of a container|
Docker Remote API
By default, the Docker daemon will run on the unix port at /var/run/docker.sock. To get started with it, make sure that you have a cURL version greater than 7.40 so that the –unix-socket option is available. Then, we can use curl to interact with the remote API by:
curl --unix-socket /var/run/docker.sock http://containers/json
It is also possible to bind the Docker daemon to a TCP port at startup by:
docker daemon -H tcp://0.0.0.0:2375 -H /var/run/docker.sock
It will then be possible to access the remote API on the host publicly:
Obviously, however, exposing the remote API so publicly can be a security issue, though it is possible.
For more information on the Docker remote API and what it’s capable of, checkout the Docker Remote API documentation.
So, as a recap:
- Docker uses containers instead of virtual machines + configuration management to make continuous delivery easier.
- Docker is great for running ephemeral, isolated processes, restricting memory/CPU usage for processes, short running tasks, maintenance scripts, worker processes, and stateless applications.
- The Docker server/daemon runs on the Docker host (the Linux machine itself, or a Linux virtual machine running on Mac/Windows). The Docker command line interface (available through the docker command) interacts with this server.
- Docker registry stores Docker images –> multiple Docker containers can be run from a single Docker image
- Dockerfiles includes all the instructions needed to build a Docker image.
- The Docker remote API is available for getting information on containers/images on the Docker server and for creating/editing containers/images.
I hope that now y’all are as excited about Docker as I am! 🙂
Have any suggestions, tips, or spot an errata? Feel free to leave a comment; I’d love to hear from you! I’m still learning, so I’ll love to hear about your experience with Docker as well.