# Docker
[a container runtime for software deployments]
## Concepts and Introduction
by [João Rocha da Silva](, based on ['Using Docker: Developing and Deploying Software with Containers']( by *Adrian Mouat* and [other sources](#references).
## Agenda
.indexpill[[Virtual Machines](#virtual-machines)]
.indexpill[[Virtual Machines vs. 'bare metal'](#virtual-machines-vs-bare-metal)]
.indexpill[[VM Hypervisor](#hypervisor)]
.indexpill[[Why Docker?](#why-docker)]
.indexpill[[VMs vs Containers](#vms-vs-containers)]
.indexpill[[Example container (Web Server)](#example)]
## Virtual Machines
### What are they?
- A virtual machine is a software that *simulates* a physical computer's hardware and software components
### Host vs. Guest
- Several **virtual** machines can run in a single **physical**, or *bare metal* machine
- Virtual machines are typically called "guests"
- The physical machines where they run are called "hosts"
## Virtual machines vs. 'bare-metal' - Advantages
- .good[Portability and hardware-agnosticism]
- The same VM can run in computers with very different hardware and software configurations
- The [*hypervisor*](#hypervisor) provides an abstraction layer between the virtual and physical hardware configurations
- .good[Faster disaster recovery]
- Virtual machines can be backed up and restored simply by copying and pasting their *virtual* hard drives, which are simple files on the host's file system.
- .good[Isolation]
- Improved security for shared hardware machines-several users can have full administration privileges inside their own VMs, but without any access to the host
- VMs make it easier to set up a *multi-tenant* environment, where resources are shared among the various VMs, which can even belong to different people.
## Virtual machines vs. 'bare-metal' - Disadvantages
- .bad[High resource consumption]
- The physical machine needs to virtualise everything, including the operating system - RAM usage is the same as that of a 'bare-metal' machine. The host needs a lot of RAM to run several VMs at the same time.
- Running many VMs on the same host can slow down even powerful servers, because the access to hard drive needs to be split among multiple concurrent and random accesses--- this is especially hard on mechanical hard drives, not so much on SSDs.
- .bad[Lack of access to some low-level functions]
- If your application needs direct access to some low-level / hardware capabilities (such as 3D acceleration), those may be unavailable.
- .bad[Backups need to include the entire virtual machine]
- Very large files, as the virtual machine "virtual hard drive" takes as much space as an entire hard drive of a 'bare-metal' machine (~hundreds of GB **each**!).
## The hypervisor
- Software that powers the virtual machines
- It provides virtual networking and storage layers (virtual network cards and virtual hard drives)
- Controls how much % of CPU and RAM of the host is given to each of the guests
.tiny[[VMWare Workstation by VMWare](]
.tiny[[VirtualBox by Oracle]( -
(Open source hypervisor)]
## Why Docker? (1/2)
- .good[**Flexibility**]
- Practically any application can be containerized
- .good[**Lightweight**]
- Containers share the host Kernel, without virtualizing an OS for every application, saving a LOT of resources when compared to using Virtual Machines
- .good[**Portability**]
- Ensures portability of execution environment on any machine
- Application, pre-requisites and dependencies are packaged together
- .good[**Scalability**]
- From a single container on your laptop or thousands in the cloud, it is all the same technology
- .good[**Loose coupling**]
- Containers are highly self-sufficient and encapsulated, and can be updated individually without upsetting others
- .good[**Security**]
- Containers apply aggressive constraints and isolations to processes without any configuration required on the part of the user.
.tiny[.footnote[Source: "Orientation and setup", by [Docker](]]
## Why Docker? (2/2)
- .good[**Elasticity for the cloud**]
- In the cloud, computational resources should be purchased as needed
- Too many users for too little computational power → poor system performance
- Too many resources for too little users → waste of money
- Docker makes it easier to scale applications up to meet peak loads, and then scale down during downtime
- Spin up more or less containers (*replicas* of the application) across a datacenter, to respond to application loading
- .good[**Separation of code from state**]
- Separates code+infrastructure (*application logic*) configuration from Data (*application state*)
- Backups only need to worry about the data, as code and infrastructure can be built on-the-fly
- .good[**Native CPU scheduling**]
- On Linux, containers are seen by the Kernel as independent processes, so they can be efficiently managed by the CPU scheduler
## VMs vs Containers (Cont'd)
- Containers share the host's OS and kernel, and do not need to virtualize the Operating System → saving memory
.footnote[.tiny[Images from [*Using Docker: Developing and Deploying Software with Containers*](#references)]]
## Architecture
.tiny[An overview of the Docker architecture (Image by Docker - [Source](]
## Images
.tiny[Docker Images (Based on an image by Docker - [Source](]
- Read-only **templates** with instructions for creating a Docker container
- Images are built using [Dockerfiles](#dockerfile), written as a sequence of steps to go from the *base image* to your final image. Every step in a Dockerfile creates a new *layer*.
- You can `pull` them from an image registry, i.e. [Docker Hub](, or build and `push` your own to Docker hub to publish your work.
- Often based on other images. e.g. : you can start with from a `ubuntu` image (*base image*) and install additional libraries, resulting in a new image.
- Steps to go from one image to another are called *layers*, because an image is like an onion: made up of several successive sets of changes.
- When images are rebuilt, only the modified layers are remade, and the base image recovered from *cache*. This makes image building much more efficient than building a VM using, say, [Vagrant](
## Containers
.tiny[Docker containers (Based on an image by Docker - [Source](]
- Containers are a runnable *instance* of an image.
- Instance because you can start __multiple containers__ from the same image---like baking cookies from the same mold!
- You can `start`, `stop`, `move`, or `delete` containers using the `docker` command.
- Connectors can be connected to one or more [*networks*](#networking) → separate them for security via isolation, or connect them so they work together
- Attach storage to the container via [**volumes**](#volumes) → like plugging in an external hard drive to keep changed files after the container is shut down.
- You can `save` a new image from the current state of a container.
.dangerbox[If you `start` a container from an image and anything is modified inside the former, all changes be lost when you `stop` and `rm` (remove) it.]
## Volumes (1/4) - What are they?
.tiny[Docker volumes (Image by Docker - [Source](]
- Without volumes, containers contain both application code and state (no separation)
- When the container is deleted, so are any changes made since its instantiation from its base image
- A volume acts a like a *mount point* for a container
- It "injects" a link to a folder from the host machine into the container's file structure
- That becomes a shared folder between host and container
- You can also use `tmpfs` in Linux to create a memory-based volume for using RAM as a virtual file structure
- Fast!! But volatile too, good for caching files, for example
.tiny[Docker volumes (Image by Docker - [Source](]
- .good[Sharing data across different containers and machines]
- Good for fault-tolerant applications---if one containerized "clone" of your application crashes, another can over transparently, because they share the same data, or *state*
- .good[Access control]
- Volumes are **bidirectional** be default: changes made by the host to files inside the volume folder are also propagated inside the corresponding folder in the container (and vice-versa)
- `readonly` volumes will allow the container to read files in the volume, but not change them
- .good[Very useful for backups]
- You link only the folders within the container that have your application state (say, the folders where you have database files, uploaded images, and any other)
- Ignore the rest of the operating system in each container, because all dependencies are handled by the image---great space savings
- To backup, instead of making an image of the entire container, you just copy the volume folders in the host
- To go back in time, just replace the volume folders' content with your backup and start a new container with those volumes mounted.
- .bad[Slow I/O on non-Linux Operating Systems]
- Read and write to/from volume folders can be quote slow on non-Linux operating systems. Watch out if you need intensive I/O in your app.
.dangerbox[Volumes are mounted when a container is booted, **replacing the contents** of the folder in the container with the contents of the volume folder from the host]
1. If you need to initialize your app automatically on first startup (e.g. create default admin users), you cannot do it during image creation, but instead use a startup script inside the container.
- This is because our application state (e.g. some database files) is saved in a folder somewhere in the container. If you initialize those files during the image building process, apparently everything works well **without volumes**.
2. When that folder is later mounted as a volume, its contents **will be replaced** with the contents of the host's folder that is mounted
- .red[Down the 🚽 goes your initialization], as the mounted volume's folder is most likely empty when it is mounted by the host in the container on first startup!
3. Possible generic solution: Create a dummy file in your data folder after a successful initialization. Let your initialization code check if the file is present. If it is not, then you need to re-initialize. Changes will be propagated to the volume's folder on the host, so this should only happen on first bootup.
- Alternatively, some web frameworks also provide support for database *migration* and *seeding*, that you can run on application startup.
## Networking
- Docker containers can live on the same network as other VMs, bare-metal machines, or containers
- All types of machines can communicate transparently
- 5 networking modes: `bridge`, `host`, `overlay`, `macvlan` and `none`
- `none` will disable networking completely-no communication
- We will cover the two most common ones, `bridge` and `host`.
.footnote[.tiny[Source: [](]]
## Networking (`bridge` mode)
.tiny[Docker Networking ([Bridge mode)](]
- Containers running on Bob host can find other containers also running on Bob by name (running `ping solr` on `my-app` container will return the IP of `my-app`, as given by the Docker DHCP server).
- Alice cannot find a container by name (`ping solr` on Alice's machine will return Host Not Found).
- Containers can find Alice by name (`ping alice` will return alice's IP, as given by Internet Gateway) and access the Internet.
- Containers running in the Bob host cannot find any containers running on Alice
- Multiple `bridge` networks can be created, and containers can will communicate within the same `bridge` network (for separation).
## Networking (`host` mode)
.tiny[Docker Networking ([Host mode)](]
- Only available on Linux
- Containers on Bob host can access the host network directly.
- No name resolution among containers
- `my-app` running on Bob will not get any response if it runs `ping mysql`
- DNS entries (such as `bob.lan <ip>`) must be added at the physical network host resolution level
- Containers can bind their open ports to ports on the host.
- Alice can access a container on Bob via Bob's IP + port of the container they want
- Only one program (and thus, only one container) can be listening on a port of each host. Beware of conflicts!
## Dockerfiles
- Dockerfiles are files containing the sequence of steps required to build an image.
- They are typically named `Dockerfile` without any extension.
# Start with a base image of Ubuntu 18.04, then:
FROM ubuntu:18.04
# Copy current folder on the host to /app on the container
COPY . /app
# run `make` (compilation, etc) on /app to build the app on the container
RUN make /app
# Set default command when container boots up (runs installed app)
CMD python /app/
To build an image from a `Dockerfile` in the current directory, you run:
docker build .
- More complicated `Dockerfile` examples can be found [here]( and [here]( if you are curious.
- [Best practices for writing Dockerfiles](
- [`docker build` command reference](
## Installation
- Installation guide for Docker and how to use a container for local PHP development available [here](/teaching/howto/local_dev_with_docker).
.warningbox[Remember to turn on Virtualization Support (or VT-x) on your BIOS/ UEFI (press Delete/F2 before Windows Starts) in order to run virtualization apps like Docker or a VM Hypervisor. See more [here](]
.tiny[Oops, I forgot to [turn on Virtualization](!]
## Example Container (Apache Web Server + PHP)
- The command that you use to start your server, now explained:
docker run \ #
\ # run is the command for running a container
-d \ #
\ # run in detached mode (without this,
\ # the container will stop when you close the
\ # command line, instead of running in
\ # the background and on system startup)
-p 8080:8080 \ #
\ # bind port 8080 of the container,
\ # which is running the Apache+PHP server,
\ # to the port 8080 of the host. This is
\ # what allows you to type localhost:8080
\ # on the browser and have the container respond
-it \ #
\ # allocate a tty for the container process
--name=php \ #
\ # name of the container to create
-v $(pwd)/html:/var/www/html \ #
\ # create a volume to map
\ # [current folder]/html on the
\ # host to /var/www/html (default Apache
\ # htdocs location) on the container #
# name of the image to base
# the container on (has Apache and PHP
# pre-installed)
- [`docker run` command reference](
- [What is a TTY?](
## References
- *Using Docker: Developing and Deploying Software with Containers*
Mouat, A. (2016).
O'Reilly Media, Inc.
- *Docker Overview*
Docker Docs
- *Docker Networking*
Docker Docs
- *Most common Docker commands*
[Docker Cheat Sheet](