class: middle, center
[a container runtime for software deployments]
## Concepts and Introduction
by [João Rocha da Silva](https://silvae86.github.io), based on ['Using Docker: Developing and Deploying Software with Containers'](https://www.oreilly.com/library/view/using-docker/9781491915752/) by *Adrian Mouat* and [other sources](#references).
class: middle, center
.indexpill[[Virtual Machines vs. 'bare metal'](#virtual-machines-vs-bare-metal)]
.indexpill[[VMs vs Containers](#vms-vs-containers)]
.indexpill[[Example container (Web Server)](#example)]
## Virtual Machines
### What are they?
- A virtual machine is a software that *simulates* a physical computer's hardware and software components
### Host vs. Guest
- Several **virtual** machines can run in a single **physical**, or *bare metal* machine
- Virtual machines are typically called "guests"
- The physical machines where they run are called "hosts"
## Virtual machines vs. 'bare-metal' - Advantages
- .good[Portability and hardware-agnosticism]
- The same VM can run in computers with very different hardware and software configurations
- The [*hypervisor*](#hypervisor) provides an abstraction layer between the virtual and physical hardware configurations
- .good[Faster disaster recovery]
- Virtual machines can be backed up and restored simply by copying and pasting their *virtual* hard drives, which are simple files on the host's file system.
- Improved security for shared hardware machines-several users can have full administration privileges inside their own VMs, but without any access to the host
- VMs make it easier to set up a *multi-tenant* environment, where resources are shared among the various VMs, which can even belong to different people.
## Virtual machines vs. 'bare-metal' - Disadvantages
- .bad[High resource consumption]
- The physical machine needs to virtualise everything, including the operating system - RAM usage is the same as that of a 'bare-metal' machine. The host needs a lot of RAM to run several VMs at the same time.
- Running many VMs on the same host can slow down even powerful servers, because the access to hard drive needs to be split among multiple concurrent and random accesses--- this is especially hard on mechanical hard drives, not so much on SSDs.
- .bad[Lack of access to some low-level functions]
- If your application needs direct access to some low-level / hardware capabilities (such as 3D acceleration), those may be unavailable.
- .bad[Backups need to include the entire virtual machine]
- Very large files, as the virtual machine "virtual hard drive" takes as much space as an entire hard drive of a 'bare-metal' machine (~hundreds of GB **each**!).
## The hypervisor
- Software that powers the virtual machines
- It provides virtual networking and storage layers (virtual network cards and virtual hard drives)
- Controls how much % of CPU and RAM of the host is given to each of the guests
.tiny[[VMWare Workstation by VMWare](https://www.vmware.com/)]
.tiny[[VirtualBox by Oracle](https://www.vmware.com/) -
(Open source hypervisor)]
## Why Docker? (1/2)
- Practically any application can be containerized
- Containers share the host Kernel, without virtualizing an OS for every application, saving a LOT of resources when compared to using Virtual Machines
- Ensures portability of execution environment on any machine
- Application, pre-requisites and dependencies are packaged together
- From a single container on your laptop or thousands in the cloud, it is all the same technology
- .good[**Loose coupling**]
- Containers are highly self-sufficient and encapsulated, and can be updated individually without upsetting others
- Containers apply aggressive constraints and isolations to processes without any configuration required on the part of the user.
.tiny[.footnote[Source: "Orientation and setup", by [Docker](https://docs.docker.com/get-started/)]]
## Why Docker? (2/2)
- .good[**Elasticity for the cloud**]
- In the cloud, computational resources should be purchased as needed
- Too many users for too little computational power → poor system performance
- Too many resources for too little users → waste of money
- Docker makes it easier to scale applications up to meet peak loads, and then scale down during downtime
- Spin up more or less containers (*replicas* of the application) across a datacenter, to respond to application loading
- .good[**Separation of code from state**]
- Separates code+infrastructure (*application logic*) configuration from Data (*application state*)
- Backups only need to worry about the data, as code and infrastructure can be built on-the-fly
- .good[**Native CPU scheduling**]
- On Linux, containers are seen by the Kernel as independent processes, so they can be efficiently managed by the CPU scheduler
## VMs vs Containers (Cont'd)
.imgmd2[![Container Engine Architectural Diagram](/teaching/slides/docker/basics/container-engine.png)]
.imgmd2[![Virtual Machines Architectural Diagram](/teaching/slides/docker/basics/vms.png)]
- Containers share the host's OS and kernel, and do not need to virtualize the Operating System → saving memory
.footnote[.tiny[Images from [*Using Docker: Developing and Deploying Software with Containers*](#references)]]
.tiny[An overview of the Docker architecture (Image by Docker - [Source](https://docs.docker.com/get-started/overview/))]
.tiny[Docker Images (Based on an image by Docker - [Source](https://docs.docker.com/storage/volumes/))]
- Read-only **templates** with instructions for creating a Docker container
- Images are built using [Dockerfiles](#dockerfile), written as a sequence of steps to go from the *base image* to your final image. Every step in a Dockerfile creates a new *layer*.
- You can `pull` them from an image registry, i.e. [Docker Hub](https://hub.docker.com/search?q=&type=image&image_filter=official), or build and `push` your own to Docker hub to publish your work.
- Often based on other images. e.g. : you can start with from a `ubuntu` image (*base image*) and install additional libraries, resulting in a new image.
- Steps to go from one image to another are called *layers*, because an image is like an onion: made up of several successive sets of changes.
- When images are rebuilt, only the modified layers are remade, and the base image recovered from *cache*. This makes image building much more efficient than building a VM using, say, [Vagrant](https://www.vagrantup.com/).
.tiny[Docker containers (Based on an image by Docker - [Source](https://docs.docker.com/storage/volumes/))]
- Containers are a runnable *instance* of an image.
- Instance because you can start __multiple containers__ from the same image---like baking cookies from the same mold!
- You can `start`, `stop`, `move`, or `delete` containers using the `docker` command.
- Connectors can be connected to one or more [*networks*](#networking) → separate them for security via isolation, or connect them so they work together
- Attach storage to the container via [**volumes**](#volumes) → like plugging in an external hard drive to keep changed files after the container is shut down.
- You can `save` a new image from the current state of a container.
.dangerbox[If you `start` a container from an image and anything is modified inside the former, all changes be lost when you `stop` and `rm` (remove) it.]
## Volumes (1/4) - What are they?
.tiny[Docker volumes (Image by Docker - [Source](https://docs.docker.com/storage/volumes/))]
- Without volumes, containers contain both application code and state (no separation)
- When the container is deleted, so are any changes made since its instantiation from its base image
- A volume acts a like a *mount point* for a container
- It "injects" a link to a folder from the host machine into the container's file structure
- That becomes a shared folder between host and container
- You can also use `tmpfs` in Linux to create a memory-based volume for using RAM as a virtual file structure
- Fast!! But volatile too, good for caching files, for example
## Volumes (2/4) - Advantages and Disadvantages
.tiny[Docker volumes (Image by Docker - [Source](https://docs.docker.com/storage/volumes/))]
- .good[Sharing data across different containers and machines]
- Good for fault-tolerant applications---if one containerized "clone" of your application crashes, another can over transparently, because they share the same data, or *state*
- .good[Access control]
- Volumes are **bidirectional** be default: changes made by the host to files inside the volume folder are also propagated inside the corresponding folder in the container (and vice-versa)
- `readonly` volumes will allow the container to read files in the volume, but not change them
## Volumes (3/4) - Advantages and Disadvantages Part 2
- .good[Very useful for backups]
- You link only the folders within the container that have your application state (say, the folders where you have database files, uploaded images, and any other)
- Ignore the rest of the operating system in each container, because all dependencies are handled by the image---great space savings
- To backup, instead of making an image of the entire container, you just copy the volume folders in the host
- To go back in time, just replace the volume folders' content with your backup and start a new container with those volumes mounted.
- .bad[Slow I/O on non-Linux Operating Systems]
- Read and write to/from volume folders can be quote slow on non-Linux operating systems. Watch out if you need intensive I/O in your app.
## Volumes (4/4) - Initialization of containerized applications .tiny[from "Stuff I learned the hard way, page 3519"]
.dangerbox[Volumes are mounted when a container is booted, **replacing the contents** of the folder in the container with the contents of the volume folder from the host]
1. If you need to initialize your app automatically on first startup (e.g. create default admin users), you cannot do it during image creation, but instead use a startup script inside the container.
- This is because our application state (e.g. some database files) is saved in a folder somewhere in the container. If you initialize those files during the image building process, apparently everything works well **without volumes**.
2. When that folder is later mounted as a volume, its contents **will be replaced** with the contents of the host's folder that is mounted
- .red[Down the 🚽 goes your initialization], as the mounted volume's folder is most likely empty when it is mounted by the host in the container on first startup!
3. Possible generic solution: Create a dummy file in your data folder after a successful initialization. Let your initialization code check if the file is present. If it is not, then you need to re-initialize. Changes will be propagated to the volume's folder on the host, so this should only happen on first bootup.
- Alternatively, some web frameworks also provide support for database *migration* and *seeding*, that you can run on application startup.
- Docker containers can live on the same network as other VMs, bare-metal machines, or containers
- All types of machines can communicate transparently
- 5 networking modes: `bridge`, `host`, `overlay`, `macvlan` and `none`
- `none` will disable networking completely-no communication
- We will cover the two most common ones, `bridge` and `host`.
## Networking (`bridge` mode)
.imgmd[!["Docker Networking - Bridge Mode"](/teaching/slides/docker/basics/docker-networks-bridge.png)]
.tiny[Docker Networking ([Bridge mode)](https://docs.docker.com/network/bridge/)]
- Containers running on Bob host can find other containers also running on Bob by name (running `ping solr` on `my-app` container will return the IP of `my-app`, as given by the Docker DHCP server).
- Alice cannot find a container by name (`ping solr` on Alice's machine will return Host Not Found).
- Containers can find Alice by name (`ping alice` will return alice's IP, as given by Internet Gateway) and access the Internet.
- Containers running in the Bob host cannot find any containers running on Alice
- Multiple `bridge` networks can be created, and containers can will communicate within the same `bridge` network (for separation).
## Networking (`host` mode)
.imgmd[!["Docker Networking - Host Mode"](/teaching/slides/docker/basics/docker-networks-host.png)]
.tiny[Docker Networking ([Host mode)](https://docs.docker.com/network/host/)]
- Only available on Linux
- Containers on Bob host can access the host network directly.
- No name resolution among containers
- `my-app` running on Bob will not get any response if it runs `ping mysql`
- DNS entries (such as `bob.lan <ip>`) must be added at the physical network host resolution level
- Containers can bind their open ports to ports on the host.
- Alice can access a container on Bob via Bob's IP + port of the container they want
- Only one program (and thus, only one container) can be listening on a port of each host. Beware of conflicts!
- Dockerfiles are files containing the sequence of steps required to build an image.
- They are typically named `Dockerfile` without any extension.
# Start with a base image of Ubuntu 18.04, then:
# Copy current folder on the host to /app on the container
COPY . /app
# run `make` (compilation, etc) on /app to build the app on the container
RUN make /app
# Set default command when container boots up (runs installed app)
CMD python /app/app.py
To build an image from a `Dockerfile` in the current directory, you run:
docker build .
- More complicated `Dockerfile` examples can be found [here](https://github.com/silvae86/feup-bdad-corrector/blob/master/Dockerfile) and [here](https://github.com/feup-infolab/dendro/blob/master/Dockerfile) if you are curious.
- [Best practices for writing Dockerfiles](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/)
- [`docker build` command reference](https://docs.docker.com/engine/reference/commandline/build/)
- Installation guide for Docker and how to use a container for local PHP development available [here](/teaching/howto/local_dev_with_docker).
.warningbox[Remember to turn on Virtualization Support (or VT-x) on your BIOS/ UEFI (press Delete/F2 before Windows Starts) in order to run virtualization apps like Docker or a VM Hypervisor. See more [here](https://docs.fedoraproject.org/en-US/Fedora/13/html/Virtualization_Guide/sect-Virtualization-Troubleshooting-Enabling_Intel_VT_and_AMD_V_virtualization_hardware_extensions_in_BIOS.html).]
.imglg[!["Virtualization Off - Docker Error"](/teaching/slides/docker/basics/docker-no-virtualization.png)]
.tiny[Oops, I forgot to [turn on Virtualization](https://docs.fedoraproject.org/en-US/Fedora/13/html/Virtualization_Guide/sect-Virtualization-Troubleshooting-Enabling_Intel_VT_and_AMD_V_virtualization_hardware_extensions_in_BIOS.html)!]
## Example Container (Apache Web Server + PHP)
- The command that you use to start your server, now explained:
docker run \ #
\ # run is the command for running a container
-d \ #
\ # run in detached mode (without this,
\ # the container will stop when you close the
\ # command line, instead of running in
\ # the background and on system startup)
-p 8080:8080 \ #
\ # bind port 8080 of the container,
\ # which is running the Apache+PHP server,
\ # to the port 8080 of the host. This is
\ # what allows you to type localhost:8080
\ # on the browser and have the container respond
-it \ #
\ # allocate a tty for the container process
--name=php \ #
\ # name of the container to create
-v $(pwd)/html:/var/www/html \ #
\ # create a volume to map
\ # [current folder]/html on the
\ # host to /var/www/html (default Apache
\ # htdocs location) on the container
# name of the image to base
# the container on (has Apache and PHP
- [`docker run` command reference](https://docs.docker.com/engine/reference/run/)
- [What is a TTY?](https://www.howtogeek.com/428174/what-is-a-tty-on-linux-and-how-to-use-the-tty-command/)
- *Using Docker: Developing and Deploying Software with Containers*
Mouat, A. (2016).
O'Reilly Media, Inc.
- *Docker Overview*
- *Docker Networking*
- *Most common Docker commands*
[Docker Cheat Sheet](https://github.com/wsargent/docker-cheat-sheet)