A container is a program which isolates another program’s whole runtime, including all files and conditions necessary to support that runtime, and enables it to run independently from factors outside the container environment.
Containers are similar to a Virtual machines in the sense that they package entire runtimes. However, unlike VMs, containers don’t run their own operating systems. Instead, containers rely on a common container engine, like Docker, to provide the interfaces necessary to utilize host system resources. This means that individual containers are much lighter and more agile than VMs, since they don’t need to store and run entire operating systems unto themselves. Containers share the host operating system kernel and may have read-only access to the host machine’s files.
Rather than a hypervisor running a virtual machine running an operating system, containerization involves a container engine running a container image running a set of processes.
How a container runs
Containers are programs, so they are either running or not running. When not running, the container is just a file or directory in storage, waiting to be run. This file (or set of files) is variously termed as an “image” or a “repository”, though this can get confusing and may differ depending on the container engine. When the container image or repository is executed, the container engine reads the image metadata and passes a copy of the image file(s) to the host operating system kernel, requesting to spawn a process to execute the container image program on the host machine. When spawning the container process on the kernel, the container engine may take extra steps to isolate, secure, and monitor the process.
A container image (“image” is the common parlance, though it often actually refers to a “repository”) is a set of files in a standardized format which is downloaded from a central container index server (known as the registry server) and can be used to spawn containers. The container image provides the files, metadata, and instructions which the container engine uses to spawn the container process on the host operating system.
Layers and tags
The term “image” is often synonymous with “repository”, which refers to a package of “image layers”, which are a nested set of diffs, similar to commits on a git repository. The diffs represent changes to the original image file. Any individual layer can likely be run as a container, provided its diff doesn’t break the image. To run the latest version of a given image, all historical layers of that image are needed.
Certain notable layers may be labeled as tags, which are often used to designate distinct versions of a given repository. This is also similar to git.
Container engines define the container image format much like compilers define the language they compile. Container image formats have not historically been standardized, but the OCI (Open Container Initiative) ↗ standard is becoming widely used. The OCI image format provides a
manifest.json file for the image metadata and a bundle of
.tar files for the image layers. The container engine reads the metadata and unpacks the
.tar files in order to construct the binary which can be run on the host OS kernel.