Chapter 1

Introduction

History of K8S

2003-2004: Google introduced Borg system , which started as a small project to manage new search engine. Later on it was heavily used for managing internal distributed systems and jobs

2013: Google moved from Borg to Omega - a flexible and scalable scheduler for large clusters

2014: Google introduced kubernetes and big players (IBM, Docker, RedHat, Microsoft) joined the project

2015: Kubernetes 1.0 released and Google partnered with Linux Foundation to form the Cloud Native Computing Foundation (CNCF)

2016: Kubernetes went to mainstream and Helm package manager introduced and `minikube` was also released. Windows support added to k8s

2017: Kubernetes reached v.1.7 and were widely adopted by industry. IBM and Google introduced `Istio` service mesh.

2018: Industry understands the power of k8s and adoption rate increased

2019: Journey continues...

Linux Kernel Architecture

Kernel

At the top is the user, or application, space. This is where the user applications are executed. Below the user space is the kernel space.

There is also the GNU C Library (glibc). This provides the system call interface that connects to the kernel and provides the mechanism to transition between the user-space application and the kernel. This is important because the kernel and user application occupy different protected address spaces. And while each user-space process occupies its own virtual address space, the kernel occupies a single address space.

The Linux kernel can be further divided into three gross levels.

At the top is the system call interface, which implements the basic functions such as read and write.
Below the system call interface is the kernel code, which can be more accurately defined as the architecture-independent kernel code. This code is common to all of the processor architectures supported by Linux.
Below this is the architecture-dependent code, which forms what is more commonly called a BSP (Board Support Package). This code serves as the processor and platform-specific code for the given architecture.

The Linux kernel implements a number of important architectural attributes. At a high level, and at lower levels, the kernel is layered into a number of distinct subsystems.

Linux Namespaces

Namespaces are a feature of the Linux kernel that partitions kernel resources such that one set of processes sees one set of resources while another set of processes sees a different set of resources. The feature works by having the same name space for these resources in the various sets of processes, but those names referring to distinct resources. Examples of resource names that can exist in multiple spaces, so that the named resources are partitioned, are process IDs, hostnames, user IDs, file names, and some names associated with network access, and interprocess communication.

Namespaces are a fundamental aspect of containers on Linux.

Namespace	Constant	Isolates
Cgroup	CLONE_NEWCGROUP	Cgroup root directory
IPC	CLONE_NEWIPC	System V IPC, POSIX message queues
Network	CLONE_NEWNET	Network devices, stacks, ports, etc.
Mount	CLONE_NEWNS	Mount points
PID	CLONE_NEWPID	Process IDs
User	CLONE_NEWUSER	User and group IDs
UTS	CLONE_NEWUTS	Hostname and NIS domain name

The kernel assigns each process a symbolic link per namespace kind in /proc/<pid>/ns/. The inode number pointed to by this symlink is the same for each process in this namespace. This uniquely identifies each namespace by the inode number pointed to by one of its symlinks.

Reading the symlink via readlink returns a string containing the namespace kind name and the inode number of the namespace.

CGroups

cgroups (abbreviated from control groups) is a Linux kernel feature that limits, accounts for, and isolates the resource usage (CPU, memory, disk I/O, network, etc.) of a collection of processes.

Resource limiting

groups can be set to not exceed a configured memory limit

Prioritization

Some groups may get a larger share of CPU utilization or disk I/O throughput

Accounting

Measures a group’s resource usage, which may be used

Control

Freezing groups of processes, their checkpointing and restarting

You can read and explore more about cGroups in this post

Container from scratch

Using namespaces , we can start a process which will be completely isolated from other processes running in the system.

Create root File System

Create directory to store rootfs contents

$ mkdir -p /root/busybox/rootfs
$ CONTAINER_ROOT=/root/busybox/rootfs
$ cd ${CONTAINER_ROOT}

Download busybox binary

$ wget https://busybox.net/downloads/binaries/1.28.1-defconfig-multiarch/busybox-x86_64

Create needed directories and symlinks

$ mv busybox-x86_64 busybox
$ chmod 755 busybox
$ mkdir bin
$ mkdir proc
$ mkdir sys
$ mkdir tmp
$ for i in $(./busybox --list)
do
   ln -s /busybox bin/$i
done

Start Container

Start a shell in new contianer

$ unshare --mount --uts --ipc --net --pid --fork --user --map-root-user chroot ${CONTAINER_ROOT} /bin/sh

Mount essential kernel structures

$ mount -t proc none /proc
$ mount -t sysfs none /sys
$ mount -t tmpfs none /tmp

Configure networking

From Host system , create a veth pair and then map that to container

$ sudo ip link add vethlocal type veth  peer name vethNS
$ sudo ip link set vethlocal up
$ sudo ip link set vethNS up
$ sudo ps -ef |grep '/bin/sh'
$ sudo ip link set vethNS netns <pid of /bin/sh>

From container , execute `ip link`

What is Docker

Docker is a tool designed to make it easier to create, deploy, and run applications by using containers. Containers allow a developer to package up an application with all of the parts it needs, such as libraries and other dependencies, and ship it all out as one package.

In a way, Docker is a bit like a virtual machine. But unlike a virtual machine, rather than creating a whole virtual operating system, Docker allows applications to use the same Linux kernel as the system that they’re running on and only requires applications be shipped with things not already running on the host computer. This gives a significant performance boost and reduces the size of the application.

Kernel

Kubernetes

Pet vs Cattle.

In the pets service model, each pet server is given a loving names like zeus, ares, hades, poseidon, and athena. They are “unique, lovingly hand-raised, and cared for, and when they get sick, you nurse them back to health”. You scale these up by making them bigger, and when they are unavailable, everyone notices.

In the cattle service model, the servers are given identification numbers like web-01, web-02, web-03, web-04, and web-05, much the same way cattle are given numbers tagged to their ear. Each server is “almost identical to each other” and “when one gets sick, you replace it with another one”. You scale these by creating more of them, and when one is unavailable, no one notices.

Kubernetes is a portable, extensible open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation. It has a large, rapidly growing ecosystem. Kubernetes services, support, and tools are widely available.

Google open-sourced the Kubernetes project in 2014. Kubernetes builds upon a decade and a half of experience that Google has with running production workloads at scale, combined with best-of-breed ideas and practices from the community

Kubernetes Architecture

Container runtime

Docker , rkt , containerd or any OCI compliant runtime which will download image , configures network , mount volumes and assist container life cycle management.

kubelet

Responsible for instructing container runtime to start , stop or modify a container

kube-proxy

Manage service IPs and iptables rules

kube-apiserver

API server interacts with all other components in cluster All client interactions will happen via API server

kube-scheduler

Responsible for scheduling workload on minions or worker nodes based on resource constraints

kube-controller-manager

Responsible for monitoring different containers in reconciliation loop Will discuss more about different controllers later in this course

etcd

Persistent store where we store all configurations and cluster state

cloud-controller-manager

Cloud vendor specific controller and cloud vendor is Responsible to develop this program

Container Networking

We need to access the container from outside world and the container running on different hosts have to communicate each other.

Here we will see how can we do it with bridging.

Traditional networking

Create a veth pair on Host.

$ sudo ip link add veth0 type veth peer name veth1
$ sudo ip link show

Create a network namespace

$ sudo ip netns add bash-nw-namespace
$ sudo ip netns show

Connect one end to namespace

$ sudo ip link set veth1 netns bash-nw-namespace
$ sudo ip link list

Resulting network

Create a Bridge interface

$ sudo brctl addbr cbr0

Add an external interface to bridge

$ sudo brctl addif cbr0 enp0s9
$ sudo brctl show

Connect other end to a switch

$ sudo brctl addif cbr0 veth0
$ sudo brctl show

Resulting network

Assign IP to interface

$ sudo ip netns exec bash-nw-namespace bash
$ sudo ip addr add 192.168.56.10/24 dev veth1
$ sudo ip link set lo up
$ sudo ip link set dev veth1 up

Access container IP from outside

Like bridging , we can opt other networking solutions.

Later we will see how Weave Network and Calico plugins works. You may read bit more on Docker networking basics on below blog post

Docker networking

Introduction

History of K8S

Subsections of Introduction

Linux Kernel Architecture

Linux Namespaces

CGroups

Resource limiting

Prioritization

Accounting

Control

Container from scratch

Create root File System

Create directory to store rootfs contents

Download busybox binary

Create needed directories and symlinks

Start Container

Start a shell in new contianer

Mount essential kernel structures

Configure networking

From Host system , create a veth pair and then map that to container

From container , execute ip link

What is Docker

Kubernetes

Pet vs Cattle.

Kubernetes Architecture

Container runtime

kubelet

kube-proxy

kube-apiserver

kube-scheduler

kube-controller-manager

etcd

cloud-controller-manager

Container Networking

Traditional networking

Create a veth pair on Host.

Create a network namespace

Connect one end to namespace

Resulting network

Create a Bridge interface

Add an external interface to bridge

Connect other end to a switch

Resulting network

Assign IP to interface

Access container IP from outside

From container , execute `ip link`