Chapter 1

Introduction

History of K8S

  • 2003-2004: Google introduced Borg system , which started as a small project to manage new search engine. Later on it was heavily used for managing internal distributed systems and jobs
  • 2013: Google moved from Borg to Omega - a flexible and scalable scheduler for large clusters
  • 2014: Google introduced kubernetes and big players (IBM, Docker, RedHat, Microsoft) joined the project
  • 2015: Kubernetes 1.0 released and Google partnered with Linux Foundation to form the Cloud Native Computing Foundation (CNCF)
  • 2016: Kubernetes went to mainstream and Helm package manager introduced and `minikube` was also released. Windows support added to k8s
  • 2017: Kubernetes reached v.1.7 and were widely adopted by industry. IBM and Google introduced `Istio` service mesh.
  • 2018: Industry understands the power of k8s and adoption rate increased
  • 2019: Journey continues...
  • Subsections of Introduction

    Linux Kernel Architecture

    Kernel

    At the top is the user, or application, space. This is where the user applications are executed. Below the user space is the kernel space.

    There is also the GNU C Library (glibc). This provides the system call interface that connects to the kernel and provides the mechanism to transition between the user-space application and the kernel. This is important because the kernel and user application occupy different protected address spaces. And while each user-space process occupies its own virtual address space, the kernel occupies a single address space.

    The Linux kernel can be further divided into three gross levels.

    • At the top is the system call interface, which implements the basic functions such as read and write.
    • Below the system call interface is the kernel code, which can be more accurately defined as the architecture-independent kernel code. This code is common to all of the processor architectures supported by Linux.
    • Below this is the architecture-dependent code, which forms what is more commonly called a BSP (Board Support Package). This code serves as the processor and platform-specific code for the given architecture.

    The Linux kernel implements a number of important architectural attributes. At a high level, and at lower levels, the kernel is layered into a number of distinct subsystems.

    Linux Namespaces

    Namespaces are a feature of the Linux kernel that partitions kernel resources such that one set of processes sees one set of resources while another set of processes sees a different set of resources. The feature works by having the same name space for these resources in the various sets of processes, but those names referring to distinct resources. Examples of resource names that can exist in multiple spaces, so that the named resources are partitioned, are process IDs, hostnames, user IDs, file names, and some names associated with network access, and interprocess communication.

    Namespaces are a fundamental aspect of containers on Linux.

    NamespaceConstantIsolates
    CgroupCLONE_NEWCGROUPCgroup root directory
    IPCCLONE_NEWIPCSystem V IPC, POSIX message queues
    NetworkCLONE_NEWNETNetwork devices, stacks, ports, etc.
    MountCLONE_NEWNSMount points
    PIDCLONE_NEWPIDProcess IDs
    UserCLONE_NEWUSERUser and group IDs
    UTSCLONE_NEWUTSHostname and NIS domain name

    The kernel assigns each process a symbolic link per namespace kind in /proc/<pid>/ns/. The inode number pointed to by this symlink is the same for each process in this namespace. This uniquely identifies each namespace by the inode number pointed to by one of its symlinks.

    Reading the symlink via readlink returns a string containing the namespace kind name and the inode number of the namespace.

    CGroups

    cgroups (abbreviated from control groups) is a Linux kernel feature that limits, accounts for, and isolates the resource usage (CPU, memory, disk I/O, network, etc.) of a collection of processes.

    Resource limiting

    groups can be set to not exceed a configured memory limit

    Prioritization

    Some groups may get a larger share of CPU utilization or disk I/O throughput

    Accounting

    Measures a group’s resource usage, which may be used

    Control

    Freezing groups of processes, their checkpointing and restarting

    You can read and explore more about cGroups in this post

    Container from scratch

    Using namespaces , we can start a process which will be completely isolated from other processes running in the system.

    Create root File System

    Create directory to store rootfs contents

    $ mkdir -p /root/busybox/rootfs
    $ CONTAINER_ROOT=/root/busybox/rootfs
    $ cd ${CONTAINER_ROOT}

    Download busybox binary

    $ wget https://busybox.net/downloads/binaries/1.28.1-defconfig-multiarch/busybox-x86_64
    $ mv busybox-x86_64 busybox
    $ chmod 755 busybox
    $ mkdir bin
    $ mkdir proc
    $ mkdir sys
    $ mkdir tmp
    $ for i in $(./busybox --list)
    do
       ln -s /busybox bin/$i
    done

    Start Container

    Start a shell in new contianer

    $ unshare --mount --uts --ipc --net --pid --fork --user --map-root-user chroot ${CONTAINER_ROOT} /bin/sh

    Mount essential kernel structures

    $ mount -t proc none /proc
    $ mount -t sysfs none /sys
    $ mount -t tmpfs none /tmp

    Configure networking

    From Host system , create a veth pair and then map that to container

    $ sudo ip link add vethlocal type veth  peer name vethNS
    $ sudo ip link set vethlocal up
    $ sudo ip link set vethNS up
    $ sudo ps -ef |grep '/bin/sh'
    $ sudo ip link set vethNS netns <pid of /bin/sh>

    What is Docker

    Docker is a tool designed to make it easier to create, deploy, and run applications by using containers. Containers allow a developer to package up an application with all of the parts it needs, such as libraries and other dependencies, and ship it all out as one package.

    In a way, Docker is a bit like a virtual machine. But unlike a virtual machine, rather than creating a whole virtual operating system, Docker allows applications to use the same Linux kernel as the system that they’re running on and only requires applications be shipped with things not already running on the host computer. This gives a significant performance boost and reduces the size of the application.

    Kernel

    Kubernetes

    Pet vs Cattle.

    In the pets service model, each pet server is given a loving names like zeus, ares, hades, poseidon, and athena. They are “unique, lovingly hand-raised, and cared for, and when they get sick, you nurse them back to health”. You scale these up by making them bigger, and when they are unavailable, everyone notices.

    In the cattle service model, the servers are given identification numbers like web-01, web-02, web-03, web-04, and web-05, much the same way cattle are given numbers tagged to their ear. Each server is “almost identical to each other” and “when one gets sick, you replace it with another one”. You scale these by creating more of them, and when one is unavailable, no one notices.

    Kubernetes is a portable, extensible open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation. It has a large, rapidly growing ecosystem. Kubernetes services, support, and tools are widely available.

    Google open-sourced the Kubernetes project in 2014. Kubernetes builds upon a decade and a half of experience that Google has with running production workloads at scale, combined with best-of-breed ideas and practices from the community

    Read More here

    Container and Kubernetes Container and Kubernetes

    Kubernetes Architecture

    Container and Kubernetes Container and Kubernetes

    Container runtime

    Docker , rkt , containerd or any OCI compliant runtime which will download image , configures network , mount volumes and assist container life cycle management.

    kubelet

    Responsible for instructing container runtime to start , stop or modify a container

    kube-proxy

    Manage service IPs and iptables rules

    kube-apiserver

    API server interacts with all other components in cluster All client interactions will happen via API server

    kube-scheduler

    Responsible for scheduling workload on minions or worker nodes based on resource constraints

    kube-controller-manager

    Responsible for monitoring different containers in reconciliation loop Will discuss more about different controllers later in this course

    etcd

    Persistent store where we store all configurations and cluster state

    cloud-controller-manager

    Cloud vendor specific controller and cloud vendor is Responsible to develop this program

    Container Networking

    We need to access the container from outside world and the container running on different hosts have to communicate each other.

    Here we will see how can we do it with bridging.

    Traditional networking

    Network Network

    Create a veth pair on Host.

    $ sudo ip link add veth0 type veth peer name veth1
    $ sudo ip link show

    Create a network namespace

    $ sudo ip netns add bash-nw-namespace
    $ sudo ip netns show

    Connect one end to namespace

    $ sudo ip link set veth1 netns bash-nw-namespace
    $ sudo ip link list

    Resulting network

    Network Network

    Create a Bridge interface

    $ sudo brctl addbr cbr0

    Add an external interface to bridge

    $ sudo brctl addif cbr0 enp0s9
    $ sudo brctl show

    Connect other end to a switch

    $ sudo brctl addif cbr0 veth0
    $ sudo brctl show

    Resulting network

    Network Network

    Assign IP to interface

    $ sudo ip netns exec bash-nw-namespace bash
    $ sudo ip addr add 192.168.56.10/24 dev veth1
    $ sudo ip link set lo up
    $ sudo ip link set dev veth1 up

    Access container IP from outside

    Like bridging , we can opt other networking solutions.

    Later we will see how Weave Network and Calico plugins works. You may read bit more on Docker networking basics on below blog post

    Docker networking