Kubernetes has become defacto platform for orchestrating (managing, deploying, scaling etc.) containerized applications. In the world of microservices, it has become an inevitable choice for the enterprises. In this article, we’ll look into different kubernetes components and how these components work with each other. As we go through these concepts, we’ll get familiar with Kubernetes components and its high level architecture.
We’ll take a top down approach to understand the various components of Kubernetes. To begin with, let’s understand what is cluster in Kubernetes world.
What is Kubernetes cluster?
In very simple terms, when we successfully deploy kubernetes, we get a cluster. This also means, when we run Kubernetes, we basically run a cluster. A kubernetes cluster comprises of minimum one or a set of machines (physical or virtual).
The following section will cover kubernetes components and its high level architecture.
A kubernetes cluster consists of two main sets of components:
Control Plane (also known as Master)
The control plane in Kubernetes, coordinates and manages the worker nodes and the Pods in the cluster. It ensures that worker nodes are up and running different objects it is responsible from . Control plane also maintains a record of all objects in the kubernetes cluster. It monitors the object states and responds to cluster events. At the outset, kubernetes has got two states; ideal state (the state that we want the k8s cluster to run in) and actual state (the state that the k8s cluster is actually running at any point of time). Kubernetes control plane continuously monitors the actual state and compares it with the ideal state; the moment there is a difference, it works on it to match the actual state to ideal state.
Please note that both of these ideal and actual state can change. The ideal state changes because as developer or administrators, we want to change it. For example, we deploy a new pod, or we scale up/down a particular pod etc. The actual state of a k8s cluster changes in an unforeseen or unexpected scenario; for example, a worker nodes goes down or a pod crashes etc.
It is also important to note that, control plane components can run in any machine in the cluster. However, for the sake of better management, isolation of responsibilities and simplicity, the setup scripts start all control plane components on the same machine, and this machine does not run application workload/pods. These components can run in a single master node. To achieve high availability of masters, we can replicate these components across multiple master nodes.
What are the components of control plane/master?
The API server is the front end of the kubernetes control plane. kube-apiserver is the implementation of this API server. It provides all sorts of APIs to manage, create, configure, query, manipulate the state of various kubernetes objects and cluster. It authenticates and authorizes clients for a particular operation, validates and serves the request. The users or the command line interfaces (cli) talk to API server and in turn with pods, services, nodes etc. for various purposes. We can access the APIs in the server via kubectl cli, REST clients. We can scale Kubernetes API server horizontally, i.e. if needed, we can deploy and run its several instances to balance traffic between them.
etcd is an open source strongly consistent, fault-tolerant, distributed key-value store. Kubernetes stores its configuration data and every information about the state of the cluster and its objects in etcd database.
The process is responsible for scheduling the containers across the worker nodes in the kubernetes cluster. It tracks the cluster health. It also checks the resource needs (for eg. CPU, Memory etc.) of the new/unscheduled container and corresponding pod. Based on these requirements, the existing and available nodes are filtered. In a cluster, Nodes that meet the scheduling requirements for a Pod are called feasible nodes.
- It is technically possible that none of the nodes are suitable for scheduling a pod. In these cases, the pod remains unscheduled until the scheduler find an appropriate node and is able to place the pods into it.
- In successful scenarios, the scheduler finds feasible worker nodes for a pod. Thereafter, it runs a set of functions and scores these selected feasible nodes. The kube-scheduler then picks a node with the highest score and assigns it to run the pod. The kube-scheduler then notifies the API server about this decision in a process called binding.
There are many factors that are considered by kube-scheduler for these decisions. Such factors include (however not limited to) resource requirements, policies, and affinity specifications regarding geolocation, workloads etc. An example of affinity/anti-affinity is to co-locate/or not to co-locate workloads in the same node. A real life example of affinity is to co-locate web-servers with the cache as much as possible. A real life example of anti-affinity is to ensure no two cache cluster are to be located on same host.
Whenever a scheduler assigns a pod to a node in API server, the kubelet (to be explained in few moments) for that node reads the pod spec and then spins up the containers to satisfy that spec with the help container runtime engine.
In order to understand kube-controller-manager, let’s first look at what control loop is. The wiki definition of control loop is as follows:
A control loop is the fundamental building block of industrial control systems. It consists of all the physical components and control functions necessary to automatically adjust the value of a measured process variable (PV) to equal the value of a desired set-point (SP).https://en.wikipedia.org/wiki/Control_loop
In kubernetes, we have a set of controllers that work like control loop. One controller process, takes responsibility of typically one resource type and watches over its actual state in an infinite loop. Whenever, it finds the actual state is different than ideal state, it makes changes to move the actual state to the ideal state. Example of controllers that is available in kubernetes are node controller, endpoints controller, replication controller, service accounts & token controller. These controllers communicate with kube-apiserver in order to watch over its resources and also make the required changes (create, update, delete) to match the actual state with the ideal state.
kube-controller-manager is the control plane component that manages these controllers.
Logically, each controller is a separate process, but to reduce complexity, they are all compiled into a single binary and run in a single process.https://kubernetes.io/docs/concepts/overview/components/#kube-controller-manager
The cloud-controller-manager allows us to link our cluster with the cloud infrastructure using the specific cloud provider’s API. It runs various controllers specific to the cloud provider. It is structured using a plugin mechanism that allows different cloud providers to integrate their platforms with kubernetes. Because the cloud-controller-manager is concerned with integration with cluster’s cloud infrastructure, if the kubernetes is run on-premise or in local PC (learning environment) or local test cluster, the cluster does not run this component.
While the kube-controller-manager contains control loops that concern themselves with core k8 resources, cloud-controller-manager deals with reconciling the actual state in the cloud provider’s infrastructure to fulfill ideal state. It integrate with cloud provider’s API.
For example, if we would like to expose a workload to accept requests from outside of the cluster, we can setup a load balancer. Service controller (a type of cloud controller) will interact with cloud provider’s APIs to provision a load balancer in the cloud provider infrastructure and configure it to route traffic to the pods. Other cloud controllers include Node Controller (responsible for creating Node objects when new servers are created in the cloud infrastructure), Route Controller (responsible for configuring routes in the cloud appropriately so that containers on different nodes in the kubernetes cluster can communicate with each other).
A kubernetes cluster must have at least one node (also known as worker node or compute machines); however, a production grade cluster must have more nodes. Pods are hosted, run, deployed, managed and scaled in each of these worker node(s). When we need to scale up the cluster capacity, we add more worker nodes to the cluster. A node contains services necessary to run pods.
A Pod (as in a pod of whales or pea pod) is a group of one or more containers, with shared storage and network resources, and a specification for how to run the containers.https://kubernetes.io/docs/concepts/workloads/pods/
What are the components of kubernetes worker node?
A node (also called worker node) is where Kubernetes schedules and runs the pods. There must be minimum one worker node in a k8s cluster; although, depending on the capacity requirement, typically a cluster contains more and more worker nodes. Planning multiple nodes in a k8s cluster, helps in sharing application workload, as well as it helps in achieving high availability. Followings are the components of a node:
instance of kubelet runs in every worker node in the cluster and is the primary node agent. The kubelet works in terms of a PodSpec (YAML or JSON specs describing a pod). It keeps watching the API server for the Pod resources that the kube-scheduler might assign for its node. Whenever kube-scheduler assigns a pod for a node, the kubelet agent for that node reads the PodSpec and instructs the container runtime to spin up the container complying to that spec.
It is important to note that even though kubelet primarily works with PodSpec from API server, the pods manifests can also be provided through (i) local file system or the it can also make (ii) http requests to a remote end point or (iii) listen for http connections to get the pod manifest.
Another interesting point to note here is that kubelet does not run as a container. The kubelet component, along with the container runtime are installed and run directly in the worker nodes.
kube-proxy is a network proxy that runs as a container in a Pod on each worker node in the kubernetes cluster. The pod of the kube-proxy is in kube-system namespace. Typically, a client application connects to a Pod through the kubernetes services resource. Kubernetes services resource provides a stable VIP (virtual IP) address that in turn routes to the appropriate backend Pods. This way, the client doesn’t need to bother about the dynamic allocation of Pod IPs that can change due to scale up/down etc.
kube-proxy manages the rules for forwarding traffic from a virtual IP address (of the k8s service objects) to the appropriate backend pod. It keeps watching the kube-apiserver for the services and endpoint resources (endpoint controller manager in the control plane manages these resources). Whenever there is a change in these resource, kube-proxy updates the rule in iptable.
Please note that while iptable is the most common network packet routing method, kube-proxy also also supports other methods like User space (outdated) and IPVS or IP virtual server.
When a client sends traffic to the k8s services resource, the traffic is routed (by Linux kernel) to appropriate Pod according the rules in the iptable set by the kube-proxy. It’s worth noting that when load balancing between more than one pods, the selection of backend Pod is random and not through any established load balancing algorithm. To achieve this, we need to use IPVS.
Container runtime is another important component that is installed in both the worker and master nodes.
One of the primary goals of extremely complex k8s platform is to run the containers and container runtime is the responsible software component for running containers. Kubernetes has published CRI (Container Runtime Interface) specifications that is used and implemented by different container runtime providers. This way, Kubernetes project remains container runtime neutral and it does not have to maintain/support every container runtime in the market. Instead, the container runtime providers ensures compliance to the CRI specifications and thereby making itself compatible and eligible container runtime. This also gives us the flexibility to decide the container runtime of our choice. The popular container runtimes are CRI-O, ContainerD etc.
Whenever kube-scheduler assigns a pod for a node, the kubelet agent for that node reads the PodSpec and instructs the container runtime to spin up the container complying to that spec. The container runtime then pulls the images if they are not already present on the node and then starts the containers. The kubelet instructs the container runtime using the container runtime interface or CRI.
Pods are the smallest deployable units of computing that one can create and manage in Kubernetes. A pod generally contains one container. However, it’s possible to have more than one container in a single pod. The container(s) inside the pod share storage and network resources. Pods are ephemeral in nature however, pods can be configured to run stateful application as well.
When I wanted to learn about kubernetes components and its high level architecture, I went through multiple tutorials, blogs to get a basic idea of it and how all these work together. I documented my learning here in this article that will help me to refer back and I also hope that it’ll help the readers.