When we talk about exposing k8s application to the external world, there are two main functions to do the same: the service load balancer or ingress.
We are going to talk about the service load balancer function here. Kubernetes does not provide a load balancer component directly, so it is up to the users to integrate the same in their respective K8s deployments. There are multiple ways to achieve this.
First way is to host LB outside the k8s cluster which means LB lifecycle is not managed by Kubernetes directly. Kubernetes cluster integration with the cloud provider's infrastructure is accomplished through CCM or Load Balancer Spec APIs. In this approach, there is a demarcation of external networking and internal cluster networking. This allows better control over ingress traffic, providing an additional layer of security. This is also the de-facto way for public-cloud providers to deploy load balancers. As a matter of fact, this is the very reason why service type External load-balancer is called so in Kubernetes.
Second way is to offer load balancer as a service inside the Kubernetes cluster itself. This approach eases LB life-cycle management but makes it a little more challenging to manage external and internal networking together. On-prem users who wish to have all the services and feature packaged inside the k8s cluster, or want to deploy a relatively small cluster prefer this way. Cost is another factor as managing external LB boxes/software might incur additional costs.
Readers of our previous blog series would be well-aware of how LoxiLB is deployed outside the cluster to manage LB services but we happen to come across many user requests who would love to run LoxiLB inside the cluster, be it for ease of management, limited resources or deployment architecture etc.
As the famous quote by Tony Robbins says - "The challenge of resourcefulness lies in turning limitations into opportunities.", here we are with the blog about running the LoxiLB load balancer in in-cluster mode.
With in-cluster mode support, LoxiLB joins the rank of select few who can support any mode seamlessly. For starters, LoxiLB is a completely new take on proxy-less load-balancing (using eBPF) which replaces traditional frameworks like ipvs/iptables. And unlike other load-balancers which usually simply rebadge these frameworks, loxilb is built from the ground-up and performs much better than traditional/legacy load-balancers.
This blog will explain how one can setup a 4-node K3s cluster with flannel CNI, run LoxiLB (loxilb-lb) as a DaemonSet and kube-loxilb as a Deployment. For this blog, we also deploy LoxiLB in a special peering mode (loxilb-peer) as a DaemonSet, which runs in worker nodes and connects with LoxiLB (loxilb-lb) instances to exchange connectivity info. Usually, popular CNIs such as Calico provide their own BGP implementations (but not by default) while some CNIs don't at all. Hence, loxilb-peer is an optional component when such options are not available or, users want an optimized BGP implementation. It can even run side-by-side with other BGP implementations if need be.
Design considerations for in-cluster LB
In order to provide in-cluster service load balancing, we deploy three LoxiLB components to achieve our goal. First is loxilb-lb in master node(s), which takes care of the actual service load balancing. The reason for running this in master node(s) is to be inline with the master-worker nodes' roles. Master nodes are usually meant to run control plane applications and worker nodes for running application workloads. Also, master nodes are recommended to be in multiples to ensure high availability. loxilb-lb runs in all master nodes and hence ensures high availability for service load balancer function. Having said that, one should have absolutely no problem to run it in any node. It should be easily achieved by tinkering with labels and pod affinity.
Second component to be run is loxilb-peer, which run in worker nodes. This component, is a non-intrusive one and together with LoxiLB, creates a BGP mesh to ensure the service IP and end-point reachability to/from LoxiLB instances.
Last but not the least is kube-loxilb, which provides kubernetes service load-balancer spec interface and implementation. It will now additionally be responsible for auto-configuring/managing the BGP mesh as well as arbitrate role(s) selection for different loxilb-lb pods.
Finally, readers might wonder how loxilb-lb pods gets their hand on ingress packets in presence of iptables/CNI rules etc. loxilb-lb uses eBPF to intercept packets much earlier than Linux kernel processing thereby is able to act on ingress packets as per configured LB rules. Also, loxilb-lb is able to work on system interfaces directly (please check yaml file definitions) and hence does not need multiple interfaces assignment using multus etc.
Bring up the Kubernetes Cluster
We will use Vagrant tool to quickly spin up a complete test topology in less than 5 mins. The following Vagrantfile is used to set up the K3s cluster:
# -*- mode: ruby -*-
# vi: set ft=ruby :
workers = (ENV['WORKERS'] || "2").to_i
#box_name = (ENV['VAGRANT_BOX'] || "ubuntu/focal64")
box_name = (ENV['VAGRANT_BOX'] || "sysnet4admin/Ubuntu-k8s")
box_version = "0.7.1"
Vagrant.configure("2") do |config|
config.vm.box = "#{box_name}"
config.vm.box_version = "#{box_version}"
if Vagrant.has_plugin?("vagrant-vbguest")
config.vbguest.auto_update = false
end
config.vm.define "master1" do |master|
master.vm.hostname = 'master1'
master.vm.network :private_network, ip: "192.168.80.10", :netmask => "255.255.255.0"
master.vm.network :private_network, ip: "192.168.90.10", :netmask => "255.255.255.0"
master.vm.provision :shell, :path => "master1.sh"
master.vm.provider :virtualbox do |vbox|
vbox.customize ["modifyvm", :id, "--memory", 8192]
vbox.customize ["modifyvm", :id, "--cpus", 4]
end
end
config.vm.define "master2" do |master|
master.vm.hostname = 'master2'
master.vm.network :private_network, ip: "192.168.80.11", :netmask => "255.255.255.0"
master.vm.network :private_network, ip: "192.168.90.11", :netmask => "255.255.255.0"
master.vm.provision :shell, :path => "master2.sh"
master.vm.provider :virtualbox do |vbox|
vbox.customize ["modifyvm", :id, "--memory", 8192]
vbox.customize ["modifyvm", :id, "--cpus", 4]
end
end
(1..workers).each do |node_number|
config.vm.define "worker#{node_number}" do |worker|
worker.vm.hostname = "worker#{node_number}"
ip = node_number + 100
worker.vm.network :private_network, ip: "192.168.80.#{ip}", :netmask => "255.255.255.0"
worker.vm.provision :shell, :path => "worker.sh"
worker.vm.provider :virtualbox do |vbox|
vbox.customize ["modifyvm", :id, "--memory", 4096]
vbox.customize ["modifyvm", :id, "--cpus", 2]
end
end
end
end
The scripts master1.sh, master2.sh and worker.sh can be found here. Setup of the cluster can be done by a simple vagrant command:
$ vagrant up
...
...
...
worker2: [INFO] systemd: Starting k3s-agent
worker2: Cluster is ready
Deploy kube-loxilb
In this blog, we will connect with external client with BGP. We need to specify client's IP address and AS number in kube-loxilb.yaml. Add below config in this yaml file as:
args:
- --cidrPools=defaultPool=123.123.123.1/24
- --setBGP=64512
- --setRoles
Now apply the modified yaml file.
vagrant@master1:~$ sudo kubectl apply -f /vagrant/kube-loxilb.yml
serviceaccount/kube-loxilb created
clusterrole.rbac.authorization.k8s.io/kube-loxilb created
clusterrolebinding.rbac.authorization.k8s.io/kube-loxilb created
deployment.apps/kube-loxilb created
vagrant@master1:~$ sudo kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system local-path-provisioner-957fdf8bc-vmndm 1/1 Running 0 10m
kube-system coredns-77ccd57875-2md2m 1/1 Running 0 10m
kube-system metrics-server-648b5df564-44wnc 1/1 Running 0 10m
kube-system loxilb-lb-7v8qm 1/1 Running 0 4m2s
kube-system kube-loxilb-5c5f686ccf-knw2p 1/1 Running 0 28s
Get LoxiLB UP and Running
Once the cluster is all set, it is time to run LoxiLB as a DaemonSet. Below is the yaml file, we used for this blog:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: loxilb-lb
namespace: kube-system
spec:
selector:
matchLabels:
app: loxilb-app
template:
metadata:
name: loxilb-lb
labels:
app: loxilb-app
spec:
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
tolerations:
- key: "node-role.kubernetes.io/master"
operator: Exists
- key: "node-role.kubernetes.io/control-plane"
operator: Exists
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "node-role.kubernetes.io/master"
operator: Exists
- key: "node-role.kubernetes.io/control-plane"
operator: Exists
containers:
- name: loxilb-app
image: "ghcr.io/loxilb-io/loxilb:latest"
imagePullPolicy: Always
command: [ "/root/loxilb-io/loxilb/loxilb", "--bgp", "--egr-hooks", "--blacklist=cni[0-9a-z]|veth.|flannel." ]
ports:
- containerPort: 11111
- containerPort: 179
securityContext:
privileged: true
capabilities:
add:
- SYS_ADMIN
---
apiVersion: v1
kind: Service
metadata:
name: loxilb-lb-service
namespace: kube-system
spec:
clusterIP: None
selector:
app: loxilb-app
ports:
- name: loxilb-app
port: 11111
targetPort: 11111
protocol: TCP
- name: loxilb-app-bgp
port: 179
targetPort: 179
protocol: TCP
If we just look at this line in the yaml file:
command: [ "/root/loxilb-io/loxilb/loxilb", "--bgp", "--egr-hooks", "--blacklist=cni[0-9a-z]|veth.|flannel." ]
Argument "--bgp" indicates that loxilb will be running with bgp instance and will be advertising the service IP to the external peer or loxilb-peer.
Argument "--egr-hooks" is required for those cases in which workloads can be scheduled in the master nodes. No need to mention this argument when you are managing the workload scheduling to worker nodes.
Argument "--blacklist=cni[0-9a-z]|veth.|flannel." is mandatory for running in in-cluster mode. As loxilb attaches it's ebpf programs on all the interfaces but since we running it in the default namespace then all the interfaces including CNI interfaces will be exposed and loxilb will attach it's ebpf program in those interfaces which is definitely not desired. So, user needs to mention a regex for excluding all those interfaces. The regex in the given example will exclude the flannel interfaces. "--blacklist=cali.|tunl.|vxlan[.]calico|veth.|cni[0-9a-z]" regex must be used with calico CNI.
Apply the loxilb.yml file to create "loxilb-lb-service" in the master node:
$ vagrant ssh master1
Welcome to Ubuntu 22.04.1 LTS (GNU/Linux 5.15.0-52-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
* Strictly confined Kubernetes makes edge and IoT secure. Learn how MicroK8s
just raised the bar for easy, resilient and secure K8s cluster deployment.
https://ubuntu.com/engage/secure-kubernetes-at-the-edge
Last login: Sat Mar 20 18:04:46 2021 from 10.0.2.2
vagrant@master1:~$ sudo kubectl apply -f /vagrant/loxilb.yaml
daemonset.apps/loxilb-lb created
service/loxilb-lb-service created
vagrant@master1:~$ sudo kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-77ccd57875-dwrsm 1/1 Running 0 129m
kube-system kube-loxilb-5c5f686ccf-knw2p 1/1 Running 0 39m
kube-system local-path-provisioner-957fdf8bc-72kcx 1/1 Running 0 129m
kube-system loxilb-lb-9s5qw 1/1 Running 0 19m
kube-system loxilb-lb-sk9cd 1/1 Running 0 19m
kube-system metrics-server-648b5df564-mfg2j 1/1 Running 0 129m
Deploy loxilb-peer
Below is the yaml file, we used for this blog:
LoxiLB peer mode can be deployed with the yaml file below:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: loxilb-peer
namespace: kube-system
spec:
selector:
matchLabels:
app: loxilb-peer-app
template:
metadata:
name: loxilb-peer
labels:
app: loxilb-peer-app
spec:
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "node-role.kubernetes.io/master"
operator: DoesNotExist
- key: "node-role.kubernetes.io/control-plane"
operator: DoesNotExist
containers:
- name: loxilb-peer-app
image: "ghcr.io/loxilb-io/loxilb:latest"
imagePullPolicy: Always
command: [ "/root/loxilb-io/loxilb/loxilb", "--peer" ]
ports:
- containerPort: 11111
- containerPort: 179
securityContext:
privileged: true
capabilities:
add:
- SYS_ADMIN
---
apiVersion: v1
kind: Service
metadata:
name: loxilb-peer-service
namespace: kube-system
spec:
clusterIP: None
selector:
app: loxilb-peer-app
ports:
- name: loxilb-peer-app
port: 11111
targetPort: 11111
protocol: TCP
- name: loxilb-peer-bgp
port: 179
targetPort: 179
protocol: TCP
If you wish to use CNI's BGP speakers then it is totally fine, no need to deploy loxilb-peer.yml. And, Just remove "--bgp" in loxilb.yml as below and then apply it.
- name: loxilb-app
image: "ghcr.io/loxilb-io/loxilb:latest"
imagePullPolicy: Always
command: [ "/root/loxilb-io/loxilb/loxilb" ]
Apply the loxilb-peer.yml file to create "loxilb-peer-service" :
vagrant@master1:~$ sudo kubectl apply -f /vagrant/loxilb-peer.yml
daemonset.apps/loxilb-peer created
service/loxilb-peer-service created
vagrant@master1:~$ sudo kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-77ccd57875-dwrsm 1/1 Running 0 154m
kube-system kube-loxilb-5c5f686ccf-knw2p 1/1 Running 0 64m
kube-system local-path-provisioner-957fdf8bc-72kcx 1/1 Running 0 154m
kube-system loxilb-lb-9s5qw 1/1 Running 0 44m
kube-system loxilb-lb-sk9cd 1/1 Running 0 44m
kube-system loxilb-peer-8bh9b 1/1 Running 0 105s
kube-system loxilb-peer-f5fmt 1/1 Running 0 105s
kube-system metrics-server-648b5df564-mfg2j 1/1 Running 0 154m
Let's verify the BGP (auto) configuration in LoxiLB instances:
vagrant@master1:~$ sudo kubectl exec -it loxilb-lb-9s5qw -n kube-system -- bash
root@master1:/# gobgp neigh
Peer AS Up/Down State |#Received Accepted
192.168.80.1 65101 00:34:38 Establ | 1 0
192.168.80.11 64512 00:34:46 Establ | 0 0
192.168.80.101 64512 00:03:58 Establ | 0 0
192.168.80.102 64512 00:04:03 Establ | 0 0
root@master1:/# gobgp global policy
Import policy:
Default: ACCEPT
Export policy:
Default: ACCEPT
Name set-next-hop-self-gpolicy:
StatementName set-next-hop-self-gstmt:
Conditions:
Actions:
Nexthop: self
vagrant@master1:~$ sudo kubectl exec -it loxilb-lb-sk9cd -n kube-system -- bash
root@master2:/# gobgp global
AS: 64512
Router-ID: 192.168.80.11
Listening Port: 179, Addresses: 0.0.0.0
root@master2:/# gobgp neigh
Peer AS Up/Down State |#Received Accepted
192.168.80.1 65101 00:36:18 Establ | 1 0
192.168.80.10 64512 00:36:51 Establ | 0 0
192.168.80.101 64512 00:06:04 Establ | 0 0
192.168.80.102 64512 00:06:06 Establ | 0 0
root@master2:/# gobgp global policy
Import policy:
Default: ACCEPT
Export policy:
Default: ACCEPT
Name set-next-hop-self-gpolicy:
StatementName set-next-hop-self-gstmt:
Conditions:
Actions:
Nexthop: self
BGP Configuration in LoxiLB peer pods:
vagrant@master1:~$ sudo kubectl exec -it loxilb-peer-8bh9b -n kube-system -- bash
root@worker1:/# gobgp neigh
Peer AS Up/Down State |#Received Accepted
192.168.80.10 64512 00:10:35 Establ | 0 0
192.168.80.11 64512 00:10:36 Establ | 0 0
192.168.80.102 64512 00:10:38 Establ | 0 0
vagrant@master1:~$ sudo kubectl exec -it loxilb-peer-f5fmt -n kube-system -- bash
root@worker2:/# gobgp neigh
Peer AS Up/Down State |#Received Accepted
192.168.80.10 64512 00:11:14 Establ | 0 0
192.168.80.11 64512 00:11:12 Establ | 0 0
192.168.80.101 64512 00:11:12 Establ | 0 0
Deploy Services
Create TCP, UDP and SCTP services in Kubernetes :
vagrant@master1:~$ sudo kubectl apply -f /vagrant/nginx.yml
service/nginx-lb1 created
pod/nginx-test created
vagrant@master1:~$ sudo kubectl apply -f /vagrant/udp.yml
service/udp-lb1 created
pod/udp-test created
vagrant@master1:~$ sudo kubectl apply -f /vagrant/sctp.yml
service/sctp-lb1 created
pod/sctp-test created
vagrant@master1:~$ sudo kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default nginx-test 1/1 Running 0 19m
default sctp-test 1/1 Running 0 32s
default udp-test 1/1 Running 0 113s
kube-system coredns-77ccd57875-dwrsm 1/1 Running 0 3h2m
kube-system kube-loxilb-5c5f686ccf-knw2p 1/1 Running 0 60m
kube-system local-path-provisioner-957fdf8bc-72kcx 1/1 Running 0 3h2m
kube-system loxilb-lb-9s5qw 1/1 Running 0 72m
kube-system loxilb-lb-sk9cd 1/1 Running 0 72m
kube-system loxilb-peer-8bh9b 1/1 Running 0 29m
kube-system loxilb-peer-f5fmt 1/1 Running 0 29m
kube-system metrics-server-648b5df564-mfg2j 1/1 Running 0 3h2m
Let's verify the services in the Kubernetes cluster:
vagrant@master1:~$ sudo kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 14m
nginx-lb1 LoadBalancer 10.43.91.80 123.123.123.1 55002:32694/TCP 3m11s
sctp-lb1 LoadBalancer 10.43.149.41 123.123.123.1 55004:31402/SCTP 3m57s
udp-lb1 LoadBalancer 10.43.149.142 123.123.123.1 55003:30165/UDP 3m18s
In LoxiLB:
vagrant@master1:~$ sudo kubectl exec -it loxilb-lb-9s5qw -n kube-system -- bash
root@master1:/# loxicmd get lb
| EXTERNAL IP | PORT | PROTOCOL | BLOCK | SELECT | MODE |# OF ENDPOINTS| MONITOR |
|---------------|-------|----------|-------|--------|--------|--------------|---------|
| 123.123.123.1 | 55002 | tcp | 0 | rr |fullnat | 1 | Off |
| 123.123.123.1 | 55003 | udp | 0 | rr |fullnat | 1 | On |
| 123.123.123.1 | 55004 | sctp | 0 | rr |fullnat | 1 | On |
LoxiLB instances announces Service IPs to its configured peers. Since, Client is running BGP server and worker nodes are running LoxiLB BGP peer service, they will install all the advertised routes. We can verify the same in the client and worker nodes.
In the Client:
$ ip route
default via 192.168.20.1 dev eno1 proto static metric 100
123.123.123.1 via 192.168.80.10 dev vboxnet2 proto bird metric 32
169.254.0.0/16 dev eno1 scope link metric 1000
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
192.168.20.0/24 dev eno1 proto kernel scope link src 192.168.20.55 metric 100
192.168.80.0/24 dev vboxnet2 proto kernel scope link src 192.168.80.1
192.168.90.0/24 dev vboxnet0 proto kernel scope link src 192.168.90.1
In the Worker node:
vagrant@worker2:~$ ip route
default via 10.0.2.2 dev eth0
default via 10.0.2.2 dev eth0 proto dhcp src 10.0.2.15 metric 100
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 metric 100
10.0.2.2 dev eth0 proto dhcp scope link src 10.0.2.15 metric 100
10.0.2.3 dev eth0 proto dhcp scope link src 10.0.2.15 metric 100
10.42.0.0/24 via 10.42.0.0 dev flannel.1 onlink
10.42.1.0/24 via 10.42.1.0 dev flannel.1 onlink
10.42.2.0/24 via 10.42.2.0 dev flannel.1 onlink
123.123.123.1 via 192.168.80.10 dev eth1 proto bgp
192.168.80.0/24 dev eth1 proto kernel scope link src 192.168.80.102
Time to validate the results
Let's verify if client can access the TCP service with the External Service IP:
Conclusion
Hopefully, this blog provides readers a good idea about how to deploy LoxiLB inside the Kubernetes cluster and interesting tidbits of in-cluster LB based services in K8s. If you like our work then please don't forget to support us by going to our Github page and give us a star. You can also reach us through our slack channel to share your valuable feedback and ideas.
Note: Want to try it out yourself? All the scripts and configurations used for this blog are available here. Download all the scripts to a folder and follow the steps as below:
$ ./config.sh
$ ./validation.sh
# Cleanup
$ ./rmconfig.sh
Comentários