Updated: Feb 22
As we introduced LoxiLB to our community, a lot of requests and ideas were shared with our team. One of them was the requirement for hyperscale load balancing and another one of them was support for L3DSR.
In this blog, we will explain how LoxiLB can be deployed as a Hyperscale load balancer to support Anycast services with High availability in L3DSR mode.
Before beginning this blog, let's touch upon the basics about Anycast services and DSR.
What is Anycast?
Anycast is a network addressing and routing method in which a single IP address is shared by the servers. Anycast is typically used to provide faster and efficient delivery and to reduce or load balance the workload on the servers by providing high availability. One instance going down will not affect the client requests as other instances are ready to serve.
What is DSR?
DSR is short for Direct Server Return. It is a method whereby traffic hits the load balancer on the way in and does not traverse the load balancer on the way out. There are some advantages for having DSR in your network e.g. serving low latency applications, preserving load balancer's bandwidth and client's IP address visibility for security or authentication of certain applications.
DSR can be implemented in two modes:
L2DSR (Layer-2 DSR) Mode: Where endpoints are in the same network and directly reachable. This mode is often called MAC translation mode as well because only destination MAC is changed to endpoint server MAC and rest remains the same.
L3DSR (Layer-3 DSR) Mode: Where endpoints are not in the same network and may be few hops away. In this mode, a IP-over-IP tunnel is created with endpoints. Packet is encapsulated with IP header and routed to the endpoint.
For both modes, packet doesn't pass through load balancer in return. Instead, it goes directly to the client. For this blog, we will be discussing more about L3DSR mode.
What LoxiLB can do?
With the growing need of services, hyperscale computing and high availability have become paramount requirements. When we are talking about hyperscale computing or high availability then having multiple instances of load balancers running is implicit. Multiple instances may run in active-active mode or active-standby depending on network architecture. Stateful load-balancer high availability can be done by tracking/maintaining or syncing the connections across all load-balancer instances(see this post).
But there can be some scenarios where connection tracking is not an option such as DSR mode where response(s) from an end-point is not likely to traverse back through the load balancer. In that case, the way to provide high availability is to make sure that every time any load balancer instance chooses same endpoint for a particular session. In other words, if all the load balancer instances can make sure "application stickiness" for a session then session synchronization is not required. Load balancers uses different techniques for choosing endpoints. In this case, we will use "Maglev consistent hashing" for endpoint selection which will allow us the maintain the application stickiness in a multi-clustered environment.
Let's have a look at the topology for the demo:
LoxiLB maps the destination IP address (VIP) of the incoming requests to one of the endpoint address depending on a per session consistent hash. As the end-point will directly return the response without involving the load balancer, this means we must preserve client's IP address. We can't do DNAT(changing the destination IP address) also because the end-point needs to use the VIP as the source IP address for the responses.
All seems fine except for the fact that the VIP belongs to the load balancer so any intermediate nodes between load-balancer and end-point would ignore requests with VIP as the destination. So, there is a need for tunneling or encapsulation between the LB node and the end-points. LoxiLB supports L3DSR with IP-over-IP tunnel between itself and the endpoints. When the packet egresses from the load balancer, an IP-over-IP tunnel will be encapsulated, preserving the client's source IP and the VIP. The outer tunnel will have destination IP of the chosen end-point and source IP as load-balancer's tunnel interface. Now the intermediate nodes don't need to deal with the VIP and it ensures the delivery of the requests towards the chosen end-point.
Create IP-over-IP tunnel in llb1 for ep1:
#Underlying interface $ip addr add 22.214.171.124/24 dev ellb1ep1 # Create tunnel interface $ip link add name ipip11 type ipip local 126.96.36.199 remote 188.8.131.52 $ip link set dev ipip11 up $ip addr add 184.108.40.206/24 dev ipip11 # Add route for the endpoint $ip route add 220.127.116.11/24 via 18.104.22.168
NOTE: Route for the endpoint must be added with the next hop as mentioned above.
"ip route add <ep> dev <tunnel> will not work properly."
An IP-over-IP tunnel interface will be created at all endpoints also. Also, all the endpoints must be configured with the VIP on the loopback interface otherwise they will reject the client's request after decapsulating the incoming packet.
Create a IP-over-IP tunnel in Endpoint ep1 and llb1:
$ip addr add 22.214.171.124/24 dev eep1llb1 $ip link add name ipip11 type ipip local 126.96.36.199 remote 188.8.131.52 $ip link set dev ipip11 up $ip addr add 184.108.40.206/24 dev ipip11 $hexec ep1 ip addr add 220.127.116.11/32 dev lo $hexec ep1 ip addr add 18.104.22.168/32 dev lo $sysctl -w net.ipv4.conf.all.arp_ignore=3 $sysctl -w net.ipv4.conf.all.arp_announce=2 $sysctl -w net.ipv4.conf.all.rp_filter=2 $sysctl -w net.ipv4.conf.ipip11.rp_filter=0
After creating the tunnels in all instances of loxilb and endpoints, configure the LB rule in all loxilb instances as:
loxicmd create lb 22.214.171.124 --select=hash --tcp=8080:8080 --endpoints=126.96.36.199:1,188.8.131.52:1,184.108.40.206:1 --mode=dsr --bgp
All the configurations used for this setup can be found here.
Let's have a look at some packet dumps:
17:02:14.994082 IP 220.127.116.11.57538 > 18.104.22.168.8080: Flags [P.], seq 716235073:716235075, ack 2910196143, win 498, options [nop,nop,TS val 662647407 ecr 4290270063], length 2: HTTP 17:02:14.994293 IP 22.214.171.124.8080 > 126.96.36.199.57538: Flags [.], ack 2, win 490, options [nop,nop,TS val 4292968451 ecr 662647407], length 0 17:02:15.265042 IP 188.8.131.52.57538 > 184.108.40.206.8080: Flags [P.], seq 2:4, ack 1, win 498, options [nop,nop,TS val 662647678 ecr 4292968451], length 2: HTTP 17:02:15.265208 IP 220.127.116.11.8080 > 18.104.22.168.57538: Flags [.], ack 4, win 490, options [nop,nop,TS val 4292968722 ecr 662647678], length 0
08:02:15.265117 IP 22.214.171.124 > 126.96.36.199: IP 188.8.131.52.57538 > 184.108.40.206.8080: Flags [P.], seq 2:4, ack 1, win 498, options [nop,nop,TS val 662647678 ecr 4292968451], length 2: HTTP (ipip-proto-4) 08:02:15.521859 IP 220.127.116.11 > 18.104.22.168: IP 22.214.171.124.57538 > 126.96.36.199.8080: Flags [P.], seq 4:6, ack 1, win 498, options [nop,nop,TS val 662647935 ecr 4292968722], length 2: HTTP (ipip-proto-4)
17:16:28.532316 IP 188.8.131.52 > 184.108.40.206: IP 220.127.116.11.57538 > 18.104.22.168.8080: Flags [P.], seq 2:4, ack 1, win 498, options [nop,nop,TS val 663500945 ecr 4293821805], length 2: HTTP (ipip-proto-4) 17:16:28.532316 IP 22.214.171.124 > 126.96.36.199: IP 188.8.131.52.57538 > 184.108.40.206.8080: Flags [P.], seq 2:4, ack 1, win 498, options [nop,nop,TS val 663500945 ecr 4293821805], length 2: HTTP (ipip-proto-4) 17:16:28.532360 IP 220.127.116.11.8080 > 18.104.22.168.57538: Flags [.], ack 4, win 490, options [nop,nop,TS val 4293821989 ecr 663500945], length 0 17:16:28.532365 IP 22.214.171.124.8080 > 126.96.36.199.57538: Flags [.], ack 4, win 490, options [nop,nop,TS val 4293821989 ecr 663500945], length 0
Demo Video to showcase high availability:
We hope you liked our blog. For more information, please visit our github page.