OpenShift 4 in an Air Gap (disconnected) environment (Part 1 — prerequisites)
The journey
Recently I had completed an Openshift 4 installation on a completely isolated (air gap) environment.
after talking with my peers and the guys that helped we with it I found that the journey I took to get the installation working (and eventually the cluster running) is one wrought telling.
This is the first part of that journey and here we will go over the preparation and the infrastructure setup so it will be easy to run the installation.
thanks
first I would like to thank Anoel Yakoubov and Ron Meshulam for helping we with the process , I wouldn’t been able to complete it without you guys.
Infrastructure
First before we begin the work through I would like to talk about the end design of how to cluster should look like and the following diagram explains it :
The installation process (I explain more in Part 2) will need the infrastructure to be setup up in advanced which consist of:
- A deployment server which will be referred as the “bastion” server which will be consisted of a DHCP , TFTP and an HTTP server for a PXE installation.
- DNS Server -> A DNS Server with all the records pre-configured (I will get into more details later in this tutorial)
- Load Balancer -> 2 Load Balancers in Active/Passive state (I will use HAproxy and Keepalived in my example).
The same load balance will serve our Application so we will direct it to the worker nodes as well on the required ports.
NOTE!!
The Bastion Server will run the containers registry. sense the registry is a major component in the installation and because it is a disconnected (Air Gaped) environment we will handle the registry during the installation part (Part 2)
DNS Server
There are a number of different scenarios:
- IT DHCP will provide random IP addresses for our OCP 4 VMs and additional options like DNS with all relevant records managed in this IT DNS and resolution for Internet access addresses.
- IT DHCP will provide random (but reserved) IP addresses for our OCP 4 VMs and additional options like DNS, but we will manage our own DNS Zone for OCP 4 Cluster, in this case we need to ask IT DNS Administrator to provide A slave zone in the IT DNS Server and our DNS Server is the master for our DNS Zone. All resolutions for Internet access addresses will be done by IT DNS servers.
- IT Administrator will provide use with a dedicated VLAN and we will manage the DNS , and the DHCP for this VLAN.
our DNS server will be the Master for our DNS zone and the Organization DNS will hold the zone as a Slave zone. All other DNS queries will be resolved by Organization DNS Servers
In this tutorial I will focus on Scenario #3 which requires us more configuration management but contains the configuration for scenarios #1 & #2 as well (basically you can take only the relevant part to your environment).
DNS install
NOTE
Our cluster domain will be consistent of <cluster name>.<domain prefix> so for our tutorial we will call our cluster “ocp4” and our domain will be “example.com”
In this tutorial we are making an effort to preserve resource and “Keep It Simple” so for our environment the Bastion Server can hold the DNS as well.
If you have access to another DNS server where you can run administrative actions you can skip this part to the configuration section. If not then on our Bastion Server we will install the DNS server :
$ dnf install bind
Next we would like to make sure the DNS is forward to the IT DNS and it will allow queries on the given address we asigned to it :
$cat >> /etc/named.conf << EOF
zone "exmaple.com" in {
type master;
file "example.com.zone";
};
EOF
More so make sure you set the server ip address and the ip addresses of the IT DNS servers in forwarders:
$vi /etc/named.conf
opetions {
....
listen-on port 53 { 192.168.1.1; };
....
forwarders {
192.168.2.1;
192.168.2.2;
};
This will enable our environment to resolve all of our DNS requests.
Now we need to create the zone (which is a file as we mentioned in the configuration).
The default file location for bind is under /var/named so to avoid mistake make sure you write the zone file as we specified in the named.conf file
$ cat > /var/named/example.com.zone << EOF
$TTL 14400
@ 1D IN SOA ns.exapmle.com. hostmaster.example.com. (
2020022306 ; serial
3H ; refresh
15 ; retry
1w ; expire
3h ; nxdomain ttl
)
IN NS ns.exapmle.com.
$ORIGIN exapmle.com.
ns IN A 192.168.1.1
hostmaster IN A 192.168.1.1
ntp IN A 192.168.2.1
registry IN A 192.168.1.1
bastion IN A 192.168.1.1
haproxy-01 IN A 192.168.1.2
haproxy-02 IN A 192.168.1.3
vip-01 IN A 192.168.1.4
ocp4 IN A 192.168.1.4
bootstrap IN A 192.168.1.5
master-01 IN A 192.168.1.6
master-02 IN A 192.168.1.7
master-03 IN A 192.168.1.8
worker-01 IN A 192.168.1.9
worker-02 IN A 192.168.1.10
worker-03 IN A 192.168.1.11
$ORIGIN ocp4.exmaple.com.
control-plane-0 IN A 192.168.1.6
control-plane-1 IN A 192.168.1.7
control-plane-2 IN A 192.168.1.8
etcd-0 IN A 192.168.1.6
etcd-1 IN A 192.168.1.7
etcd-2 IN A 192.168.1.8
_etcd-server-ssl._tcp IN SRV 0 10 2380 etcd-0
_etcd-server-ssl._tcp IN SRV 0 10 2380 etcd-1
_etcd-server-ssl._tcp IN SRV 0 10 2380 etcd-2
ocp4-bootstrap IN A 192.168.1.5
bootstrap-0 IN A 192.168.1.5
api IN A 192.168.1.4
api-int IN A 192.168.1.4
$ORIGIN apps.ocp4.example.com.
* IN A 192.168.1.4
EOF
DNS zone file
Without going into to much DNS configuration we basically setting our domain prefix with the $ORIGIN pointer and from that point adding the records that are relevant.
All the records are been used at some point during the installation so make sure you double check it before we continue.
Make sure the Serial number at the top matches the date + 01 serial at the end and restart the named service.
$ systemctl restart named
Load Balancer (highly available)
A typical deployment of OpenShift Container Platform (just like in our tutorial) has multiple masters and workers.In this configuration, there is no single point of failure for the cluster, unless there is only a single load balancer (haproxy) server configured to load balance cluster data.
HAproxy load balances port socket connections to a pool of masters and workers. The following discusses the process of adding a second HAproxy server to an existing OpenShift deployment. This configures the environment into a highly available cluster using Keepalived. Keepalived is routing software written in C that establishes a floating virtual IP address using Virtual Router Redundancy Protocol (VRRP) that can belong to any node in a cluster. For more information regarding Keepalived: http://www.keepalived.org
The following image describes the High Availability architecture :
haproxy
for the haproxy we will run the same commands on both servers to make sure the configurations are identical. But first we will install the server :
$ dnf install haproxy
Now that the package is installed let's update the configuration with the following content
$ cat > /etc/haproxy/haproxy.cfg << EOF
# Global settings
#-----------------------------------------------------------------global
maxconn 20000
log /dev/log local0 info
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
user haproxy
group haproxy
daemon# turn on stats unix socket
stats socket /var/lib/haproxy/stats
#------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#-------------------------------------------------------------------
defaults
mode http
log global
option httplog
option dontlognull
option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 300s
timeout server 300s
timeout http-keep-alive 10s
timeout check 10s
maxconn 20000
listen stats
bind :9000
mode http
stats enable
stats uri /
frontend openshift-app-https
bind *:443
default_backend openshift-app-https
mode tcp
option tcplog
backend openshift-app-https
balance roundrobin
mode tcp
server worker-01 192.168.1.9:443 check
server worker-02 192.168.1.10:443 check
server worker-03 192.168.1.11:443 check
frontend openshift-app-http
bind *:80
default_backend openshift-app-http
mode tcp
option tcplogbackend openshift-app-http
balance roundrobin
mode tcp
server worker-01 192.168.1.9:80 check
server worker-02 192.168.1.10:80 check
server worker-03 192.168.1.11:80 check
frontend master-api
bind *:6443
default_backend master-api-be
mode tcp
option tcplog
backend master-api-be
balance roundrobin
mode tcp
server bootstrap 192.168.1.5:6443 check
server master-01 192.168.1.6:6443 check
server master-02 192.168.1.7:6443 check
server master-03 192.168.1.8:6443 check
frontend master-api-2
bind *:22623
default_backend master-api-2-be
mode tcp
option tcplog
backend master-api-2-be
balance roundrobin
mode tcp
server bootstrap 192.168.1.5:22623 check
server master-01 192.168.1.6:22623 check
server master-02 192.168.1.7:22623 check
server master-03 192.168.1.8:22623 check
EOF
Note that bootstap server is only for the installation part , once all the server are booted (I will alaborate on this in part 2) we need to remove the bootstrap servers from the load balancer and shut it down.
Next we will need to open the ports on our firewalld:
first get your firewalld zone:
$ export FIREWALLD_DEFAULT_ZONE=`firewall-cmd --get-default-zone`
$ echo ${FIREWALLD_DEFAULT_ZONE}
public
Open the relevant ports :
$ firewall-cmd --add-port 22623/tcp --permanent --zone=${FIREWALLD_DEFAULT_ZONE}
$ firewall-cmd --add-port 6443/tcp --permanent --zone=${FIREWALLD_DEFAULT_ZONE}
$ firewall-cmd --add-service https --permanent --zone=${FIREWALLD_DEFAULT_ZONE}
$ firewall-cmd --add-service http --permanent --zone=${FIREWALLD_DEFAULT_ZONE}
$ firewall-cmd --add-port 9000/tcp --permanent --zone=${FIREWALLD_DEFAULT_ZONE}
Now lets reload and see the ports:
$ firewall-cmd --reload
$ firewall-cmd --list-ports
In case you have SElinux enabled because port 22623 is not in the haproxy allowed ports list we will fail when we start haproxy. In order to avoid that we need to put SElinux in premissive mode , start the haproxy and then genrate an selinux module for haproxy:
Switch to premissive mode :
$ setenforce 0
start the haproxy service:
$ systemctl start haproxy
$ systemctl enable haproxy
Now generate the SElinux module using audit2allow and apply it :
$ dnf install -y policycoreutils-python
$ cat /var/log/audit/audit.log | audit2allow -M haproxy
$ semodule -i haproxy.pp
Now we can swith SElinux back to “enforcing”
$ setenforce 1
keepalived
for the keepalived part there are a few small diffrences between our two (2) servers. First lets install it on both Servers.
# dnf install -y keepalived
Determine the interface for use with the services:
$ ip link show
1: lo: LOOPBACK,UP,LOWER_UP mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
link/ether 00:50:56:a1:ab:11 brd ff:ff:ff:ff:ff:ff
Generate a random external password for Keepalived’s AUTH_PASS:
$ uuidgen
3e879f74-6b0b-4097-89a3-c2fccdf522ef
Now lets start with Server number 1 (haproxy-01) which will also be our master Server.
the keepalived configuration should look as follow :
$ cat > /etc/keepalived/keepalived.conf << EOF
global_defs {
router_id ovp_vrrp
}
vrrp_script haproxy_check {
script "killall -0 haproxy"
interval 2
weight 2
}
vrrp_instance OCP_LB {
state MASTER
interface eth0
virtual_router_id 51
priority 100
virtual_ipaddress {
192.168.1.4
} track_script {
haproxy_check
}authentication {
auth_type PASS
auth_pass 3e879f74-6b0b-4097-89a3-c2fccdf522ef
}
}
EOF
On haproxy-02 the file should look the same except for two (2) line :
$ cat > /etc/keepalived/keepalived.conf << EOF
global_defs {
router_id ovp_vrrp
}
vrrp_script haproxy_check {
script "killall -0 haproxy"
interval 2
weight 2
}
vrrp_instance OCP_LB {
state BACKUP
interface eth0
virtual_router_id 51
priority 98
virtual_ipaddress {
192.168.1.4
}track_script {
haproxy_check
}authentication {
auth_type PASS
auth_pass 3e879f74-6b0b-4097-89a3-c2fccdf522ef
}
}
EOF
the state of this server should be backup and the priority should be lower then the master to avoid split brain so we gave it 98
If you are running a firewall (by means of firewalld
or iptables
), you must allow VRRP traffic to pass between the keepalived
nodes. To configure the firewall to allow the VRRP traffic with firewalld
, run the following commands:
$ firewall-cmd --add-rich-rule='rule protocol value="vrrp" accept' --permanent
$ firewall-cmd --reload
next start the service on both nodes:
$ systemctl enable keepalived; systemctl start keepalived
PXE Installation
The Objective of a PXE installation is to enable “boot from LAN” for servers, we are going to use this technology to deploy rhcos to our servers (the server should be without OS and the boot sequense should be first “HD” then “LAN”. That will ensure a single boot in a seccesful deploy.
DHCPD
A PXE server must start with a DHCP server so we are going to deploy and configure one on our bastion Server:
$ dnf install dhcp
next lets configure the “dhcpd.conf” file to send address to our network segment (VLAN).
NOTE!!
at this point you will need to know in advance what is the allow IP range you are given to work with or in some cases the IP’s for your servers which will work only in reservation mode.
More so you need to make sure that it is the only DHCP server in it’s VLAN and that the “ip helper” on the switches/routes is directed to your DHCP ip address.
the following file is a good example of the dhcpd.conf file :
$ cat > /etc/dhcp/dhcpd.conf << EOF
#
# VLAN ...(192.168.1.0/24)
#
subnet 192.168.1.0 netmask 255.255.255.0 {
option subnet-mask 255.255.255.0;
option broadcast-address 192.168.1.255;
option routers 192.168.1.254;
option domain-name "example.com";
option ntp-servers ntp.exmaple.com;
option domain-name-servers 192.168.2.1, 192.168.2.2;
option time-offset 1;next-server bastion.example.com;
filename "pxelinux.0";
}group openshift4 {
host master-01 {
hardware ethernet 18:66:da:cc:aa:02;
fixed-address 192.168.1.6;
option host-name "master-01.example.com";
}
....
}
EOF
now we need to start the dhcp and make sure it will run at boot time :
$ systemctl start dhcpd && systemctl enable dhcpd
tftp Server
TFTP server can be installed using following command, where xinetd
is necessary.
In order to install run the following command :
# dnf install tftp tftp-server xinetd -y
next let configure the tftp server to run when starting the xinetd service :
$ cat > /etc/xinetd.d/tftp << EOF
# default: off
# description: The tftp server serves files using the trivial file transfer \
# protocol. The tftp protocol is often used to boot diskless \
# workstations, download configuration files to network-aware printers, \
# and to start the installation process for some operating systems.
service tftp
{
socket_type = dgram
protocol = udp
wait = yes
user = root
server = /usr/sbin/in.tftpd
server_args = -c -s /var/lib/tftpboot
disable = no
per_source = 11
cps = 100 2
flags = IPv4
}
EOF
We made 2 modification to the original file :
- set
disable
tono.
- Add
-c
option intoserver_args
if you need to upload files to TFTP server from client
Next we would want to enable a Linux boot from our PXE Server , In order to achieve that we will install the syslinux-tftpboot package :
$ yum install -y syslinux-tftpboot
$ systemctl start xinetd && systemctl enable xinetd
for the firewall we need to open UDP 69:
$ firewall-cmd --add-port 69/udp --permanent --zone=${FIREWALLD_DEFAULT_ZONE}
For the SElinux ONLY if you wants clients to be able to write to it , you need to set the following flags :
$ setsebool -P tftp_anon_write 1
$ setsebool -P tftp_home_dir 1
httpd
In our Scenario we would want the bare-metal image of the Red Hat CoreOS and the ignition files to be available through HTTP.
We will install Apache HTTPD and use it’s public directory to publish the files :
$ dnf install httpd
Now we will create the directories we are going to use in the installation :
$ mkdir /var/www/html/pub
$ mkdir /var/www/html/pub/{pxe,ign}
Once all the steps are completed we are ready to continue to the installation
( on part 2 )