OpenShift 4 in an Air Gap (disconnected) environment (Part 1 — prerequisites)

The journey

Recently I had completed an Openshift 4 installation on a completely isolated (air gap) environment.
after talking with my peers and the guys that helped we with it I found that the journey I took to get the installation working (and eventually the cluster running) is one wrought telling.
This is the first part of that journey and here we will go over the preparation and the infrastructure setup so it will be easy to run the installation.

thanks

first I would like to thank Anoel Yakoubov and Ron Meshulam for helping we with the process , I wouldn’t been able to complete it without you guys.

Infrastructure

First before we begin the work through I would like to talk about the end design of how to cluster should look like and the following diagram explains it :

The installation process (I explain more in Part 2) will need the infrastructure to be setup up in advanced which consist of:

  1. A deployment server which will be referred as the “bastion” server which will be consisted of a DHCP , TFTP and an HTTP server for a PXE installation.
  2. DNS Server -> A DNS Server with all the records pre-configured (I will get into more details later in this tutorial)
  3. Load Balancer -> 2 Load Balancers in Active/Passive state (I will use HAproxy and Keepalived in my example).
    The same load balance will serve our Application so we will direct it to the worker nodes as well on the required ports.

NOTE!!

The Bastion Server will run the containers registry. sense the registry is a major component in the installation and because it is a disconnected (Air Gaped) environment we will handle the registry during the installation part (Part 2)

DNS Server

There are a number of different scenarios:

  1. IT DHCP will provide random IP addresses for our OCP 4 VMs and additional options like DNS with all relevant records managed in this IT DNS and resolution for Internet access addresses.
  2. IT DHCP will provide random (but reserved) IP addresses for our OCP 4 VMs and additional options like DNS, but we will manage our own DNS Zone for OCP 4 Cluster, in this case we need to ask IT DNS Administrator to provide A slave zone in the IT DNS Server and our DNS Server is the master for our DNS Zone. All resolutions for Internet access addresses will be done by IT DNS servers.
  3. IT Administrator will provide use with a dedicated VLAN and we will manage the DNS , and the DHCP for this VLAN.
    our DNS server will be the Master for our DNS zone and the Organization DNS will hold the zone as a Slave zone. All other DNS queries will be resolved by Organization DNS Servers

In this tutorial I will focus on Scenario #3 which requires us more configuration management but contains the configuration for scenarios #1 & #2 as well (basically you can take only the relevant part to your environment).

DNS install

NOTE

Our cluster domain will be consistent of <cluster name>.<domain prefix> so for our tutorial we will call our cluster “ocp4” and our domain will be “example.com”

In this tutorial we are making an effort to preserve resource and “Keep It Simple” so for our environment the Bastion Server can hold the DNS as well.
If you have access to another DNS server where you can run administrative actions you can skip this part to the configuration section. If not then on our Bastion Server we will install the DNS server :

$ yum install bind

Next we would like to make sure the DNS is forward to the IT DNS and it will allow queries on the given address we asigned to it :

$cat >> /etc/named.conf << EOF
zone "exmaple.com" in {
type master;
file "example.com.zone";
};
EOF

More so make sure you set the server ip address and the ip addresses of the IT DNS servers in forwarders:

$vi /etc/named.conf
opetions {
....
listen-on port 53 { 192.168.1.1; };
....
forwarders {
192.168.2.1;
192.168.2.2;
};

This will enable our environment to resolve all of our DNS requests.

Now we need to create the zone (which is a file as we mentioned in the configuration).
The default file location for bind is under /var/named so to avoid mistake make sure you write the zone file as we specified in the named.conf file

$ cat > /var/named/example.com.zone << EOF
$TTL 14400
@ 1D IN SOA ns.exapmle.com. hostmaster.example.com. (
2020022306 ; serial
3H ; refresh
15 ; retry
1w ; expire
3h ; nxdomain ttl
)
IN NS ns.exapmle.com.
$ORIGIN exapmle.com.
ns IN A 192.168.1.1
hostmaster IN A 192.168.1.1
ntp IN A 192.168.2.1
registry IN A 192.168.1.1
bastion IN A 192.168.1.1
haproxy-01 IN A 192.168.1.2
haproxy-02 IN A 192.168.1.3
vip-01 IN A 192.168.1.4
ocp4 IN A 192.168.1.4
bootstrap IN A 192.168.1.5
master-01 IN A 192.168.1.6
master-02 IN A 192.168.1.7
master-03 IN A 192.168.1.8
worker-01 IN A 192.168.1.9
worker-02 IN A 192.168.1.10
worker-03 IN A 192.168.1.11
$ORIGIN ocp4.exmaple.com.
control-plane-0 IN A 192.168.1.6
control-plane-1 IN A 192.168.1.7
control-plane-2 IN A 192.168.1.8
etcd-0 IN A 192.168.1.6
etcd-1 IN A 192.168.1.7
etcd-2 IN A 192.168.1.8
_etcd-server-ssl._tcp IN SRV 0 10 2380 etcd-0
_etcd-server-ssl._tcp IN SRV 0 10 2380 etcd-1
_etcd-server-ssl._tcp IN SRV 0 10 2380 etcd-2
ocp4-bootstrap IN A 192.168.1.5
bootstrap-0 IN A 192.168.1.5
api IN A 192.168.1.4
api-int IN A 192.168.1.4
$ORIGIN apps.ocp4.example.com.
* IN A 192.168.1.4
EOF

DNS zone file

Without going into to much DNS configuration we basically setting our domain prefix with the $ORIGIN pointer and from that point adding the records that are relevant.
All the records are been used at some point during the installation so make sure you double check it before we continue.

Make sure the Serial number at the top matches the date + 01 serial at the end and restart the named service.

$ systemctl restart named

Load Balancer (highly available)

A typical deployment of OpenShift Container Platform (just like in our tutorial) has multiple masters and workers.In this configuration, there is no single point of failure for the cluster, unless there is only a single load balancer (haproxy) server configured to load balance cluster data.

HAproxy load balances port socket connections to a pool of masters and workers. The following discusses the process of adding a second HAproxy server to an existing OpenShift deployment. This configures the environment into a highly available cluster using Keepalived. Keepalived is routing software written in C that establishes a floating virtual IP address using Virtual Router Redundancy Protocol (VRRP) that can belong to any node in a cluster. For more information regarding Keepalived: http://www.keepalived.org

The following image describes the High Availability architecture :

haproxy

for the haproxy we will run the same commands on both servers to make sure the configurations are identical. But first we will install the server :

$ yum install haproxy
$ cat > /etc/haproxy/haproxy.cfg << EOF
# Global settings
#---------------------------------------------------------------------
global
maxconn 20000
log /dev/log local0 info
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
user haproxy
group haproxy
daemon
# turn on stats unix socket
stats socket /var/lib/haproxy/stats
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
mode http
log global
option httplog
option dontlognull
option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 300s
timeout server 300s
timeout http-keep-alive 10s
timeout check 10s
maxconn 20000

listen stats
bind :9000
mode http
stats enable
stats uri /

frontend openshift-app-https
bind *:443
default_backend openshift-app-https
mode tcp
option tcplog

backend openshift-app-https
balance source
mode tcp
server worker-01 192.168.1.9:443 check
server worker-02 192.168.1.10:443 check
server worker-03 192.168.1.11:443 check

frontend openshift-app-http
bind *:80
default_backend openshift-app-http
mode tcp
option tcplog

backend openshift-app-http
balance source
mode tcp
server worker-01 192.168.1.9:80 check
server worker-02 192.168.1.10:80 check
server worker-03 192.168.1.11:80 check

frontend master-api
bind *:6443
default_backend master-api-be
mode tcp
option tcplog

backend master-api-be
balance roundrobin
mode tcp
server bootstrap 192.168.1.5:6443 check
server master-01 192.168.1.6:6443 check
server master-02 192.168.1.7:6443 check
server master-03 192.168.1.8:6443 check

frontend master-api-2
bind *:22623
default_backend master-api-2-be
mode tcp
option tcplog

backend master-api-2-be
balance roundrobin
mode tcp
server bootstrap 192.168.1.5:22623 check
server master-01 192.168.1.6:22623 check
server master-02 192.168.1.7:22623 check
server master-03 192.168.1.8:22623 check
EOF

Note that bootstap server is only for the installation part , once all the server are booted (I will alaborate on this in part 2) we need to remove the bootstrap servers from the load balancer and shut it down.

Next we will need to open the ports on our firewalld:

first get your firewalld zone:

$ export FIREWALLD_DEFAULT_ZONE=`firewall-cmd --get-default-zone`
$ echo ${FIREWALLD_DEFAULT_ZONE}
public

Open the relevant ports :

$ firewall-cmd --add-port 22623/tcp --permanent --zone=${FIREWALLD_DEFAULT_ZONE}
$ firewall-cmd --add-port 6443/tcp --permanent --zone=${FIREWALLD_DEFAULT_ZONE}
$ firewall-cmd --add-service https --permanent --zone=${FIREWALLD_DEFAULT_ZONE}
$ firewall-cmd --add-service http --permanent --zone=${FIREWALLD_DEFAULT_ZONE}
$ firewall-cmd --add-port 9000/tcp --permanent --zone=${FIREWALLD_DEFAULT_ZONE}

Now lets reload and see the ports:

$ firewall-cmd --reload
$ firewall-cmd --list-ports

In case you have SElinux enabled because port 22623 is not in the haproxy allowed ports list we will fail when we start haproxy. In order to avoid that we need to put SElinux in premissive mode , start the haproxy and then genrate an selinux module for haproxy:

Switch to premissive mode :

$ setenforce 0

start the haproxy service:

$ systemctl start haproxy
$ systemctl enable haproxy

Now generate the SElinux module using audit2allow and apply it :

$ yum install -y policycoreutils-python
$ cat /var/log/audit/audit.log | audit2allow -M haproxy
$ semodule -i haproxy.pp

Now we can swith SElinux back to “enforcing”

$ setenforce 1

keepalived

for the keepalived part there are a few small diffrences between our two (2) servers. First lets install it on both Servers.

# yum install -y keepalived

Determine the interface for use with the services:

$ ip link show
1: lo: LOOPBACK,UP,LOWER_UP mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
link/ether 00:50:56:a1:ab:11 brd ff:ff:ff:ff:ff:ff

Generate a random external password for Keepalived’s AUTH_PASS:

$ uuidgen
3e879f74-6b0b-4097-89a3-c2fccdf522ef

Now lets start with Server number 1 (haproxy-01) which will also be our master Server.

the keepalived configuration should look as follow :

$ cat > /etc/keepalived/keepalived.conf << EOF
global_defs {
router_id ovp_vrrp
}

vrrp_script haproxy_check {
script "killall -0 haproxy"
interval 2
weight 2
}

vrrp_instance OCP_LB {
state MASTER
interface eth0
virtual_router_id 51
priority 100
virtual_ipaddress {
192.168.1.4
}
track_script {
haproxy_check
}
authentication {
auth_type PASS
auth_pass 3e879f74-6b0b-4097-89a3-c2fccdf522ef
}
}
EOF

On haproxy-02 the file should look the same except for two (2) line :

$ cat > /etc/keepalived/keepalived.conf << EOF
global_defs {
router_id ovp_vrrp
}

vrrp_script haproxy_check {
script "killall -0 haproxy"
interval 2
weight 2
}

vrrp_instance OCP_LB {
state BACKUP
interface eth0
virtual_router_id 51
priority 98
virtual_ipaddress {
192.168.1.4
}
track_script {
haproxy_check
}
authentication {
auth_type PASS
auth_pass 3e879f74-6b0b-4097-89a3-c2fccdf522ef
}
}
EOF

the state of this server should be backup and the priority should be lower then the master to avoid split brain so we gave it 98

If you are running a firewall (by means of firewalld or iptables), you must allow VRRP traffic to pass between the keepalived nodes. To configure the firewall to allow the VRRP traffic with firewalld, run the following commands:

$ firewall-cmd --add-rich-rule='rule protocol value="vrrp" accept' --permanent
$ firewall-cmd --reload

next start the service on both nodes:

$ systemctl enable keepalived; systemctl start keepalived

PXE Server

The Objective of a PXE server is to enable “boot from LAN” for servers, we are going to use this technology to deploy rhcos to our servers (the server should be without OS and the boot sequense should be first “HD” then “LAN”. That will ensure a single boot in a seccesful deploy.

DHCPD

A PXE server must start with a DHCP server so we are going to deploy and configure one on our bastion Server:

$ yum install dhcp

next lets configure the “dhcpd.conf” file to send address to our network segment (VLAN).

NOTE!!

at this point you will need to know in advance what is the allow IP range you are given to work with or in some cases the IP’s for your servers which will work only in reservation mode.
More so you need to make sure that it is the only DHCP server in it’s VLAN and that the “ip helper” on the switches/routes is directed to your DHCP ip address.

the following file is a good example of the dhcpd.conf file :

$ cat > /etc/dhcp/dhcpd.conf << EOF
#
# VLAN ...(192.168.1.0/24)
#
subnet 192.168.1.0 netmask 255.255.255.0 {
option subnet-mask 255.255.255.0;
option broadcast-address 192.168.1.255;
option routers 192.168.1.254;
option domain-name "example.com";
option ntp-servers ntp.exmaple.com;
option domain-name-servers 192.168.2.1, 192.168.2.2;
option time-offset 1;
next-server bastion.example.com;
filename "pxelinux.0";
}
group openshift4 {
host master-01 {
hardware ethernet 18:66:da:cc:aa:02;
fixed-address 192.168.1.6;
option host-name "master-01.example.com";
}
....
}
EOF

now we need to start the dhcp and make sure it will run at boot time :

$systemctl start dhcpd && systemctl enable dhcpd

tftp Server

TFTP server can be installed using following command, where xinetd is necessary.

In order to install run the following command :

yum install tftp tftp-server xinetd -y

next let configure the tftp server to run when starting the xinetd service :

$ cat > /etc/xinetd.d/tftp << EOF
# default: off
# description: The tftp server serves files using the trivial file transfer \
# protocol. The tftp protocol is often used to boot diskless \
# workstations, download configuration files to network-aware printers, \
# and to start the installation process for some operating systems.
service tftp
{
socket_type = dgram
protocol = udp
wait = yes
user = root
server = /usr/sbin/in.tftpd
server_args = -c -s /var/lib/tftpboot
disable = no
per_source = 11
cps = 100 2
flags = IPv4
}
EOF

We made 2 modification to the original file :

  1. set disable to no.
  2. Add -c option into server_args if you need to upload files to TFTP server from client

Next we would want to enable a Linux boot from our PXE Server , In order to achieve that we will install the syslinux-tftpboot package :

$ yum install -y syslinux-tftpboot
$ systemctl start xinetd && systemctl enable xinetd

for the firewall we need to open UDP 69:

$ firewall-cmd --add-port 69/udp --permanent --zone=${FIREWALLD_DEFAULT_ZONE}

For the SElinux ONLY if you wants clients to be able to write to it , you need to set the following flags :

$ setsebool -P tftp_anon_write 1
$ setsebool -P tftp_home_dir 1

httpd

In our Scenario we would want the bare-metal image of the Red Hat CoreOS and the ignition files to be available through HTTP.
We will install Apache HTTPD and use it’s public directory to publish the files :

yum install httpd 

Now we will create the directories we are going to use in the installation :

$ mkdir /var/www/html/pub
$ mkdir /var/www/html/pub/{pxe,ign}

Once all the steps are completed we are ready to continue to the installation
( on part 2 )

Open Source contributer for the past 15 years