OpenShift 4 in an Air Gap (disconnected) environment (Part 1 — prerequisites)

11 min readFeb 27, 2020

The journey

Recently I had completed an Openshift 4 installation on a completely isolated (air gap) environment.
after talking with my peers and the guys that helped we with it I found that the journey I took to get the installation working (and eventually the cluster running) is one wrought telling.
This is the first part of that journey and here we will go over the preparation and the infrastructure setup so it will be easy to run the installation.

thanks

first I would like to thank Anoel Yakoubov and Ron Meshulam for helping we with the process , I wouldn’t been able to complete it without you guys.

Infrastructure

First before we begin the work through I would like to talk about the end design of how to cluster should look like and the following diagram explains it :

The installation process (I explain more in Part 2) will need the infrastructure to be setup up in advanced which consist of:

A deployment server which will be referred as the “bastion” server which will be consisted of a DHCP , TFTP and an HTTP server for a PXE installation.
DNS Server -> A DNS Server with all the records pre-configured (I will get into more details later in this tutorial)
Load Balancer -> 2 Load Balancers in Active/Passive state (I will use HAproxy and Keepalived in my example).
The same load balance will serve our Application so we will direct it to the worker nodes as well on the required ports.

NOTE!!

The Bastion Server will run the containers registry. sense the registry is a major component in the installation and because it is a disconnected (Air Gaped) environment we will handle the registry during the installation part (Part 2)

DNS Server

There are a number of different scenarios:

IT DHCP will provide random IP addresses for our OCP 4 VMs and additional options like DNS with all relevant records managed in this IT DNS and resolution for Internet access addresses.
IT DHCP will provide random (but reserved) IP addresses for our OCP 4 VMs and additional options like DNS, but we will manage our own DNS Zone for OCP 4 Cluster, in this case we need to ask IT DNS Administrator to provide A slave zone in the IT DNS Server and our DNS Server is the master for our DNS Zone. All resolutions for Internet access addresses will be done by IT DNS servers.
IT Administrator will provide use with a dedicated VLAN and we will manage the DNS , and the DHCP for this VLAN.
our DNS server will be the Master for our DNS zone and the Organization DNS will hold the zone as a Slave zone. All other DNS queries will be resolved by Organization DNS Servers

In this tutorial I will focus on Scenario #3 which requires us more configuration management but contains the configuration for scenarios #1 & #2 as well (basically you can take only the relevant part to your environment).

DNS install

NOTE

Our cluster domain will be consistent of <cluster name>.<domain prefix> so for our tutorial we will call our cluster “ocp4” and our domain will be “example.com”

In this tutorial we are making an effort to preserve resource and “Keep It Simple” so for our environment the Bastion Server can hold the DNS as well.
If you have access to another DNS server where you can run administrative actions you can skip this part to the configuration section. If not then on our Bastion Server we will install the DNS server :

$ dnf install bind

Next we would like to make sure the DNS is forward to the IT DNS and it will allow queries on the given address we asigned to it :

$cat >> /etc/named.conf << EOF
zone "exmaple.com" in {
   type master;
   file "example.com.zone";
};
EOF

More so make sure you set the server ip address and the ip addresses of the IT DNS servers in forwarders:

$vi /etc/named.conf
opetions {
....
            listen-on port 53 { 192.168.1.1; };
            ....
            forwarders {
                192.168.2.1;
                192.168.2.2;
            };

This will enable our environment to resolve all of our DNS requests.

Now we need to create the zone (which is a file as we mentioned in the configuration).
The default file location for bind is under /var/named so to avoid mistake make sure you write the zone file as we specified in the named.conf file

$ cat > /var/named/example.com.zone << EOF
$TTL    14400
@  1D  IN  SOA ns.exapmle.com. hostmaster.example.com. (
  2020022306 ; serial
  3H ; refresh
  15 ; retry
  1w ; expire
  3h ; nxdomain ttl
 )
   IN  NS     ns.exapmle.com.
$ORIGIN exapmle.com.
ns                    IN   A   192.168.1.1
hostmaster            IN   A   192.168.1.1
ntp                   IN   A   192.168.2.1
registry              IN   A   192.168.1.1
bastion               IN   A   192.168.1.1
haproxy-01            IN   A   192.168.1.2
haproxy-02            IN   A   192.168.1.3
vip-01                IN   A   192.168.1.4
ocp4                  IN   A   192.168.1.4
bootstrap             IN   A   192.168.1.5
master-01             IN   A   192.168.1.6
master-02             IN   A   192.168.1.7
master-03             IN   A   192.168.1.8
worker-01             IN   A   192.168.1.9
worker-02             IN   A   192.168.1.10
worker-03             IN   A   192.168.1.11
$ORIGIN ocp4.exmaple.com.
control-plane-0       IN       A   192.168.1.6
control-plane-1       IN       A   192.168.1.7
control-plane-2       IN       A   192.168.1.8
etcd-0                IN       A   192.168.1.6
etcd-1                IN       A   192.168.1.7
etcd-2                IN       A   192.168.1.8
_etcd-server-ssl._tcp IN SRV 0 10  2380 etcd-0
_etcd-server-ssl._tcp IN SRV 0 10  2380 etcd-1
_etcd-server-ssl._tcp IN SRV 0 10  2380 etcd-2
ocp4-bootstrap        IN       A   192.168.1.5
bootstrap-0           IN       A   192.168.1.5
api                   IN       A   192.168.1.4
api-int               IN       A   192.168.1.4
$ORIGIN apps.ocp4.example.com.
*                     IN       A   192.168.1.4
EOF

DNS zone file

Without going into to much DNS configuration we basically setting our domain prefix with the $ORIGIN pointer and from that point adding the records that are relevant.
All the records are been used at some point during the installation so make sure you double check it before we continue.

Make sure the Serial number at the top matches the date + 01 serial at the end and restart the named service.

$ systemctl restart named

Load Balancer (highly available)

A typical deployment of OpenShift Container Platform (just like in our tutorial) has multiple masters and workers.In this configuration, there is no single point of failure for the cluster, unless there is only a single load balancer (haproxy) server configured to load balance cluster data.

HAproxy load balances port socket connections to a pool of masters and workers. The following discusses the process of adding a second HAproxy server to an existing OpenShift deployment. This configures the environment into a highly available cluster using Keepalived. Keepalived is routing software written in C that establishes a floating virtual IP address using Virtual Router Redundancy Protocol (VRRP) that can belong to any node in a cluster. For more information regarding Keepalived: http://www.keepalived.org

The following image describes the High Availability architecture :

haproxy

for the haproxy we will run the same commands on both servers to make sure the configurations are identical. But first we will install the server :

$ dnf install haproxy

Now that the package is installed let's update the configuration with the following content

$ cat > /etc/haproxy/haproxy.cfg << EOF
# Global settings
#-----------------------------------------------------------------global
    maxconn     20000
    log         /dev/log local0 info
    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    user        haproxy
    group       haproxy
    daemon# turn on stats unix socket
    stats socket /var/lib/haproxy/stats
#------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#-------------------------------------------------------------------
defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         10s
    timeout client          300s
    timeout server          300s
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 20000
listen stats
    bind :9000
    mode http
    stats enable
    stats uri /
frontend  openshift-app-https
    bind *:443
    default_backend openshift-app-https
    mode tcp
    option tcplog
backend openshift-app-https
    balance roundrobin
    mode tcp
    server      worker-01 192.168.1.9:443 check
    server      worker-02 192.168.1.10:443 check
    server      worker-03 192.168.1.11:443 check
frontend  openshift-app-http
    bind *:80
    default_backend openshift-app-http
    mode tcp
    option tcplogbackend openshift-app-http
    balance roundrobin
    mode tcp
    server      worker-01 192.168.1.9:80 check
    server      worker-02 192.168.1.10:80 check
    server      worker-03 192.168.1.11:80 check
frontend  master-api
    bind *:6443
    default_backend master-api-be
    mode tcp
    option tcplog
backend master-api-be
    balance roundrobin
    mode tcp
    server      bootstrap 192.168.1.5:6443 check
    server      master-01 192.168.1.6:6443 check
    server      master-02 192.168.1.7:6443 check
    server      master-03 192.168.1.8:6443 check
frontend  master-api-2
    bind *:22623
    default_backend master-api-2-be
    mode tcp
    option tcplog
backend master-api-2-be
    balance roundrobin
    mode tcp
    server      bootstrap 192.168.1.5:22623 check
    server      master-01 192.168.1.6:22623 check
    server      master-02 192.168.1.7:22623 check
    server      master-03 192.168.1.8:22623 check
EOF

Note that bootstap server is only for the installation part , once all the server are booted (I will alaborate on this in part 2) we need to remove the bootstrap servers from the load balancer and shut it down.

Next we will need to open the ports on our firewalld:

first get your firewalld zone:

$ export FIREWALLD_DEFAULT_ZONE=`firewall-cmd --get-default-zone`
$ echo ${FIREWALLD_DEFAULT_ZONE}
public

Open the relevant ports :

$ firewall-cmd --add-port 22623/tcp --permanent --zone=${FIREWALLD_DEFAULT_ZONE}
$ firewall-cmd --add-port 6443/tcp --permanent --zone=${FIREWALLD_DEFAULT_ZONE}
$ firewall-cmd --add-service https --permanent --zone=${FIREWALLD_DEFAULT_ZONE}
$ firewall-cmd --add-service http --permanent --zone=${FIREWALLD_DEFAULT_ZONE}
$ firewall-cmd --add-port 9000/tcp --permanent --zone=${FIREWALLD_DEFAULT_ZONE}

Now lets reload and see the ports:

$ firewall-cmd --reload
$ firewall-cmd --list-ports

In case you have SElinux enabled because port 22623 is not in the haproxy allowed ports list we will fail when we start haproxy. In order to avoid that we need to put SElinux in premissive mode , start the haproxy and then genrate an selinux module for haproxy:

Switch to premissive mode :

$ setenforce 0

start the haproxy service:

$ systemctl start haproxy
$ systemctl enable haproxy

Now generate the SElinux module using audit2allow and apply it :

$ dnf install -y policycoreutils-python
$ cat /var/log/audit/audit.log | audit2allow -M haproxy
$ semodule -i haproxy.pp

Now we can swith SElinux back to “enforcing”

$ setenforce 1

keepalived

for the keepalived part there are a few small diffrences between our two (2) servers. First lets install it on both Servers.

# dnf install -y keepalived

Determine the interface for use with the services:

$ ip link show
1: lo: LOOPBACK,UP,LOWER_UP mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
link/ether 00:50:56:a1:ab:11 brd ff:ff:ff:ff:ff:ff

Generate a random external password for Keepalived’s AUTH_PASS:

$ uuidgen
3e879f74-6b0b-4097-89a3-c2fccdf522ef

Now lets start with Server number 1 (haproxy-01) which will also be our master Server.

the keepalived configuration should look as follow :

$ cat > /etc/keepalived/keepalived.conf << EOF
global_defs {
router_id ovp_vrrp
}

vrrp_script haproxy_check {
script "killall -0 haproxy"
interval 2
weight 2
}

vrrp_instance OCP_LB {
   state MASTER
   interface eth0
   virtual_router_id 51
   priority 100
   virtual_ipaddress {
     192.168.1.4
   }   track_script {
     haproxy_check
   }authentication {
      auth_type PASS
      auth_pass 3e879f74-6b0b-4097-89a3-c2fccdf522ef
   }
}
EOF

On haproxy-02 the file should look the same except for two (2) line :

$ cat > /etc/keepalived/keepalived.conf << EOF
global_defs {
router_id ovp_vrrp
}

vrrp_script haproxy_check {
script "killall -0 haproxy"
interval 2
weight 2
}

vrrp_instance OCP_LB {
   state BACKUP
   interface eth0
   virtual_router_id 51
   priority 98
   virtual_ipaddress {
     192.168.1.4
   }track_script {
     haproxy_check
   }authentication {
      auth_type PASS
      auth_pass 3e879f74-6b0b-4097-89a3-c2fccdf522ef
   }
}
EOF

the state of this server should be backup and the priority should be lower then the master to avoid split brain so we gave it 98

If you are running a firewall (by means of firewalld or iptables), you must allow VRRP traffic to pass between the keepalived nodes. To configure the firewall to allow the VRRP traffic with firewalld, run the following commands:

$ firewall-cmd --add-rich-rule='rule protocol value="vrrp" accept' --permanent
$ firewall-cmd --reload

next start the service on both nodes:

$ systemctl enable keepalived; systemctl start keepalived

PXE Installation

The Objective of a PXE installation is to enable “boot from LAN” for servers, we are going to use this technology to deploy rhcos to our servers (the server should be without OS and the boot sequense should be first “HD” then “LAN”. That will ensure a single boot in a seccesful deploy.

DHCPD

A PXE server must start with a DHCP server so we are going to deploy and configure one on our bastion Server:

$ dnf install dhcp

next lets configure the “dhcpd.conf” file to send address to our network segment (VLAN).

NOTE!!

at this point you will need to know in advance what is the allow IP range you are given to work with or in some cases the IP’s for your servers which will work only in reservation mode.
More so you need to make sure that it is the only DHCP server in it’s VLAN and that the “ip helper” on the switches/routes is directed to your DHCP ip address.

the following file is a good example of the dhcpd.conf file :

$ cat > /etc/dhcp/dhcpd.conf << EOF
#
# VLAN ...(192.168.1.0/24)
#
subnet 192.168.1.0 netmask 255.255.255.0 {
  option subnet-mask  255.255.255.0;
  option broadcast-address 192.168.1.255;
  option routers   192.168.1.254;
  option domain-name  "example.com";
  option ntp-servers      ntp.exmaple.com;
  option domain-name-servers      192.168.2.1, 192.168.2.2;
  option time-offset  1;next-server  bastion.example.com;
  filename  "pxelinux.0";
}group openshift4 {
        host master-01 {
                hardware ethernet        18:66:da:cc:aa:02;
                fixed-address            192.168.1.6;
                option host-name         "master-01.example.com";
        }
....
}
EOF

now we need to start the dhcp and make sure it will run at boot time :

$ systemctl start dhcpd && systemctl enable dhcpd

tftp Server

TFTP server can be installed using following command, where xinetd is necessary.

In order to install run the following command :

# dnf install tftp tftp-server xinetd -y

next let configure the tftp server to run when starting the xinetd service :

$ cat > /etc/xinetd.d/tftp << EOF
# default: off
# description: The tftp server serves files using the trivial file transfer \
# protocol.  The tftp protocol is often used to boot diskless \
# workstations, download configuration files to network-aware printers, \
# and to start the installation process for some operating systems.
service tftp
{
         socket_type      = dgram
         protocol         = udp
         wait             = yes
         user             = root
         server           = /usr/sbin/in.tftpd
         server_args      = -c -s /var/lib/tftpboot
         disable          = no
         per_source       = 11
         cps              = 100 2
         flags            = IPv4
}
EOF

We made 2 modification to the original file :

set disable to no.
Add -c option into server_args if you need to upload files to TFTP server from client

Next we would want to enable a Linux boot from our PXE Server , In order to achieve that we will install the syslinux-tftpboot package :

$ yum install -y syslinux-tftpboot
$ systemctl start xinetd && systemctl enable xinetd

for the firewall we need to open UDP 69:

$ firewall-cmd --add-port 69/udp --permanent --zone=${FIREWALLD_DEFAULT_ZONE}

For the SElinux ONLY if you wants clients to be able to write to it , you need to set the following flags :

$ setsebool -P tftp_anon_write 1
$ setsebool -P tftp_home_dir 1

httpd

In our Scenario we would want the bare-metal image of the Red Hat CoreOS and the ignition files to be available through HTTP.
We will install Apache HTTPD and use it’s public directory to publish the files :

$ dnf install httpd

Now we will create the directories we are going to use in the installation :

$ mkdir /var/www/html/pub
$ mkdir /var/www/html/pub/{pxe,ign}

Once all the steps are completed we are ready to continue to the installation
( on part 2 )

OpenShift 4 in an Air Gap (disconnected) environment (Part 1 — prerequisites)

The journey

thanks

Infrastructure

NOTE!!

DNS Server

DNS install

NOTE

DNS zone file

Load Balancer (highly available)

haproxy

keepalived

PXE Installation

DHCPD

NOTE!!

tftp Server

httpd

Written by Oren Oichman

Responses (2)