Deploying OKD in a disconnected (Air Gap) Environment

Oren Oichman
29 min readMay 22, 2020

--

About This Tutorial

I know there are a lot of IT guys out their how want to show case the benefits of working with OpenShift 4 as oppose to working with bare Kubernetes and building everything around it and working in a disconnected environment.
Well …. this one is for you (from scrach to running).

What is OKD

OKD is a version of Openshift but is only a snapshot in time of the current running upstream release of Openshift … no SLA , no Support , and most of all … no upgrade guarantee. Which makes it great for playing with the product itself, maybe develop a few things (Operators, etc…) but deferentially NOT for your production environment.

In General

basically we are going to deploy the tools we need on a connected Server (prefer centos 7/8) . bunch them all together and deploy everything we need in our disconnected environment.
the Steps are :

  1. Setting the Server
  2. Building and running the registry
  3. Mirroring the packages that we need
  4. Tar (or 7zip) the files we need
  5. Getting everything to the disconnected environment
  6. Make sure the infrastructure is ready for the deployment
  7. Deploying the OKD cluster

Preparations

tools

on our Centos system in our connected environment first install the necessary tools :

$ yum install -y jq ftp openssl irssi p7zip curl wget tftp buildah telnet podman httpd-tools tcpdump nmap net-tools screen tmux bind-utils nfs-utils sg3_utils nmap-ncat rlwrap uucp openldap-clients xorg-x11-xauth wireshark unix2dos unixODBC policycoreutils-python-utils vim-*

I know you don’t need some of those tools but I prefer having them in case we you will need them for testing or advanced administration.

Registry

First let’s create a base directory for the repository on the external server.
For the purpose of this document I will refer to this server as “external”

On the external server run the following command :

$ mkdir /opt/registry
$ export REGISTRY_BASE="/opt/registry"

Now lets create the directories we need for the repository and everything we will want to take to the internal server

$ mkdir -p ${REGISTRY_BASE}/{auth,certs,data,downloads}
$ mkdir -p ${REGISTRY_BASE}/downloads/{images,tools,secrets}

A simple but a tricky part , here we will want to call the registry the same name as we would in the internal LAN but we probably do not want to write our internal domain in an external server so we will use the hostname and not FQDN.

We will edit the /etc/hosts file of the external Server and add the “registry” record to it:

$ vi /etc/hosts
127.0.0.1 registry

From now on our registry will be named “registry”.

Now lets start a “screen” or tmux session in case we will get disconnected our session will continue

$ screen -S ocp
$ export REGISTRY_BASE="/opt/registry"

Our registry will need to work over SSL so we have 2 choices the long way (with a certificate request) or the short way (self signed certificate).
I will pick the short way because other then the sync itself will are not going to use this certificate (unless you do then I would prefer the long way)

$ cd ${REGISTRY_BASE}/certs/
$ cat >csr_answer.txt << EOF
[req]
default_bits = 4096
prompt = no
default_md = sha256
distinguished_name = dn
[ dn ]
C=US
ST=New York
L=New York
O=MyOrg
OU=MyOU
emailAddress=me@working.me
CN = registry
EOF

Change the values under the DN section as you see fit (here it does not really matter execpt for the CN ).
Now lets generate the self signed certificate:

$ openssl req -newkey rsa:4096 -nodes -sha256 -keyout domain.key -x509 -days 1825 -out domain.crt -config <( cat csr_answer.txt )

The output of this command will be 2 new files which we will use for our registry’s SSL certificate:

$ ls -al
total 20
drwxr-xr-x. 2 root root 4096 Jan 8 13:49 .
drwxr-xr-x. 7 root root 4096 Jan 8 09:57 ..
-rw-r — r — . 1 root root 175 Jan 8 13:48 csr_answer.txt
-rw-r — r — . 1 root root 1972 Jan 8 13:49 domain.crt
-rw-r — r — . 1 root root 3272 Jan 8 13:49 domain.key

Generate a username and password (must use bcrypt formatted passwords), for access to your registry.

$ htpasswd -bBc ${REGISTRY_BASE}/auth/htpasswd admin admin

This tutorial assume that you are running SElinux and firewalld as a secure server.
At this point we will make sure the port is open with the firewall-cmd tool:

$ FIREWALLD_DEFAULT_ZONE=`firewall-cmd --get-default-zone`
$ echo ${FIREWALLD_DEFAULT_ZONE}
public

My output is “public” but it can be “dmz” , “internal” or “public” for you.

Make sure to open port 5000 on your host, as this is the default port for the registry.

$ firewall-cmd --add-port=5000/tcp --zone=${FIREWALLD_DEFAULT_ZONE} --permanent$ firewall-cmd --reload

Now you’re ready to run the container. Here I specify the directories I want to mount inside the container. I also specify I want to run on port 5000 and that I want it in daemon mode.
I would recommend you put this in a shell script under ${REGISTRY_BASE}/downloads/tools so it will be easy to run it again in the internal server:

$ echo 'podman run --name my-registry -d -p 5000:5000 \
-v ${REGISTRY_BASE}/data:/var/lib/registry:z \
-v ${REGISTRY_BASE}/auth:/auth:z -e "REGISTRY_AUTH=htpasswd" \
-e "REGISTRY_AUTH_HTPASSWD_REALM=Registry" \
-e "REGISTRY_HTTP_SECRET=ALongRandomSecretForRegistry" \
-e REGISTRY_AUTH_HTPASSWD_PATH=/auth/htpasswd \
-v ${REGISTRY_BASE}/certs:/certs:z \
-e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/domain.crt \
-e REGISTRY_HTTP_TLS_KEY=/certs/domain.key \
docker.io/library/registry:2' > ${REGISTRY_BASE}/downloads/tools/start_registry.sh

I am using “echo” here instead of “cut” because I want to preserve our variables with in the command.
The reason for that is to allow us to select a different Directory base for our internal registry.

Now change the file permission and run it :

$ chmod a+x ${REGISTRY_BASE}/downloads/tools/start_registry.sh
$ ${REGISTRY_BASE}/downloads/tools/start_registry.sh

make sure the external server has the certificate to validate with the registry :

$ cp ${REGISTRY_BASE}/certs/domain.crt /etc/pki/ca-trust/source/anchors/
$ update-ca-trust extract

Verify connectivity to your registry with curl. Provide it the username and password you created.

$ curl -u admin:admin https://registry:5000/v2/_catalog 
{"repositories":[]}

This should return an “empty” repository for now.

All we need to do now is to generate a pull-secret.json file for our internal registry because we are protecting it with username and password (admin/admin)

First we will generate a base64 of the username and password:

$ REG_SECRET=`echo -n 'admin:admin' | base64 -w0`

now we will insert into a JSON file the oc command expect to read from (the content is identical to the .docker/config.json file)

$ cd ${REGISTRY_BASE}/downloads/secrets/
$ echo '{ "auths": {}}' | jq '.auths += {"registry:5000": {"auth": "REG_SECRET","email": "me@working.me"}}' | sed "s/REG_SECRET/$REG_SECRET/" | jq -c .> pull-secret-registry.json

to make sure the file is in a correct JSON format pipe it to jq

$ cat pull-secret-registry.json | jq
{
"auths": {
"registry:5000": {
"auth": "YWRtaW46YWRtaW4=",
"email": "me@working.me"
}
}
}

syncing the repository

In Order to start the syncing we nee to first download the “oc” client and use it for mirroring the content.
To download the latest oc client and openshift-install binaries, you need to use an existing version of the oc client.

Download the 4.4 version of the oc client from the OKD releases page. Example:

$ cd ${REGISTRY_BASE}/downloads/tools/
$ wget https://github.com/openshift/okd/releases/download/4.4.0-0.okd-2020-01-28-022517/openshift-client-linux-4.4.0-0.okd-2020-01-28-022517.tar.gz -P ${REGISTRY_BASE}/downloads/tools/

Extract the okd version of the oc client:

$ tar -zxvf openshift-client-linux-4.4.0-0.okd-2020-01-28-022517.tar.gz

The latest releases are available on https://origin-release.svc.ci.openshift.org/.

$ ./oc adm release extract --tools registry.svc.ci.openshift.org/origin/release:4.4

This downloads two tar.gz files, one containing a new version of the oc client and the openshift-install binary.

Extract and move the new version of the oc client and kubectl to /opt/okd/bin and link with alternatives and Verify the version:

$ rm -f kubectl oc openshift-install-linux-4.4.0-0.okd-2020-04-16-024151.tar.gz  README.md release.txt sha256sum.txt$ rm -f openshift-client-linux-4.4.0-0.okd-2020-01-28-022517.tar.gz$ rm -f openshift-install-*$ tar -zxvf openshift-client-*$ mkdir /opt/okd/$ mkdir /opt/okd/bin$ cp oc kubectl /opt/okd/bin$ alternatives --install /usr/bin/oc oc /opt/okd/bin/oc 10$ alternatives --install /usr/bin/kubectl kubectl /opt/okd/bin/kubectl 10

Now we have the latest version of “oc” okd.

mirroring

the mirroring process is very similar to Openshift 4 mirror deployment, what we need now is to sync the repository.

First lets export our variables:

$ export LOCAL_REGISTRY='registry:5000'$ export OCP_RELEASE="4.4"$ export LOCAL_REPO='origin/release'$ export PRODUCT_REPO='origin'$ export LOCAL_SECRET_JSON="${REGISTRY_BASE}/downloads/secrets/pull-secret-registry.json"$ export RELEASE_NAME="release"

now we can start the sync process to our internal registry:

$ oc adm -a ${LOCAL_SECRET_JSON} release mirror \
--from=registry.svc.ci.openshift.org/${PRODUCT_REPO}/${RELEASE_NAME}:${OCP_RELEASE} \
--to=${LOCAL_REGISTRY}/${LOCAL_REPO} \
--to-release-image=${LOCAL_REGISTRY}/${LOCAL_REPO}:${OCP_RELEASE} \
2>&1 | tee ${REGISTRY_BASE}/downloads/secrets/mirror-output.txt

keep the output file , it will come in handy for the internal deployment.

openshift-install

For the last step we need to Generate the openshift-install command so it will look for our registry during the deployment :

$ cd ${REGISTRY_BASE}/downloads/tools/
$ oc adm -a ${LOCAL_SECRET_JSON} release extract --command=openshift-install "${LOCAL_REGISTRY}/${LOCAL_REPO}:${OCP_RELEASE}"

Now we can create our install-config.yaml file which will be needed for our installation process, the reason that we are doing it now is to save us a few typos and to make sure we have everything we need from the internet to our Air Gaped environment

NOTE!!!

The file name must be "install-config.yaml".
This is the file our installation command expects to read from.
This is how the file should look like:

$ cd ${REGISTRY_BASE}/downloads/tools
$ cat > install-config.yaml << EOF
apiVersion: v1
baseDomain: example.com
controlPlane:
name: master
hyperthreading: Disabled
replicas: 3
compute:
- name: worker
hyperthreading: Disabled
replicas: 3
metadata:
name: test-cluster
networking:
clusterNetworks:
- cidr: 10.128.0.0/14
hostPrefix: 23
machineNetwork:
- cidr: 172.18.0.0/16
networkType: OpenShiftSDN
serviceNetwork:
- 172.30.0.0/16
platform:
none: {}
fips: false
pullSecret: ''
sshKey: ''
additionalTrustBundle: |
-----BEGIN CERTIFICATE-----
<...base-64-encoded, DER - CA certificate>
-----END CERTIFICATE-----
EOF

That is all we need for now , the rest we will generate from the output of our mirroring command and from our internal CA certificate and our SSH public key.

Save the registry

Now that the tools are set we came save the registry image so it will be available in the disconnected environment

$ podman stop my-registry

$ podman rm --force my-registry
$ podman save docker.io/library/registry:2 -o ${REGISTRY_BASE}/downloads/images/registry.tar

IMAGES

To download the image we need to go to the Fedora website at: https://getfedora.org/en/coreos/ and select the download tab at the top right of the screen :

And then click on the “Bare Metal & Virtualized” Tab at the middle and download the following files (I am running a PXE/network install in my example )

ISO , PXE (Kernel and initramfs ) and the raw.xz file)

Make sure you save all the images under “${REGISTRY_BASE}/downloads/images/” you can use wget as shown in the example :

$ wget https://builds.coreos.fedoraproject.org/prod/streams/stable/builds/31.20200323.3.2/x86_64/fedora-coreos-31.20200323.3.2-live.x86_64.iso -P ${REGISTRY_BASE}/downloads/images/

Generating the 7zip files

Now that we have all the files in place we can generate an archive from the files and transport it to our internal bastion Server.

It is much more easier to split the file with 7zip

go to a directory where there 15G of storage available:

$ cd /home

Wow create the 7z archive files

$ 7za a -t7z -v1g -m0=lzma -mx=9 -mfb=64 -md=32m -ms=on okd-registry.7z ${REGISTRY_BASE}

That will generate files of 1 GB size containing the registry directory.

you should see something like this :

$ ls -al
total 7167720
drwxr-xr-x. 3 root root 206 Apr 16 15:03 .
dr-xr-xr-x. 17 root root 224 Apr 16 09:59 ..
-rw-r--r--. 1 root root 1073741824 Apr 16 15:09 okd-registry.7z.001
-rw-r--r--. 1 root root 1073741824 Apr 16 14:41 okd-registry.7z.002
-rw-r--r--. 1 root root 1073741824 Apr 16 14:47 okd-registry.7z.003
-rw-r--r--. 1 root root 1073741824 Apr 16 14:52 okd-registry.7z.004
-rw-r--r--. 1 root root 1073741824 Apr 16 14:58 okd-registry.7z.005
-rw-r--r--. 1 root root 1073741824 Apr 16 15:03 okd-registry.7z.006
-rw-r--r--. 1 root root 897293356 Apr 16 15:09 okd-registry.7z.007

Disconnected Environment

In our disconnected environment we will deploy a bastion Server with DHCP , APACHE and TFTP for PXE boot and another 2 servers to act as a load balancer with HAproxy and keepalived.

The order will be :

  1. DNS zone
  2. HAproxy with keepalived.
  3. The registry deployment
  4. PXE Configuration
  5. Complete The OKD installation.

DNS

in our deployment the bastion Server will also be the DNS server.
You will need to make sure ALL the records I am adding here are present in your DNS server (A and SRV).

NOTE
Our cluster domain will be consistent of <cluster name>.<domain prefix> so for our tutorial we will call our cluster “ocp4” and our domain will be “example.com”

In this tutorial we are making an effort to preserve resource and “Keep It Simple” so for our environment the Bastion Server can hold the DNS as well.
If you have access to another DNS server where you can run administrative actions you can skip this part to the configuration section. If not then on our Bastion Server we will install the DNS server :

$ yum install bind

Next we would like to make sure the DNS is forward to the IT DNS and it will allow queries on the given address we asigned to it :

$cat >> /etc/named.conf << EOF
zone "exmaple.com" in {
type master;
file "example.com.zone";
};
EOF

More so make sure you set the server ip address and the ip addresses of the IT DNS servers in forwarders:

$vi /etc/named.conf
opetions {
....
listen-on port 53 { 192.168.1.1; };
....
forwarders {
192.168.2.1;
192.168.2.2;
};

This will enable our environment to resolve all of our DNS requests.

Now we need to create the zone (which is a file as we mentioned in the configuration).
The default file location for bind is under /var/named so to avoid mistake make sure you write the zone file as we specified in the named.conf file

$ cat > /var/named/example.com.zone << EOF
$TTL 14400
@ 1D IN SOA ns.example.com. hostmaster.example.com. (
2020022306 ; serial
3H ; refresh
15 ; retry
1w ; expire
3h ; nxdomain ttl
)
IN NS ns.exapmle.com.
$ORIGIN example.com.
ns IN A 192.168.1.1
hostmaster IN A 192.168.1.1
ntp IN A 192.168.2.1
registry IN A 192.168.1.1
bastion IN A 192.168.1.1
haproxy-01 IN A 192.168.1.2
haproxy-02 IN A 192.168.1.3
vip-01 IN A 192.168.1.4
ocp4 IN A 192.168.1.4
bootstrap IN A 192.168.1.5
master-01 IN A 192.168.1.6
master-02 IN A 192.168.1.7
master-03 IN A 192.168.1.8
worker-01 IN A 192.168.1.9
worker-02 IN A 192.168.1.10
worker-03 IN A 192.168.1.11
$ORIGIN ocp4.example.com.
control-plane-0 IN A 192.168.1.6
control-plane-1 IN A 192.168.1.7
control-plane-2 IN A 192.168.1.8
etcd-0 IN A 192.168.1.6
etcd-1 IN A 192.168.1.7
etcd-2 IN A 192.168.1.8
_etcd-server-ssl._tcp IN SRV 0 10 2380 etcd-0
_etcd-server-ssl._tcp IN SRV 0 10 2380 etcd-1
_etcd-server-ssl._tcp IN SRV 0 10 2380 etcd-2
ocp4-bootstrap IN A 192.168.1.5
bootstrap-0 IN A 192.168.1.5
api IN A 192.168.1.4
api-int IN A 192.168.1.4
$ORIGIN apps.ocp4.example.com.
* IN A 192.168.1.4
EOF

DNS zone file
Without going into to much DNS configuration we basically setting our domain prefix with the $ORIGIN pointer and from that point adding the records that are relevant.
All the records are been used at some point during the installation so make sure you double check it before we continue.

Make sure the Serial number at the top matches the date + 01 serial at the end and restart the named service.

$ systemctl restart named

HAproxy and keepalived.

For the rest of this tutorial I will assume that your Servers are already deployed with RHEL/CENTOS 8 and have access to the iso disks (or for a yum repo based on the ISO).

Load Balancer (highly available)
A typical deployment of OpenShift Container Platform (just like in our tutorial) has multiple masters and workers.In this configuration, there is no single point of failure for the cluster, unless there is only a single load balancer (haproxy) server configured to load balance cluster data.

HAproxy load balances port socket connections to a pool of masters and workers. The following discusses the process of adding a second HAproxy server to an existing OpenShift deployment. This configures the environment into a highly available cluster using Keepalived. Keepalived is routing software written in C that establishes a floating virtual IP address using Virtual Router Redundancy Protocol (VRRP) that can belong to any node in a cluster. For more information regarding Keepalived: http://www.keepalived.org

The following image describes the High Availability architecture :

haproxy

for the haproxy we will run the same commands on both servers to make sure the configurations are identical. But first we will install the server :

$ yum install haproxy
$ cat > /etc/haproxy/haproxy.cfg << EOF
# Global settings
#---------------------------------------------------------------------
global
maxconn 20000
log /dev/log local0 info
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
user haproxy
group haproxy
daemon# turn on stats unix socket
stats socket /var/lib/haproxy/stats
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
mode http
log global
option httplog
option dontlognull
option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 300s
timeout server 300s
timeout http-keep-alive 10s
timeout check 10s
maxconn 20000
listen stats
bind :9000
mode http
stats enable
stats uri /
frontend openshift-app-https
bind *:443
default_backend openshift-app-https
mode tcp
option tcplog
backend openshift-app-https
balance source
mode tcp
server worker-01 192.168.1.9:443 check
server worker-02 192.168.1.10:443 check
server worker-03 192.168.1.11:443 check
frontend openshift-app-http
bind *:80
default_backend openshift-app-http
mode tcp
option tcplog
backend openshift-app-http
balance source
mode tcp
server worker-01 192.168.1.9:80 check
server worker-02 192.168.1.10:80 check
server worker-03 192.168.1.11:80 check
frontend master-api
bind *:6443
default_backend master-api-be
mode tcp
option tcplog
backend master-api-be
balance roundrobin
mode tcp
server bootstrap 192.168.1.5:6443 check
server master-01 192.168.1.6:6443 check
server master-02 192.168.1.7:6443 check
server master-03 192.168.1.8:6443 check
frontend master-api-2
bind *:22623
default_backend master-api-2-be
mode tcp
option tcplog
backend master-api-2-be
balance roundrobin
mode tcp
server bootstrap 192.168.1.5:22623 check
server master-01 192.168.1.6:22623 check
server master-02 192.168.1.7:22623 check
server master-03 192.168.1.8:22623 check
EOF

Note that bootstap server is only for the installation part , once all the server are booted (I will alaborate on this in part 2) we need to remove the bootstrap servers from the load balancer and shut it down.

Next we will need to open the ports on our firewalld:

first get your firewalld zone:

$ export FIREWALLD_DEFAULT_ZONE=`firewall-cmd --get-default-zone`
$ echo ${FIREWALLD_DEFAULT_ZONE}
public

Open the relevant ports :

$ firewall-cmd --add-port 22623/tcp --permanent --zone=${FIREWALLD_DEFAULT_ZONE}
$ firewall-cmd --add-port 6443/tcp --permanent --zone=${FIREWALLD_DEFAULT_ZONE}
$ firewall-cmd --add-service https --permanent --zone=${FIREWALLD_DEFAULT_ZONE}
$ firewall-cmd --add-service http --permanent --zone=${FIREWALLD_DEFAULT_ZONE}
$ firewall-cmd --add-port 9000/tcp --permanent --zone=${FIREWALLD_DEFAULT_ZONE}

Now lets reload and see the ports:

$ firewall-cmd --reload
$ firewall-cmd --list-ports

In case you have SElinux enabled because port 22623 is not in the haproxy allowed ports list we will fail when we start haproxy. In order to avoid that we need to put SElinux in premissive mode , start the haproxy and then genrate an selinux module for haproxy:

Switch to premissive mode :

$ setenforce 0

start the haproxy service:

$ systemctl start haproxy
$ systemctl enable haproxy

Now generate the SElinux module using audit2allow and apply it :

$ yum install -y policycoreutils-python
$ cat /var/log/audit/audit.log | audit2allow -M haproxy
$ semodule -i haproxy.pp

Now we can swith SElinux back to “enforcing”

$ setenforce 1

keepalived

for the keepalived part there are a few small diffrences between our two (2) servers. First lets install it on both Servers.

# yum install -y keepalived

Determine the interface for use with the services:

$ ip link show
1: lo: LOOPBACK,UP,LOWER_UP mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
link/ether 00:50:56:a1:ab:11 brd ff:ff:ff:ff:ff:ff

Generate a random external password for Keepalived’s AUTH_PASS:

$ uuidgen okd
3e879f74-6b0b-4097-89a3-c2fccdf522ef

Set it into a variable :

$ export KA_PASS=`uuidgen okd`
$ echo $KA_PASS
591ab0e4-f84d-420c-bc28-36c24edd381f

Now lets start with Server number 1 (haproxy-01) which will also be our master Server.

the keepalived configuration should look as follow :

Set the virtual IP

$ export MY_VIP=192.168.1.4

and now add it to keepalived.conf

$ cat > /etc/keepalived/keepalived.conf << EOF
global_defs {
router_id ovp_vrrp
}
vrrp_script haproxy_check {
script "killall -0 haproxy"
interval 2
weight 2
}
vrrp_instance OCP_LB {
state MASTER
interface eth0
virtual_router_id 51
priority 100
virtual_ipaddress {
${MY_VIP}
}
track_script {
haproxy_check
}
authentication {
auth_type PASS
auth_pass ${KA_PASS}
}
}
EOF

On haproxy-02 the file should look the same except for two (2) line :

$ cat > /etc/keepalived/keepalived.conf << EOF
global_defs {
router_id ovp_vrrp
}
vrrp_script haproxy_check {
script "killall -0 haproxy"
interval 2
weight 2
}
vrrp_instance OCP_LB {
state BACKUP
interface eth0
virtual_router_id 51
priority 98
virtual_ipaddress {
192.168.1.4
}
track_script {
haproxy_check
}
authentication {
auth_type PASS
auth_pass ${KA_PASS}
}
}
EOF

the state of this server should be backup and the priority should be lower then the master to avoid split brain so we gave it 98

If you are running a firewall (by means of firewalld or iptables), you must allow VRRP traffic to pass between the keepalived nodes. To configure the firewall to allow the VRRP traffic with firewalld, run the following commands:

$ firewall-cmd --add-rich-rule='rule protocol value="vrrp" accept' --permanent
$ firewall-cmd --reload

next start the service on both nodes:

$ systemctl enable keepalived && systemctl start keepalived

Bastion

First lets make sure we have all the tools we need just like in the external Server (some packges are from the EPEL so you can obtain them manually or just skip team):

$ yum install -y jq ftp openssl weechat p7zip curl wget tftp telnet podman httpd-tools tcpdump nmap net-tools screen tmux bind-utils nfs-utils sg3_utils nmap-ncat rlwrap uucp openldap-clients xorg-x11-xauth wireshark unix2dos unixODBC policycoreutils-python-utils vim-*

The registry

Wow to deploy the registry all we need to do is go to a directory where we have 15 GB of storage available and unzip the file :

$ 7za x okd-registry.7z.001

The follow command will create a directory named “registry” with all the content from the external server I placed it under /opt which is the same as the external server so I will export the “REGISTRY_BASE” variable again :

$ export REGISTRY_BASE=/opt/registry

Before I will restart the registry I would regerate the SSL certificate with the Organization CA ( I will use custom CA in my example)

$ cd ${REGISTRY_BASE}/certs/

Generate answer files for the CA and for the certificate :

$ cat > csr_answer.txt << EOF
[req]
default_bits = 4096
prompt = no
default_md = sha256
x509_extensions = req_ext
req_extensions = req_ext
distinguished_name = dn

[ dn ]
C=US
ST=New York
L=New York
O=MyOrg
OU=MyOrgUnit
emailAddress=me@working.me
CN = registry

[ req_ext ]
subjectAltName = @alt_names

[ alt_names ]
DNS.1 = registry
DNS.2 = registry.example.com
EOF

And for CA :

$ cat > csr_ca.txt << EOF
[req]
default_bits = 4096
prompt = no
default_md = sha256
distinguished_name = dn
x509_extensions = usr_cert

[ dn ]
C=US
ST=New York
L=New York
O=MyOrg
OU=MyOU
emailAddress=me@working.me
CN = server.example.com

[ usr_cert ]
basicConstraints=CA:TRUE
subjectKeyIdentifier=hash
authorityKeyIdentifier=keyid,issuer
EOF

Now we will generate the RSA keys , the CA and the certificate request :

$ openssl genrsa -out ca.key 4096$ openssl req -new -x509 -key ca.key -days 1825 -out ca.crt -config <( cat csr_ca.txt )$ openssl genrsa -out domain.key 4096$ openssl req -new -key domain.key -out domain.csr -config <( cat csr_answer.txt )

Sign the CSR
now comes the tricky part , we need to tell the CA to use the “altrnames” we setup in the answer file but we need to tell it which section to look at for the values we need so we are going to add 2 more arguments for this purpose.

$ openssl x509 -req -in domain.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out domain.crt -days 730 -extensions 'req_ext' -extfile <(cat csr_answer.txt)

as you can see there are 2 more arguments :

  1. extentions — section from config file with X509V3 extensions to add
  2. extfile — configuration file with X509V3 extensions to add

Only when we team up those 2 options does our CA sign the certificate with our alternatives DNS names.

Certificate bundle

In some cases it is a good practice to join the certificate and the CA into a single file (not all servers has a CA configuration options).

$ mv domain.crt domain-certonly.crt
$ cat domain-certonly.crt ca.crt > domain.crt

Testing the Certificate

Now all that is left to do is to test our certificate :

$ openssl x509 -in domain.crt -noout -text | grep DNS
DNS:registry, DNS:registry.example.com

Now update the server to approve the CA :

$ cp ca.crt /etc/pki/ca-trust/source/anchors/registry.crt
$ update-ca-trust extract

To start the registry we first need to load it:

$ podman load -i ${REGISTRY_BASE}/downloads/images/registry.tar

NOTE
in some case you will need to re tag the image (if the image ID is : 708bc6af7e5e )

$ podman tag 708bc6af7e5e docker.io/library/registry:2

Make sure the firewall is open:

$ export FIREWALLD_DEFAULT_ZONE=`firewall-cmd --get-default-zone`$ firewall-cmd --add-port=5000/tcp --zone=${FIREWALLD_DEFAULT_ZONE} --permanent$ $ firewall-cmd --reload

And start it :

$ ${REGISTRY_BASE}/downloads/tools/start_registry.sh

Now make sure the registry is running :

$ podman ps --format json | jq '.[0]' | jq '.Names'+'.Status'
"my-registryUp 1 minute ago"

For a final touch make sure you see the repository we just synced :

$ curl -u admin:admin https://registry:5000/v2/_catalog 
{"repositories":["origin/release"]}

PXE

The Objective of a PXE server is to enable “boot from LAN” for servers, we are going to use this technology to deploy rhcos to our servers (the server should be without OS and the boot sequense should be first “HD” then “LAN”. That will ensure a single boot in a seccesful deploy.

DHCPD

A PXE server must start with a DHCP server so we are going to deploy and configure one on our bastion Server:

$ yum install dhcp

next lets configure the “dhcpd.conf” file to send address to our network segment (VLAN).

NOTE!!
at this point you will need to know in advance what is the allow IP range you are given to work with or in some cases the IP’s for your servers which will work only in reservation mode.
More so you need to make sure that it is the only DHCP server in it’s VLAN and that the “ip helper” on the switches/routes is directed to your DHCP ip address.

the following file is a good example of the dhcpd.conf file :

$ cat > /etc/dhcp/dhcpd.conf << EOF
#
# VLAN ...(192.168.1.0/24)
#
subnet 192.168.1.0 netmask 255.255.255.0 {
option subnet-mask 255.255.255.0;
option broadcast-address 192.168.1.255;
option routers 192.168.1.254;
option domain-name "example.com";
option ntp-servers ntp.exmaple.com;
option domain-name-servers 192.168.2.1, 192.168.2.2;
option time-offset 1;next-server bastion.example.com;
filename "pxelinux.0";
}group openshift4 {
host master-01 {
hardware ethernet 18:66:da:cc:aa:02;
fixed-address 192.168.1.6;
option host-name "master-01.example.com";
}
....
}
EOF

tftp Server
TFTP server can be installed using following command, where xinetd is necessary.

In order to install run the following command :

yum install tftp tftp-server xinetd -y

next let configure the tftp server to run when starting the xinetd service :

$ cat > /etc/xinetd.d/tftp << EOF
# default: off
# description: The tftp server serves files using the trivial file transfer \
# protocol. The tftp protocol is often used to boot diskless \
# workstations, download configuration files to network-aware printers, \
# and to start the installation process for some operating systems.
service tftp
{
socket_type = dgram
protocol = udp
wait = yes
user = root
server = /usr/sbin/in.tftpd
server_args = -c -s /var/lib/tftpboot
disable = no
per_source = 11
cps = 100 2
flags = IPv4
}
EOF

We made 2 modification to the original file :

  1. set disable to no.
  2. Add -c option into server_args if you need to upload files to TFTP server from client

Next we would want to enable a Linux boot from our PXE Server , In order to achieve that we will install the syslinux-tftpboot package :

$ yum install -y syslinux-tftpboot
$ systemctl start xinetd && systemctl enable xinetd

for the firewall we need to open UDP 69:

$ export FIREWALLD_DEFAULT_ZONE=`firewall-cmd --get-default-zone`$ echo ${FIREWALLD_DEFAULT_ZONE}
public
$ firewall-cmd --add-service=tftp --permanent --zone=${FIREWALLD_DEFAULT_ZONE}

For the SElinux ONLY if you wants clients to be able to write to it , you need to set the following flags :

$ setsebool -P tftp_anon_write 1
$ setsebool -P tftp_home_dir 1

httpd
In our Scenario we would want the bare-metal image of the Red Hat CoreOS and the ignition files to be available through HTTP.
We will install Apache HTTPD and use it’s public directory to publish the files :

$ yum install httpd -y

Now we will create the directories we are going to use in the installation :

$ mkdir /var/www/html/pub
$ mkdir /var/www/html/pub/{pxe,ign}

Let’s copy the kernel and the initramfs to to the tftboot directory and the raw.gz to the pxe directory :

$ mkdir /var/lib/tftpboot/coreos/$ cp ${REGISTRY_BASE}/downloads/images/fedora-coreos-31.20200323.3.2-live-initramfs.x86_64.img /var/lib/tftpboot/coreos/fcos-initramfs.x86_64.img$ cp ${REGISTRY_BASE}/downloads/images/fedora-coreos-31.20200323.3.2-live-kernel-x86_64 /var/lib/tftpboot/coreos/fcos-kernel-x86_64$ cp ${REGISTRY_BASE}/downloads/images/fedora-coreos-31.20200323.3.2-metal.x86_64.raw.xz /var/www/html/pub/pxe/fedora-coreos-metal.x86_64.raw.xz$ cp ${REGISTRY_BASE}/downloads/images/fedora-coreos-31.20200323.3.2-metal.x86_64.raw.xz.sig /var/www/html/pub/pxe/fedora-coreos-metal.x86_64.raw.xz.sig

For SElinux make sure the raw.xz file has the http tags it needs for SElinux to allow it :

$ cd /var/www/html/pub/pxe/$ ls -lZ fedora-coreos-metal.x86_64.raw.xz
-rw-r--r--. root root unconfined_u:object_r:httpd_sys_content_t:s0 fedora-coreos-metal.x86_64.raw.xz

if for any reason it is not , you need to modify it :

$ emanage fcontext -a -t httpd_sys_content_t "/var/www/html(/.*)?"
$ restorecon -R -v /var/www/html

And last but not least … start the service

$ systemctl start httpd && systemctl enable httpd

and firewall

$ export FIREWALLD_DEFAULT_ZONE=`firewall-cmd --get-default-zone`$ echo ${FIREWALLD_DEFAULT_ZONE}
public
$ firewall-cmd --add-service=http --permanent --zone=${FIREWALLD_DEFAULT_ZONE}$ firewall-cmd --add-service=https --permanent --zone=${FIREWALLD_DEFAULT_ZONE}$ firewall-cmd --reload

Machine by MAC address

Now we need to create (or obtain if it is a bare metal installation) 7 machines which will be split to :

3 masters

3 workers

1 bootstrap server

Once the Machines are created obtain their MAC addresses and add them to the dhcpd.conf file (see example above) and then we will link them to the appropiate server type.

On the tftpboot directory we will create the necessary files to make sure each server boot with it’s own type.
Go to the /var/lib/tftpboot/ directory and create a new directory named pxelinux.cfg (we need it for the PXE process)

$ cd /var/lib/tftpboot/
$ mkdir pxelinux.cfg

Now we will create 3 files. One for each type :

$ touch master worker bootstrap

And now we will update the files accordingly.
For the bootstrap :

$ cat > bootstrap << EOF
DEFAULT pxeboot
TIMEOUT 5
PROMPT 0
LABEL pxeboot
KERNEL coreos/fcos-kernel-x86_64
APPEND ip=dhcp rd.neednet=1 initrd=coreos/fcos-initramfs.x86_64.img coreos.inst=yes console=tty0 console=ttyS0 coreos.inst.insecure=yes coreos.inst.install_dev=/dev/sda coreos.inst.image_url=http://bastion.example.com/pub/pxe/fedora-coreos-metal.x86_64.raw.xz coreos.inst.ignition_url=http://bastion.example.com/pub/ign/bootstrap.ign
EOF

For the master :

$ cat > master << EOF
DEFAULT pxeboot
TIMEOUT 5
PROMPT 0
LABEL pxeboot
KERNEL coreos/fcos-kernel-x86_64
APPEND ip=dhcp rd.neednet=1 initrd=coreos/fcos-initramfs.x86_64.img console=tty0 console=ttyS0 coreos.inst=yes
coreos.inst.insecure=yes coreos.inst.install_dev=sda coreos.inst.image_url=http://bastion.example.com/pub/pxe/fedora-coreos-metal.x86_64.raw.xz coreos.inst.ignition_url=http://bastion.exmaple.com/pub/ign/master.ign
EOF

And for the worker :

$ cat > worker << EOF
DEFAULT pxeboot
TIMEOUT 5
PROMPT 0
LABEL pxeboot
KERNEL coreos/fcos-kernel-x86_64
APPEND ip=dhcp rd.neednet=1 initrd=coreos/fcos-initramfs.x86_64.img console=tty0 console=ttyS0 coreos.inst=yes
coreos.inst.insecure=yes coreos.inst.install_dev=sda coreos.inst.image_url=http://bastion.example.com/pub/pxe/fedora-coreos-metal.x86_64.raw.xz coreos.inst.ignition_url=http://bastion.exmaple.com/pub/ign/worker.ign
EOF

NOTE
do not worry about the ign (ignition) files , we will create them in a minute.

Our last act here is to link the MAC addresses to the right profile file with 01 at the beginning and dash “-” instend of “:”.
I created a small bash function that does the work :

$ mac-pxe-update() {
ln -s $1 $(echo "$2" | sed 's/^/01-/g' | sed 's/:/-/g')
}

Now run the command where the machine type is the first argument and the mac is the second :

$ mac-pxe-update bootstrap <BOOTSTRAP MAC> #(and so on for all the MACS)

The Installation

If everything is working up until this point then we are good to go and start the installation.

I recommend creating a new user rather then using root , mostly because it is bast practice and to be more specific the rest of the installation process does not require root privileges so there is no need of using it.

$ useradd okd

Before we continue I suggest making sure that our newly created user will be able to access everything it needs.
For that we will use the Linux ACL and a tools called alternative to make sure we have everything we need available.

ACL
The first thing that we need to do is to make sure our user “ocp” has access to the directories (and sub directories) we created in our infrastructure (read part 1 for more information)

So first lets grant read, write and execute permission to out download directory and make sure it will be the default option for future directories :

$ setfacl -m u:okd:rwx -R ${REGISTRY_BASE}
$ setfacl -d -m u:okd:rwx -R ${REGISTRY_BASE}
$ setfacl -m u:okd:rwx -R /var/www/html/pub
$ setfacl -d -m u:okd:rwx -R /var/www/html/pub

Just like we did in the external server we will take all the installation binaries to one directory and use the alternatives to link them to /usr/bin/

$ mkdir /opt/okd/$ mkdir /opt/okd/bin$ cd ${REGISTRY_BASE}/downloads/tools/$ cp oc kubectl openshift-install /opt/okd/bin$ alternatives --install /usr/bin/oc oc /opt/okd/bin/oc 10$ alternatives --install /usr/bin/kubectl kubectl /opt/okd/bin/kubectl 10$ alternatives --install /usr/bin/openshift-install openshift-install /opt/okd/bin/openshift-install 10

Before we continue , let’s make sure that we are using the right openshift-install binary:

$ openshift-install version
openshift-install 4.4.0-0.okd-2020-04-16-052218
built from commit e0b9dedd751543fbc01066a3049ff000e60b1459
release image registry:5000/origin/release@sha256:c31e6e53dba66c80bcbb7e4f376fefed812260da4dec3bf66ed1e3de3ed28c62

If your result is the same as my we are good to go.

Bash auto completion
To make our life easier the tools are been deployed with a set os templates to enable use to use those tools with bash auto completion.
To generate the bash auto completion scripts run the following command :

$ yum install -y bash-completion.noarch bash-completion-extras.noarch$ oc completion bash > /etc/bash_completion.d/oc$ openshift-install completion bash > /etc/bash_completion.d/openshift-install$ source ~/.bashrc

Install config

To get the installation running we need to switch to our new user and start a screen session:

$ su - okd
$ screen -S okd

Generate SSH key
One of the keys we need to add to the installation template is a public ssh key of the user which will be able to login with the “core” users to our cluster servers.
In order to generate the key run the ssh-keygen coomand :

$ ssh-keygen -t rsa -N '' -f ~/.ssh/id_rsa

Now under our new user’s home directory , we will create a new directory named (what ever you want) “install” and switch to it:

$ mkdir okd4
$ cd okd4

Let’s set a few variable that will come in handy.
if we want we can catch the repository name into an environment variable which we will use later on :

$ INTERNAL_REPO_NAME=`curl -u admin:admin https://registry:5000/v2/_catalog | jq .repositories | grep origin`
#(or export INTERNAL_REPO_NAME="origin/release" )

You should see your repository in the variable :

$ echo $INTERNAL_REPO_NAME
"origin/release"

Optional
If you have other repositories and you see a comma at the end of the output , then you can remove it with the following command :

$ INTERNAL_REPO_NAME=`echo ${INTERNAL_REPO_NAME} | sed 's/\,//'`

If you remember (and you do) we create a template for our “install-config.yaml” file at the “${REGISTRY_BASE}/downloads/tools/” directory so lets copy it from there

$ export REGISTRY_BASE=/opt/registry$ cp ${REGISTRY_BASE}/downloads/tools/install-config.yaml ~/okd4/

If you are lazzy like and you want to create the file quickly just run the following command:

Create the install-config.yaml skeleton :

$ export CLUSTER_NAME="test-cluster"$ export CLUSTER_DOMAIN="example.com"$ cat > install-config.yaml << EOF
apiVersion: v1
baseDomain: ${CLUSTER_DOMAIN}
controlPlane:
name: master
hyperthreading: Disabled
replicas: 3
compute:
- name: worker
hyperthreading: Disabled
replicas: 3
metadata:
name: ${CLUSTER_NAME}
networking:
clusterNetworks:
- cidr: 10.128.0.0/14
hostPrefix: 23
machineNetwork:
- cidr: 172.18.0.0/16
networkType: OpenShiftSDN
serviceNetwork:
- 172.30.0.0/16
platform:
none: {}
fips: false
EOF

Now add the registry pull-secret

$ REG_SECRET=`echo -n 'admin:admin' | base64 -w0`$ echo -n "pullSecret: '" >> install-config.yaml && echo '{ "auths": {}}' | jq '.auths += {"registry:5000": {"auth": "REG_SECRET","email": "me@working.me"}}' | sed "s/REG_SECRET/$REG_SECRET/" | jq -c . | sed "s/$/\'/g" >> install-config.yaml

Attaching the ssh key:

$ echo -n "sshKey: '" >> install-config.yaml && cat ~/.ssh/id_rsa.pub | sed "s/$/\'/g" >> install-config.yaml

Adding the Registry CA:
(make sure you obtain the registry CA and you save it in a base64 format in a file named ca.crt)

$ echo "additionalTrustBundle: |" >> install-config.yaml $ cat ${REGISTRY_BASE}/certs/ca.crt | sed 's/^/\ \ \ \ \ /g' >> install-config.yaml

And finally, adding the “imageContentSources” extension :

$ cat ${REGISTRY_BASE}/downloads/secrets/mirror-output.txt | grep -A7 imageContentSources >> install-config.yaml

NOTE
After each command you can run cat on the file to see the result :

$ cat install-config.yaml

backup

There is a very probable chance you will need to run this installation more then once. in order to save time keep a backup of your install-config.yaml file in your home directory:

cp install-config.yaml ~/install-config.yaml.bck

The Installation begins

Generate the Kubernetes manifests for the cluster:

$ openshift-install create manifests --dir=./

Modify the manifests/cluster-scheduler-02-config.yml Kubernetes manifest file to prevent Pods from being scheduled on the control plane machines:
A. Open the manifests/cluster-scheduler-02-config.yml file.
B. Locate the mastersSchedulable parameter and set its value to False.
C. Save and exit the file.

Our next step is to generate the ignision files for the bootstrap , the masters and for the workers. in order to do that we need to run the openshift-install with the following arguments :

$ openshift-install create ignition-configs --dir=./

After running this command you will see the list of the relevant folders and ignition files:

openshift-install — ignition

Next we need to copy our ignition file to our Apache directory so they will be available over HTTP during the installation (make sure they are readable to all)

$ cp * /var/www/html/pub/ign/
$ chmod a+r /var/www/html/pub/ign/*

PXE-Less Install

I case your PXE configuration do not work or your environment do not allow you to run PXE here are the steps you should take:

boot the Sever from the livecd

make sure that the boot order is first the HD and then the CDROM and wipe (or generate) a clean disk

once you boot the server let the livecd boot until the prompt screen

once it is booted setup the IP address or if your DHCP is running make sure you have connectivite ( a PING to the default gateway should be sufficient)

copy the ignition file and the image to the running OS with curl :

$ curl -LO http://bastion/pub/ign/bootstrap.ign

and download the image :

$ curl -LO http://bastion/pub/pxe/fedora-coreos-metal.x86_64.raw.xz

once both files are in your working directory , run the coreos-installer command :

$ sudo coreos-installer install /dev/sda -f fedora-coreos-metal.x86_64.raw.xz -i bootstrap.ign --insecure

Repeat this step for all the server with the right ignition file.

Booting the Servers
Make sure you are on bios mode and the boot order is first HD and them PXE (HD wipped).
Next boot the machines in the following order :

  1. bootstrap
  2. masters
  3. workers

Bootstrap
A good and quick implementation depends on the bootstrap server.
Once the boot has completed wait 5 minutes and try a telnet command on port 22623 from the bastion to the bootstrap

$ telnet bootstarp 22623

if your conection is successfull only then can we contiune to the boot the masters. If it is not login to the server and runt journalctl :

$ ssh core@bootsrap "journalctl -xe"

and look for errors. Make sure the DNS is correct , the registry is up and running (and accessable from other servers) and make sure there are no firewall/selinux issues.
90% of the times once the bootstrap is up and running the other server should have no issue during their boot process.

Testing the registry
this is a very important point , make sure you are able to access your registry from the bootstrap server, this test will save you a lot of time (and frustration) later on .

$ curl -u admin:admin https://registry:5000/v2/_catalog

If everything goes well we can continue with the installation so first we need to exit the bootstrap server

$ exit

Openshift-install (continue)

now we will first run the bootstrap installation :

$ openshift-install --dir=./ wait-for bootstrap-complete --log-level debug

We will see an output similer to this one :

INFO Waiting up to 30m0s for the Kubernetes API at https://api.ocp4.example.com:6443...
INFO API v1.13.4+b626c2fe1 up
INFO Waiting up to 30m0s for the bootstrap-complete event…

You can follow the installation process on the bootstrap server.
I suggest you look for errors in the journal , they can be very explainatory and help you understand what goes worng

$ ssh core@bootstrap "journalctl -xe"

After bootstrap process is complete, remove the bootstrap machine from the load balancer.

IMPORTANT
You must remove the bootstrap machine from the load balancer at this
point. You can also remove or reformat the machine itself !!!

Logging into the cluster

$ export KUBECONFIG=/home/okd/okd4/auth/kubeconfig
$ oc whoami
$ system:admin

Approving the CSRs for your machines

When you add machines to a cluster, two pending certificate signing request (CSRs) are generated for each machine that you added. You must confirm that these CSRs are approved or, if necessary, approve them yourself.
Confirm that the cluster recognizes the machines:

$ oc get node
NAME STATUS ROLES AGE VERSION
master-0 Ready master 63m v1.13.4+b626c2fe1
master-1 Ready master 63m v1.13.4+b626c2fe1
master-2 Ready master 64m v1.13.4+b626c2fe1
worker-0 NotReady worker 76s v1.13.4+b626c2fe1
worker-1 NotReady worker 70s v1.13.4+b626c2fe1

NOTE

if you only see the masters that means you need to approve the CSR for the worker nodes. Once you approve them you will see the Workers in “NotReady” state at the beginning.
This is a normal behavior.

let’s list the CSR

$ oc get csr
NAME AGE REQUESTOR CONDITION
csr-8b2br 15m system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending

If the CSRs were not approved, after all of the pending CSRs for the machines you added are in Pending status, approve the CSRs for your cluster machines:
To approve them individually, run the following command for each valid CSR:

$ oc adm certificate approve <csr_name>

If all the CSRs are valid, approve them all by running the following command:

$ oc get csr -ojson | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve

You may need to repeat this process several times until you see all the nodes as available.

Initial Operator configuration

$ watch -n5 oc get clusteroperators

In this phase you have to wait up to 15 min to all operators to go to Available True state.

Completing installation on user-provisioned infrastructure
After you complete the Operator configuration, you can finish installing the cluster on infrastructure that you provide.

Confirm that all cluster components are online

completing the installation

Now to complete the installation run :

$ openshift-install --dir=./ wait-for install-complete 2>&1 | tee install-complete

This will give you an output of you console login with the admin user and credentials to login

If you have any question feel free to responed/ leave a comment.
You can find on linkedin at : https://www.linkedin.com/in/orenoichman
Or twitter at : https://twitter.com/ooichman

--

--

Oren Oichman
Oren Oichman

Written by Oren Oichman

Open Source contributer for the past 15 years

Responses (1)