Building Custom Ansible based Operator for OpenShift 4

Oren Oichman
8 min readApr 1, 2020

About the Tutorial

For a long time building a custom operator has been a difficult task for those of us who wanted to delegate our load to “by demand” procedure so that customer will self consume what they want without ping pong the request between the Cluster admin and them.

As you may know OpenShift 4 deployment is based on Operators which made me look deeper into it and I found out there is a very simple way of writing the Operator by our self with a simple Ansible role (I am referring to a very basic operator).

For this tutorial I expect you to come with a basic (or more) knowledge and understanding of the Ansible k8s module and have used it before for several playbook and rules.

External Sources

Before you go through this tutorial I want to encorage you to check the following external resource. though I am trying to be self explainatory as I can the following link provide a deeper and more conprehansive understanding of the topics and help me a lot to build this tutorial:

  1. about the operator-sdk : https://github.com/operator-framework/operator-sdk
  2. Deploying the operator-sdk : https://github.com/operator-framework/operator-sdk/blob/master/doc/user/install-operator-sdk.md
  3. Ansible operator tutorial : https://learn.openshift.com/ansibleop/ansible-operator-overview/
  4. Ansible Kubernetes module : https://docs.ansible.com/ansible/latest/modules/k8s_module.html

operator-sdk

This project is a component of the Operator Framework, an open source toolkit to manage Kubernetes native applications, called Operators, in an effective, automated, and scalable way. Read more in the introduction blog post.

Operators make it easy to manage complex stateful applications on top of Kubernetes. However writing an operator today can be difficult because of challenges such as using low level APIs, writing boilerplate, and a lack of modularity which leads to duplication.

The Operator SDK is a framework that uses the controller-runtime library to make writing operators easier by providing:

  • High level APIs and abstractions to write the operational logic more intuitively
  • Tools for scaffolding and code generation to bootstrap a new project fast
  • Extensions to cover common operator use cases

NOTE!!

in it’s current version the operators generated by the operator-sdk (the CRD in particular) only work with Openshift 4 to change the CRD to fit Openshift 3.11 please find examples and modify the CRD accordingly.

Deploying

To get started we need to download the “operator-sdk” binary which will generate the template we need for the operator.

Download the release binary

Set the release version variable:

$ export RELEASE_VERSION=v0.16.0

Now download the GNU binary

$ curl -LO https://github.com/operator-framework/operator-sdk/releases/download/${RELEASE_VERSION}/operator-sdk-${RELEASE_VERSION}-x86_64-linux-gnu

Verify the downloaded release binary

$ curl -LO https://github.com/operator-framework/operator-sdk/releases/download/${RELEASE_VERSION}/operator-sdk-${RELEASE_VERSION}-x86_64-linux-gnu.asc

To verify a release binary using the provided asc files, place the binary and corresponding asc file into the same directory and use the corresponding command:

$ gpg --verify operator-sdk-${RELEASE_VERSION}-x86_64-linux-gnu.asc

If you do not have the maintainers public key on your machine, you will get an error message similar to this:

$ gpg --verify operator-sdk-${RELEASE_VERSION}-x86_64-apple-darwin.asc
$ gpg: assuming signed data in 'operator-sdk-${RELEASE_VERSION}-x86_64-apple-darwin'
$ gpg: Signature made Fri Apr 5 20:03:22 2019 CEST
$ gpg: using RSA key <KEY_ID>
$ gpg: Can't check signature: No public key

To download the key, use the following command, replacing $KEY_ID with the RSA key string provided in the output of the previous command:

$ gpg --recv-key "$KEY_ID"

You’ll need to specify a key server if one hasn’t been configured. For example:

$ gpg --keyserver keyserver.ubuntu.com --recv-key "$KEY_ID"

Now you should be able to verify the binary.

Install the release binary in your PATH

$ chmod +x operator-sdk-${RELEASE_VERSION}-x86_64-linux-gnu $ sudo mkdir -p /usr/local/bin/$ sudo cp operator-sdk-${RELEASE_VERSION}-x86_64-linux-gnu /usr/local/bin/operator-sdk $ rm -f operator-sdk-${RELEASE_VERSION}-x86_64-linux-gnu

WorkFlow

The SDK provides workflows to develop operators in Go, Ansible, or Helm. In this tutorial we will be focusing on Ansible.

The following workflow is for a new Ansible operator:

  1. Create a new operator project using the SDK Command Line Interface(CLI)
  2. Write the reconciling logic for your object using ansible playbooks and roles
  3. Use the SDK CLI to build and generate the operator deployment manifests
  4. Optionally add additional CRD’s using the SDK CLI and repeat steps 2 and 3

First lets create a project with our new tool We’ll be building a Memcached Ansible Operator for the remainder of this tutorial :

$ operator-sdk new memcached-operator --type=ansible --api-version=cache.example.com/v1alpha1 --kind=Memcached

now lets look at the TREE of our new object :

$ tree memcached-operator
memcached-operator
├── build
│ ├── Dockerfile
│ └── test-framework
│ ├── ansible-test.sh
│ └── Dockerfile
├── deploy
│ ├── crds
│ │ ├── cache_v1alpha1_memcached_crd.yaml
│ │ └── cache_v1alpha1_memcached_cr.yaml
│ ├── operator.yaml
│ ├── role_binding.yaml
│ ├── role.yaml
│ └── service_account.yaml
├── molecule
│ ├── default
│ │ ├── asserts.yml
│ │ ├── molecule.yml
│ │ ├── playbook.yml
│ │ └── prepare.yml
│ ├── test-cluster
│ │ ├── molecule.yml
│ │ └── playbook.yml
│ └── test-local
│ ├── molecule.yml
│ ├── playbook.yml
│ └── prepare.yml
├── roles
│ └── memcached
│ ├── defaults
│ │ └── main.yml
│ ├── files
│ ├── handlers
│ │ └── main.yml
│ ├── meta
│ │ └── main.yml
│ ├── README.md
│ ├── tasks
│ │ └── main.yml
│ ├── templates
│ └── vars
│ └── main.yml
└── watches.yaml
17 directories, 25 files

Now lets change the directory :

$ cd memcached-operator

To speed development of our Operator up, we can reuse an existing Role. We will install a Role from Ansible Galaxy into our Operator:

dymurray.memcached_operator_role (galaxy.ansible.com)

Run to install the Ansible Role inside of the project:

$ ansible-galaxy install dymurray.memcached_operator_role -p ./roles$ ls roles/
dymurray.memcached_operator_role Memcached

Removing the unneeded Role

Since we’ll be reusing the logic from ‘dymurray.memcached_operator_role’, we can safely delete the placeholder Role generated by the the ‘operator-sdk new’ command we ran previously.

$ rm -rf ./roles/memcached

Correcting the Watches File

By default, the memcached-operator watches Memcached resource events as shown in watches.yaml and executes Ansible Role Memcached.

Since we have swapped out the original Role for one from Ansible Galaxy, lets change the Watches file to reflect this:

Copy to Editor---
- version: v1alpha1
group: cache.example.com
kind: Memcached
role: /opt/ansible/roles/dymurray.memcached_operator_role

Build and run the Operator

Before running the Operator, Kubernetes needs to know about the new custom resource definition the Operator will be watching

$ oc create -f deploy/crds/cache_v1alpha1_memcached_crd.yaml

By running this command, we are creating a new resource type, memcached, on the cluster. We will give our Operator work to do by creating and modifying resources of this type.

Ways to Run an Operator

Once the CRD is registered, there are two ways to run the Operator:

  • As a pod inside an Openshift cluster
  • As a go program outside the cluster using operator-sdk

For the sake of this tutorial, we will run the Operator as a pod inside of a Openshift Cluster. If you are interested in learning more about running the Operator using operator-sdk.

Building

Running as a pod inside a Openshift cluster is preferred for production use.

Let’s build the memcached-operator image:

$ operator-sdk build memcached-operator:v0.0.1 --image-builder buildah

As you can see I am using “buildah” as a container builder so we need to make sure the buildah packages and the podman package are already installed.

Modifying the Operator deploy manifest

Kubernetes deployment manifests are generated by ‘operator-sdk new’ in deploy/operator.yaml. We need to make a few changes to this file.

  • image placeholder 'REPLACE_IMAGE' should be set to the previously-built image.
  • imagePullPolicy from 'Always' to 'Never' since we aren't pushing our image to a registry.
apiVersion: apps/v1
kind: Deployment
metadata:
name: memcached-operator
spec:
# [...]
containers:
- name: memcached-operator
# Replace 'REPLACE_IMAGE' with the built image name
image: REPLACE_IMAGE
# Replace 'Always' with 'Never'
imagePullPolicy: Always
# [...]

The commands below will change the Deployment ‘image’ and ‘imagePullPolicy’ respectively.

$ sed -i 's|{{ REPLACE_IMAGE }}|memcached-operator:v0.0.1|g' deploy/operator.yaml$ sed -i "s|{{ pull_policy\|default('Always') }}|Never|g" deploy/operator.yaml

Creating the Operator from deploy manifests

Now, we are ready to deploy the memcached-operator:

Create a Project for the Operator to run in

$ oc new-project tutorial

Create Service Account for Operator to run as

$ oc create -f deploy/service_account.yaml

Create OpenShift Role specifying Operator Permissions

$ oc create -f deploy/role.yaml

Create OpenShift Role Binding assigning Permissions to Service Account

$ oc create -f deploy/role_binding.yaml

Create Operator Deployment Object

$ oc create -f deploy/operator.yaml

Note: role.yaml and role_binding.yaml describe cluster-wide resources. Creating these requires elevated permissions.

Verify that the memcached-operator is running:

$ oc get deployment
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
memcached-operator 1 1 1 1 1m

Using the Operator

Now that we have deployed our Operator, let’s create a CR and deploy an instance of memcached.

There is a sample CR in the scaffolding created as part of the Operator SDK:

apiVersion: cache.example.com/v1alpha1
kind: Memcached
metadata:
name: example-memcached
spec:
# Add fields here
size: 3

Let’s go ahead and apply this in our Tutorial project to deploy 3 memcached pods, using our Operator:

# deploy/crds/cache_v1alpha1_memcached_cr.yaml
apiVersion: cache.example.com/v1alpha1
kind: Memcached
metadata:
name: example-memcached
spec:
size: 3

Now run the the “oc create” command

$ oc create -f deploy/crds/cache_v1alpha1_memcached_cr.yaml

Ensure that the memcached-operator creates the deployment for the CR:

$ oc get deployment
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
memcached-operator 1 1 1 1 2m
example-memcached 3 3 3 3 1m

Custom Variables

To pass ‘extra vars’ to the Playbooks/Roles being run by the Operator, you can embed key-value pairs in the ‘spec’ section of the Custom Resource (CR).

This is equivalent to how — extra-vars can be passed into the ansible-playbook command.

The CR snippet below shows two ‘extra vars’ (message and newParamater) being passed in via spec. Passing 'extra vars' through the CR allows for customization of Ansible logic based on the contents of each CR instance.

# Sample CR definition where some 
# 'extra vars' are passed via the spec
apiVersion: "app.example.com/v1alpha1"
kind: "Database"
metadata:
name: "example"
spec:
message: "Hello world 2"
newParameter: "newParam"

Accessing CR Fields

Now that you’ve passed ‘extra vars’ to your Playbook through the CR spec, we need to read them from the Ansible logic that makes up your Operator.

Variables passed in through the CR spec are made available at the top-level to be read from Jinja templates. For the CR example above, we could read the vars ‘message’ and ‘newParameter’ from a Playbook like so:

- debug:
msg: "message value from CR spec: {{ message }}"

- debug:
msg: "newParameter value from CR spec: {{ new_parameter }}"

Did you notice anything strange about the snippet above? The ‘newParameter’ variable that we set on our CR spec was accessed as ‘new_parameter’. Keep this automatic conversion from camelCase to snake_case in mind, as it will happen to all ‘extra vars’ passed into the CR spec.

Refer to the next section for further info on reaching into the JSON structure exposed in the Ansible Operator runtime environment.

JSON Structure

When a reconciliation job runs, the content of the associated CR is made available as variables in the Ansible runtime environment.

The JSON below is an example of what gets passed into ansible-runner (the Ansible Operator runtime).

Note that vars added to the ‘spec’ section of the CR (‘message’ and ‘new_parameter’) are placed at the top-level of this structure for easy access.

{ "meta": {
"name": "<cr-name>",
"namespace": "<cr-namespace>",
},
"message": "Hello world 2",
"new_parameter": "newParam",
"_app_example_com_database": {
<Full CR>
},
}

Accessing CR metadata

The meta fields provide the CR 'name' and 'namespace' associated with a reconciliation job. These and other nested fields can be accessed with dot notation in Ansible.

- debug:
msg: "name: {{ meta.name }}, namespace: {{ meta.namespace }}"

That is it

Have FUN !!!

--

--