Building Custom Ansible based Operator for OpenShift 4

About the Tutorial

For a long time building a custom operator has been a difficult task for those of us who wanted to delegate our load to “by demand” procedure so that customer will self consume what they want without ping pong the request between the Cluster admin and them.

As you may know OpenShift 4 deployment is based on Operators which made me look deeper into it and I found out there is a very simple way of writing the Operator by our self with a simple Ansible role (I am referring to a very basic operator).

For this tutorial I expect you to come with a basic (or more) knowledge and understanding of the Ansible k8s module and have used it before for several playbook and rules.

External Sources

Before you go through this tutorial I want to encorage you to check the following external resource. though I am trying to be self explainatory as I can the following link provide a deeper and more conprehansive understanding of the topics and help me a lot to build this tutorial:

  1. about the operator-sdk :
  2. Deploying the operator-sdk :
  3. Ansible operator tutorial :
  4. Ansible Kubernetes module :


This project is a component of the Operator Framework, an open source toolkit to manage Kubernetes native applications, called Operators, in an effective, automated, and scalable way. Read more in the introduction blog post.

Operators make it easy to manage complex stateful applications on top of Kubernetes. However writing an operator today can be difficult because of challenges such as using low level APIs, writing boilerplate, and a lack of modularity which leads to duplication.

The Operator SDK is a framework that uses the controller-runtime library to make writing operators easier by providing:

  • High level APIs and abstractions to write the operational logic more intuitively
  • Tools for scaffolding and code generation to bootstrap a new project fast
  • Extensions to cover common operator use cases

in it’s current version the operators generated by the operator-sdk (the CRD in particular) only work with Openshift 4 to change the CRD to fit Openshift 3.11 please find examples and modify the CRD accordingly.

To get started we need to download the “operator-sdk” binary which will generate the template we need for the operator.

Set the release version variable:

$ export RELEASE_VERSION=v0.16.0

Now download the GNU binary

$ curl -LO${RELEASE_VERSION}/operator-sdk-${RELEASE_VERSION}-x86_64-linux-gnu
$ curl -LO${RELEASE_VERSION}/operator-sdk-${RELEASE_VERSION}-x86_64-linux-gnu.asc

To verify a release binary using the provided asc files, place the binary and corresponding asc file into the same directory and use the corresponding command:

$ gpg --verify operator-sdk-${RELEASE_VERSION}-x86_64-linux-gnu.asc

If you do not have the maintainers public key on your machine, you will get an error message similar to this:

$ gpg --verify operator-sdk-${RELEASE_VERSION}-x86_64-apple-darwin.asc
$ gpg: assuming signed data in 'operator-sdk-${RELEASE_VERSION}-x86_64-apple-darwin'
$ gpg: Signature made Fri Apr 5 20:03:22 2019 CEST
$ gpg: using RSA key <KEY_ID>
$ gpg: Can't check signature: No public key

To download the key, use the following command, replacing $KEY_ID with the RSA key string provided in the output of the previous command:

$ gpg --recv-key "$KEY_ID"

You’ll need to specify a key server if one hasn’t been configured. For example:

$ gpg --keyserver --recv-key "$KEY_ID"

Now you should be able to verify the binary.

$ chmod +x operator-sdk-${RELEASE_VERSION}-x86_64-linux-gnu $ sudo mkdir -p /usr/local/bin/$ sudo cp operator-sdk-${RELEASE_VERSION}-x86_64-linux-gnu /usr/local/bin/operator-sdk $ rm -f operator-sdk-${RELEASE_VERSION}-x86_64-linux-gnu


The SDK provides workflows to develop operators in Go, Ansible, or Helm. In this tutorial we will be focusing on Ansible.

The following workflow is for a new Ansible operator:

  1. Create a new operator project using the SDK Command Line Interface(CLI)
  2. Write the reconciling logic for your object using ansible playbooks and roles
  3. Use the SDK CLI to build and generate the operator deployment manifests
  4. Optionally add additional CRD’s using the SDK CLI and repeat steps 2 and 3

First lets create a project with our new tool We’ll be building a Memcached Ansible Operator for the remainder of this tutorial :

$ operator-sdk new memcached-operator --type=ansible --kind=Memcached

now lets look at the TREE of our new object :

$ tree memcached-operator
├── build
│ ├── Dockerfile
│ └── test-framework
│ ├──
│ └── Dockerfile
├── deploy
│ ├── crds
│ │ ├── cache_v1alpha1_memcached_crd.yaml
│ │ └── cache_v1alpha1_memcached_cr.yaml
│ ├── operator.yaml
│ ├── role_binding.yaml
│ ├── role.yaml
│ └── service_account.yaml
├── molecule
│ ├── default
│ │ ├── asserts.yml
│ │ ├── molecule.yml
│ │ ├── playbook.yml
│ │ └── prepare.yml
│ ├── test-cluster
│ │ ├── molecule.yml
│ │ └── playbook.yml
│ └── test-local
│ ├── molecule.yml
│ ├── playbook.yml
│ └── prepare.yml
├── roles
│ └── memcached
│ ├── defaults
│ │ └── main.yml
│ ├── files
│ ├── handlers
│ │ └── main.yml
│ ├── meta
│ │ └── main.yml
│ ├──
│ ├── tasks
│ │ └── main.yml
│ ├── templates
│ └── vars
│ └── main.yml
└── watches.yaml
17 directories, 25 files

Now lets change the directory :

$ cd memcached-operator

To speed development of our Operator up, we can reuse an existing Role. We will install a Role from Ansible Galaxy into our Operator:

dymurray.memcached_operator_role (

Run to install the Ansible Role inside of the project:

$ ansible-galaxy install dymurray.memcached_operator_role -p ./roles$ ls roles/
dymurray.memcached_operator_role Memcached

Since we’ll be reusing the logic from ‘dymurray.memcached_operator_role’, we can safely delete the placeholder Role generated by the the ‘operator-sdk new’ command we ran previously.

$ rm -rf ./roles/memcached

By default, the memcached-operator watches Memcached resource events as shown in watches.yaml and executes Ansible Role Memcached.

Since we have swapped out the original Role for one from Ansible Galaxy, lets change the Watches file to reflect this:

Copy to Editor---
- version: v1alpha1
kind: Memcached
role: /opt/ansible/roles/dymurray.memcached_operator_role

Before running the Operator, Kubernetes needs to know about the new custom resource definition the Operator will be watching

$ oc create -f deploy/crds/cache_v1alpha1_memcached_crd.yaml

By running this command, we are creating a new resource type, memcached, on the cluster. We will give our Operator work to do by creating and modifying resources of this type.

Once the CRD is registered, there are two ways to run the Operator:

  • As a pod inside an Openshift cluster
  • As a go program outside the cluster using operator-sdk

For the sake of this tutorial, we will run the Operator as a pod inside of a Openshift Cluster. If you are interested in learning more about running the Operator using operator-sdk.

Running as a pod inside a Openshift cluster is preferred for production use.

Let’s build the memcached-operator image:

$ operator-sdk build memcached-operator:v0.0.1 --image-builder buildah

As you can see I am using “buildah” as a container builder so we need to make sure the buildah packages and the podman package are already installed.

Kubernetes deployment manifests are generated by ‘operator-sdk new’ in deploy/operator.yaml. We need to make a few changes to this file.

  • image placeholder 'REPLACE_IMAGE' should be set to the previously-built image.
  • imagePullPolicy from 'Always' to 'Never' since we aren't pushing our image to a registry.
apiVersion: apps/v1
kind: Deployment
name: memcached-operator
# [...]
- name: memcached-operator
# Replace 'REPLACE_IMAGE' with the built image name
# Replace 'Always' with 'Never'
imagePullPolicy: Always
# [...]

The commands below will change the Deployment ‘image’ and ‘imagePullPolicy’ respectively.

$ sed -i 's|{{ REPLACE_IMAGE }}|memcached-operator:v0.0.1|g' deploy/operator.yaml$ sed -i "s|{{ pull_policy\|default('Always') }}|Never|g" deploy/operator.yaml

Creating the Operator from deploy manifests

Now, we are ready to deploy the memcached-operator:

Create a Project for the Operator to run in

$ oc new-project tutorial

Create Service Account for Operator to run as

$ oc create -f deploy/service_account.yaml

Create OpenShift Role specifying Operator Permissions

$ oc create -f deploy/role.yaml

Create OpenShift Role Binding assigning Permissions to Service Account

$ oc create -f deploy/role_binding.yaml

Create Operator Deployment Object

$ oc create -f deploy/operator.yaml

Note: role.yaml and role_binding.yaml describe cluster-wide resources. Creating these requires elevated permissions.

Verify that the memcached-operator is running:

$ oc get deployment
memcached-operator 1 1 1 1 1m

Now that we have deployed our Operator, let’s create a CR and deploy an instance of memcached.

There is a sample CR in the scaffolding created as part of the Operator SDK:

kind: Memcached
name: example-memcached
# Add fields here
size: 3

Let’s go ahead and apply this in our Tutorial project to deploy 3 memcached pods, using our Operator:

# deploy/crds/cache_v1alpha1_memcached_cr.yaml
kind: Memcached
name: example-memcached
size: 3

Now run the the “oc create” command

$ oc create -f deploy/crds/cache_v1alpha1_memcached_cr.yaml

Ensure that the memcached-operator creates the deployment for the CR:

$ oc get deployment
memcached-operator 1 1 1 1 2m
example-memcached 3 3 3 3 1m

Custom Variables

To pass ‘extra vars’ to the Playbooks/Roles being run by the Operator, you can embed key-value pairs in the ‘spec’ section of the Custom Resource (CR).

This is equivalent to how — extra-vars can be passed into the ansible-playbook command.

The CR snippet below shows two ‘extra vars’ (message and newParamater) being passed in via spec. Passing 'extra vars' through the CR allows for customization of Ansible logic based on the contents of each CR instance.

# Sample CR definition where some 
# 'extra vars' are passed via the spec
apiVersion: ""
kind: "Database"
name: "example"
message: "Hello world 2"
newParameter: "newParam"

Accessing CR Fields

Now that you’ve passed ‘extra vars’ to your Playbook through the CR spec, we need to read them from the Ansible logic that makes up your Operator.

Variables passed in through the CR spec are made available at the top-level to be read from Jinja templates. For the CR example above, we could read the vars ‘message’ and ‘newParameter’ from a Playbook like so:

- debug:
msg: "message value from CR spec: {{ message }}"

- debug:
msg: "newParameter value from CR spec: {{ new_parameter }}"

Did you notice anything strange about the snippet above? The ‘newParameter’ variable that we set on our CR spec was accessed as ‘new_parameter’. Keep this automatic conversion from camelCase to snake_case in mind, as it will happen to all ‘extra vars’ passed into the CR spec.

Refer to the next section for further info on reaching into the JSON structure exposed in the Ansible Operator runtime environment.

When a reconciliation job runs, the content of the associated CR is made available as variables in the Ansible runtime environment.

The JSON below is an example of what gets passed into ansible-runner (the Ansible Operator runtime).

Note that vars added to the ‘spec’ section of the CR (‘message’ and ‘new_parameter’) are placed at the top-level of this structure for easy access.

{ "meta": {
"name": "<cr-name>",
"namespace": "<cr-namespace>",
"message": "Hello world 2",
"new_parameter": "newParam",
"_app_example_com_database": {
<Full CR>

The meta fields provide the CR 'name' and 'namespace' associated with a reconciliation job. These and other nested fields can be accessed with dot notation in Ansible.

- debug:
msg: "name: {{ }}, namespace: {{ meta.namespace }}"

That is it

Have FUN !!!




Open Source contributer for the past 15 years

Love podcasts or audiobooks? Learn on the go with our new app.

[Java] Method References

Introducing WunderQL — A new performance testing tool for your GraphQL server

Compact Strings in Java 9

Python Game Development Frameworks for Developers

What is DTO and Mapstruct? — asked 5 years old daughter :)

Hosted Services in .Net Core

ProtocolBuffer — TL;DR

Elasticsearch Playground Setup

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Oren Oichman

Oren Oichman

Open Source contributer for the past 15 years

More from Medium

Kubernetes in production

Practical example of how to set requests and limits on Kubernetes

Canary Deployment with Istio in Kubernetes

Learning note for Kubernetes - Basic command cheat sheet