Kubernetes#

Introduction#

A Helm chart for deploying recruIT on a Kubernetes cluster is available in the main repository's OCI registry and in the MIRACUM charts repository. The chart can be used to deploy the application as well as all dependencies required for it to run (OHDSI WebAPI, OHDSI Atlas, HAPI FHIR server). The chart also includes MailHog, a mock mail server for testing email notifications.

Using the default values provided with the chart, all dependencies are installed and all services are configured to use them.

Setup#

Setup a Kubernetes cluster using your cloud provider of choice OR in a local environment using minikube, KinD, or k3d.
Install kubectl and helm

Installation#

Deploy recruIT to a namespace called recruit by running

helm install -n recruit \
  --create-namespace \
  --render-subchart-notes \
  --set ohdsi.cdmInitJob.enabled=true \
  recruit oci://ghcr.io/miracum/recruit/charts/recruit

As a quick check to make sure everything is running correctly, you can use the following to check the readiness of all services:

$ helm test -n recruit recruit

NAME: recruit
LAST DEPLOYED: Wed May  4 21:45:06 2022
NAMESPACE: recruit
STATUS: deployed
REVISION: 1
TEST SUITE:     recruit-fhirserver-test-endpoints
Last Started:   Wed May  4 22:14:23 2022
Last Completed: Wed May  4 22:14:39 2022
Phase:          Succeeded
TEST SUITE:     recruit-ohdsi-test-connection
Last Started:   Wed May  4 22:14:39 2022
Last Completed: Wed May  4 22:14:43 2022
Phase:          Succeeded
TEST SUITE:     recruit-test-health-probes
Last Started:   Wed May  4 22:14:43 2022
Last Completed: Wed May  4 22:14:49 2022
Phase:          Succeeded
NOTES:
1. Get the screening list URL by running these commands:
  http://recruit-list.127.0.0.1.nip.io/

Example installation of the recruIT chart with ingress support using KinD#

This will demonstrate how to install recruIT on your local machine using KinD using the following advanced features:

create a multi-node Kubernetes cluster to demonstrate topology-zone aware pod spreading for high-availability deployments
expose all user-facing services behing the NGINX ingress controller on a https://nip.io domain resolved to localhost
enable and enforce the restricted Pod Security Standard to demonstrate security best practices followed by all components
pre-load the OMOP CDM database with SynPUF-based sample data

First, create a new cluster with Ingress support:

cat <<EOF | kind create cluster --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
featureGates:
  PodSecurity: true
nodes:
  - role: control-plane
    image: docker.io/kindest/node:v1.26.0@sha256:45aa9ecb5f3800932e9e35e9a45c61324d656cf5bc5dd0d6adfc1b0f8168ec5f
    kubeadmConfigPatches:
      - |
        kind: InitConfiguration
        nodeRegistration:
          kubeletExtraArgs:
            node-labels: "ingress-ready=true"
    extraPortMappings:
      - containerPort: 80
        hostPort: 80
        protocol: TCP
      - containerPort: 443
        hostPort: 443
        protocol: TCP
    labels:
      topology.kubernetes.io/zone: a
  - role: worker
    image: docker.io/kindest/node:v1.26.0@sha256:45aa9ecb5f3800932e9e35e9a45c61324d656cf5bc5dd0d6adfc1b0f8168ec5f
    labels:
      topology.kubernetes.io/zone: b
  - role: worker
    image: docker.io/kindest/node:v1.26.0@sha256:45aa9ecb5f3800932e9e35e9a45c61324d656cf5bc5dd0d6adfc1b0f8168ec5f
    labels:
      topology.kubernetes.io/zone: c
EOF

Install the NGINX ingress controller

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.5.1/deploy/static/provider/kind/deploy.yaml

Wait until it's ready to process requests by running

kubectl wait --namespace ingress-nginx \
  --for=condition=ready pod \
  --selector=app.kubernetes.io/component=controller \
  --timeout=90s

Create a namespace for the new installation. Enable and enforce restricted pod security policies:

kubectl create namespace recruit
kubectl label namespace recruit pod-security.kubernetes.io/enforce=restricted
kubectl label namespace recruit pod-security.kubernetes.io/enforce-version=v1.26

Save the following as values-kind-recruit.yaml, or you can clone this repo and reference the file as -f docs/_snippets/values-kind-recruit.yaml. The ohdsi.cdmInitJob.extraEnv option SETUP_SYNPUF=true means that the OMOP database will be initialized with SynPUF 1K sample patient data.

Documentation for all available chart options

You can find a complete description of all available chart configuration options here: https://github.com/miracum/charts/blob/master/charts/recruit/README.md#configuration

values-kind-recruit.yaml

list:
  resources:
    requests:
      memory: "128Mi"
      cpu: "250m"
    limits:
      memory: "128Mi"
  ingress:
    enabled: true
    hosts:
      - host: recruit-list.127.0.0.1.nip.io
        paths: ["/"]

fhirserver:
  resources:
    requests:
      memory: "3Gi"
      cpu: "2500m"
    limits:
      memory: "3Gi"
  postgresql:
    auth:
      postgresPassword: fhir
  ingress:
    enabled: true
    hosts:
      - host: recruit-fhir-server.127.0.0.1.nip.io
        paths: ["/"]

query:
  resources:
    requests:
      memory: "1Gi"
      cpu: "1000m"
    limits:
      memory: "1Gi"
  webAPI:
    dataSource: "SynPUF-CDMV5"
  omop:
    resultsSchema: synpuf_results
    cdmSchema: synpuf_cdm
  cohortSelectorLabels:
    - "recruIT"

notify:
  resources:
    requests:
      memory: "1Gi"
      cpu: "1000m"
    limits:
      memory: "1Gi"
  rules:
    schedules:
      everyMorning: "0 0 8 1/1 * ? *"
    trials:
      - acronym: "*"
        subscriptions:
          - email: "everything@example.com"
      - acronym: "SAMPLE"
        accessibleBy:
          users:
            - "user1"
            - "user.two@example.com"
        subscriptions:
          - email: "everyMorning@example.com"
            notify: "everyMorning"

mailhog:
  resources:
    requests:
      memory: "64Mi"
      cpu: "250m"
    limits:
      memory: "64Mi"
  ingress:
    enabled: true
    hosts:
      - host: recruit-mailhog.127.0.0.1.nip.io
        paths:
          - path: "/"
            pathType: Prefix

ohdsi:
  atlas:
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "64Mi"
  webApi:
    resources:
      requests:
        memory: "4Gi"
        cpu: "250m"
      limits:
        memory: "4Gi"
  postgresql:
    auth:
      postgresPassword: ohdsi
    primary:
      resources:
        limits:
          memory: 4Gi
          cpu: 2500m
        requests:
          memory: 256Mi
          cpu: 250m
  ingress:
    enabled: true
    hosts:
      - host: recruit-ohdsi.127.0.0.1.nip.io
  cdmInitJob:
    enabled: false
    ttlSecondsAfterFinished: ""
    extraEnv:
      - name: SETUP_SYNPUF
        value: "true"
  achilles:
    schemas:
      cdm: "synpuf_cdm"
      vocab: "synpuf_cdm"
      res: "synpuf_results"
    sourceName: "SynPUF-CDMV5"
  loadCohortDefinitionsJob:
    enabled: false
    cohortDefinitions:
      - |
        {
          "name": "A sample cohort",
          "description": "[acronym=SAMPLE] [recruIT] Sample Cohort containing only female patients older than 90 years.",
          "expressionType": "SIMPLE_EXPRESSION",
          "expression": {
            "ConceptSets": [],
            "PrimaryCriteria": {
              "CriteriaList": [
                {
                  "ObservationPeriod": {
                    "First": true
                  }
                }
              ],
              "ObservationWindow": {
                "PriorDays": 0,
                "PostDays": 0
              },
              "PrimaryCriteriaLimit": {
                "Type": "First"
              }
            },
            "QualifiedLimit": {
              "Type": "First"
            },
            "ExpressionLimit": {
              "Type": "First"
            },
            "InclusionRules": [
              {
                "name": "Older than 18",
                "expression": {
                  "Type": "ALL",
                  "CriteriaList": [],
                  "DemographicCriteriaList": [
                    {
                      "Age": {
                        "Value": 90,
                        "Op": "gt"
                      },
                      "Gender": [
                        {
                          "CONCEPT_CODE": "F",
                          "CONCEPT_ID": 8532,
                          "CONCEPT_NAME": "FEMALE",
                          "DOMAIN_ID": "Gender",
                          "INVALID_REASON_CAPTION": "Unknown",
                          "STANDARD_CONCEPT_CAPTION": "Unknown",
                          "VOCABULARY_ID": "Gender"
                        }
                      ]
                    }
                  ],
                  "Groups": []
                }
              }
            ],
            "CensoringCriteria": [],
            "CollapseSettings": {
              "CollapseType": "ERA",
              "EraPad": 0
            },
            "CensorWindow": {},
            "cdmVersionRange": ">=5.0.0"
          }
        }

And finally, run

helm install -n recruit \
  --render-subchart-notes \
  -f values-kind-recruit.yaml \
  --set ohdsi.cdmInitJob.enabled=true \
  --set ohdsi.loadCohortDefinitionsJob.enabled=true \
  recruit oci://ghcr.io/miracum/recruit/charts/recruit

CDM init job

The included CDM initialization job is currently not idempotent and may cause problems if ran multiple times. You should set ohdsi.cdmInitJob.enabled=false when the job has completed once when changing the chart configuration. Similarly, you should set ohdsi.loadCohortDefinitionsJob.enabled=false to avoid creating duplicate cohort definitions.

The application stack is now deployed. You can wait for the OMOP CDM init job to be done by running the following. This may take quite some time to complete.

kubectl wait job \
  --namespace=recruit \
  --for=condition=Complete \
  --selector=app.kubernetes.io/component=cdm-init \
  --timeout=1h

At this point, all externally exposed services should be accessible:

Service	Ingress URL
OHDSI Atlas	http://recruit-ohdsi.127.0.0.1.nip.io/atlas/
recruIT Screening List	http://recruit-list.127.0.0.1.nip.io/
HAPI FHIR Server	http://recruit-fhir-server.127.0.0.1.nip.io/
MailHog	http://recruit-mailhog.127.0.0.1.nip.io/

The values-kind-recruit.yaml used to install the chart automatically loaded a sample cohort defined in the ohdsi.loadCohortDefinitionsJob.cohortDefinitions setting. If the CDM init job completed and the query module ran at least once, you should see a notification email at http://recruit-mailhog.127.0.0.1.nip.io/:

Notification Email for the SAMPLE study displayed in MailHog

and the corresponding screening list is accesible at http://recruit-list.127.0.0.1.nip.io/:

To create additional studies, follow the Creating your first study guide using Atlas at http://recruit-ohdsi.127.0.0.1.nip.io/atlas/. Be sure to use [recruIT] as the special label instead of [UC1] as the values above override query.cohortSelectorLabels[0]=recruIT.

Metrics#

All modules expose metrics in Prometheus format (see Observability). The chart makes it easy to scrape these metrics by integrating with the widely used Prometheus Operator:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install --create-namespace -n monitoring kube-prometheus-stack prometheus-community/kube-prometheus-stack

You can now update your release by combining the values-kind-recruit.yaml from above with the following:

values-kind-recruit-enable-servicemonitors.yaml

list:
  metrics:
    serviceMonitor:
      enabled: true
      additionalLabels:
        release: kube-prometheus-stack

query:
  metrics:
    serviceMonitor:
      enabled: true
      additionalLabels:
        release: kube-prometheus-stack

notify:
  metrics:
    serviceMonitor:
      enabled: true
      additionalLabels:
        release: kube-prometheus-stack

fhirserver:
  metrics:
    serviceMonitor:
      enabled: true
      additionalLabels:
        release: kube-prometheus-stack

ohdsi:
  webApi:
    metrics:
      serviceMonitor:
        enabled: true
        additionalLabels:
          release: kube-prometheus-stack

helm upgrade -n recruit \
  -f values-kind-recruit.yaml \
  -f values-kind-recruit-enable-servicemonitors.yaml \
  recruit oci://ghcr.io/miracum/recruit/charts/recruit

Opening the Grafana instance included with the kube-prometheus-stack chart will allow you to query the exposed metrics:

Grafana Explore view of some metrics for the list module

High-Availability#

The FHIR server, the screening list, and the notification module support running using multiple replicas to ensure high-availability in case of individual component failures. Scaling up the notification module requires setting up a backend database for persistence to avoid sending duplicate emails. Setting notify.ha.enabled=true and postgresql.enabled=true in the values will deploy an integrated PostgreSQL database for the notification module. See the options under the notify.ha.database key for specifying a custom database to use.

The snippet below configures the release to run multiple replicas of any supporting service, enables pod disruption budget resources, and uses pod topology spread constraints to spread the pods across node topology zones.

For information on setting up recruIT with highly-available PostgreSQL clusters provided by CloudNativePG, see below.

values-kind-recruit-ha.yaml

notify:
  replicaCount: 2
  podDisruptionBudget:
    enabled: true
  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: ScheduleAnyway
      labelSelector:
        matchLabels:
          app.kubernetes.io/name: recruit
          # note that this label depends on the name of the chart release
          # this assumes the chart is deployed with a name of `recruit`
          app.kubernetes.io/instance: recruit
          app.kubernetes.io/component: notify
  ha:
    enabled: true

list:
  replicaCount: 2
  podDisruptionBudget:
    enabled: true
  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: ScheduleAnyway
      labelSelector:
        matchLabels:
          app.kubernetes.io/name: recruit
          app.kubernetes.io/instance: recruit
          app.kubernetes.io/component: list

postgresql:
  enabled: true
  auth:
    postgresPassword: recruit-notify-ha

ohdsi:
  atlas:
    replicaCount: 2
    topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAnyway
        labelSelector:
          matchLabels:
            app.kubernetes.io/name: ohdsi
            app.kubernetes.io/instance: recruit
            app.kubernetes.io/component: atlas

fhirserver:
  replicaCount: 2
  podDisruptionBudget:
    enabled: true
  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: ScheduleAnyway
      labelSelector:
        matchLabels:
          app.kubernetes.io/name: fhirserver
          app.kubernetes.io/instance: recruit

fhir-pseudonymizer:
  replicaCount: 2
  podDisruptionBudget:
    enabled: true
  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: ScheduleAnyway
      labelSelector:
        matchLabels:
          app.kubernetes.io/name: fhir-pseudonymizer
          app.kubernetes.io/instance: recruit
  vfps:
    enabled: true
    replicaCount: 2
    podDisruptionBudget:
      enabled: true
    topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAnyway
        labelSelector:
          matchLabels:
            app.kubernetes.io/name: vfps
            app.kubernetes.io/instance: recruit

Service mesh integration#

The application can be integrated with a service mesh, both for observability and to secure service-to-service communication via mTLS.

Linkerd#

The following values-kind-recruit-linkerd.yaml shows how to configure the chart release to place Linkerd's linkerd.io/inject: enabled annotation for all service pods (excluding pods created by Jobs):

values-kind-recruit-linkerd.yaml

podAnnotations:
  linkerd.io/inject: "enabled"

postgresql:
  primary:
    service:
      annotations:
        config.linkerd.io/opaque-ports: "5432"

ohdsi:
  postgresql:
    primary:
      service:
        annotations:
          config.linkerd.io/opaque-ports: "5432"
  atlas:
    podAnnotations:
      linkerd.io/inject: "enabled"
  webApi:
    podAnnotations:
      linkerd.io/inject: "enabled"

fhirserver:
  postgresql:
    primary:
      service:
        annotations:
          config.linkerd.io/opaque-ports: "5432"
  podAnnotations:
    linkerd.io/inject: "enabled"

mailhog:
  automountServiceAccountToken: true
  podAnnotations:
    linkerd.io/inject: "enabled"
  service:
    annotations:
      config.linkerd.io/opaque-ports: "1025"

Linkerd dashboard view of the recruiT deployment

You can also use the linkerd.io/inject: enabled on the recruit namespace, see https://linkerd.io/2.11/features/proxy-injection/ but you will have to manually add a disabled annotation to the OHDSI Achilles CronJob and init job.

Istio#

Add a namespace label to instruct Istio to automatically inject Envoy sidecar proxies when you deploy your application later:

kubectl label namespace recruit istio-injection=enabled

To disable sidecar proxy injection for the Achilles and OMOP CDM init job, see the following values.yaml:

values-kind-recruit-istio.yaml

# ohdsi:
#   cdmInitJob:
#     podAnnotations:
#       sidecar.istio.io/inject: "false"
#   achilles:
#     podAnnotations:
#       sidecar.istio.io/inject: "false"

mailhog:
  automountServiceAccountToken: true
  ingress:
    annotations:
      kubernetes.io/ingress.class: istio

list:
  ingress:
    annotations:
      kubernetes.io/ingress.class: istio

fhirserver:
  ingress:
    annotations:
      kubernetes.io/ingress.class: istio

ohdsi:
  ingress:
    annotations:
      kubernetes.io/ingress.class: istio
    hosts:
      - host: recruit-ohdsi.127.0.0.1.nip.io
        pathType: Prefix

Kiali dashboard view of the recruiT deployment

Zero-trust networking#

To limit the communication between the components you can deploy Kubernetes NetworkPolicy resources. Because the details of a deployment can differ significantly (external databases, dependencies spread across several namespaces, etc.), no generic NetworkPolicy resources are included in the Helm chart. Instead, the following policies and explanations should provide a starting point for customization.

The policies are based on these assumptions:

the recruit application is deployed in a namespace called recruit
the OHDSI stack is deployed in a namespace called ohdsi
the SMTP server is running on a host outside the cluster at IP 192.0.2.1 and port 1025
the Prometheus monitoring stack is deployed in a namespace called monitoring

You can use https://editor.cilium.io/ to visualize and edit individual policies or https://orca.tufin.io/netpol/# to have the entire policies explained.

recruit-network-policies.yaml

---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: fhir-server-policy
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/instance: recruit
      app.kubernetes.io/name: fhirserver
  ingress:
    # all modules are allowed to communicate with
    # the FHIR server
    - from:
        - podSelector:
            matchLabels:
              app.kubernetes.io/instance: recruit
              app.kubernetes.io/component: list
        - podSelector:
            matchLabels:
              app.kubernetes.io/instance: recruit
              app.kubernetes.io/component: notify
        - podSelector:
            matchLabels:
              app.kubernetes.io/instance: recruit
              app.kubernetes.io/component: query
      ports:
        - port: http
    - from:
        # allow the FHIR server to be scraped by the Prometheus stack
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: monitoring
          podSelector:
            matchLabels:
              app.kubernetes.io/instance: kube-prometheus-stack-prometheus
      ports:
        - port: metrics
    # allow the FHIR server to be accessed via the NGINX Ingress
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: ingress-nginx
          podSelector:
            matchLabels:
              app.kubernetes.io/name: ingress-nginx
      ports:
        - port: http
  egress:
    # for subscriptions to work, the FHIR server must be allowed to
    # initiate connections to the notify module
    - to:
        - podSelector:
            matchLabels:
              app.kubernetes.io/instance: recruit
              app.kubernetes.io/component: notify
      ports:
        - port: http
    # allow the server access to its own database
    - to:
        - podSelector:
            matchLabels:
              app.kubernetes.io/instance: recruit
              app.kubernetes.io/component: primary
              app.kubernetes.io/name: fhir-server-postgres
      ports:
        - port: tcp-postgresql
    # allow DNS lookups
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - port: 53
          protocol: UDP
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: list-policy
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/instance: recruit
      app.kubernetes.io/component: list
  ingress:
    - from:
        # allow the list module to be scraped by the Prometheus stack
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: monitoring
          podSelector:
            matchLabels:
              app.kubernetes.io/instance: kube-prometheus-stack-prometheus
        # allow the list module to be accessed via the NGINX Ingress
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: ingress-nginx
          podSelector:
            matchLabels:
              app.kubernetes.io/name: ingress-nginx
      ports:
        - port: http
  egress:
    # allow the list module to initiate connections to the FHIR server
    # for querying screening lists
    - to:
        - podSelector:
            matchLabels:
              app.kubernetes.io/instance: recruit
              app.kubernetes.io/name: fhirserver
      ports:
        - port: http
    # allow DNS lookups
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - port: 53
          protocol: UDP
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: query-policy
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/instance: recruit
      app.kubernetes.io/component: query
  ingress:
    # allow the query module to be scraped by the Prometheus stack
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: monitoring
          podSelector:
            matchLabels:
              app.kubernetes.io/instance: kube-prometheus-stack-prometheus
      ports:
        - port: http-metrics
  egress:
    # allow the query module to initiate connections to the FHIR server
    # to transmit FHIR resources
    - to:
        - podSelector:
            matchLabels:
              app.kubernetes.io/instance: recruit
              app.kubernetes.io/name: fhirserver
      ports:
        - port: http
    # allow the query module to initiate connections to the OHDSI WebAPI
    # in the ohdsi namespace
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: ohdsi
          podSelector:
            matchLabels:
              app.kubernetes.io/instance: ohdsi
              app.kubernetes.io/component: webapi
      ports:
        - port: http
    # allow the query module to initiate connections to the OHDSI PostgreSQL DB
    # in the ohdsi namespace
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: ohdsi
          podSelector:
            matchLabels:
              app.kubernetes.io/name: postgresql
              app.kubernetes.io/instance: ohdsi
              app.kubernetes.io/component: primary
      ports:
        - port: tcp-postgresql
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - port: 53
          protocol: UDP
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: notify-policy
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/instance: recruit
      app.kubernetes.io/component: notify
  ingress:
    # allow the notify module to be scraped by the Prometheus stack
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: monitoring
          podSelector:
            matchLabels:
              app.kubernetes.io/instance: kube-prometheus-stack-prometheus
      ports:
        - port: http-metrics
    # allow the notify module to receive subscription invocations from the FHIR server
    - from:
        - podSelector:
            matchLabels:
              app.kubernetes.io/instance: recruit
              app.kubernetes.io/name: fhirserver
      ports:
        - port: http
  egress:
    # allow the notify module to initiate connections to the FHIR server
    - to:
        - podSelector:
            matchLabels:
              app.kubernetes.io/instance: recruit
              app.kubernetes.io/name: fhirserver
      ports:
        - port: http
    # allow the notify module to access the SMTP server at
    # 192.0.2.1. The `32` subnet prefix length limits egress
    # to just this one address
    - to:
        - ipBlock:
            cidr: 192.0.2.1/32
      ports:
        - protocol: TCP
          port: 1025
    # allow the notify module to initiate connections to its PostgreSQL db
    # in case of HA
    - to:
        - podSelector:
            matchLabels:
              app.kubernetes.io/name: recruit-postgres
              app.kubernetes.io/instance: recruit
              app.kubernetes.io/component: primary
      ports:
        - port: tcp-postgresql
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - port: 53
          protocol: UDP

Distributed Tracing#

All services support distributed tracing based on OpenTelemetry.

For testing, you can install the Jaeger operator to prepare your cluster for tracing.

# Cert-Manager is required by the Jaeger Operator
# See <https://cert-manager.io/docs/installation/> for details.
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.9.1/cert-manager.yaml

kubectl wait --namespace cert-manager \
  --for=condition=ready pod \
  --selector=app.kubernetes.io/instance=cert-manager \
  --timeout=5m

kubectl create namespace observability
kubectl create -n observability -f https://github.com/jaegertracing/jaeger-operator/releases/download/v1.38.0/jaeger-operator.yaml

kubectl wait --namespace observability \
  --for=condition=ready pod \
  --selector=name=jaeger-operator \
  --timeout=5m

cat <<EOF | kubectl apply -n observability -f -
# simple, all-in-one Jaeger installation. Not suitable for production use.
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: simplest
EOF

The following values enable tracing for the query, list, and notify module, the HAPI FHIR server and the OHDSI WebAPI:

values-kind-recruit-tracing.yaml

query:
  extraEnv:
    - name: JAVA_TOOL_OPTIONS
      value: "-javaagent:/app/opentelemetry-javaagent.jar"
    - name: OTEL_METRICS_EXPORTER
      value: "none"
    - name: OTEL_LOGS_EXPORTER
      value: "none"
    - name: OTEL_TRACES_EXPORTER
      value: "jaeger"
    - name: OTEL_SERVICE_NAME
      value: "recruit-query"
    - name: OTEL_EXPORTER_JAEGER_ENDPOINT
      value: "http://simplest-collector.observability.svc:14250"

list:
  extraEnv:
    - name: TRACING_ENABLED
      value: "true"
    - name: OTEL_TRACES_EXPORTER
      value: "jaeger"
    - name: OTEL_SERVICE_NAME
      value: "recruit-list"
    - name: OTEL_EXPORTER_JAEGER_AGENT_HOST
      value: "simplest-agent.observability.svc"

notify:
  extraEnv:
    - name: JAVA_TOOL_OPTIONS
      value: "-javaagent:/app/opentelemetry-javaagent.jar"
    - name: OTEL_METRICS_EXPORTER
      value: "none"
    - name: OTEL_LOGS_EXPORTER
      value: "none"
    - name: OTEL_TRACES_EXPORTER
      value: "jaeger"
    - name: OTEL_SERVICE_NAME
      value: "recruit-notify"
    - name: OTEL_EXPORTER_JAEGER_ENDPOINT
      value: "http://simplest-collector.observability.svc:14250"

fhirserver:
  extraEnv:
    # the recruit tool relies on the FHIR server subscription mechanism to create notifications.
    # if you overwrite `fhirserver.extraEnv`, make sure to keep this setting enabled.
    - name: HAPI_FHIR_SUBSCRIPTION_RESTHOOK_ENABLED
      value: "true"
    - name: SPRING_FLYWAY_BASELINE_ON_MIGRATE
      value: "true"
    # OTel options
    - name: JAVA_TOOL_OPTIONS
      value: "-javaagent:/app/opentelemetry-javaagent.jar"
    - name: OTEL_METRICS_EXPORTER
      value: "none"
    - name: OTEL_LOGS_EXPORTER
      value: "none"
    - name: OTEL_TRACES_EXPORTER
      value: "jaeger"
    - name: OTEL_SERVICE_NAME
      value: "recruit-hapi-fhir-server"
    - name: OTEL_EXPORTER_JAEGER_ENDPOINT
      value: "http://simplest-collector.observability.svc:14250"

fhir-pseudonymizer:
  extraEnv:
    - name: Tracing__Enabled
      value: "true"
    - name: Tracing__ServiceName
      value: "recruit-fhir-pseudonymizer"
    - name: Tracing__Jaeger__AgentHost
      value: "simplest-agent.observability.svc"
  vfps:
    extraEnv:
      - name: Tracing__IsEnabled
        value: "true"
      - name: Tracing__ServiceName
        value: "recruit-vfps"
      - name: Tracing__Jaeger__AgentHost
        value: "simplest-agent.observability.svc"

ohdsi:
  webApi:
    tracing:
      enabled: true
      jaeger:
        protocol: "grpc"
        endpoint: http://simplest-collector.observability.svc:14250

Jaeger Trace Graph view of a single scheduled run of the query module

Jaeger Trace timeline for interacting with the screening list

Screening List De-Pseudonymization#

Info

Requires version 9.3.0 or later of the recruIT Helm chart.

You can optionally deploy both the FHIR Pseudonymizer and Vfps as a pseudonym service backend to allow for de-pseudonymizing patient and visit identifiers stored in OMOP or the FHIR server prior to displaying them on the screening list.

The background is detailed in De-Pseudonymization.

The following values.yaml enable the included FHIR Pseudonymizer and Vfps as a pseudonym service. When Vfps is installed, it uses another PostgreSQL database which is naturally empty and does not contain any pre-defined namespaces or pseudonyms. It is up to the user to pseudonymize the resources stored inside the FHIR server used by the screening list.

values-kind-recruit-de-pseudonymization.yaml

list:
  dePseudonymization:
    enabled: true

fhir-pseudonymizer:
  enabled: true
  auth:
    apiKey:
      # enable requiring an API key placed in the `x-api-key` header to
      # authenticate against the fhir-pseudonymizer's `/fhir/$de-pseudonymize`
      # endpoint.
      enabled: true
      # the API key required to be set when the list module invokes
      # the FHIR Pseudonymizer's `$de-pseudonymize` endpoint.
      # Note: instead of storing the key in plaintext in the values.yaml,
      #       you might want to leverage the `existingSecret` option instead.
      key: "demo-secret-api-key"
  # the values below are the default values defined in <https://github.com/miracum/charts/blob/master/charts/recruit/values.yaml>
  pseudonymizationService: Vfps
  vfps:
    enabled: true
    postgresql:
      enabled: true
      auth:
        database: vfps
        postgresPassword: vfps

CloudNativePG for HA databases#

Install the CloudNativePG operator first by following the official documentation site:

kubectl apply -f https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.18/releases/cnpg-1.18.0.yaml

Next, create PostgreSQL clusters and pre-configured users for OHDSI, the HAPI FHIR server, the Vfps pseudonymization service, and the notify module:

cnpg-clusters.yaml

---
apiVersion: v1
kind: Secret
metadata:
  name: recruit-ohdsi-db-app-user
type: kubernetes.io/basic-auth
stringData:
  password: recruit-ohdsi
  username: ohdsi
---
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: recruit-ohdsi-db
spec:
  instances: 3
  primaryUpdateStrategy: unsupervised
  replicationSlots:
    highAvailability:
      enabled: true
  storage:
    size: 64Gi
  bootstrap:
    initdb:
      database: ohdsi
      owner: ohdsi
      secret:
        name: recruit-ohdsi-db-app-user
---
apiVersion: v1
kind: Secret
metadata:
  name: recruit-fhir-server-db-app-user
type: kubernetes.io/basic-auth
stringData:
  password: recruit-fhir-server
  username: fhir_server_user
---
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: recruit-fhir-server-db
spec:
  instances: 3
  primaryUpdateStrategy: unsupervised
  replicationSlots:
    highAvailability:
      enabled: true
  storage:
    size: 64Gi
  bootstrap:
    initdb:
      database: fhir_server
      owner: fhir_server_user
      secret:
        name: recruit-fhir-server-db-app-user
---
apiVersion: v1
kind: Secret
metadata:
  name: vfps-db-app-user
type: kubernetes.io/basic-auth
stringData:
  password: vfps
  username: vfps_user
---
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: vfps-db
spec:
  instances: 3
  primaryUpdateStrategy: unsupervised
  replicationSlots:
    highAvailability:
      enabled: true
  storage:
    size: 64Gi
  bootstrap:
    initdb:
      database: vfps
      owner: vfps_user
      secret:
        name: vfps-db-app-user
---
apiVersion: v1
kind: Secret
metadata:
  name: recruit-notify-db-app-user
type: kubernetes.io/basic-auth
stringData:
  password: notify
  username: notify_user
---
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: recruit-notify-db
spec:
  instances: 3
  primaryUpdateStrategy: unsupervised
  replicationSlots:
    highAvailability:
      enabled: true
  storage:
    size: 64Gi
  bootstrap:
    initdb:
      database: notify_jobstore
      owner: notify_user
      secret:
        name: recruit-notify-db-app-user

kubectl apply -f cnpg-clusters.yaml

Finally, install the recruIT chart using the following updated values.yaml:

values-kind-recruit-with-cnpg.yaml

ohdsi:
  postgresql:
    enabled: false
  webApi:
    db:
      host: "recruit-ohdsi-db-rw"
      port: 5432
      database: "ohdsi"
      username: "ohdsi"
      password: ""
      existingSecret: "recruit-ohdsi-db-app-user"
      existingSecretKey: "password"
      schema: "ohdsi"

fhirserver:
  postgresql:
    enabled: false
  externalDatabase:
    host: "recruit-fhir-server-db-rw"
    port: 5432
    database: "fhir_server"
    user: "fhir_server_user"
    password: ""
    existingSecret: "recruit-fhir-server-db-app-user"
    existingSecretKey: "password"

notify:
  ha:
    enabled: true
    database:
      host: "recruit-notify-db-rw"
      port: 5432
      username: "notify_user"
      password: ""
      name: "notify_jobstore"
      existingSecret:
        name: "recruit-notify-db-app-user"
        key: "password"

postgresql:
  enabled: false

fhir-pseudonymizer:
  enabled: true
  vfps:
    postgresql:
      enabled: false
    database:
      host: "vfps-db-rw"
      port: 5432
      database: "vfps"
      username: "vfps_user"
      password: ""
      existingSecret: "vfps-db-app-user"
      existingSecretKey: "password"
      schema: "vfps"

Running the query module using Argo Workflows#

By default, the query module runs on a dedicated schedule. As of version 10.1.0, the module can also be configured to run as a one-shot container. This is useful when integrating with existing containerized workflows, e.g. using Airflow or Argo Workflows.

Below you can find an example for running the query module as part of a larger workflow:

query-argo-workflow.yaml

# yaml-language-server: $schema=https://raw.githubusercontent.com/argoproj/argo-workflows/v3.4.3/api/jsonschema/schema.json
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: recruit-query-workflow-
spec:
  entrypoint: full-run
  templates:
    - name: omop-cdm-etl
      container:
        image: docker.io/docker/whalesay@sha256:178598e51a26abbc958b8a2e48825c90bc22e641de3d31e18aaf55f3258ba93b
        command: [cowsay]
        args: ["Running ETL Job from source to the OMOP CDM database"]
        securityContext:
          readOnlyRootFilesystem: true
          runAsUser: 65532
          runAsGroup: 65532
          seccompProfile:
            type: RuntimeDefault
          allowPrivilegeEscalation: false
          capabilities:
            drop:
              - ALL
          privileged: false
          runAsNonRoot: true

    - name: ohdsi-achilles
      # run for at most 1 hour before timing out to make sure the query module will run eventually
      activeDeadlineSeconds: "3600"
      container:
        image: docker.io/ohdsi/broadsea-achilles:sha-bccd396@sha256:a881063aff6200d0d368ec30eb633381465fb8aa15e7d7138b7d48b6256a6feb
        env:
          - name: ACHILLES_DB_URI
            value: >-
              postgresql://broadsea-atlasdb:5432/postgres?ApplicationName=recruit-ohdsi-achilles
          - name: ACHILLES_DB_USERNAME
            value: postgres
          - name: ACHILLES_DB_PASSWORD
            valueFrom:
              secretKeyRef:
                name: recruit-ohdsi-webapi-db-secret
                key: postgres-password
          - name: ACHILLES_CDM_SCHEMA
            value: demo_cdm
          - name: ACHILLES_VOCAB_SCHEMA
            value: demo_cdm
          - name: ACHILLES_RES_SCHEMA
            value: demo_cdm_results
          - name: ACHILLES_CDM_VERSION
            value: "5.3"
          - name: ACHILLES_SOURCE
            value: EUNOMIA
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
              - ALL
          privileged: false
          runAsNonRoot: true
          runAsUser: 10001
          runAsGroup: 10001
          readOnlyRootFilesystem: true
          seccompProfile:
            type: RuntimeDefault
        volumeMounts:
          - name: achilles-workspace-volume
            mountPath: /opt/achilles/workspace
          - name: r-tempdir-volume
            mountPath: /tmp
      volumes:
        - name: achilles-workspace-volume
          emptyDir: {}
        - name: r-tempdir-volume
          emptyDir: {}

    - name: recruit-query
      container:
        image: ghcr.io/miracum/recruit/query:v10.4.2 # x-release-please-version
        env:
          - name: QUERY_RUN_ONCE_AND_EXIT
            value: "true"
          - name: QUERY_SCHEDULE_ENABLED
            value: "false"
          - name: QUERY_SELECTOR_MATCHLABELS
            value: ""
          - name: FHIR_URL
            value: http://recruit-fhirserver:8080/fhir
          - name: OMOP_JDBCURL
            value: >-
              jdbc:postgresql://broadsea-atlasdb:5432/postgres?ApplicationName=recruit-query
          - name: OMOP_USERNAME
            value: postgres
          - name: OMOP_PASSWORD
            valueFrom:
              secretKeyRef:
                name: recruit-ohdsi-webapi-db-secret
                key: postgres-password
          - name: OMOP_CDMSCHEMA
            value: demo_cdm
          - name: OMOP_RESULTSSCHEMA
            value: demo_cdm_results
          - name: QUERY_WEBAPI_BASE_URL
            value: http://recruit-ohdsi-webapi:8080/WebAPI
          - name: ATLAS_DATASOURCE
            value: EUNOMIA
          - name: MANAGEMENT_ENDPOINT_HEALTH_PROBES_ADD_ADDITIONAL_PATHS
            value: "true"
          - name: MANAGEMENT_SERVER_PORT
            value: "8081"
          - name: CAMEL_HEALTH_ENABLED
            value: "false"
          - name: QUERY_WEBAPI_COHORT_CACHE_SCHEMA
            value: webapi
        securityContext:
          privileged: false
          capabilities:
            drop:
              - ALL
          runAsNonRoot: true
          runAsUser: 65532
          runAsGroup: 65532
          readOnlyRootFilesystem: true
          allowPrivilegeEscalation: false
          seccompProfile:
            type: RuntimeDefault
        volumeMounts:
          - name: tmp-volume
            mountPath: /tmp
      volumes:
        - name: tmp-volume
          emptyDir: {}

    - name: full-run
      dag:
        tasks:
          - name: run-omop-cdm-etl
            template: omop-cdm-etl
          - name: run-ohdsi-achilles
            depends: run-omop-cdm-etl
            template: ohdsi-achilles
          - name: run-recruit-query
            # doesn't really matter whether the achilles job failed or succeeded
            depends: "run-omop-cdm-etl && (run-ohdsi-achilles.Succeeded || run-ohdsi-achilles.Failed)"
            template: recruit-query

You can run this workflow against the integration test setup of the recruIT Helm chart:

kubectl create namespace recruit

helm repo add argo https://argoproj.github.io/argo-helm
helm upgrade --install \
  --create-namespace \
  --namespace=argo-workflows \
  -f tests/chaos/argo-workflows-values.yaml \
  argo-workflows argo/argo-workflows

helm upgrade --install \
  --namespace=recruit \
  -f charts/recruit/values-integrationtest.yaml \
  --set query.enabled=false \
  recruit charts/recruit/

argo submit -n recruit --wait --log docs/_snippets/k8s/query-argo-workflow.yaml