Container health checks
Open Forms is deployed as a collection of containers. Containers can be checked if they’re running as expected, and actions can be taken by the container runtime or container orchestration (like Kubernetes and Docker) when that’s not the case, like restarting the container or removing it from the pool that serves traffic.
Health checks are responsible for detecting anomalies and reporting that a container is not running as expected. They can take different forms, for example:
running a script and checking the exit code of the process
making an HTTP request to an endpoint which responds with a success or error status code
opening a TCP connection to a particular port
This section of the documentation describes the recommended health checks to use that are provided in Open Forms, or the health checks to implement in containers of third party software typically used in an Open Forms deployment. You can incorporate these in your infrastructure code (like Helm charts).
You can find code examples of these health checks in our docker-compose.yml on Github.
Open Forms containers
HTTP service
The Open Forms web service listens on port 8000 inside the container and accepts HTTP traffic. Three endpoints are exposed for health checks.
http://localhost:8000/_healthz/livez/The liveness endpoint - checks that HTTP requests can be handled. Suitable for liveness (and readiness) probes. This is the check with lowest overhead.
http://localhost:8000/_healthz/Endpoint that checks connections with database, caches, database migration state…
Suitable for the startup probe. The most expensive check to run, as it checks all dependencies of the application.
http://localhost:8000/_healthz/readyz/The readiness endpoint - checks that requests can be handled and tests that the default cache (used by for sessions) and database connection function. Slightly more expensive than the liveness check, but it’s a good candidate for the readiness probe.
Tip
Ensure the ALLOWED_HOSTS environment variable contains localhost. See
Environment configuration reference for more details.
Tip
The executable maykin-common is available in the container which can be
used to perform the health checks, as an alternative to HTTP probes.
maykin-common health-check \
--endpoint=http://localhost:8000/_healthz/livez/ \
--timeout=3
Added in version 3.5.0.
Celery workers
The Celery Worker service is responsible for picking up and executing background tasks scheduled by the web service or Celery beat.
The worker creates and updates an event loop liveness file at
/app/tmp/celery_worker_event_loop.live, which is touched every minute. Additionally,
when the worker is ready to accept tasks, it creates the
/app/tmp/celery_worker.ready file and removes it when the worker shuts down.
The worker liveness can be checked with the maykin-common CLI:
maykin-common worker-health-check \
--broker redis://redis:6379/0 \
--liveness-file /app/tmp/celery_worker_event_loop.live \
--worker-name celery@docker
Caution
Adapt the --broker and --worker-name options to your environment.
--brokermust match the value of theCELERY_BROKERsetting.--worker-nameshould not be necessary as it is taken from theCELERY_WORKER_NAMEenvvar if set, and otherwise falls back tocelery@<hostname>, where the hostname of the container is used.If pings are failing, you may need to provide the worker name(s) explicitly.
Tip
You can also use the health checks for readiness in rolling deployments on Kubernetes, so that old pods are only stopped when the new versions are confirmed to be ready.
maykin-common worker-health-check \
--skip-ping \
--skip-event-loop-liveness \
--no-skip-readiness \
--readiness-file /app/tmp/celery_worker.ready
Added in version 3.5.0.
Celery beat
The Open Forms Beat service is responsible for periodically scheduling background
tasks. It touches a liveness file at /app/tmp/celery_beat.live in
the container every time a task is successfully scheduled.
Liveness and readiness can be checked with the maykin-common CLI:
maykin-common beat-health-check \
--file=/app/tmp/celery_beat.live \
--max-age=120
The CELERY_BEAT_SCHEDULE setting (in openforms.conf.base) contains some tasks
that run every minute (60 seconds). The --max-age of 120 seconds covers 2 minutes
and should account for some time skew.
Tip
On Kubernetes, you can use this same check for the startup probe, but with a
larger --max-age or delay the probe about 10 seconds to allow the application
some time to load and initialize.
Added in version 3.5.0.
Celery flower
Celery Flower is a web-app which binds to port 5555 by default. You can use the
generic HTTP health check utility from maykin-common, or set up an equivalent
HTTP probe:
maykin-common health-check \
--endpoint=http://localhost:5555/ \
--timeout=3
Added in version 3.5.0.
Third party containers
Redis/Valkey
The Redis and Valkey container images include a command line utility - redis-cli and
valkey-cli which has a ping command to test connectivity to the server:
redis-cli ping
The command exits with exit code 0 on success and exit code 1 on failure.
nginx
nginx proxies HTTP traffic from the browser/client to the backend service. It also serves static assets directly. The nginx config needs to be extended with a location handler for the health checks. You should take care to namespace this so that you don’t get collisions with identifiers of forms that would be masked by this path.
Example nginx configuration snippet:
location = /_healthz/livez/ {
access_log off;
add_header Content-Type text/plain;
# block outside traffic
allow 127.0.0.1;
allow ::1;
deny all;
return 200 "ok\n";
}
We recommend this cheap check for both the liveness and readiness checks.
You can then wire up an HTTP probe or curl script to make a GET call to
http://localhost:8080/_healthz/livez/. Note the port number - often the nginx
unprivileged image will be used, which binds to 8080 by default, but check your
specific environment to confirm.
Smart readiness probe
You may want to consider proxying to the backend-service for the readiness check.
Warning
This can lead to cascading failures where first your backend-service becomes unavailable, which leads to nginx becoming unavailable and possible other dependent services.
Tip
Even if the backend is not available, nginx may still be performing useful work by serving static files.
Example nginx configuration snippet:
location = /_healthz/readyz/ {
access_log off;
# block outside traffic
allow 127.0.0.1;
allow ::1;
deny all;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Forwarded-Host $server_name;
proxy_set_header X-Scheme $scheme;
proxy_pass http://web:8000/_health/readyz/;
}
ClamAV
ClamAV includes a healthcheck definition in its container image. On Kubernetes, these are not used, instead probes are used.
The health check is implemented in clamdcheck.sh, exiting with a non-zero exit code
when there are problems. It pings the default port (3310).
Note
On initial deploy, ClamAV will have to download the virus definitions database, which can take a long time. The built-in healthcheck only starts after 6 minutes. It’s recommended to configure a startup probe on Kubernetes.
PostgreSQL
Warning
Running the database as a container can bring certain scaling and disaster recovery challenges. We only provide this check for completeness sake.
PostgreSQL container images typically include the pg_isready binary, which tests
the database connection (accepting traffic on the specified host and port). It has a
non-zero exit code when the database is not ready.