18 KiB

Authentik deployment checklist

Purpose

This runbook is the operator path for deploying, verifying, and handing off Authentik as the homelab identity provider.

It covers:

  • Preflight checks: secrets, Swarm state, storage, and network readiness.
  • Deployment execution using the canonical Ansible playbook.
  • Service convergence and health verification.
  • Ingress and functional smoke tests against the live endpoint.
  • Post-deploy hardening, evidence capture, and rollback guidance.
  • Day-1 troubleshooting for common failure modes.

Scope

  • Stack name: authentik
  • Canonical playbook: ansible/playbooks/docker/deploy_authentik.yml
  • Stack template: ansible/templates/stacks/authentik.stack.yml
  • Target manager: swarm-manager-1 (10.0.0.211)
  • Public URL: https://sso.castaldifamily.com
  • Data root: /mnt/homelab/apps/authentik
  • Services deployed: authentik-postgres, authentik-redis, authentik-server, authentik-worker

Important

This stack uses absolute bind mounts. The deploy playbook requires all data directories to exist before deployment. If any path is missing, the preflight asserts will fail-safe and abort rather than bootstrap an empty installation over existing data.


Deployment flow

flowchart LR
    preflight[Phase 1 — Preflight] --> validation[Phase 2 — Validation run]
    validation --> deploy[Phase 3 — Deploy]
    deploy --> convergence[Phase 4 — Convergence]
    convergence --> ingress[Phase 5 — Ingress checks]
    ingress --> handoff[Phase 6 — Handoff]

    classDef phase fill:#dbeafe,stroke:#3b82f6;
    class preflight,validation,deploy,convergence,ingress,handoff phase

Phase 1 — Preflight checklist

Complete all items in this phase before running any playbook command.

1.1 Change window and ownership

  • Deployment owner is assigned.
  • Rollback owner is assigned.
  • Maintenance window is confirmed.
  • No active cluster incidents in the latest Swarm audit (outputs/swarm_audit_*.md).

1.2 Control node readiness

Run from the ansible/ directory with the virtual environment active.

# Confirm Python environment
source /home/chester/homelab/.venv/bin/activate

# Confirm Ansible version (must be >= 2.18.0)
ansible --version

# Confirm SSH access to all Swarm managers
ansible swarm_managers -i inventory/hosts.ini -m ping
  • Ansible version is 2.18.0 or higher.
  • All Swarm managers return pong.
  • Vault password is available (.vault_pass file present or ANSIBLE_VAULT_PASSWORD_FILE set).

1.3 Secrets readiness

The deploy playbook asserts both values are defined, non-empty, and not placeholder strings. Verify them first:

ansible -i inventory/hosts.ini localhost \
  -m ansible.builtin.debug \
  -a "msg={{ vault_authentik_secret_key | length }}" \
  -e "@group_vars/all.yml" \
  --vault-password-file .vault_pass

Repeat for vault_authentik_postgres_password.

  • vault_authentik_secret_key decrypts to a non-empty, non-placeholder value.
  • vault_authentik_postgres_password decrypts to a non-empty, non-placeholder value.
  • Neither value is any of: change-me, changeme, your-random-secret, your-db-password.

1.4 Swarm cluster state

# Confirm target manager is active and is control-plane
ssh chester@10.0.0.211 \
  "docker info --format '{{.Swarm.LocalNodeState}}|{{.Swarm.ControlAvailable}}'"
# Expected output: active|true

# Confirm all managers are active
ansible swarm_managers -i inventory/hosts.ini \
  -m ansible.builtin.command \
  -a "docker info --format '{{.Swarm.LocalNodeState}}'"
  • swarm-manager-1 returns active|true.
  • All three managers return active.
  • No node shows inactive, pending, or error.

1.5 External overlay network

Authentik requires proxy-net to exist before stack deploy.

ssh chester@10.0.0.211 \
  "docker network ls --filter name=proxy-net --format '{{.Name}}|{{.Driver}}|{{.Scope}}'"
# Expected: proxy-net|overlay|swarm
  • proxy-net exists with overlay driver and swarm scope.

Warning

If proxy-net is missing, create it before continuing:

ssh chester@10.0.0.211 \
  "docker network create --driver overlay --attachable proxy-net"

1.6 Persistent data paths

All bind-mount paths must exist on swarm-manager-1 before deploying. The playbook will fail-safe if any are missing.

ssh chester@10.0.0.211 "for d in \
  /mnt/homelab/apps/authentik \
  /mnt/homelab/apps/authentik/data \
  /mnt/homelab/apps/authentik/data/database \
  /mnt/homelab/apps/authentik/data/redis \
  /mnt/homelab/apps/authentik/data/media \
  /mnt/homelab/apps/authentik/data/config \
  /mnt/homelab/apps/authentik/data/blueprints; do
    [ -d \"\$d\" ] && echo \"OK  \$d\" || echo \"MISSING  \$d\"
done"
  • All 7 paths return OK.
  • If any path is MISSING, create or restore from backup before proceeding.

To create paths for a fresh install (no existing data to protect):

ssh chester@10.0.0.211 "sudo mkdir -p \
  /mnt/homelab/apps/authentik/data/database \
  /mnt/homelab/apps/authentik/data/redis \
  /mnt/homelab/apps/authentik/data/media \
  /mnt/homelab/apps/authentik/data/config \
  /mnt/homelab/apps/authentik/data/blueprints"

Warning

Do not create missing paths if you are restoring an existing Authentik install. Restore from backup first to avoid initialising an empty database over pre-existing data.


Phase 2 — Validation-only run

Run the playbook in validation mode to confirm all asserts pass before changing anything on the cluster.

cd /home/chester/homelab/ansible

ansible-playbook \
  -i inventory/hosts.ini \
  playbooks/docker/deploy_authentik.yml \
  -e "stack_validate_only=true" \
  --vault-password-file .vault_pass
  • Playbook completes with 0 failed tasks.
  • Secrets assertion tasks pass (no FAILED on assert blocks).
  • Swarm manager state assertion passes.
  • Data path assertions pass for all 7 required directories.

Stop here if any assert fails. Diagnose using the Troubleshooting matrix below, then re-run validation before proceeding.


Phase 3 — Deployment execution

Run the standard deploy. All playbook output should be captured for the evidence record.

cd /home/chester/homelab/ansible

ansible-playbook \
  -i inventory/hosts.ini \
  playbooks/docker/deploy_authentik.yml \
  --vault-password-file .vault_pass \
  2>&1 | tee ../outputs/authentik_deploy_$(date +%Y%m%dT%H%M%S).log
  • Playbook completes without FAILED tasks.
  • Deployment result block is printed confirming stack name, manager, and URL.
  • Log file is saved to outputs/ with a timestamp.

Expected deployment result output:

"Authentik deployment complete."
"Stack     : authentik"
"Manager   : swarm-manager-1 (10.0.0.211)"
"URL       : https://sso.castaldifamily.com"
"Data root : /mnt/homelab/apps/authentik"
"Services  : authentik-postgres, authentik-redis, authentik-server, authentik-worker"

Phase 4 — Service convergence and health

Verify that all four services are running, stable, and healthy.

4.1 Service replica status

ssh chester@10.0.0.211 \
  "docker service ls --filter label=com.docker.stack.namespace=authentik"

Expected replica counts:

Service Expected
authentik_authentik-postgres 1/1
authentik_authentik-redis 1/1
authentik_authentik-server 1/1
authentik_authentik-worker 1/1
  • All four services show 1/1 replicas.
  • No service shows 0/1 or a failure count.

4.2 Service placement

All four services must be pinned to swarm-manager-1.

ssh chester@10.0.0.211 \
  "docker service ps authentik_authentik-server --filter desired-state=running --format '{{.Node}} {{.CurrentState}}'"
# Expected: swarm-manager-1   Running ...
  • authentik-server task is running on swarm-manager-1.
  • authentik-worker task is running on swarm-manager-1.

4.3 Container health checks

# postgres health (pg_isready)
ssh chester@10.0.0.211 \
  "docker ps --filter name=authentik_authentik-postgres --format '{{.Status}}'"
# Expected: Up ... (healthy)

# redis health (redis-cli ping)
ssh chester@10.0.0.211 \
  "docker ps --filter name=authentik_authentik-redis --format '{{.Status}}'"
# Expected: Up ... (healthy)
  • authentik-postgres container shows (healthy).
  • authentik-redis container shows (healthy).

4.4 Critical startup log checks

# Check server startup for migration and database connectivity
ssh chester@10.0.0.211 \
  "docker service logs authentik_authentik-server --since 10m --no-task-ids 2>&1 | tail -40"

# Check worker for job queue connectivity
ssh chester@10.0.0.211 \
  "docker service logs authentik_authentik-worker --since 10m --no-task-ids 2>&1 | tail -40"
  • No FATAL or ERROR messages relating to database connection in server logs.
  • No FATAL or ERROR messages relating to Redis connection in server or worker logs.
  • Database migration messages complete without errors.
  • No repeated container restart events (no started 2+ times).

4.5 Resource limits in effect

Service Memory limit CPU limit
authentik-postgres 1 G 0.75
authentik-redis 512 M 0.50
authentik-server 2 G 1.0
authentik-worker 1 G 0.75
ssh chester@10.0.0.211 \
  "docker service inspect authentik_authentik-server \
   --format '{{.Spec.TaskTemplate.Resources.Limits.MemoryBytes}}'"
# Expected: 2147483648 (2 GB)
  • Resource limits are present and match the table above.

Phase 5 — Ingress and functional verification

5.1 Traefik route registration

Traefik routes are published via traefik-kop. Verify the route is active before testing the public endpoint.

# Check Traefik router for the authentik rule
curl -fsS http://10.0.0.151:8080/api/http/routers/authentik@docker \
  | python3 -m json.tool | grep -E '"rule"|"status"'
# Expected: "rule": "Host(...sso.castaldifamily.com...)", "status": "enabled"
  • Traefik router authentik@docker exists and is enabled.
  • Router rule matches Host('sso.castaldifamily.com').
  • Middlewares include security-headers@file and ratelimit-basic@file.

5.2 HTTPS endpoint reachability

# TLS handshake and HTTP 200/302 response
curl -fsS -o /dev/null -w "%{http_code} %{ssl_verify_result}" \
  https://sso.castaldifamily.com
# Expected: 200 0  (or 302 0 for a redirect to login)
  • curl returns HTTP 200 or 302.
  • ssl_verify_result is 0 (certificate valid).
  • Response is not a Traefik 404 or 502.

5.3 Login page load

Open https://sso.castaldifamily.com in a browser.

  • Authentik login page loads without JavaScript errors.
  • Page title includes "authentik" or "Sign in".
  • No TLS certificate warning from the browser.

5.4 Admin UI readiness (if initial deploy)

Navigate to https://sso.castaldifamily.com/if/flow/initial-setup/

  • Initial setup flow is reachable on first-run bootstrap.
  • Skip this step if the instance already existed; do not re-run initial setup on an existing install.

Phase 6 — Post-deploy handoff

6.1 Monitoring integration

Authentik is referenced as the SSO provider in group_vars/all.yml:

monitoring:
  authentik_host: "https://sso.castaldifamily.com"
  • Uptime Kuma has a monitor for https://sso.castaldifamily.com.
  • Prometheus or health check system is alerting on authentik_authentik-server replica count dropping below 1.

6.2 Backup verification

  • /mnt/homelab/apps/authentik/data/database is included in backup scope.
  • A manual backup snapshot was taken before or immediately after deploy.
  • Restore procedure is documented and tested (or explicitly deferred).

6.3 Secret rotation awareness

Secret Rotation procedure
vault_authentik_secret_key Update vault → redeploy stack → running sessions are invalidated
vault_authentik_postgres_password Update vault AND postgres user password → redeploy
  • Rotation procedure is known to the deployment owner.

6.4 Evidence capture

# Save service state snapshot
ssh chester@10.0.0.211 \
  "docker service ls --filter label=com.docker.stack.namespace=authentik" \
  > ../outputs/authentik_service_snapshot_$(date +%Y%m%dT%H%M%S).txt
  • Deploy log saved to outputs/authentik_deploy_<timestamp>.log.
  • Service state snapshot saved to outputs/authentik_service_snapshot_<timestamp>.txt.
  • Deployment timestamp and verification timestamp recorded in this checklist.

6.5 Deployment sign-off

Field Value
Deployment owner
Deployment timestamp
Verification timestamp
Endpoint verified https://sso.castaldifamily.com
Final status ☐ GREEN — all phases passed

Rollback procedure

If deployment fails or causes instability, remove the stack and preserve data.

cd /home/chester/homelab/ansible

ansible-playbook \
  -i inventory/hosts.ini \
  playbooks/docker/deploy_authentik.yml \
  -e "authentik_deploy_state=absent" \
  --vault-password-file .vault_pass

Warning

authentik_deploy_state=absent removes the Swarm stack (containers, services, configs) but does not delete the bind-mount data directories. Data at /mnt/homelab/apps/authentik is preserved for re-deploy or restore.

  • Stack removed cleanly (docker stack ls shows no authentik entry).
  • Data directories still intact on swarm-manager-1.
  • Root cause identified before re-deploying.

Troubleshooting matrix

Validation assert fails: secrets not defined or placeholder

Symptom: Playbook fails on Assert vault_authentik_secret_key is defined or Assert Authentik secrets are not placeholders.

Check:

ansible -i inventory/hosts.ini localhost \
  -m ansible.builtin.debug \
  -a "var=vault_authentik_secret_key" \
  -e "@group_vars/all.yml" \
  --vault-password-file .vault_pass

Fix: Encrypt and store the correct value:

ansible-vault encrypt_string 'YOUR-KEY' \
  --name 'vault_authentik_secret_key' \
  --vault-password-file .vault_pass
# Paste output into group_vars/vault/all.yml

Validation assert fails: data paths missing

Symptom: Playbook fails on Assert required Authentik paths exist before deploy.

Check:

ssh chester@10.0.0.211 "ls -la /mnt/homelab/apps/authentik/"

Fix (fresh install only):

ssh chester@10.0.0.211 "sudo mkdir -p \
  /mnt/homelab/apps/authentik/data/{database,redis,media,config,blueprints}"

Fix (existing install): Restore from backup before creating directories.


Swarm assert fails: manager not active or not control plane

Symptom: Playbook fails on Assert target is an active Swarm manager.

Check:

ssh chester@10.0.0.211 "docker info --format '{{.Swarm.LocalNodeState}}'"

Fix: Investigate Swarm manager health. Do not proceed until a healthy quorum manager is the deploy target.


Services not converging to 1/1

Symptom: docker service ls shows 0/1 or a service cycles through restarts.

Check:

ssh chester@10.0.0.211 \
  "docker service ps authentik_authentik-server --no-trunc"

Look for failure reasons in the Error column.

Common causes:

Cause Evidence in logs Fix
Secret key mismatch cryptography error or key invalid in server logs Re-check vault value, redeploy
Postgres not healthy yet connection refused in server logs Wait for postgres (healthy), then check server
Redis not reachable redis connection error in server or worker logs Confirm authentik-redis is 1/1 healthy first
Missing bind-mount path no such file or directory in container start Create path, redeploy
Insufficient memory OOM kill in docker service ps error column Check node resources, adjust limits if needed

Traefik route not registered or 502 response

Symptom: curl https://sso.castaldifamily.com returns 502 Bad Gateway or connection refused.

Check:

# Confirm traefik-kop is running (Swarm stack)
ssh chester@10.0.0.211 \
  "docker service ls --filter name=traefik-kop"

# Check server is listening on port 9000
ssh chester@10.0.0.211 \
  "docker service ps authentik_authentik-server --filter desired-state=running"

Common causes:

  • traefik-kop is not running → deploy monitoring stack first.
  • authentik-server is not bound on port 9000 → check replica and restart.
  • edge_routing.swarm.bind_ip is incorrect in group_vars/all.yml → verify it resolves to an active Swarm node.
  • Cloudflare DNS is not pointing to 10.0.0.151 → verify DNS record for sso.castaldifamily.com.

Database migration errors on first boot

Symptom: Server logs show migration errors or relation does not exist.

Check:

ssh chester@10.0.0.211 \
  "docker service logs authentik_authentik-server --since 5m 2>&1 | grep -i 'migrat\|error\|fatal'"

Fix: Migrations run automatically on startup. If they fail:

  1. Check postgres is (healthy) and accepting connections.
  2. Check vault_authentik_postgres_password in vault matches the running postgres password.
  3. Restart the server service to trigger a re-run:
ssh chester@10.0.0.211 \
  "docker service update --force authentik_authentik-server"

Reference

Resource Location
Deploy playbook ansible/playbooks/docker/deploy_authentik.yml
Stack template ansible/templates/stacks/authentik.stack.yml
Shared variables ansible/group_vars/all.yml
Vault secrets ansible/group_vars/vault/all.yml
Authentik docs https://goauthentik.io/docs
Authentik changelog https://github.com/goauthentik/authentik/releases
Swarm cluster baseline outputs/swarm_audit_20260314T122134.md