# Ansible quality gates This document defines the quality standards, review checklist, and validation workflow for all Ansible code in this repository. ## Philosophy Quality gates progress through three enforcement tiers: - **Tier 1 (Advisory):** Visible via lint warnings; not blocking. Baseline cleanup phase. - **Tier 2 (Mandatory — current):** Must pass for swarm-impacting changes. CI enforces. - **Tier 3 (Fully blocking):** All rules enforced on every commit. Target: Phase 3 roadmap. **Idempotency controls are Tier 2 (mandatory now) for all stack-impacting changes.** This means: changed_when, manager-state assertions, secret preflight asserts, bind-mount path asserts, and validate-only mode support are required, not advisory. ## Linting ### Configuration The repository includes [.ansible-lint](../../.ansible-lint) configuration that enforces: * **Moderate profile** — Balanced between permissive and strict * **Advisory rules** — No blocking on known patterns (e.g., raw commands in bootstrap playbooks) * **Warnings** — Experimental syntax and risky permissions are flagged but not blocked ### Running lint checks ```bash # Lint all playbooks and roles cd /home/chester/homelab/ansible ansible-lint # Lint specific playbook ansible-lint playbooks/onboarding/generic_host.yml # Lint entire role ansible-lint roles/monitoring_stack/ ``` ### Installing ansible-lint ```bash # On control node (Ubuntu/Debian) sudo apt-get update sudo apt-get install -y python3-pip pip3 install ansible-lint # Verify installation ansible-lint --version ``` ## Quality checklist Use this checklist when creating or reviewing playbooks and roles: ### Security * [ ] **No SSH bypasses** — `StrictHostKeyChecking=no` is forbidden * [ ] **Host key checking enabled** — `ansible.cfg` must have `host_key_checking = True` * [ ] **Secrets vaulted** — No plaintext passwords in defaults, vars, or playbooks * [ ] **Secrets validated** — Roles requiring secrets include `assert` tasks to fail fast * [ ] **File permissions explicit** — All `file`, `copy`, `template` tasks specify `mode` * [ ] **No root by default** — Use `become: true` only when necessary ### Idempotency * [x] **Changed semantics** — All `command`/`shell` tasks include `changed_when` (**mandatory**) * [x] **Error handling** — All `command`/`shell` tasks include `failed_when` or `ignore_errors` (**mandatory**) * [x] **Check mode safe** — Playbooks can run with `--check` without errors (**mandatory**) * [x] **Replay safe** — Running twice produces no changes on second run (**mandatory**; PR evidence required) * [x] **Manager assertion** — Swarm manager checks use exact equality (`== 'active|true'`), not substring search (**mandatory**) * [x] **Absent idempotency** — Stack removal checks existence first; no false `changed` when already absent (**mandatory**) * [x] **Validate-only mode** — All stack deploy playbooks support `stack_validate_only=true` (**mandatory**) ### Modularity * [ ] **Roles over monoliths** — Multi-task logic belongs in roles, not massive playbooks * [ ] **Builtin modules first** — Prefer `ansible.builtin.*` over `command`/`shell`/`raw` * [ ] **Bootstrap exception** — `raw` commands are acceptable only for pre-Python tasks * [ ] **Variables separated** — Environment-specific values live in `group_vars`, not role defaults ### Maintainability * [ ] **Task names descriptive** — Each task has a clear, action-oriented name * [ ] **Tags applied** — Logical grouping with tags (e.g., `setup`, `security`, `monitoring`) * [ ] **Documentation inline** — Complex logic includes comments explaining "why" * [ ] **Handlers for services** — Service restarts use handlers, not inline tasks ## Mandatory pre-deploy gate (effective now — blocking for all stack changes) > [!IMPORTANT] > All steps below MUST pass before merging any pull request that touches > `ansible/templates/stacks/`, `ansible/playbooks/docker/deploy_*.yml`, > or `ansible/roles/swarm_stack_deploy/`. > The Gitea CI workflow (`.gitea/workflows/stack-idempotency.yml`) runs > stages 1–3 automatically on every PR. The two-run idempotency proof > (step 6 below) must be performed manually and included as PR evidence. For any swarm-impacting change, all checks below must pass before deployment: ```bash cd /home/chester/homelab/ansible # 1) Inventory parse gate ansible-inventory -i inventory/hosts.ini --graph # 2) Connectivity gate ansible -i inventory/hosts.ini swarm_hosts -m ping # 3) Swarm control-plane gate ansible -i inventory/hosts.ini swarm_managers -m shell -a "docker info 2>/dev/null | grep -E 'Swarm:|Is Manager:'" # 4) Playbook syntax gate ansible-playbook -i inventory/hosts.ini playbooks/your-playbook.yml --syntax-check # 5) Control node sanity gate ansible-playbook -i inventory/hosts.ini playbooks/preflight/validate_control_node.yml # 6) Validate-only preflight (no Swarm mutations — mandatory for stack changes) ansible-playbook -i inventory/hosts.ini playbooks/docker/deploy_.yml \ -e "stack_validate_only=true" \ --vault-password-file .vault_pass # 7) TWO-RUN IDEMPOTENCY PROOF (required for stack PRs — attach output as evidence) # Run 1: apply desired state ansible-playbook -i inventory/hosts.ini playbooks/docker/deploy_.yml \ --vault-password-file .vault_pass \ 2>&1 | tee /tmp/run1.log # Run 2: replay — MUST report changed=0 for stack tasks ansible-playbook -i inventory/hosts.ini playbooks/docker/deploy_.yml \ --vault-password-file .vault_pass \ 2>&1 | tee /tmp/run2.log # Verify: second run must show changed=0 for deploy/reconcile tasks grep -E 'changed=[^0]' /tmp/run2.log && echo 'IDEMPOTENCY FAIL' || echo 'IDEMPOTENCY PASS' ``` ## PR evidence pack (required for stack-impacting changes) For any PR that modifies a stack template, deploy playbook, or the `swarm_stack_deploy` role, attach the following to the PR description: ``` ### Idempotency evidence **Stack:** **Date:** YYYY-MM-DD **Operator:** @username **Run 1 summary:** ``` PLAY RECAP *** swarm-manager-1 : ok=N changed=N ... ``` **Run 2 summary (must show changed=0 for stack tasks):** ``` PLAY RECAP *** swarm-manager-1 : ok=N changed=0 ... ``` **Validate-only passed:** yes/no **Lint passed:** yes/no (CI enforced) **Syntax check passed:** yes/no (CI enforced) ``` > [!IMPORTANT] > A PR that cannot demonstrate changed=0 on the second run MUST NOT be merged. Before committing changes, always run syntax checks: ```bash cd /home/chester/homelab/ansible # Check specific playbook ansible-playbook -i inventory/hosts.ini playbooks/your-playbook.yml --syntax-check # Preflight validation (control node sanity) ansible-playbook -i inventory/hosts.ini playbooks/preflight/validate_control_node.yml ``` ## Idempotency testing High-risk playbooks (those modifying system state) should be tested for idempotency: ```bash # Run playbook twice; second run should report "changed=0" ansible-playbook -i inventory/hosts.ini playbooks/your-playbook.yml ansible-playbook -i inventory/hosts.ini playbooks/your-playbook.yml ``` ## Review process ### Pre-commit (developer) 1. Run inventory parse gate and connectivity gate 2. Run syntax check on modified playbooks 3. Run ansible-lint on modified playbooks/roles (**Tier 2: mandatory for stack files**) 4. For stack changes, run validate-only preflight 5. For stack changes, run idempotency proof (two-run) and collect evidence 6. Ensure required secrets are provided via vault (no plaintext defaults) ### Pre-merge (reviewer) 1. Verify security checklist items are addressed 2. Spot-check modularity (no 500+ line playbooks) 3. Confirm environment-specific values are in inventory, not defaults 4. Confirm no root-level duplicate Ansible directories were introduced 5. **For stack changes: verify PR evidence pack is attached and shows changed=0 on second run** 6. For critical changes (security, networking), require idempotency proof * **Weekly:** Triage Critical/High findings from drift reports * **Biweekly:** Run preflight validation suite * **Monthly:** Generate fresh standards-drift audit and review trends ## Roadmap As baseline quality improves, the repository will: 1. **Phase 1 (current):** Mandatory idempotency gate for stack changes. Lint advisory for non-stack playbooks. Gitea CI blocks stack PRs on lint + syntax + preflight failures. `no-changed-when` promoted from skip to warn (visible everywhere). 2. **Phase 2 (3 months):** Mandatory lint for all new/modified playbooks. `no-changed-when` moved to blocking; bootstrap exceptions suppressed inline with `# noqa: no-changed-when` on specific tasks. 3. **Phase 3 (6 months):** Full baseline coverage, stricter profile. All remaining idempotency violations resolved. Two-run check automated in CI for eligible stacks. 4. **Phase 4 (12 months):** Fully blocking CI on every commit. Molecule/integration tests for multi-node Swarm scenarios. ## References * [Ansible Best Practices](https://docs.ansible.com/ansible/latest/tips_tricks/ansible_tips_tricks.html) * [ansible-lint documentation](https://ansible-lint.readthedocs.io/) * [environment-constraints.md](./environment-constraints.md) — Infrastructure-specific rules * [naming-conventions.md](./naming-conventions.md) — File and variable naming standards