9.2 KiB
Ansible quality gates
This document defines the quality standards, review checklist, and validation workflow for all Ansible code in this repository.
Philosophy
Quality gates progress through three enforcement tiers:
- Tier 1 (Advisory): Visible via lint warnings; not blocking. Baseline cleanup phase.
- Tier 2 (Mandatory — current): Must pass for swarm-impacting changes. CI enforces.
- Tier 3 (Fully blocking): All rules enforced on every commit. Target: Phase 3 roadmap.
Idempotency controls are Tier 2 (mandatory now) for all stack-impacting changes. This means: changed_when, manager-state assertions, secret preflight asserts, bind-mount path asserts, and validate-only mode support are required, not advisory.
Linting
Configuration
The repository includes .ansible-lint configuration that enforces:
- Moderate profile — Balanced between permissive and strict
- Advisory rules — No blocking on known patterns (e.g., raw commands in bootstrap playbooks)
- Warnings — Experimental syntax and risky permissions are flagged but not blocked
Running lint checks
# Lint all playbooks and roles
cd /home/chester/homelab/ansible
ansible-lint
# Lint specific playbook
ansible-lint playbooks/onboarding/generic_host.yml
# Lint entire role
ansible-lint roles/monitoring_stack/
Installing ansible-lint
# On control node (Ubuntu/Debian)
sudo apt-get update
sudo apt-get install -y python3-pip
pip3 install ansible-lint
# Verify installation
ansible-lint --version
Quality checklist
Use this checklist when creating or reviewing playbooks and roles:
Security
- No SSH bypasses —
StrictHostKeyChecking=nois forbidden - Host key checking enabled —
ansible.cfgmust havehost_key_checking = True - Secrets vaulted — No plaintext passwords in defaults, vars, or playbooks
- Secrets validated — Roles requiring secrets include
asserttasks to fail fast - File permissions explicit — All
file,copy,templatetasks specifymode - No root by default — Use
become: trueonly when necessary
Idempotency
- Changed semantics — All
command/shelltasks includechanged_when(mandatory) - Error handling — All
command/shelltasks includefailed_whenorignore_errors(mandatory) - Check mode safe — Playbooks can run with
--checkwithout errors (mandatory) - Replay safe — Running twice produces no changes on second run (mandatory; PR evidence required)
- Manager assertion — Swarm manager checks use exact equality (
== 'active|true'), not substring search (mandatory) - Absent idempotency — Stack removal checks existence first; no false
changedwhen already absent (mandatory) - Validate-only mode — All stack deploy playbooks support
stack_validate_only=true(mandatory)
Modularity
- Roles over monoliths — Multi-task logic belongs in roles, not massive playbooks
- Builtin modules first — Prefer
ansible.builtin.*overcommand/shell/raw - Bootstrap exception —
rawcommands are acceptable only for pre-Python tasks - Variables separated — Environment-specific values live in
group_vars, not role defaults
Maintainability
- Task names descriptive — Each task has a clear, action-oriented name
- Tags applied — Logical grouping with tags (e.g.,
setup,security,monitoring) - Documentation inline — Complex logic includes comments explaining "why"
- Handlers for services — Service restarts use handlers, not inline tasks
Mandatory pre-deploy gate (effective now — blocking for all stack changes)
Important
All steps below MUST pass before merging any pull request that touches
ansible/templates/stacks/,ansible/playbooks/docker/deploy_*.yml, oransible/roles/swarm_stack_deploy/. The Gitea CI workflow (.gitea/workflows/stack-idempotency.yml) runs stages 1–3 automatically on every PR. The two-run idempotency proof (step 6 below) must be performed manually and included as PR evidence.
For any swarm-impacting change, all checks below must pass before deployment:
cd /home/chester/homelab/ansible
# 1) Inventory parse gate
ansible-inventory -i inventory/hosts.ini --graph
# 2) Connectivity gate
ansible -i inventory/hosts.ini swarm_hosts -m ping
# 3) Swarm control-plane gate
ansible -i inventory/hosts.ini swarm_managers -m shell -a "docker info 2>/dev/null | grep -E 'Swarm:|Is Manager:'"
# 4) Playbook syntax gate
ansible-playbook -i inventory/hosts.ini playbooks/your-playbook.yml --syntax-check
# 5) Control node sanity gate
ansible-playbook -i inventory/hosts.ini playbooks/preflight/validate_control_node.yml
# 6) Validate-only preflight (no Swarm mutations — mandatory for stack changes)
ansible-playbook -i inventory/hosts.ini playbooks/docker/deploy_<service>.yml \
-e "stack_validate_only=true" \
--vault-password-file .vault_pass
# 7) TWO-RUN IDEMPOTENCY PROOF (required for stack PRs — attach output as evidence)
# Run 1: apply desired state
ansible-playbook -i inventory/hosts.ini playbooks/docker/deploy_<service>.yml \
--vault-password-file .vault_pass \
2>&1 | tee /tmp/run1.log
# Run 2: replay — MUST report changed=0 for stack tasks
ansible-playbook -i inventory/hosts.ini playbooks/docker/deploy_<service>.yml \
--vault-password-file .vault_pass \
2>&1 | tee /tmp/run2.log
# Verify: second run must show changed=0 for deploy/reconcile tasks
grep -E 'changed=[^0]' /tmp/run2.log && echo 'IDEMPOTENCY FAIL' || echo 'IDEMPOTENCY PASS'
PR evidence pack (required for stack-impacting changes)
For any PR that modifies a stack template, deploy playbook, or the
swarm_stack_deploy role, attach the following to the PR description:
### Idempotency evidence
**Stack:** <service>
**Date:** YYYY-MM-DD
**Operator:** @username
**Run 1 summary:**
PLAY RECAP *** swarm-manager-1 : ok=N changed=N ...
**Run 2 summary (must show changed=0 for stack tasks):**
PLAY RECAP *** swarm-manager-1 : ok=N changed=0 ...
**Validate-only passed:** yes/no
**Lint passed:** yes/no (CI enforced)
**Syntax check passed:** yes/no (CI enforced)
Important
A PR that cannot demonstrate changed=0 on the second run MUST NOT be merged.
Before committing changes, always run syntax checks:
cd /home/chester/homelab/ansible
# Check specific playbook
ansible-playbook -i inventory/hosts.ini playbooks/your-playbook.yml --syntax-check
# Preflight validation (control node sanity)
ansible-playbook -i inventory/hosts.ini playbooks/preflight/validate_control_node.yml
Idempotency testing
High-risk playbooks (those modifying system state) should be tested for idempotency:
# Run playbook twice; second run should report "changed=0"
ansible-playbook -i inventory/hosts.ini playbooks/your-playbook.yml
ansible-playbook -i inventory/hosts.ini playbooks/your-playbook.yml
Review process
Pre-commit (developer)
- Run inventory parse gate and connectivity gate
- Run syntax check on modified playbooks
- Run ansible-lint on modified playbooks/roles (Tier 2: mandatory for stack files)
- For stack changes, run validate-only preflight
- For stack changes, run idempotency proof (two-run) and collect evidence
- Ensure required secrets are provided via vault (no plaintext defaults)
Pre-merge (reviewer)
- Verify security checklist items are addressed
- Spot-check modularity (no 500+ line playbooks)
- Confirm environment-specific values are in inventory, not defaults
- Confirm no root-level duplicate Ansible directories were introduced
- For stack changes: verify PR evidence pack is attached and shows changed=0 on second run
- For critical changes (security, networking), require idempotency proof
- Weekly: Triage Critical/High findings from drift reports
- Biweekly: Run preflight validation suite
- Monthly: Generate fresh standards-drift audit and review trends
Roadmap
As baseline quality improves, the repository will:
- Phase 1 (current): Mandatory idempotency gate for stack changes. Lint advisory for
non-stack playbooks. Gitea CI blocks stack PRs on lint + syntax + preflight failures.
no-changed-whenpromoted from skip to warn (visible everywhere). - Phase 2 (3 months): Mandatory lint for all new/modified playbooks.
no-changed-whenmoved to blocking; bootstrap exceptions suppressed inline with# noqa: no-changed-whenon specific tasks. - Phase 3 (6 months): Full baseline coverage, stricter profile. All remaining idempotency violations resolved. Two-run check automated in CI for eligible stacks.
- Phase 4 (12 months): Fully blocking CI on every commit. Molecule/integration tests for multi-node Swarm scenarios.
References
- Ansible Best Practices
- ansible-lint documentation
- environment-constraints.md — Infrastructure-specific rules
- naming-conventions.md — File and variable naming standards