241 lines
9.2 KiB
Markdown
241 lines
9.2 KiB
Markdown
# Ansible quality gates
|
||
|
||
This document defines the quality standards, review checklist, and validation workflow for all Ansible code in this repository.
|
||
|
||
## Philosophy
|
||
|
||
Quality gates progress through three enforcement tiers:
|
||
|
||
- **Tier 1 (Advisory):** Visible via lint warnings; not blocking. Baseline cleanup phase.
|
||
- **Tier 2 (Mandatory — current):** Must pass for swarm-impacting changes. CI enforces.
|
||
- **Tier 3 (Fully blocking):** All rules enforced on every commit. Target: Phase 3 roadmap.
|
||
|
||
**Idempotency controls are Tier 2 (mandatory now) for all stack-impacting changes.**
|
||
This means: changed_when, manager-state assertions, secret preflight asserts,
|
||
bind-mount path asserts, and validate-only mode support are required, not advisory.
|
||
|
||
## Linting
|
||
|
||
### Configuration
|
||
|
||
The repository includes [.ansible-lint](../../.ansible-lint) configuration that enforces:
|
||
|
||
* **Moderate profile** — Balanced between permissive and strict
|
||
* **Advisory rules** — No blocking on known patterns (e.g., raw commands in bootstrap playbooks)
|
||
* **Warnings** — Experimental syntax and risky permissions are flagged but not blocked
|
||
|
||
### Running lint checks
|
||
|
||
```bash
|
||
# Lint all playbooks and roles
|
||
cd /home/chester/homelab/ansible
|
||
ansible-lint
|
||
|
||
# Lint specific playbook
|
||
ansible-lint playbooks/onboarding/generic_host.yml
|
||
|
||
# Lint entire role
|
||
ansible-lint roles/monitoring_stack/
|
||
```
|
||
|
||
### Installing ansible-lint
|
||
|
||
```bash
|
||
# On control node (Ubuntu/Debian)
|
||
sudo apt-get update
|
||
sudo apt-get install -y python3-pip
|
||
pip3 install ansible-lint
|
||
|
||
# Verify installation
|
||
ansible-lint --version
|
||
```
|
||
|
||
## Quality checklist
|
||
|
||
Use this checklist when creating or reviewing playbooks and roles:
|
||
|
||
### Security
|
||
|
||
* [ ] **No SSH bypasses** — `StrictHostKeyChecking=no` is forbidden
|
||
* [ ] **Host key checking enabled** — `ansible.cfg` must have `host_key_checking = True`
|
||
* [ ] **Secrets vaulted** — No plaintext passwords in defaults, vars, or playbooks
|
||
* [ ] **Secrets validated** — Roles requiring secrets include `assert` tasks to fail fast
|
||
* [ ] **File permissions explicit** — All `file`, `copy`, `template` tasks specify `mode`
|
||
* [ ] **No root by default** — Use `become: true` only when necessary
|
||
|
||
### Idempotency
|
||
|
||
* [x] **Changed semantics** — All `command`/`shell` tasks include `changed_when` (**mandatory**)
|
||
* [x] **Error handling** — All `command`/`shell` tasks include `failed_when` or `ignore_errors` (**mandatory**)
|
||
* [x] **Check mode safe** — Playbooks can run with `--check` without errors (**mandatory**)
|
||
* [x] **Replay safe** — Running twice produces no changes on second run (**mandatory**; PR evidence required)
|
||
* [x] **Manager assertion** — Swarm manager checks use exact equality (`== 'active|true'`), not substring search (**mandatory**)
|
||
* [x] **Absent idempotency** — Stack removal checks existence first; no false `changed` when already absent (**mandatory**)
|
||
* [x] **Validate-only mode** — All stack deploy playbooks support `stack_validate_only=true` (**mandatory**)
|
||
|
||
### Modularity
|
||
|
||
* [ ] **Roles over monoliths** — Multi-task logic belongs in roles, not massive playbooks
|
||
* [ ] **Builtin modules first** — Prefer `ansible.builtin.*` over `command`/`shell`/`raw`
|
||
* [ ] **Bootstrap exception** — `raw` commands are acceptable only for pre-Python tasks
|
||
* [ ] **Variables separated** — Environment-specific values live in `group_vars`, not role defaults
|
||
|
||
### Maintainability
|
||
|
||
* [ ] **Task names descriptive** — Each task has a clear, action-oriented name
|
||
* [ ] **Tags applied** — Logical grouping with tags (e.g., `setup`, `security`, `monitoring`)
|
||
* [ ] **Documentation inline** — Complex logic includes comments explaining "why"
|
||
* [ ] **Handlers for services** — Service restarts use handlers, not inline tasks
|
||
|
||
## Mandatory pre-deploy gate (effective now — blocking for all stack changes)
|
||
|
||
> [!IMPORTANT]
|
||
> All steps below MUST pass before merging any pull request that touches
|
||
> `ansible/templates/stacks/`, `ansible/playbooks/docker/deploy_*.yml`,
|
||
> or `ansible/roles/swarm_stack_deploy/`.
|
||
> The Gitea CI workflow (`.gitea/workflows/stack-idempotency.yml`) runs
|
||
> stages 1–3 automatically on every PR. The two-run idempotency proof
|
||
> (step 6 below) must be performed manually and included as PR evidence.
|
||
|
||
For any swarm-impacting change, all checks below must pass before deployment:
|
||
|
||
```bash
|
||
cd /home/chester/homelab/ansible
|
||
|
||
# 1) Inventory parse gate
|
||
ansible-inventory -i inventory/hosts.ini --graph
|
||
|
||
# 2) Connectivity gate
|
||
ansible -i inventory/hosts.ini swarm_hosts -m ping
|
||
|
||
# 3) Swarm control-plane gate
|
||
ansible -i inventory/hosts.ini swarm_managers -m shell -a "docker info 2>/dev/null | grep -E 'Swarm:|Is Manager:'"
|
||
|
||
# 4) Playbook syntax gate
|
||
ansible-playbook -i inventory/hosts.ini playbooks/your-playbook.yml --syntax-check
|
||
|
||
# 5) Control node sanity gate
|
||
ansible-playbook -i inventory/hosts.ini playbooks/preflight/validate_control_node.yml
|
||
|
||
# 6) Validate-only preflight (no Swarm mutations — mandatory for stack changes)
|
||
ansible-playbook -i inventory/hosts.ini playbooks/docker/deploy_<service>.yml \
|
||
-e "stack_validate_only=true" \
|
||
--vault-password-file .vault_pass
|
||
|
||
# 7) TWO-RUN IDEMPOTENCY PROOF (required for stack PRs — attach output as evidence)
|
||
# Run 1: apply desired state
|
||
ansible-playbook -i inventory/hosts.ini playbooks/docker/deploy_<service>.yml \
|
||
--vault-password-file .vault_pass \
|
||
2>&1 | tee /tmp/run1.log
|
||
|
||
# Run 2: replay — MUST report changed=0 for stack tasks
|
||
ansible-playbook -i inventory/hosts.ini playbooks/docker/deploy_<service>.yml \
|
||
--vault-password-file .vault_pass \
|
||
2>&1 | tee /tmp/run2.log
|
||
|
||
# Verify: second run must show changed=0 for deploy/reconcile tasks
|
||
grep -E 'changed=[^0]' /tmp/run2.log && echo 'IDEMPOTENCY FAIL' || echo 'IDEMPOTENCY PASS'
|
||
```
|
||
|
||
## PR evidence pack (required for stack-impacting changes)
|
||
|
||
For any PR that modifies a stack template, deploy playbook, or the
|
||
`swarm_stack_deploy` role, attach the following to the PR description:
|
||
|
||
```
|
||
### Idempotency evidence
|
||
|
||
**Stack:** <service>
|
||
**Date:** YYYY-MM-DD
|
||
**Operator:** @username
|
||
|
||
**Run 1 summary:**
|
||
```
|
||
PLAY RECAP ***
|
||
swarm-manager-1 : ok=N changed=N ...
|
||
```
|
||
|
||
**Run 2 summary (must show changed=0 for stack tasks):**
|
||
```
|
||
PLAY RECAP ***
|
||
swarm-manager-1 : ok=N changed=0 ...
|
||
```
|
||
|
||
**Validate-only passed:** yes/no
|
||
**Lint passed:** yes/no (CI enforced)
|
||
**Syntax check passed:** yes/no (CI enforced)
|
||
```
|
||
|
||
> [!IMPORTANT]
|
||
> A PR that cannot demonstrate changed=0 on the second run MUST NOT be merged.
|
||
|
||
|
||
|
||
Before committing changes, always run syntax checks:
|
||
|
||
```bash
|
||
cd /home/chester/homelab/ansible
|
||
|
||
# Check specific playbook
|
||
ansible-playbook -i inventory/hosts.ini playbooks/your-playbook.yml --syntax-check
|
||
|
||
# Preflight validation (control node sanity)
|
||
ansible-playbook -i inventory/hosts.ini playbooks/preflight/validate_control_node.yml
|
||
```
|
||
|
||
## Idempotency testing
|
||
|
||
High-risk playbooks (those modifying system state) should be tested for idempotency:
|
||
|
||
```bash
|
||
# Run playbook twice; second run should report "changed=0"
|
||
ansible-playbook -i inventory/hosts.ini playbooks/your-playbook.yml
|
||
ansible-playbook -i inventory/hosts.ini playbooks/your-playbook.yml
|
||
```
|
||
|
||
## Review process
|
||
|
||
### Pre-commit (developer)
|
||
|
||
1. Run inventory parse gate and connectivity gate
|
||
2. Run syntax check on modified playbooks
|
||
3. Run ansible-lint on modified playbooks/roles (**Tier 2: mandatory for stack files**)
|
||
4. For stack changes, run validate-only preflight
|
||
5. For stack changes, run idempotency proof (two-run) and collect evidence
|
||
6. Ensure required secrets are provided via vault (no plaintext defaults)
|
||
|
||
### Pre-merge (reviewer)
|
||
|
||
1. Verify security checklist items are addressed
|
||
2. Spot-check modularity (no 500+ line playbooks)
|
||
3. Confirm environment-specific values are in inventory, not defaults
|
||
4. Confirm no root-level duplicate Ansible directories were introduced
|
||
5. **For stack changes: verify PR evidence pack is attached and shows changed=0 on second run**
|
||
6. For critical changes (security, networking), require idempotency proof
|
||
|
||
* **Weekly:** Triage Critical/High findings from drift reports
|
||
* **Biweekly:** Run preflight validation suite
|
||
* **Monthly:** Generate fresh standards-drift audit and review trends
|
||
|
||
## Roadmap
|
||
|
||
As baseline quality improves, the repository will:
|
||
|
||
1. **Phase 1 (current):** Mandatory idempotency gate for stack changes. Lint advisory for
|
||
non-stack playbooks. Gitea CI blocks stack PRs on lint + syntax + preflight failures.
|
||
`no-changed-when` promoted from skip to warn (visible everywhere).
|
||
2. **Phase 2 (3 months):** Mandatory lint for all new/modified playbooks.
|
||
`no-changed-when` moved to blocking; bootstrap exceptions suppressed inline with
|
||
`# noqa: no-changed-when` on specific tasks.
|
||
3. **Phase 3 (6 months):** Full baseline coverage, stricter profile. All remaining
|
||
idempotency violations resolved. Two-run check automated in CI for eligible stacks.
|
||
4. **Phase 4 (12 months):** Fully blocking CI on every commit. Molecule/integration
|
||
tests for multi-node Swarm scenarios.
|
||
|
||
## References
|
||
|
||
* [Ansible Best Practices](https://docs.ansible.com/ansible/latest/tips_tricks/ansible_tips_tricks.html)
|
||
* [ansible-lint documentation](https://ansible-lint.readthedocs.io/)
|
||
* [environment-constraints.md](./environment-constraints.md) — Infrastructure-specific rules
|
||
* [naming-conventions.md](./naming-conventions.md) — File and variable naming standards
|