# Ansible Control Node Setup: Path to Production Readiness ## Overview Transform **Watchtower** (Raspberry Pi 5) into a production-ready Ansible control node capable of managing the entire homelab infrastructure. This guide builds the foundational runtime environment required to execute automation against Heimdall, Waldorf, and Watchtower itself. **Control Node:** Watchtower (10.0.0.200) — Raspberry Pi 5, ARM Cortex-A76, 16GB RAM **Managed Nodes:** Heimdall (10.0.0.151), Waldorf (10.0.0.251), Watchtower (localhost) **End State:** Fully configured Ansible environment with validated connectivity, encrypted secrets, and role scaffolding. **Estimated Time to Complete:** 2-3 hours (first-time setup) | 45-60 minutes (experienced operator) --- ## Time Breakdown by Phase | Phase | Description | Time Estimate | |-------|-------------|---------------| | **Phase 1** | Control Node Foundation | 20-30 minutes | | **Phase 2** | Ansible Project Configuration | 25-35 minutes | | **Phase 3** | Validation & First Automation | 15-25 minutes | | **Phase 4** | Role Scaffolding & Developer Experience | 20-30 minutes | | **Phase 5** | Final Verification & Documentation | 15-20 minutes | | **Total** | End-to-End Setup | **2-3 hours** | --- ## Prerequisites - [ ] SSH access to Watchtower as `chester` (or your primary user) - [ ] Watchtower has network access to Heimdall, Waldorf, and TerraMaster NAS - [ ] Git repository cloned to Watchtower at `/home/chester/homelab` (or similar) - [ ] Sudo privileges on Watchtower - [ ] Basic understanding of YAML syntax - [ ] VSCode with Remote-SSH extension (optional, but recommended) --- ## Phase 1: Control Node Foundation (Watchtower Setup) **Estimated Time:** 20-30 minutes ### Step 1: Install Ansible Toolchain **Time:** 10-15 minutes (depends on network speed) Connect to Watchtower via SSH and install the complete Ansible stack: ```bash # SSH to Watchtower ssh chester@10.0.0.200 # Update package index sudo apt update # Install Ansible core components sudo apt install -y ansible ansible-lint sshpass python3-pip python3-venv git # Verify Ansible installation ansible --version # Expected: ansible [core 2.x.x] or newer # Install Python API libraries pip3 install proxmoxer requests --break-system-packages # Verify ansible-lint ansible-lint --version # Expected: ansible-lint 6.x.x or newer ``` **Why these tools:** - `ansible`: Execution engine - `ansible-lint`: Code quality enforcement (aligns with `.ansible-lint` configuration) - `sshpass`: Enables password-based initial SSH key deployment - `proxmoxer`: Required for Proxmox API automation (future state) - `python3-pip`: Package manager for Python libraries --- ### Step 2: Generate SSH Keys (ED25519) **Time:** 2-3 minutes Create the SSH key pair that Ansible will use for node authentication: ```bash # Generate ED25519 key (modern, secure, fast) ssh-keygen -t ed25519 -C "ansible@watchtower" -f ~/.ssh/id_ed25519 -N "" # Set proper permissions chmod 600 ~/.ssh/id_ed25519 chmod 644 ~/.ssh/id_ed25519.pub # Verify key creation ls -lh ~/.ssh/id_ed25519* # Expected: Two files (private key and .pub) # Display public key (for manual distribution if needed) cat ~/.ssh/id_ed25519.pub ``` **Security Note:** The private key (`id_ed25519`) never leaves Watchtower. Only the `.pub` file is distributed to managed nodes. --- ### Step 3: Distribute SSH Key to Managed Nodes **Time:** 3-5 minutes Deploy the public key to all nodes (including localhost for self-management): ```bash # Deploy to Heimdall ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@10.0.0.151 # Deploy to Waldorf ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@10.0.0.251 # Deploy to localhost (Watchtower managing itself) ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@localhost # Test passwordless authentication ssh -i ~/.ssh/id_ed25519 chester@10.0.0.151 "hostname && exit" # Expected: heimdall ssh -i ~/.ssh/id_ed25519 chester@10.0.0.251 "hostname && exit" # Expected: waldorf ssh -i ~/.ssh/id_ed25519 chester@localhost "hostname && exit" # Expected: watchtower ``` **Troubleshooting:** If `ssh-copy-id` fails, ensure: 1. You can SSH to the target with password authentication first 2. The target user has a `~/.ssh` directory with proper permissions (`700`) 3. The firewall allows SSH (port 22) --- ### Step 4: Configure Passwordless Sudo (If Required) **Time:** 5-7 minutes (per node) If Ansible tasks require privilege escalation without password prompts: ```bash # On EACH node (Heimdall, Waldorf, Watchtower), run: sudo visudo # Add this line (replace 'chester' with your username): chester ALL=(ALL) NOPASSWD: ALL # Save and exit (:wq in vi) ``` **Alternative (More Secure):** Use Ansible Vault to encrypt the sudo password and configure `ansible_become_pass` in inventory. See Step 7 below. --- ## Phase 2: Ansible Project Configuration **Estimated Time:** 25-35 minutes ### Step 5: Create ansible.cfg **Time:** 5-7 minutes Navigate to the homelab repository on Watchtower and create the main configuration file: ```bash cd ~/homelab/ansible cat > ansible.cfg <<'EOF' [defaults] # Inventory Configuration inventory = ./inventory/hosts.yml host_key_checking = False # SSH Behavior remote_user = chester private_key_file = ~/.ssh/id_ed25519 timeout = 30 forks = 3 # Output & Logging stdout_callback = yaml display_skipped_hosts = False display_ok_hosts = True log_path = ./ansible.log # Vault Configuration vault_password_file = ./.vault_pass # Role Path roles_path = ./roles # Retry Configuration retry_files_enabled = False [privilege_escalation] become = True become_method = sudo become_user = root become_ask_pass = False [ssh_connection] ssh_args = -o ControlMaster=auto -o ControlPersist=60s pipelining = True EOF # Verify syntax ansible-config dump --only-changed ``` **Key Decisions:** - `host_key_checking = False`: Simplifies homelab automation (acceptable for trusted private network) - `vault_password_file`: Points to `.vault_pass` (created in Step 7) - `forks = 3`: Limits parallel execution (prevents overwhelming Pi resources) - `pipelining = True`: Performance optimization --- ### Step 6: Create Inventory Structure **Time:** 8-10 minutes Define the three-node infrastructure with hybrid grouping (hardware type + function): ```bash # Create inventory directory mkdir -p ~/homelab/ansible/inventory/group_vars/all # Create main inventory file cat > ~/homelab/ansible/inventory/hosts.yml <<'EOF' --- # Homelab Infrastructure Inventory # Control Node: Watchtower (10.0.0.200) all: vars: ansible_user: chester ansible_ssh_private_key_file: ~/.ssh/id_ed25519 children: # --- Hardware Hierarchy --- proxmox_vms: hosts: heimdall: ansible_host: 10.0.0.151 physical_servers: hosts: waldorf: ansible_host: 10.0.0.251 # GPU passthrough capability gpu_enabled: true gpu_type: nvidia raspberry_pi: hosts: watchtower: ansible_host: localhost ansible_connection: local # --- Functional Hierarchy --- infrastructure: hosts: heimdall: watchtower: media_servers: hosts: waldorf: docker_hosts: hosts: heimdall: waldorf: watchtower: EOF # Validate inventory ansible-inventory --list ansible-inventory --graph ``` **Inventory Design:** - **Hardware groups** (`proxmox_vms`, `physical_servers`, `raspberry_pi`): Target based on architecture - **Functional groups** (`infrastructure`, `media_servers`, `docker_hosts`): Target based on role - **Localhost optimization**: Watchtower uses `ansible_connection: local` (no SSH overhead) --- ### Step 7: Initialize Ansible Vault **Time:** 12-15 minutes Create encrypted storage for sensitive variables (passwords, API keys, tokens): ```bash cd ~/homelab/ansible # Create vault password file echo "YourSecureVaultPassword123!" > .vault_pass chmod 600 .vault_pass # CRITICAL: Add to .gitignore echo ".vault_pass" >> ../.gitignore # Create encrypted variable file cat > inventory/group_vars/all/vault.yml <<'EOF' --- # Encrypted Secrets (Ansible Vault) # Edit with: ansible-vault edit inventory/group_vars/all/vault.yml vault_sudo_password: "YourSudoPasswordHere" vault_nfs_password: "" # If NFS requires auth vault_proxmox_api_token: "" # For future Proxmox automation vault_gitea_token: "" # For Git automation EOF # Encrypt the file ansible-vault encrypt inventory/group_vars/all/vault.yml # Verify encryption cat inventory/group_vars/all/vault.yml # Expected: $ANSIBLE_VAULT... (encrypted content) # Test decryption ansible-vault view inventory/group_vars/all/vault.yml # Expected: Original YAML content ``` **Usage Pattern:** 1. Store all secrets in `vault.yml` with `vault_` prefix 2. Reference in playbooks/roles as: `become_pass: "{{ vault_sudo_password }}"` 3. Edit encrypted file: `ansible-vault edit inventory/group_vars/all/vault.yml` --- ## Phase 3: Validation & First Automation **Estimated Time:** 15-25 minutes ### Step 8: Create Connectivity Validation Playbook **Time:** 8-10 minutes Build a simple playbook to prove the entire stack works: ```bash mkdir -p ~/homelab/ansible/playbooks cat > ~/homelab/ansible/playbooks/validate-connectivity.yml <<'EOF' --- - name: Ansible Environment Validation hosts: all gather_facts: true tasks: - name: Test ping module ansible.builtin.ping: - name: Display node facts ansible.builtin.debug: msg: | Hostname: {{ ansible_hostname }} OS: {{ ansible_distribution }} {{ ansible_distribution_version }} Architecture: {{ ansible_architecture }} Python: {{ ansible_python_version }} - name: Test privilege escalation ansible.builtin.command: cmd: whoami become: true register: sudo_test changed_when: false - name: Verify sudo worked ansible.builtin.assert: that: - sudo_test.stdout == "root" success_msg: "Privilege escalation: PASS" fail_msg: "Privilege escalation: FAIL" - name: Check NFS mount (infrastructure nodes only) ansible.builtin.stat: path: /mnt/appdata register: nfs_mount when: inventory_hostname in groups['infrastructure'] - name: Display NFS status ansible.builtin.debug: msg: "NFS mount exists: {{ nfs_mount.stat.exists | default(false) }}" when: inventory_hostname in groups['infrastructure'] EOF # Validate playbook syntax ansible-playbook playbooks/validate-connectivity.yml --syntax-check # Lint check (must pass with zero errors) ansible-lint playbooks/validate-connectivity.yml ``` --- ### Step 9: Execute Validation Playbook **Time:** 7-15 minutes (includes troubleshooting) Run the playbook to confirm end-to-end functionality: ```bash cd ~/homelab/ansible # Dry-run first (check mode) ansible-playbook playbooks/validate-connectivity.yml --check # Full execution ansible-playbook playbooks/validate-connectivity.yml # Expected output summary: # heimdall : ok=6 changed=0 unreachable=0 failed=0 # waldorf : ok=6 changed=0 unreachable=0 failed=0 # watchtower : ok=6 changed=0 unreachable=0 failed=0 ``` **Success Criteria:** - All hosts return `ok` status (no `unreachable` or `failed`) - Sudo test shows "Privilege escalation: PASS" - Facts display correct OS/architecture for each node **Troubleshooting:** - `unreachable`: Check SSH keys, network connectivity, firewall - `failed` on sudo: Verify passwordless sudo or Vault configuration - Lint errors: Fix YAML indentation, task naming, FQCN usage --- ## Phase 4: Role Scaffolding & Developer Experience **Estimated Time:** 20-30 minutes ### Step 10: Create Standard Role Directory Structure **Time:** 5-7 minutes Generate the skeleton for reusable Ansible roles: ```bash cd ~/homelab/ansible # Create roles directory mkdir -p roles # Generate a sample role (follows .ansible-standards.md) cat > roles/setup-docker.yml <<'EOF' --- # Placeholder: This will become a proper role with: # - roles/setup-docker/tasks/main.yml # - roles/setup-docker/defaults/main.yml # - roles/setup-docker/handlers/main.yml # - roles/setup-docker/meta/main.yml # # For now, just structural validation. EOF # Create external roles directory (for Ansible Galaxy) mkdir -p roles/external # Update .ansible-lint exclusion (already configured) grep -q "roles/external" .ansible-lint && echo "✅ Lint exclusion exists" ``` **Next Steps (Future):** - Install Galaxy roles: `ansible-galaxy install geerlingguy.docker -p roles/external/` - Create custom roles following the patterns in `.ansible-standards.md` - Use `molecule` for role testing (installed in Step 1) --- ### Step 11: Configure VSCode Remote Development **Time:** 15-20 minutes (includes extension installation) Connect VSCode from your Windows workstation to Watchtower for seamless editing: **On Windows:** 1. Install **Remote - SSH** extension in VSCode 2. Open Command Palette (`Ctrl+Shift+P`) → "Remote-SSH: Connect to Host" 3. Enter: `chester@10.0.0.200` 4. VSCode opens a new window connected to Watchtower 5. Navigate to `/home/chester/homelab/ansible` 6. Install extensions **on the remote** (VSCode will prompt): - Ansible (by Red Hat) - YAML (by Red Hat) **Verify:** - Open `ansible.cfg` → Syntax highlighting works - Open `playbooks/validate-connectivity.yml` → Ansible linting shows in Problems panel - Terminal in VSCode → Runs commands directly on Watchtower --- ## Phase 5: Final Verification & Documentation **Estimated Time:** 15-20 minutes ### Step 12: Execute Full Environment Test **Time:** 10-12 minutes Run comprehensive checks to certify the environment: ```bash cd ~/homelab/ansible # 1. Lint all playbooks ansible-lint playbooks/*.yml # 2. Configuration dump ansible-config dump --only-changed # 3. Inventory validation ansible-inventory --list --yaml # 4. Ad-hoc ping test ansible all -m ping # 5. Fact gathering test ansible all -m setup -a "filter=ansible_distribution*" # 6. Vault operations test ansible-vault view inventory/group_vars/all/vault.yml ansible-vault edit inventory/group_vars/all/vault.yml # Add a test variable, save, exit # 7. Privilege escalation test ansible all -m command -a "whoami" --become # 8. Full playbook run ansible-playbook playbooks/validate-connectivity.yml ``` **Success State:** - ✅ Zero lint errors - ✅ All nodes respond to `ping` - ✅ Facts gathered from all hosts - ✅ Vault encrypt/decrypt cycle works - ✅ Sudo escalation succeeds - ✅ Validation playbook completes with no failures --- ### Step 13: Update Repository Documentation **Time:** 5-8 minutes Document the new Ansible capabilities: ```bash cd ~/homelab # Update main README cat >> README.md <<'EOF' ## Ansible Automation **Control Node:** Watchtower (10.0.0.200) **Managed Nodes:** Heimdall, Waldorf, Watchtower **Quick Start:** ```bash # SSH to control node ssh chester@10.0.0.200 # Run validation cd ~/homelab/ansible ansible-playbook playbooks/validate-connectivity.yml # Ad-hoc commands ansible all -m ping ansible docker_hosts -m command -a "docker --version" ``` **Configuration:** - Inventory: `ansible/inventory/hosts.yml` - Main Config: `ansible/ansible.cfg` - Secrets: `ansible/inventory/group_vars/all/vault.yml` (encrypted) **Standards:** See `ansible/.ansible-standards.md` for architectural patterns. EOF # Commit the changes git add ansible/ git commit -m "feat(ansible): complete control node setup on Watchtower - Install ansible-core, ansible-lint, proxmoxer - Generate ED25519 SSH keys and distribute to nodes - Create ansible.cfg with vault integration - Build YAML inventory with hardware + functional grouping - Initialize Ansible Vault for secret management - Create validate-connectivity.yml playbook - Verify end-to-end automation capability Environment tested and production-ready." git push origin main ``` --- ## Maintenance & Next Steps ### Ongoing Operations **Update Ansible:** ```bash sudo apt update && sudo apt upgrade ansible ansible-lint ``` **Rotate Vault Password:** ```bash ansible-vault rekey inventory/group_vars/all/vault.yml # Update .vault_pass file with new password ``` **Add New Managed Node:** 1. Deploy SSH key: `ssh-copy-id -i ~/.ssh/id_ed25519.pub user@new-host` 2. Add to `inventory/hosts.yml` 3. Test: `ansible new-host -m ping` ### Future Enhancements 1. **Proxmox Automation:** - Create playbook to manage VM creation/deletion via Proxmox API - Use `proxmoxer` library (already installed) 2. **Docker Stack Management:** - Ansible role to deploy Compose stacks (replacing manual Git pulls) - Integration with Komodo API for automated deployments 3. **System Maintenance:** - Scheduled playbook for OS updates (`apt update && upgrade`) - NFS mount validation and auto-remediation - Log rotation and backup verification 4. **CI/CD Integration:** - Gitea webhook triggers Ansible playbook runs - Automated testing via Molecule in Docker containers --- ## Troubleshooting ### Issue: "Host key verification failed" **Cause:** SSH strict host checking is enabled despite `ansible.cfg` setting. **Fix:** ```bash # Clear known_hosts rm ~/.ssh/known_hosts # Force disable in SSH config cat >> ~/.ssh/config < ~/homelab/ansible/.vault_pass chmod 600 ~/homelab/ansible/.vault_pass # Verify ansible.cfg points to it grep vault_password_file ~/homelab/ansible/ansible.cfg ``` --- ### Issue: Lint errors on playbook execution **Cause:** Code violates `.ansible-lint` safety profile rules. **Fix:** ```bash # Run linter to see specific violations ansible-lint playbooks/YOUR_PLAYBOOK.yml # Common fixes: # - Use FQCN: ansible.builtin.command instead of 'command' # - Add 'name:' to all tasks # - Use changed_when/failed_when for shell/command tasks # - Add check_mode support for idempotency testing ``` --- ## Summary Checklist - [ ] Ansible toolchain installed on Watchtower - [ ] ED25519 SSH keys generated and distributed - [ ] Passwordless sudo configured (or Vault password set) - [ ] `ansible.cfg` created and validated - [ ] Inventory file with all three nodes defined - [ ] Ansible Vault initialized and `.vault_pass` secured - [ ] Validation playbook created and linted - [ ] First playbook run successful (all hosts green) - [ ] VSCode Remote-SSH connected to Watchtower - [ ] Repository documentation updated - [ ] Git commit pushed to Gitea **Environment Status:** 🟢 **PRODUCTION READY** --- **Document Version:** 1.0 **Last Updated:** April 12, 2026 **Author:** FrankGPT (Ansible Architect Mode) **Review Cycle:** Quarterly or after infrastructure changes