--- name: security-ansible-hardening description: "MEDIUM: Ansible security hardening - SSH configuration, sudo security, and host-level security controls. Phase 3 of security hardening." --- # [ROLE] You are an **Infrastructure Security Engineer** specializing in Ansible automation security and Linux host hardening. Your goal is to secure Ansible automation workflows and managed hosts without disrupting operations. # [GOAL] Harden Ansible security posture by: 1. Implementing secure SSH configuration (host key checking) 2. Configuring least-privilege sudo access 3. Enabling host-level firewalls (UFW) 4. Securing Ansible Vault password files 5. Implementing fail2ban for brute-force protection # [INPUT CONTEXT] 1. **Environment**: Multi-node homelab managed via Ansible 2. **Current State**: - SSH host key checking disabled - Passwordless sudo without restrictions - No host firewalls (UFW disabled) - Vault password file permissions not verified 3. **Managed Nodes**: Proxmox (root), Docker nodes (chester user), Raspberry Pi (chester user) # [FINDINGS TO ADDRESS] ## 🟠 Ansible Configuration Security 1. `ansible/ansible.cfg:34` - `host_key_checking = False` 2. `ansible/ansible.cfg:35` - `StrictHostKeyChecking=no` 3. `ansible/ansible.cfg:30` - `become_ask_pass = False` 4. `ansible/ansible.cfg:11` - Vault password file permissions not enforced ## 🟡 Host Security Controls 1. `ansible/group_vars/all.yml:29` - UFW disabled 2. `ansible/group_vars/all.yml:30` - fail2ban disabled 3. No SSH key rotation policy 4. No sudo command restrictions # [NON-NEGOTIABLES] - **Gradual Rollout**: Enable security controls one node at a time - **Maintain Access**: Never lock yourself out during SSH hardening - **Test Playbooks**: Validate all changes with `--check` mode first - **Document Exceptions**: Some settings (like Proxmox root access) may have valid reasons # [WORKFLOW] ## Gate 0 — Current State Assessment Run these validation commands: ```bash # Check vault password file permissions ls -la ansible/vault/.vault_pass # Check SSH key distribution ansible all -m shell -a "ls -la ~/.ssh/authorized_keys" # Check sudo configuration ansible all -b -m shell -a "grep -r NOPASSWD /etc/sudoers*" # Check firewall status ansible all -b -m shell -a "ufw status" ``` Create inventory of current security posture. **Required confirmation**: `ASSESSMENT COMPLETE: nodes evaluated` ## Step 1 — Vault Password File Security ### Current Risk Vault password file may have insecure permissions allowing read by other users. ### Remediation ```yaml # Add to ansible/playbooks/secure-vault-file.yml --- - name: Secure Ansible Vault password file hosts: localhost gather_facts: false tasks: - name: Check vault password file exists ansible.builtin.stat: path: "{{ playbook_dir }}/../vault/.vault_pass" register: vault_pass_file - name: Ensure vault password file has secure permissions ansible.builtin.file: path: "{{ playbook_dir }}/../vault/.vault_pass" mode: '0600' owner: "{{ ansible_user_id }}" when: vault_pass_file.stat.exists - name: Verify vault directory permissions ansible.builtin.file: path: "{{ playbook_dir }}/../vault" mode: '0700' state: directory ``` ## Step 2 — SSH Host Key Management ### Phase 2a: Populate known_hosts Before enabling strict host key checking, populate known_hosts for all managed nodes. ```yaml # ansible/playbooks/populate-known-hosts.yml --- - name: Populate SSH known_hosts for all managed nodes hosts: localhost gather_facts: false vars: ansible_connection: local tasks: - name: Scan SSH host keys ansible.builtin.shell: | ssh-keyscan -H {{ item }} >> ~/.ssh/known_hosts 2>/dev/null loop: "{{ groups['all'] | map('extract', hostvars, 'ansible_host') | list }}" changed_when: false - name: Remove duplicate entries ansible.builtin.shell: | sort -u ~/.ssh/known_hosts > ~/.ssh/known_hosts.tmp mv ~/.ssh/known_hosts.tmp ~/.ssh/known_hosts chmod 600 ~/.ssh/known_hosts changed_when: false ``` ### Phase 2b: Enable Host Key Checking After known_hosts is populated, update ansible.cfg: ```ini # ansible/ansible.cfg [defaults] host_key_checking = True # Changed from False [ssh_connection] # Remove -o StrictHostKeyChecking=no ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o UserKnownHostsFile=~/.ssh/known_hosts ``` ### Phase 2c: Verification ```bash # Test connection to all hosts ansible all -m ping # Should succeed without warnings ``` ## Step 3 — Sudo Security Configuration ### Current Risk `become_ask_pass = False` assumes all nodes have unrestricted NOPASSWD sudo. ### Recommended Approach Create restricted sudoers files for automation: ```yaml # ansible/playbooks/configure-sudo-security.yml --- - name: Configure secure sudo for Ansible automation hosts: all become: true tasks: - name: Create ansible-automation sudoers file ansible.builtin.copy: dest: /etc/sudoers.d/50-ansible-automation content: | # Ansible automation - restricted sudo commands # User: {{ ansible_user }} # Package management {{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/bin/apt, /usr/bin/apt-get, /usr/bin/dpkg # Service management {{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/bin/systemctl # Docker operations {{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/bin/docker # File operations in managed paths only {{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/bin/mkdir -p /mnt/appdata/* {{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/bin/chown -R * /mnt/appdata/* # UFW firewall {{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/sbin/ufw mode: '0440' validate: 'visudo -cf %s' - name: Remove unrestricted sudo access ansible.builtin.lineinfile: path: /etc/sudoers.d/90-cloud-init-users regexp: '^{{ ansible_user }}\s+ALL=\(ALL\)\s+NOPASSWD:\s+ALL$' state: absent when: ansible_distribution == "Ubuntu" ``` ### Alternative: Keep Unrestricted but Add Logging If restricted sudo is too limiting: ```yaml # Enable sudo logging - name: Enable sudo command logging ansible.builtin.lineinfile: path: /etc/sudoers line: 'Defaults log_output' validate: 'visudo -cf %s' ``` ## Step 4 — Host Firewall Configuration ### Phase 4a: Create UFW Role ```yaml # ansible/roles/ufw_baseline/tasks/main.yml --- - name: Install UFW ansible.builtin.apt: name: ufw state: present update_cache: yes - name: Set UFW default policies community.general.ufw: direction: "{{ item.direction }}" policy: "{{ item.policy }}" loop: - { direction: 'incoming', policy: 'deny' } - { direction: 'outgoing', policy: 'allow' } - { direction: 'routed', policy: 'allow' } - name: Allow SSH (prevent lockout) community.general.ufw: rule: allow port: '22' proto: tcp comment: 'SSH access' - name: Allow service-specific ports community.general.ufw: rule: allow port: "{{ item.port }}" proto: "{{ item.proto }}" comment: "{{ item.comment }}" loop: "{{ ufw_allowed_ports | default([]) }}" - name: Enable UFW community.general.ufw: state: enabled when: ufw_enable_firewall | default(false) ``` ### Phase 4b: Define Per-Node Firewall Rules ```yaml # ansible/inventory/host_vars/heimdall.yml ufw_allowed_ports: - { port: '80', proto: 'tcp', comment: 'HTTP - Traefik' } - { port: '443', proto: 'tcp', comment: 'HTTPS - Traefik' } - { port: '9120', proto: 'tcp', comment: 'Komodo Core' } - { port: '2377', proto: 'tcp', comment: 'Docker Swarm (if used)' } ufw_enable_firewall: true ``` ### Phase 4c: Gradual Rollout Test on one node first: ```bash # Test on watchtower (non-critical node) ansible watchtower -m include_role -a name=ufw_baseline --check # Apply if check succeeds ansible watchtower -m include_role -a name=ufw_baseline # Verify SSH still works ansible watchtower -m ping # Roll out to other nodes ansible docker_nodes -m include_role -a name=ufw_baseline ``` ## Step 5 — Fail2ban Configuration ### Basic Fail2ban Role ```yaml # ansible/roles/fail2ban/tasks/main.yml --- - name: Install fail2ban ansible.builtin.apt: name: fail2ban state: present - name: Configure fail2ban for SSH ansible.builtin.copy: dest: /etc/fail2ban/jail.local content: | [DEFAULT] bantime = 1h findtime = 10m maxretry = 5 [sshd] enabled = true port = ssh logpath = /var/log/auth.log mode: '0644' notify: Restart fail2ban - name: Ensure fail2ban is running ansible.builtin.systemd: name: fail2ban state: started enabled: yes ``` ## Gate 1 — Pre-Deployment Testing Run all playbooks in check mode: ```bash ansible-playbook ansible/playbooks/secure-vault-file.yml --check ansible-playbook ansible/playbooks/populate-known-hosts.yml --check ansible-playbook ansible/playbooks/configure-sudo-security.yml --check ansible all -m include_role -a name=ufw_baseline --check ansible all -m include_role -a name=fail2ban --check ``` **Required confirmation**: `CHECKS PASSED: Ready for deployment` ## Step 6 — Phased Deployment Deploy in this order: 1. **Local security** (vault file, known_hosts) 2. **Test node** (watchtower) - full hardening 3. **Docker nodes** (heimdall, waldorf) - after validating watchtower 4. **Proxmox** (pve01) - last, as it's most critical # [OUTPUT FORMAT] ## Security Hardening Plan ```markdown ## Phase 1: Ansible Controller Security - [ ] Secure vault password file (chmod 600) - [ ] Populate SSH known_hosts - [ ] Enable host key checking in ansible.cfg - [ ] Test: `ansible all -m ping` ## Phase 2: Sudo Hardening - [ ] Create restricted sudoers on watchtower (test node) - [ ] Validate Ansible operations still work - [ ] Roll out to remaining nodes - [ ] Document sudo command allowlist ## Phase 3: Host Firewalls - [ ] Deploy UFW role to watchtower - [ ] Verify SSH access maintained - [ ] Verify Docker services accessible - [ ] Roll out to docker_nodes group - [ ] Configure Proxmox firewall separately (PVE-specific) ## Phase 4: Intrusion Detection - [ ] Deploy fail2ban to all nodes - [ ] Configure SSH jail - [ ] Test ban/unban procedures - [ ] Set up alerting (optional) ``` ## Rollback Procedures ```markdown ### If locked out after UFW enable: 1. Access via Proxmox console (for VMs/LXC) 2. Run: `sudo ufw disable` 3. Fix rule, re-enable ### If sudo restrictions break Ansible: 1. SSH to node manually 2. `sudo visudo -f /etc/sudoers.d/50-ansible-automation` 3. Add required commands or remove file ``` # [VALIDATION CHECKLIST] After each phase: ```bash # Connectivity test ansible all -m ping # Privilege escalation test ansible all -b -m shell -a "whoami" # Service verification ansible docker_nodes -b -m shell -a "docker ps" # Firewall status ansible all -b -m shell -a "ufw status numbered" ``` # [SUCCESS CRITERIA] - [ ] SSH host key checking enabled without connection failures - [ ] Sudo access restricted and logged - [ ] UFW enabled on all Docker nodes with service-specific rules - [ ] Fail2ban active and monitoring SSH - [ ] Vault password file secured (600 permissions) - [ ] All Ansible playbooks execute successfully - [ ] No SSH lockouts occurred - [ ] Documentation updated with security procedures