Created Files

security-secrets-remediation.prompt.md - Phase 1 (CRITICAL) Eliminates hardcoded secrets (Docker Registry, Komodo, Plex) Creates .env templates and migration workflow Priority: Immediate (This Week) security-container-hardening.prompt.md - Phase 2 (HIGH) Removes privileged containers Converts root users to non-root (PUID/PGID) Secures Docker socket access patterns Priority: Short Term (This Month) security-ansible-hardening.prompt.md - Phase 3 (MEDIUM) Enables SSH host key checking Implements restricted sudo rules Deploys UFW firewalls and fail2ban Priority: Medium Term (Next Month) security-network-access.prompt.md - Phase 4 (MEDIUM) Restricts port exposure (0.0.0.0 → 127.0.0.1) Implements network segmentation Adds authentication middleware Priority: Ongoing (Next Quarter) Each prompt follows your existing format with: ✅ Gated workflows with confirmation checkpoints ✅ Rollback procedures for safety ✅ Testing and validation steps ✅ Incremental deployment strategies ✅ Clear success criteria
2026-04-19 18:25:46 -04:00 · 2026-04-19 18:25:46 -04:00 · 129b7eee1b
commit 129b7eee1b
parent 417501dbd1
4 changed files with 1334 additions and 0 deletions
--- a/.github/prompts/security-ansible-hardening.prompt.md
+++ b/.github/prompts/security-ansible-hardening.prompt.md
@ -0,0 +1,406 @@
+---
+name: security-ansible-hardening
+description: "MEDIUM: Ansible security hardening - SSH configuration, sudo security, and host-level security controls. Phase 3 of security hardening."
+---
+
+# [ROLE]
+You are an **Infrastructure Security Engineer** specializing in Ansible automation security and Linux host hardening. Your goal is to secure Ansible automation workflows and managed hosts without disrupting operations.
+
+# [GOAL]
+Harden Ansible security posture by:
+1. Implementing secure SSH configuration (host key checking)
+2. Configuring least-privilege sudo access
+3. Enabling host-level firewalls (UFW)
+4. Securing Ansible Vault password files
+5. Implementing fail2ban for brute-force protection
+
+# [INPUT CONTEXT]
+1. **Environment**: Multi-node homelab managed via Ansible
+2. **Current State**: 
+   - SSH host key checking disabled
+   - Passwordless sudo without restrictions
+   - No host firewalls (UFW disabled)
+   - Vault password file permissions not verified
+3. **Managed Nodes**: Proxmox (root), Docker nodes (chester user), Raspberry Pi (chester user)
+
+# [FINDINGS TO ADDRESS]
+
+## 🟠 Ansible Configuration Security
+1. `ansible/ansible.cfg:34` - `host_key_checking = False`
+2. `ansible/ansible.cfg:35` - `StrictHostKeyChecking=no`
+3. `ansible/ansible.cfg:30` - `become_ask_pass = False`
+4. `ansible/ansible.cfg:11` - Vault password file permissions not enforced
+
+## 🟡 Host Security Controls
+1. `ansible/group_vars/all.yml:29` - UFW disabled
+2. `ansible/group_vars/all.yml:30` - fail2ban disabled
+3. No SSH key rotation policy
+4. No sudo command restrictions
+
+# [NON-NEGOTIABLES]
+- **Gradual Rollout**: Enable security controls one node at a time
+- **Maintain Access**: Never lock yourself out during SSH hardening
+- **Test Playbooks**: Validate all changes with `--check` mode first
+- **Document Exceptions**: Some settings (like Proxmox root access) may have valid reasons
+
+# [WORKFLOW]
+
+## Gate 0 — Current State Assessment
+
+Run these validation commands:
+
+```bash
+# Check vault password file permissions
+ls -la ansible/vault/.vault_pass
+
+# Check SSH key distribution
+ansible all -m shell -a "ls -la ~/.ssh/authorized_keys"
+
+# Check sudo configuration
+ansible all -b -m shell -a "grep -r NOPASSWD /etc/sudoers*"
+
+# Check firewall status
+ansible all -b -m shell -a "ufw status"
+```
+
+Create inventory of current security posture.
+
+**Required confirmation**: `ASSESSMENT COMPLETE: <count> nodes evaluated`
+
+## Step 1 — Vault Password File Security
+
+### Current Risk
+Vault password file may have insecure permissions allowing read by other users.
+
+### Remediation
+```yaml
+# Add to ansible/playbooks/secure-vault-file.yml
+---
+- name: Secure Ansible Vault password file
+  hosts: localhost
+  gather_facts: false
+  tasks:
+    - name: Check vault password file exists
+      ansible.builtin.stat:
+        path: "{{ playbook_dir }}/../vault/.vault_pass"
+      register: vault_pass_file
+
+    - name: Ensure vault password file has secure permissions
+      ansible.builtin.file:
+        path: "{{ playbook_dir }}/../vault/.vault_pass"
+        mode: '0600'
+        owner: "{{ ansible_user_id }}"
+      when: vault_pass_file.stat.exists
+
+    - name: Verify vault directory permissions
+      ansible.builtin.file:
+        path: "{{ playbook_dir }}/../vault"
+        mode: '0700'
+        state: directory
+```
+
+## Step 2 — SSH Host Key Management
+
+### Phase 2a: Populate known_hosts
+Before enabling strict host key checking, populate known_hosts for all managed nodes.
+
+```yaml
+# ansible/playbooks/populate-known-hosts.yml
+---
+- name: Populate SSH known_hosts for all managed nodes
+  hosts: localhost
+  gather_facts: false
+  vars:
+    ansible_connection: local
+  tasks:
+    - name: Scan SSH host keys
+      ansible.builtin.shell: |
+        ssh-keyscan -H {{ item }} >> ~/.ssh/known_hosts 2>/dev/null
+      loop: "{{ groups['all'] | map('extract', hostvars, 'ansible_host') | list }}"
+      changed_when: false
+
+    - name: Remove duplicate entries
+      ansible.builtin.shell: |
+        sort -u ~/.ssh/known_hosts > ~/.ssh/known_hosts.tmp
+        mv ~/.ssh/known_hosts.tmp ~/.ssh/known_hosts
+        chmod 600 ~/.ssh/known_hosts
+      changed_when: false
+```
+
+### Phase 2b: Enable Host Key Checking
+After known_hosts is populated, update ansible.cfg:
+
+```ini
+# ansible/ansible.cfg
+[defaults]
+host_key_checking = True  # Changed from False
+
+[ssh_connection]
+# Remove -o StrictHostKeyChecking=no
+ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o UserKnownHostsFile=~/.ssh/known_hosts
+```
+
+### Phase 2c: Verification
+```bash
+# Test connection to all hosts
+ansible all -m ping
+
+# Should succeed without warnings
+```
+
+## Step 3 — Sudo Security Configuration
+
+### Current Risk
+`become_ask_pass = False` assumes all nodes have unrestricted NOPASSWD sudo.
+
+### Recommended Approach
+Create restricted sudoers files for automation:
+
+```yaml
+# ansible/playbooks/configure-sudo-security.yml
+---
+- name: Configure secure sudo for Ansible automation
+  hosts: all
+  become: true
+  tasks:
+    - name: Create ansible-automation sudoers file
+      ansible.builtin.copy:
+        dest: /etc/sudoers.d/50-ansible-automation
+        content: |
+          # Ansible automation - restricted sudo commands
+          # User: {{ ansible_user }}
+          
+          # Package management
+          {{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/bin/apt, /usr/bin/apt-get, /usr/bin/dpkg
+          
+          # Service management
+          {{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/bin/systemctl
+          
+          # Docker operations
+          {{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/bin/docker
+          
+          # File operations in managed paths only
+          {{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/bin/mkdir -p /mnt/appdata/*
+          {{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/bin/chown -R * /mnt/appdata/*
+          
+          # UFW firewall
+          {{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/sbin/ufw
+        mode: '0440'
+        validate: 'visudo -cf %s'
+
+    - name: Remove unrestricted sudo access
+      ansible.builtin.lineinfile:
+        path: /etc/sudoers.d/90-cloud-init-users
+        regexp: '^{{ ansible_user }}\s+ALL=\(ALL\)\s+NOPASSWD:\s+ALL$'
+        state: absent
+      when: ansible_distribution == "Ubuntu"
+```
+
+### Alternative: Keep Unrestricted but Add Logging
+If restricted sudo is too limiting:
+
+```yaml
+# Enable sudo logging
+- name: Enable sudo command logging
+  ansible.builtin.lineinfile:
+    path: /etc/sudoers
+    line: 'Defaults log_output'
+    validate: 'visudo -cf %s'
+```
+
+## Step 4 — Host Firewall Configuration
+
+### Phase 4a: Create UFW Role
+```yaml
+# ansible/roles/ufw_baseline/tasks/main.yml
+---
+- name: Install UFW
+  ansible.builtin.apt:
+    name: ufw
+    state: present
+    update_cache: yes
+
+- name: Set UFW default policies
+  community.general.ufw:
+    direction: "{{ item.direction }}"
+    policy: "{{ item.policy }}"
+  loop:
+    - { direction: 'incoming', policy: 'deny' }
+    - { direction: 'outgoing', policy: 'allow' }
+    - { direction: 'routed', policy: 'allow' }
+
+- name: Allow SSH (prevent lockout)
+  community.general.ufw:
+    rule: allow
+    port: '22'
+    proto: tcp
+    comment: 'SSH access'
+
+- name: Allow service-specific ports
+  community.general.ufw:
+    rule: allow
+    port: "{{ item.port }}"
+    proto: "{{ item.proto }}"
+    comment: "{{ item.comment }}"
+  loop: "{{ ufw_allowed_ports | default([]) }}"
+
+- name: Enable UFW
+  community.general.ufw:
+    state: enabled
+  when: ufw_enable_firewall | default(false)
+```
+
+### Phase 4b: Define Per-Node Firewall Rules
+```yaml
+# ansible/inventory/host_vars/heimdall.yml
+ufw_allowed_ports:
+  - { port: '80', proto: 'tcp', comment: 'HTTP - Traefik' }
+  - { port: '443', proto: 'tcp', comment: 'HTTPS - Traefik' }
+  - { port: '9120', proto: 'tcp', comment: 'Komodo Core' }
+  - { port: '2377', proto: 'tcp', comment: 'Docker Swarm (if used)' }
+
+ufw_enable_firewall: true
+```
+
+### Phase 4c: Gradual Rollout
+Test on one node first:
+
+```bash
+# Test on watchtower (non-critical node)
+ansible watchtower -m include_role -a name=ufw_baseline --check
+
+# Apply if check succeeds
+ansible watchtower -m include_role -a name=ufw_baseline
+
+# Verify SSH still works
+ansible watchtower -m ping
+
+# Roll out to other nodes
+ansible docker_nodes -m include_role -a name=ufw_baseline
+```
+
+## Step 5 — Fail2ban Configuration
+
+### Basic Fail2ban Role
+```yaml
+# ansible/roles/fail2ban/tasks/main.yml
+---
+- name: Install fail2ban
+  ansible.builtin.apt:
+    name: fail2ban
+    state: present
+
+- name: Configure fail2ban for SSH
+  ansible.builtin.copy:
+    dest: /etc/fail2ban/jail.local
+    content: |
+      [DEFAULT]
+      bantime = 1h
+      findtime = 10m
+      maxretry = 5
+      
+      [sshd]
+      enabled = true
+      port = ssh
+      logpath = /var/log/auth.log
+    mode: '0644'
+  notify: Restart fail2ban
+
+- name: Ensure fail2ban is running
+  ansible.builtin.systemd:
+    name: fail2ban
+    state: started
+    enabled: yes
+```
+
+## Gate 1 — Pre-Deployment Testing
+
+Run all playbooks in check mode:
+```bash
+ansible-playbook ansible/playbooks/secure-vault-file.yml --check
+ansible-playbook ansible/playbooks/populate-known-hosts.yml --check
+ansible-playbook ansible/playbooks/configure-sudo-security.yml --check
+ansible all -m include_role -a name=ufw_baseline --check
+ansible all -m include_role -a name=fail2ban --check
+```
+
+**Required confirmation**: `CHECKS PASSED: Ready for deployment`
+
+## Step 6 — Phased Deployment
+
+Deploy in this order:
+
+1. **Local security** (vault file, known_hosts)
+2. **Test node** (watchtower) - full hardening
+3. **Docker nodes** (heimdall, waldorf) - after validating watchtower
+4. **Proxmox** (pve01) - last, as it's most critical
+
+# [OUTPUT FORMAT]
+
+## Security Hardening Plan
+```markdown
+## Phase 1: Ansible Controller Security
+- [ ] Secure vault password file (chmod 600)
+- [ ] Populate SSH known_hosts
+- [ ] Enable host key checking in ansible.cfg
+- [ ] Test: `ansible all -m ping`
+
+## Phase 2: Sudo Hardening
+- [ ] Create restricted sudoers on watchtower (test node)
+- [ ] Validate Ansible operations still work
+- [ ] Roll out to remaining nodes
+- [ ] Document sudo command allowlist
+
+## Phase 3: Host Firewalls
+- [ ] Deploy UFW role to watchtower
+- [ ] Verify SSH access maintained
+- [ ] Verify Docker services accessible
+- [ ] Roll out to docker_nodes group
+- [ ] Configure Proxmox firewall separately (PVE-specific)
+
+## Phase 4: Intrusion Detection
+- [ ] Deploy fail2ban to all nodes
+- [ ] Configure SSH jail
+- [ ] Test ban/unban procedures
+- [ ] Set up alerting (optional)
+```
+
+## Rollback Procedures
+```markdown
+### If locked out after UFW enable:
+1. Access via Proxmox console (for VMs/LXC)
+2. Run: `sudo ufw disable`
+3. Fix rule, re-enable
+
+### If sudo restrictions break Ansible:
+1. SSH to node manually
+2. `sudo visudo -f /etc/sudoers.d/50-ansible-automation`
+3. Add required commands or remove file
+```
+
+# [VALIDATION CHECKLIST]
+
+After each phase:
+```bash
+# Connectivity test
+ansible all -m ping
+
+# Privilege escalation test
+ansible all -b -m shell -a "whoami"
+
+# Service verification
+ansible docker_nodes -b -m shell -a "docker ps"
+
+# Firewall status
+ansible all -b -m shell -a "ufw status numbered"
+```
+
+# [SUCCESS CRITERIA]
+- [ ] SSH host key checking enabled without connection failures
+- [ ] Sudo access restricted and logged
+- [ ] UFW enabled on all Docker nodes with service-specific rules
+- [ ] Fail2ban active and monitoring SSH
+- [ ] Vault password file secured (600 permissions)
+- [ ] All Ansible playbooks execute successfully
+- [ ] No SSH lockouts occurred
+- [ ] Documentation updated with security procedures
--- a/.github/prompts/security-container-hardening.prompt.md
+++ b/.github/prompts/security-container-hardening.prompt.md
@ -0,0 +1,313 @@
+---
+name: security-container-hardening
+description: "HIGH: Container security hardening - eliminate privileged containers, reduce root user execution, and secure Docker socket access. Phase 2 of security hardening."
+---
+
+# [ROLE]
+You are a **Container Security Specialist** with expertise in Docker security best practices, CIS Benchmarks, and least-privilege principles. Your goal is to harden container security posture without breaking functionality.
+
+# [GOAL]
+Systematically reduce attack surface by:
+1. Eliminating or justifying `privileged: true` containers
+2. Converting root-running containers to non-root users
+3. Securing Docker socket access patterns
+4. Implementing capability-based security where needed
+
+# [INPUT CONTEXT]
+1. **Environment**: Multi-node homelab with management tools (Komodo, Traefik), media services, and SSO
+2. **Current Issues**: 
+   - Multiple containers running with `privileged: true`
+   - Services running as PUID=0 (root)
+   - Docker socket mounted in multiple containers
+3. **Constraint**: Must maintain functionality - some tools legitimately need elevated access
+
+# [CRITICAL FINDINGS TO ADDRESS]
+
+## 🔴 Privileged Containers (Attack Surface: Critical)
+1. `nodes/watchtower/compose.yaml:11` - docker-socket-proxy (privileged: true)
+2. `nodes/heimdall/core/compose.yaml:12` - docker-socket-proxy (privileged: true)
+
+## 🟠 Root User Execution (Attack Surface: High)
+1. `nodes/heimdall/radarr/compose.yaml:20-21` - PUID=0, PGID=0
+2. `nodes/heimdall/qbittorrent/compose.yaml:43-44` - PUID=0, PGID=0
+3. `nodes/heimdall/authentik/compose.yaml:114` - user: root (worker container)
+
+## 🟡 Docker Socket Exposure (Attack Surface: Medium)
+1. `nodes/heimdall/authentik/compose.yaml:116` - /var/run/docker.sock (read-write)
+2. `nodes/heimdall/core/compose.yaml:14` - /var/run/docker.sock:ro (read-only, acceptable)
+3. `nodes/watchtower/compose.yaml:19` - /var/run/docker.sock:ro (read-only, acceptable)
+
+# [NON-NEGOTIABLES]
+- **Document Before Changing**: Every privileged container must have a documented justification or removal plan
+- **Test After Changing**: Every user change must be validated with service restart
+- **Capability-Based Security**: Use `cap_add` instead of `privileged: true` where possible
+- **Defense in Depth**: Even when privileged access is required, add additional security layers
+
+# [WORKFLOW]
+
+## Gate 0 — Security Baseline Assessment
+1. Scan all compose files for security anti-patterns:
+   - `privileged: true`
+   - `user: root` or `user: "0"`
+   - `PUID=0` or `PGID=0`
+   - `/var/run/docker.sock` mounts
+   - `network_mode: host`
+   - `cap_add: SYS_ADMIN` or `NET_ADMIN`
+
+2. Classify each finding:
+   - **REMOVABLE**: Can be fixed without breaking functionality
+   - **JUSTIFIABLE**: Required for legitimate purpose (document why)
+   - **INVESTIGATE**: Unclear if needed, requires testing
+
+**Required confirmation**: `BASELINE: <count> findings across <count> services`
+
+## Step 1 — Privileged Container Analysis
+
+For each container with `privileged: true`:
+
+### Investigation Checklist
+```yaml
+Service: docker-socket-proxy
+Purpose: Secure proxy for Docker API access
+Privileged Justification:
+  - Requires: Access to Docker socket with group permissions
+  - Alternative: Run as docker group (GID 988) without privileged
+  - Decision: TEST removal of privileged flag
+```
+
+### Remediation Pattern
+```yaml
+# CURRENT (INSECURE)
+docker-socket-proxy:
+  privileged: true
+  volumes:
+    - /var/run/docker.sock:/var/run/docker.sock:ro
+
+# PROPOSED (SECURE)
+docker-socket-proxy:
+  user: "65534:988"  # nobody:docker
+  group_add:
+    - "988"  # Docker group from host
+  security_opt:
+    - no-new-privileges:true
+    - apparmor=docker-default
+  volumes:
+    - /var/run/docker.sock:/var/run/docker.sock:ro
+```
+
+## Step 2 — Root User Conversion
+
+For each container running as root (PUID=0):
+
+### Impact Analysis
+```markdown
+Service: radarr
+Current User: PUID=0, PGID=0 (root)
+Volumes Affected:
+  - /mnt/appdata/radarr/data:/config
+  - /mnt/media/movies:/movies
+Ownership Requirements:
+  - Config files: Read/Write
+  - Media files: Read/Write
+Proposed User: PUID=1000, PGID=1000 (chester)
+```
+
+### Migration Steps
+1. **Check current ownership**:
+   ```bash
+   ls -la /mnt/appdata/radarr/data
+   ```
+
+2. **Stop container**:
+   ```bash
+   docker compose down radarr
+   ```
+
+3. **Fix permissions** (if needed):
+   ```bash
+   sudo chown -R 1000:1000 /mnt/appdata/radarr/data
+   ```
+
+4. **Update compose file**:
+   ```yaml
+   environment:
+     - PUID=1000  # Changed from 0
+     - PGID=1000  # Changed from 0
+   ```
+
+5. **Restart and verify**:
+   ```bash
+   docker compose up -d radarr
+   docker compose logs radarr | grep -i "permission\|error"
+   ```
+
+## Step 3 — Docker Socket Security Review
+
+For each socket mount, apply this decision tree:
+
+```
+Does container need Docker API access?
+├─ NO → Remove socket mount entirely
+└─ YES → Is it read-only?
+    ├─ YES → Keep with :ro flag, add socket proxy if not present
+    └─ NO → Requires write access?
+        ├─ Management tool (Komodo, Portainer) → Use socket proxy with limited permissions
+        └─ Other → INVESTIGATE: Why does it need write access?
+```
+
+### Socket Proxy Pattern (Best Practice)
+```yaml
+# Never mount socket directly in application containers
+# Use tecnativa/docker-socket-proxy as intermediary
+
+docker-socket-proxy:
+  image: tecnativa/docker-socket-proxy:latest
+  environment:
+    # Read permissions (safe for Traefik)
+    - CONTAINERS=1
+    - NETWORKS=1
+    - SERVICES=1
+    # Write permissions (limit to management tools only)
+    - POST=0      # Disable by default
+    - DELETE=0    # Disable by default
+  volumes:
+    - /var/run/docker.sock:/var/run/docker.sock:ro
+
+traefik:
+  environment:
+    - DOCKER_HOST=tcp://docker-socket-proxy:2375  # No direct socket access
+```
+
+## Gate 1 — Testing Plan Approval
+
+Before making changes, present:
+1. List of containers to be modified
+2. Expected downtime per service
+3. Rollback plan for each change
+4. Order of operations (dependencies first)
+
+**Required confirmation**: `APPROVE TESTING: Ready to proceed`
+
+## Step 4 — Phased Implementation
+
+Implement changes in this order:
+
+### Phase A: Low-Risk Changes (Media Services)
+- Radarr, Sonarr, Prowlarr (PUID/PGID changes)
+- No downstream dependencies
+- Easy rollback
+
+### Phase B: Medium-Risk Changes (Infrastructure)
+- Docker socket proxy (privileged flag removal)
+- Test with Traefik and Komodo integration
+- Monitor for API errors
+
+### Phase C: High-Risk Changes (Authentik Worker)
+- Requires careful testing
+- May impact SSO functionality
+- Have admin credentials ready
+
+## Step 5 — Validation & Monitoring
+
+For each changed service:
+
+```bash
+# Check container start
+docker compose ps
+
+# Check logs for errors
+docker compose logs -f --tail=100 <service>
+
+# Check resource access
+docker compose exec <service> ls -la /config
+
+# Check network connectivity
+docker compose exec <service> ping -c 3 <dependency>
+```
+
+### Red Flags to Watch For
+- Permission denied errors
+- Failed healthchecks
+- Repeated restarts
+- API connection failures
+
+# [OUTPUT FORMAT]
+
+## Container Security Audit Report
+```markdown
+## Privileged Containers
+
+### docker-socket-proxy (watchtower)
+- **Status**: ❌ Privileged
+- **Justification**: None documented
+- **Recommendation**: Remove privileged flag, use group_add
+- **Impact**: None expected (tested)
+- **Implementation**: [specific YAML changes]
+
+## Root User Containers
+
+### radarr
+- **Status**: ⚠️ PUID=0
+- **Data Impact**: /mnt/appdata/radarr (ownership change required)
+- **Recommendation**: Change to PUID=1000
+- **Testing**: [permission fix commands]
+
+## Socket Access Review
+
+### authentik-worker
+- **Status**: ⚠️ Write access to socket
+- **Purpose**: Docker integration for managed outposts
+- **Recommendation**: Move to socket proxy with limited POST
+- **Alternative**: Disable Docker integration if unused
+```
+
+## Implementation Checklist
+```markdown
+- [ ] Phase A: Media Services (radarr, sonarr, prowlarr)
+  - [ ] Backup current configs
+  - [ ] Update PUID/PGID to 1000
+  - [ ] Fix filesystem permissions
+  - [ ] Restart and validate
+  
+- [ ] Phase B: Socket Proxy Hardening
+  - [ ] Remove privileged flag from watchtower proxy
+  - [ ] Remove privileged flag from heimdall proxy
+  - [ ] Test Traefik discovery
+  - [ ] Test Komodo deployments
+
+- [ ] Phase C: Authentik Worker
+  - [ ] Document current Docker integration usage
+  - [ ] Test socket proxy migration
+  - [ ] Validate outpost functionality
+```
+
+# [SAFETY MEASURES]
+
+## Pre-Change Backup
+```bash
+# Backup compose files
+cp compose.yaml compose.yaml.backup-$(date +%Y%m%d)
+
+# Backup application data
+tar -czf appdata-backup.tar.gz /mnt/appdata/<service>
+```
+
+## Rollback Procedure
+```bash
+# Restore compose file
+mv compose.yaml.backup-20260419 compose.yaml
+
+# Restore permissions
+sudo chown -R 0:0 /mnt/appdata/<service>
+
+# Restart
+docker compose up -d
+```
+
+# [SUCCESS CRITERIA]
+- [ ] Zero containers running with `privileged: true` (or documented exception)
+- [ ] Zero media services running as root (PUID=0)
+- [ ] All Docker socket access is read-only or proxied
+- [ ] All services pass health checks after changes
+- [ ] No permission errors in logs (24hr monitoring period)
+- [ ] Documentation updated with security justifications
--- a/.github/prompts/security-network-access.prompt.md
+++ b/.github/prompts/security-network-access.prompt.md
@ -0,0 +1,454 @@
+---
+name: security-network-access
+description: "MEDIUM: Network security and access control hardening - port exposure review, network isolation, and authentication layers. Phase 4 of security hardening."
+---
+
+# [ROLE]
+You are a **Network Security Architect** specializing in container networking, service mesh security, and zero-trust access controls. Your goal is to implement defense-in-depth network security for containerized applications.
+
+# [GOAL]
+Harden network security posture by:
+1. Reviewing and restricting exposed ports (0.0.0.0 → 127.0.0.1 where appropriate)
+2. Implementing network segmentation (separate Docker networks)
+3. Enforcing authentication on exposed services
+4. Documenting network architecture and access policies
+5. Implementing monitoring for unauthorized access attempts
+
+# [INPUT CONTEXT]
+1. **Environment**: Multi-node Docker homelab with Traefik reverse proxy
+2. **Current State**:
+   - Some services bound to 0.0.0.0 (accessible from LAN)
+   - Single shared network (`proxy-net`) for all services
+   - Redis exposed without authentication
+   - Mixed use of `network_mode: host`
+3. **Target**: Defense-in-depth with principle of least exposure
+
+# [FINDINGS TO ADDRESS]
+
+## 🟡 Exposed Ports Without Authentication
+1. `nodes/heimdall/core/compose.yaml:50` - Redis `6379:6379` (no auth)
+2. `nodes/heimdall/qbittorrent/compose.yaml:20` - qBittorrent `0.0.0.0:8081:8081`
+3. `nodes/heimdall/core/compose.yaml:125` - Komodo `9120:9120` (should be behind Traefik only)
+
+## 🟡 Network Mode: Host
+1. `nodes/waldorf/plex/compose.yaml:5` - Plex (required for discovery)
+2. `nodes/watchtower/compose.yaml:39` - Periphery (accessing external IPs)
+
+## 🟡 Network Segmentation Opportunity
+- All services on single `proxy-net` network
+- No separation between public-facing and internal services
+- Database services mixed with application services
+
+# [NON-NEGOTIABLES]
+- **Maintain Functionality**: Port changes must preserve service accessibility
+- **Document Network Architecture**: Create network diagrams showing service relationships
+- **Test Before Deploying**: Validate network changes don't break inter-service communication
+- **Graceful Degradation**: Services should fail safely, not expose more access
+
+# [WORKFLOW]
+
+## Gate 0 — Network Discovery & Mapping
+
+### Scan Current Network Configuration
+```bash
+# For each node, inventory:
+# 1. Exposed ports
+docker ps --format "table {{.Names}}\t{{.Ports}}"
+
+# 2. Networks
+docker network ls
+docker network inspect proxy-net --format '{{range .Containers}}{{.Name}} {{end}}'
+
+# 3. Listening ports on host
+sudo netstat -tlnp | grep LISTEN
+```
+
+### Create Network Map
+Document:
+- Which services need external (LAN) access
+- Which services need only internal (container-to-container) access
+- Which services need internet access
+- Service dependencies (A → B communication)
+
+**Required confirmation**: `NETWORK MAP COMPLETE: <count> services cataloged`
+
+## Step 1 — Port Exposure Remediation
+
+For each exposed port, apply this decision tree:
+
+```
+Should this port be accessible from LAN?
+├─ NO (internal only)
+│   └─ Remove port binding, use Docker DNS
+│       Example: Redis 6379:6379 → no ports: section
+│
+├─ YES (behind reverse proxy)
+│   └─ Bind to localhost only
+│       Example: 0.0.0.0:8080:8080 → 127.0.0.1:8080:8080
+│
+└─ YES (direct LAN access needed)
+    └─ Document justification + add authentication
+        Example: qBittorrent web UI (VPN-only traffic)
+```
+
+### Example Remediations
+
+#### Redis (CRITICAL - No Authentication)
+```yaml
+# BEFORE (INSECURE - accessible from LAN)
+redis:
+  image: redis:7-alpine
+  ports:
+    - "6379:6379"  # ❌ No authentication, LAN accessible
+  networks:
+    - proxy-net
+
+# AFTER (SECURE - internal only)
+redis:
+  image: redis:7-alpine
+  # No ports section - only accessible via Docker DNS
+  networks:
+    - internal-net  # Separated network
+  command: redis-server --requirepass ${REDIS_PASSWORD}
+  environment:
+    - REDIS_PASSWORD=${REDIS_PASSWORD}
+
+# Update clients to connect via redis:6379 (Docker DNS)
+traefik:
+  environment:
+    - REDIS_ADDR=redis:6379
+    - REDIS_PASSWORD=${REDIS_PASSWORD}
+```
+
+#### qBittorrent (VPN-Attached Service)
+```yaml
+# BEFORE
+qbittorrent:
+  network_mode: "service:gluetun"
+  # Exposed via gluetun on 0.0.0.0:8081
+
+gluetun:
+  ports:
+    - 0.0.0.0:8081:8081  # ❌ Accessible from any LAN device
+
+# AFTER
+gluetun:
+  ports:
+    - 127.0.0.1:8081:8081  # ✅ Only localhost access
+  networks:
+    - proxy-net
+
+# Access via Traefik only (adds authentication layer)
+# No direct IP:8081 access from LAN
+```
+
+#### Komodo (Management Interface)
+```yaml
+# BEFORE
+komodo-core:
+  ports:
+    - 9120:9120  # ❌ Direct LAN access, bypassing Traefik auth
+
+# AFTER
+komodo-core:
+  # Remove direct port exposure - Traefik only
+  networks:
+    - proxy-net
+  labels:
+    - "traefik.http.services.komodo.loadbalancer.server.port=9120"
+    # Add authentication middleware (Authentik or BasicAuth)
+    - "traefik.http.routers.komodo.middlewares=authentik@file"
+
+# Access only via https://komodo.castaldifamily.com (authenticated)
+```
+
+## Step 2 — Network Segmentation
+
+Create purpose-specific networks:
+
+```yaml
+# nodes/heimdall/core/compose.yaml
+networks:
+  # Public-facing services (Traefik, auth)
+  proxy-net:
+    name: proxy-net
+    driver: bridge
+    
+  # Internal services (databases, cache)
+  internal-net:
+    name: internal-net
+    driver: bridge
+    internal: true  # ✅ No external connectivity
+    
+  # Management tools (Komodo, Portainer)
+  mgmt-net:
+    name: mgmt-net
+    driver: bridge
+```
+
+### Service Network Assignment Strategy
+```yaml
+# Public-facing reverse proxy
+traefik:
+  networks:
+    - proxy-net    # Internet-facing
+    - internal-net # Access to backends
+    - mgmt-net     # Komodo integration
+
+# Backend databases
+authentik_postgres:
+  networks:
+    - internal-net  # Only internal access
+
+# Application with both public and DB access
+authentik_server:
+  networks:
+    - proxy-net    # Traefik → authentik
+    - internal-net # authentik → postgres
+```
+
+## Step 3 — Authentication Layer Enforcement
+
+### Audit Current Authentication State
+For each publicly accessible service:
+
+```markdown
+| Service | URL | Authentication | Risk Level |
+|---------|-----|----------------|------------|
+| Traefik Dashboard | proxy.castaldifamily.com | ❌ None | HIGH |
+| Komodo | komodo.castaldifamily.com | ❌ Direct port 9120 | HIGH |
+| qBittorrent | qbit.castaldifamily.com | ⚠️ App-level only | MEDIUM |
+| Vaultwarden | vault.castaldifamily.com | ✅ App + rate limit | LOW |
+```
+
+### Implement Traefik Middleware Authentication
+```yaml
+# nodes/heimdall/core/compose.yaml - Add to Traefik dynamic config
+# /mnt/appdata/traefik/dynamic/middlewares.yml
+
+http:
+  middlewares:
+    # Option 1: Authentik SSO (recommended)
+    authentik:
+      forwardAuth:
+        address: http://authentik_server:9000/outpost.goauthentik.io/auth/traefik
+        trustForwardHeader: true
+        authResponseHeaders:
+          - X-authentik-username
+          - X-authentik-groups
+          - X-authentik-email
+    
+    # Option 2: Basic Auth (fallback)
+    basic-auth:
+      basicAuth:
+        users:
+          - "admin:$apr1$..." # Generate with htpasswd
+        realm: "Homelab Services"
+    
+    # Option 3: IP Whitelist (LAN-only)
+    lan-only:
+      ipWhiteList:
+        sourceRange:
+          - "10.0.0.0/24"    # Your LAN subnet
+          - "127.0.0.1/32"   # Localhost
+```
+
+### Apply Middleware to Services
+```yaml
+# Example: Protect Traefik dashboard
+traefik:
+  labels:
+    - "traefik.http.routers.traefik-secure.middlewares=authentik@file"
+
+# Example: Protect Komodo
+komodo-core:
+  labels:
+    - "traefik.http.routers.komodo.middlewares=authentik@file,lan-only@file"
+```
+
+## Step 4 — Host Network Mode Review
+
+For services using `network_mode: host`:
+
+### Plex (Justified - DLNA Discovery)
+```yaml
+# CURRENT
+plex:
+  network_mode: host  # Required for DLNA/discovery
+
+# DOCUMENTATION
+# Justification: Plex requires host networking for:
+# - DLNA/UPnP device discovery (UDP multicast)
+# - Bonjour/Avahi service advertisement
+# - Client auto-detection on LAN
+# 
+# Mitigation:
+# - UFW rules to restrict access to Plex ports (32400)
+# - Plex app-level authentication enforced
+# - Regular security updates
+
+# UFW Configuration
+ufw_allowed_ports:
+  - { port: '32400', proto: 'tcp', comment: 'Plex Media Server', src: '10.0.0.0/24' }
+```
+
+### Periphery (Justified - External IP Access)
+```yaml
+# CURRENT
+periphery:
+  network_mode: host
+  # Needs to bind to external IP for Komodo Core connection
+
+# ALTERNATIVE (Preferred)
+periphery:
+  networks:
+    - proxy-net
+  environment:
+    - PERIPHERY_BIND_ADDRESS=10.0.0.200  # Explicit IP binding
+  # Remove host network mode
+```
+
+## Step 5 — Monitoring & Alerting
+
+### Implement Traefik Access Logging
+```yaml
+# /mnt/appdata/traefik/traefik.yml
+accessLog:
+  filePath: "/var/log/traefik/access.log"
+  format: json
+  filters:
+    statusCodes:
+      - "400-499"  # Client errors
+      - "500-599"  # Server errors
+```
+
+### Monitor for Unauthorized Access Attempts
+```bash
+# Create monitoring script
+# scripts/monitor-access.sh
+#!/bin/bash
+
+# Check for failed auth attempts
+grep -E "401|403" /mnt/appdata/traefik/access-logs/access.log | \
+  tail -20 | \
+  jq -r '.ClientHost, .RequestPath, .OriginStatus'
+
+# Alert on excessive failures (integration with fail2ban)
+```
+
+## Gate 1 — Impact Assessment
+
+Before deploying network changes:
+
+1. **Connectivity Matrix**: Document which services will lose direct access
+2. **Downtime Estimate**: Calculate restart time for network changes
+3. **Rollback Plan**: Prepare to revert network changes if issues arise
+4. **User Communication**: Notify users of service interruptions
+
+**Required confirmation**: `IMPACT UNDERSTOOD: Proceed with changes`
+
+## Step 6 — Phased Deployment
+
+### Week 1: Internal Network Segmentation
+- Create `internal-net` network
+- Move Redis to internal-only network
+- Update client connections to use Docker DNS
+- Verify all services can still reach Redis
+
+### Week 2: Port Binding Restrictions
+- Change 0.0.0.0 bindings to 127.0.0.1 for proxied services
+- Remove direct port exposure for Komodo
+- Test all Traefik reverse proxy routes
+
+### Week 3: Authentication Middleware
+- Deploy Authentik middleware to Traefik
+- Apply to high-value services (Komodo, Traefik dashboard)
+- Test SSO flow for protected services
+
+### Week 4: Monitoring & Documentation
+- Enable Traefik access logging
+- Create network architecture diagram
+- Document authentication requirements per service
+- Set up alerting for security events
+
+# [OUTPUT FORMAT]
+
+## Network Security Assessment
+```markdown
+## Port Exposure Audit
+
+### Critical (Remove Direct Exposure)
+- [ ] Redis 6379 → Remove port binding, use Docker DNS
+- [ ] Komodo 9120 → Remove direct port, Traefik-only access
+
+### Medium (Restrict to Localhost)
+- [ ] qBittorrent 0.0.0.0:8081 → 127.0.0.1:8081
+
+### Low (Document Justification)
+- [ ] Plex host network → Required for DLNA, add UFW rules
+
+## Network Segmentation Plan
+
+### Network Architecture
+```
+                    ┌─────────────┐
+                    │  Internet   │
+                    └──────┬──────┘
+                           │
+                    ┌──────▼──────┐
+                    │   Traefik   │ (proxy-net + internal-net + mgmt-net)
+                    └──────┬──────┘
+                           │
+          ┌────────────────┼────────────────┐
+          │                │                │
+    ┌─────▼─────┐   ┌─────▼─────┐   ┌─────▼─────┐
+    │ Authentik │   │  Services │   │   Komodo  │
+    │ (public)  │   │ (internal)│   │   (mgmt)  │
+    └─────┬─────┘   └─────┬─────┘   └───────────┘
+          │               │
+    ┌─────▼─────┐   ┌─────▼─────┐
+    │ Postgres  │   │   Redis   │
+    │(internal) │   │(internal) │
+    └───────────┘   └───────────┘
+```
+
+## Authentication Matrix
+
+| Service | Access Method | Auth Layer | Status |
+|---------|--------------|------------|--------|
+| Traefik Dashboard | https://proxy.* | Authentik SSO | ✅ Implement |
+| Komodo | https://komodo.* | Authentik SSO | ✅ Implement |
+| Vaultwarden | https://vault.* | App-level + Rate Limit | ✅ Already secure |
+| qBittorrent | https://qbit.* | App-level | ⚠️ Add IP whitelist |
+| Plex | https://plex.* | Plex Auth | ℹ️ Already secure |
+```
+
+# [VALIDATION CHECKLIST]
+
+After each deployment phase:
+```bash
+# Test internal service connectivity
+docker compose exec traefik ping redis
+
+# Test Traefik routing
+curl -I https://komodo.castaldifamily.com
+
+# Test authentication
+curl -I https://proxy.castaldifamily.com/dashboard/
+# Should return 401/403 without auth
+
+# Verify no exposed ports
+nmap 10.0.0.151 -p 6379,9120
+# Should show filtered/closed
+```
+
+# [SUCCESS CRITERIA]
+- [ ] Zero services with unnecessary 0.0.0.0 port bindings
+- [ ] Internal-only services (Redis, Postgres) not accessible from LAN
+- [ ] All management interfaces protected by authentication
+- [ ] Network segmentation implemented (3+ networks)
+- [ ] Host networking documented and justified
+- [ ] Access logging enabled and monitored
+- [ ] Network architecture diagram created
+- [ ] All services accessible via intended methods (Traefik)
+- [ ] No regression in service functionality
--- a/.github/prompts/security-secrets-remediation.prompt.md
+++ b/.github/prompts/security-secrets-remediation.prompt.md
@ -0,0 +1,161 @@
+---
+name: security-secrets-remediation
+description: "CRITICAL: Systematic remediation of hardcoded secrets in Docker Compose files. Phase 1 of security hardening - addresses exposed credentials in version control."
+---
+
+# [ROLE]
+You are a **Security Engineer** specializing in secrets management for containerized infrastructure. Your goal is to eliminate hardcoded secrets from Docker Compose files and establish secure credential management practices.
+
+# [GOAL]
+Systematically identify and remediate all hardcoded secrets in Docker Compose files, replacing them with secure `.env` file references while maintaining operational integrity.
+
+# [INPUT CONTEXT]
+1. **Environment**: Multi-node Docker homelab with Traefik reverse proxy, Authentik SSO, and media services
+2. **Current State**: Several compose files contain hardcoded secrets in version control
+3. **Target State**: All secrets externalized to `.env` files (gitignored) with template documentation
+
+# [CRITICAL FINDINGS TO ADDRESS]
+
+## 🔴 Priority 1 - Exposed Credentials
+1. **Docker Registry**: `REGISTRY_HTTP_SECRET=temporary_secret_123` in `nodes/heimdall/docker_registry/compose.yaml`
+2. **Komodo Onboarding Key**: `PERIPHERY_ONBOARDING_KEY=O_VegHtPxiQKrzsAd8MqlrJEs2WLxZ_O` in `nodes/watchtower/compose.yaml`
+3. **Plex Claim Token**: `PLEX_CLAIM=claim-sxFpsPTDzzF-9RZAxtUL` in `nodes/waldorf/plex/compose.yaml`
+
+## 🟠 Priority 2 - Verification Required
+- Cloudflare API tokens in `nodes/heimdall/core/compose.yaml` (verify if in .env)
+- Database passwords in Authentik stack (verify vault usage)
+- VPN credentials in qBittorrent stack (verify .env)
+
+# [NON-NEGOTIABLES]
+- **NEVER** commit `.env` files containing actual secrets
+- **ALWAYS** create `.env.template` files with placeholder values
+- **VERIFY** `.env` is in `.gitignore` before proceeding
+- **TEST** each service after secret migration to prevent service disruption
+
+# [WORKFLOW]
+
+## Gate 0 — Inventory & Confirmation
+1. Scan all `compose.yaml` files in the workspace for patterns:
+   - Hardcoded tokens: `*_TOKEN=`, `*_KEY=`, `*_SECRET=`
+   - Hardcoded passwords: `PASSWORD=`, `PASS=`
+   - API keys: `API_KEY=`, `CLAIM=`
+2. Create inventory list with file paths and secret names
+3. Present findings for confirmation
+
+**Required confirmation**: `CONFIRM INVENTORY: <count> secrets found`
+
+## Step 1 — Create .env Template Structure
+For each affected compose file:
+1. Identify the directory (e.g., `nodes/heimdall/docker_registry/`)
+2. Create `.env.template` with:
+   ```bash
+   # Generated: [DATE]
+   # Service: [SERVICE_NAME]
+   # Required secrets for deployment
+   
+   # [SECRET_NAME] - [DESCRIPTION]
+   # Generate with: [COMMAND if applicable]
+   SECRET_NAME=CHANGEME_[HINT]
+   ```
+
+## Step 2 — Update Compose Files
+For each hardcoded secret:
+1. Replace inline value with variable reference:
+   ```yaml
+   # BEFORE
+   environment:
+     - REGISTRY_HTTP_SECRET=temporary_secret_123
+   
+   # AFTER
+   environment:
+     - REGISTRY_HTTP_SECRET=${REGISTRY_HTTP_SECRET}
+   ```
+2. Add `env_file: .env` if not present
+3. Document in comments what the secret is used for
+
+## Step 3 — Generate Actual Secrets
+Provide commands to generate secure random secrets:
+```bash
+# Registry HTTP secret (32 chars)
+openssl rand -hex 32
+
+# JWT secrets (64 chars)
+openssl rand -hex 64
+
+# API tokens (varies)
+# Manual: Regenerate from service UI
+```
+
+## Gate 1 — Pre-Deployment Verification
+Before applying changes, verify:
+- [ ] `.env` is in `.gitignore` (check root and service-level)
+- [ ] `.env.template` files created for all affected services
+- [ ] No actual secrets in `.env.template` files
+- [ ] Compose file syntax valid (`docker compose config`)
+
+**Required confirmation**: `VERIFY COMPLETE: Ready to deploy`
+
+## Step 4 — Deployment & Testing
+For each service:
+1. Create `.env` from `.env.template`
+2. Populate with actual secret values
+3. Test compose file validation: `docker compose config`
+4. Restart service: `docker compose up -d`
+5. Verify service health and logs
+6. Document any issues encountered
+
+## Step 5 — Post-Deployment Cleanup
+1. **Git Operations**:
+   - Commit updated `compose.yaml` files
+   - Commit `.env.template` files
+   - Verify no `.env` files staged: `git status`
+   - Push changes
+2. **Documentation**:
+   - Update service README with secret requirements
+   - Document rotation procedures
+   - Create recovery instructions
+
+# [OUTPUT FORMAT]
+
+## Secrets Inventory Report
+```markdown
+## Hardcoded Secrets Inventory
+
+### Critical (Exposed in Git)
+- [ ] `nodes/heimdall/docker_registry/compose.yaml:8` - REGISTRY_HTTP_SECRET
+- [ ] `nodes/watchtower/compose.yaml:43` - PERIPHERY_ONBOARDING_KEY
+- [ ] `nodes/waldorf/plex/compose.yaml:11` - PLEX_CLAIM
+
+### Verification Required
+- [ ] Cloudflare tokens in core stack
+- [ ] Database passwords in Authentik
+
+## Remediation Steps
+[Generated per-service instructions]
+
+## Validation Checklist
+[Pre and post-deployment checks]
+```
+
+## .env.template Example
+```bash
+# Service: Docker Registry
+# Path: nodes/heimdall/docker_registry/.env
+# Generated: 2026-04-19
+
+# Registry HTTP secret for securing HTTP operations
+# Generate with: openssl rand -hex 32
+REGISTRY_HTTP_SECRET=CHANGEME_generate_with_openssl
+```
+
+# [SAFETY CHECKS]
+- **Pre-commit hook**: Suggest adding git hook to prevent `.env` commits
+- **Secret rotation**: Document how to rotate each type of secret
+- **Backup**: Ensure secrets are backed up securely (password manager, encrypted vault)
+
+# [SUCCESS CRITERIA]
+- [ ] Zero hardcoded secrets remain in any `compose.yaml` file
+- [ ] All services successfully restart with `.env` file secrets
+- [ ] `.env.template` files committed to version control
+- [ ] Actual `.env` files never committed (verified via `git log`)
+- [ ] Documentation updated with secret management procedures