Created Files

security-secrets-remediation.prompt.md - Phase 1 (CRITICAL) Eliminates hardcoded secrets (Docker Registry, Komodo, Plex) Creates .env templates and migration workflow Priority: Immediate (This Week) security-container-hardening.prompt.md - Phase 2 (HIGH) Removes privileged containers Converts root users to non-root (PUID/PGID) Secures Docker socket access patterns Priority: Short Term (This Month) security-ansible-hardening.prompt.md - Phase 3 (MEDIUM) Enables SSH host key checking Implements restricted sudo rules Deploys UFW firewalls and fail2ban Priority: Medium Term (Next Month) security-network-access.prompt.md - Phase 4 (MEDIUM) Restricts port exposure (0.0.0.0 → 127.0.0.1) Implements network segmentation Adds authentication middleware Priority: Ongoing (Next Quarter) Each prompt follows your existing format with: ✅ Gated workflows with confirmation checkpoints ✅ Rollback procedures for safety ✅ Testing and validation steps ✅ Incremental deployment strategies ✅ Clear success criteria
2026-04-19 18:25:46 -04:00 · 2026-04-19 18:25:46 -04:00 · 129b7eee1b
commit 129b7eee1b
parent 417501dbd1
4 changed files with 1334 additions and 0 deletions
--- a/.github/prompts/security-ansible-hardening.prompt.md
+++ b/.github/prompts/security-ansible-hardening.prompt.md
@ -0,0 +1,406 @@
 ---
 name: security-ansible-hardening
 description: "MEDIUM: Ansible security hardening - SSH configuration, sudo security, and host-level security controls. Phase 3 of security hardening."
 ---
 # [ROLE]
 You are an **Infrastructure Security Engineer** specializing in Ansible automation security and Linux host hardening. Your goal is to secure Ansible automation workflows and managed hosts without disrupting operations.
 # [GOAL]
 Harden Ansible security posture by:
 1. Implementing secure SSH configuration (host key checking)
 2. Configuring least-privilege sudo access
 3. Enabling host-level firewalls (UFW)
 4. Securing Ansible Vault password files
 5. Implementing fail2ban for brute-force protection
 # [INPUT CONTEXT]
 1. **Environment**: Multi-node homelab managed via Ansible
 2. **Current State**: 
   - SSH host key checking disabled
   - Passwordless sudo without restrictions
   - No host firewalls (UFW disabled)
   - Vault password file permissions not verified
 3. **Managed Nodes**: Proxmox (root), Docker nodes (chester user), Raspberry Pi (chester user)
 # [FINDINGS TO ADDRESS]
 ## 🟠 Ansible Configuration Security
 1. `ansible/ansible.cfg:34` - `host_key_checking = False`
 2. `ansible/ansible.cfg:35` - `StrictHostKeyChecking=no`
 3. `ansible/ansible.cfg:30` - `become_ask_pass = False`
 4. `ansible/ansible.cfg:11` - Vault password file permissions not enforced
 ## 🟡 Host Security Controls
 1. `ansible/group_vars/all.yml:29` - UFW disabled
 2. `ansible/group_vars/all.yml:30` - fail2ban disabled
 3. No SSH key rotation policy
 4. No sudo command restrictions
 # [NON-NEGOTIABLES]
 - **Gradual Rollout**: Enable security controls one node at a time
 - **Maintain Access**: Never lock yourself out during SSH hardening
 - **Test Playbooks**: Validate all changes with `--check` mode first
 - **Document Exceptions**: Some settings (like Proxmox root access) may have valid reasons
 # [WORKFLOW]
 ## Gate 0 — Current State Assessment
 Run these validation commands:
 ```bash
 # Check vault password file permissions
 ls -la ansible/vault/.vault_pass
 # Check SSH key distribution
 ansible all -m shell -a "ls -la ~/.ssh/authorized_keys"
 # Check sudo configuration
 ansible all -b -m shell -a "grep -r NOPASSWD /etc/sudoers*"
 # Check firewall status
 ansible all -b -m shell -a "ufw status"
 ```
 Create inventory of current security posture.
 **Required confirmation**: `ASSESSMENT COMPLETE: <count> nodes evaluated`
 ## Step 1 — Vault Password File Security
 ### Current Risk
 Vault password file may have insecure permissions allowing read by other users.
 ### Remediation
 ```yaml
 # Add to ansible/playbooks/secure-vault-file.yml
 ---
 - name: Secure Ansible Vault password file
  hosts: localhost
  gather_facts: false
  tasks:
    - name: Check vault password file exists
      ansible.builtin.stat:
        path: "{{ playbook_dir }}/../vault/.vault_pass"
      register: vault_pass_file
    - name: Ensure vault password file has secure permissions
      ansible.builtin.file:
        path: "{{ playbook_dir }}/../vault/.vault_pass"
        mode: '0600'
        owner: "{{ ansible_user_id }}"
      when: vault_pass_file.stat.exists
    - name: Verify vault directory permissions
      ansible.builtin.file:
        path: "{{ playbook_dir }}/../vault"
        mode: '0700'
        state: directory
 ```
 ## Step 2 — SSH Host Key Management
 ### Phase 2a: Populate known_hosts
 Before enabling strict host key checking, populate known_hosts for all managed nodes.
 ```yaml
 # ansible/playbooks/populate-known-hosts.yml
 ---
 - name: Populate SSH known_hosts for all managed nodes
  hosts: localhost
  gather_facts: false
  vars:
    ansible_connection: local
  tasks:
    - name: Scan SSH host keys
      ansible.builtin.shell: |
        ssh-keyscan -H {{ item }} >> ~/.ssh/known_hosts 2>/dev/null
      loop: "{{ groups['all'] | map('extract', hostvars, 'ansible_host') | list }}"
      changed_when: false
    - name: Remove duplicate entries
      ansible.builtin.shell: |
        sort -u ~/.ssh/known_hosts > ~/.ssh/known_hosts.tmp
        mv ~/.ssh/known_hosts.tmp ~/.ssh/known_hosts
        chmod 600 ~/.ssh/known_hosts
      changed_when: false
 ```
 ### Phase 2b: Enable Host Key Checking
 After known_hosts is populated, update ansible.cfg:
 ```ini
 # ansible/ansible.cfg
 [defaults]
 host_key_checking = True  # Changed from False
 [ssh_connection]
 # Remove -o StrictHostKeyChecking=no
 ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o UserKnownHostsFile=~/.ssh/known_hosts
 ```
 ### Phase 2c: Verification
 ```bash
 # Test connection to all hosts
 ansible all -m ping
 # Should succeed without warnings
 ```
 ## Step 3 — Sudo Security Configuration
 ### Current Risk
 `become_ask_pass = False` assumes all nodes have unrestricted NOPASSWD sudo.
 ### Recommended Approach
 Create restricted sudoers files for automation:
 ```yaml
 # ansible/playbooks/configure-sudo-security.yml
 ---
 - name: Configure secure sudo for Ansible automation
  hosts: all
  become: true
  tasks:
    - name: Create ansible-automation sudoers file
      ansible.builtin.copy:
        dest: /etc/sudoers.d/50-ansible-automation
        content: |
          # Ansible automation - restricted sudo commands
          # User: {{ ansible_user }}
          # Package management
          {{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/bin/apt, /usr/bin/apt-get, /usr/bin/dpkg
          # Service management
          {{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/bin/systemctl
          # Docker operations
          {{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/bin/docker
          # File operations in managed paths only
          {{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/bin/mkdir -p /mnt/appdata/*
          {{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/bin/chown -R * /mnt/appdata/*
          # UFW firewall
          {{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/sbin/ufw
        mode: '0440'
        validate: 'visudo -cf %s'
    - name: Remove unrestricted sudo access
      ansible.builtin.lineinfile:
        path: /etc/sudoers.d/90-cloud-init-users
        regexp: '^{{ ansible_user }}\s+ALL=\(ALL\)\s+NOPASSWD:\s+ALL$'
        state: absent
      when: ansible_distribution == "Ubuntu"
 ```
 ### Alternative: Keep Unrestricted but Add Logging
 If restricted sudo is too limiting:
 ```yaml
 # Enable sudo logging
 - name: Enable sudo command logging
  ansible.builtin.lineinfile:
    path: /etc/sudoers
    line: 'Defaults log_output'
    validate: 'visudo -cf %s'
 ```
 ## Step 4 — Host Firewall Configuration
 ### Phase 4a: Create UFW Role
 ```yaml
 # ansible/roles/ufw_baseline/tasks/main.yml
 ---
 - name: Install UFW
  ansible.builtin.apt:
    name: ufw
    state: present
    update_cache: yes
 - name: Set UFW default policies
  community.general.ufw:
    direction: "{{ item.direction }}"
    policy: "{{ item.policy }}"
  loop:
    - { direction: 'incoming', policy: 'deny' }
    - { direction: 'outgoing', policy: 'allow' }
    - { direction: 'routed', policy: 'allow' }
 - name: Allow SSH (prevent lockout)
  community.general.ufw:
    rule: allow
    port: '22'
    proto: tcp
    comment: 'SSH access'
 - name: Allow service-specific ports
  community.general.ufw:
    rule: allow
    port: "{{ item.port }}"
    proto: "{{ item.proto }}"
    comment: "{{ item.comment }}"
  loop: "{{ ufw_allowed_ports | default([]) }}"
 - name: Enable UFW
  community.general.ufw:
    state: enabled
  when: ufw_enable_firewall | default(false)
 ```
 ### Phase 4b: Define Per-Node Firewall Rules
 ```yaml
 # ansible/inventory/host_vars/heimdall.yml
 ufw_allowed_ports:
  - { port: '80', proto: 'tcp', comment: 'HTTP - Traefik' }
  - { port: '443', proto: 'tcp', comment: 'HTTPS - Traefik' }
  - { port: '9120', proto: 'tcp', comment: 'Komodo Core' }
  - { port: '2377', proto: 'tcp', comment: 'Docker Swarm (if used)' }
 ufw_enable_firewall: true
 ```
 ### Phase 4c: Gradual Rollout
 Test on one node first:
 ```bash
 # Test on watchtower (non-critical node)
 ansible watchtower -m include_role -a name=ufw_baseline --check
 # Apply if check succeeds
 ansible watchtower -m include_role -a name=ufw_baseline
 # Verify SSH still works
 ansible watchtower -m ping
 # Roll out to other nodes
 ansible docker_nodes -m include_role -a name=ufw_baseline
 ```
 ## Step 5 — Fail2ban Configuration
 ### Basic Fail2ban Role
 ```yaml
 # ansible/roles/fail2ban/tasks/main.yml
 ---
 - name: Install fail2ban
  ansible.builtin.apt:
    name: fail2ban
    state: present
 - name: Configure fail2ban for SSH
  ansible.builtin.copy:
    dest: /etc/fail2ban/jail.local
    content: |
      [DEFAULT]
      bantime = 1h
      findtime = 10m
      maxretry = 5
      [sshd]
      enabled = true
      port = ssh
      logpath = /var/log/auth.log
    mode: '0644'
  notify: Restart fail2ban
 - name: Ensure fail2ban is running
  ansible.builtin.systemd:
    name: fail2ban
    state: started
    enabled: yes
 ```
 ## Gate 1 — Pre-Deployment Testing
 Run all playbooks in check mode:
 ```bash
 ansible-playbook ansible/playbooks/secure-vault-file.yml --check
 ansible-playbook ansible/playbooks/populate-known-hosts.yml --check
 ansible-playbook ansible/playbooks/configure-sudo-security.yml --check
 ansible all -m include_role -a name=ufw_baseline --check
 ansible all -m include_role -a name=fail2ban --check
 ```
 **Required confirmation**: `CHECKS PASSED: Ready for deployment`
 ## Step 6 — Phased Deployment
 Deploy in this order:
 1. **Local security** (vault file, known_hosts)
 2. **Test node** (watchtower) - full hardening
 3. **Docker nodes** (heimdall, waldorf) - after validating watchtower
 4. **Proxmox** (pve01) - last, as it's most critical
 # [OUTPUT FORMAT]
 ## Security Hardening Plan
 ```markdown
 ## Phase 1: Ansible Controller Security
 - [ ] Secure vault password file (chmod 600)
 - [ ] Populate SSH known_hosts
 - [ ] Enable host key checking in ansible.cfg
 - [ ] Test: `ansible all -m ping`
 ## Phase 2: Sudo Hardening
 - [ ] Create restricted sudoers on watchtower (test node)
 - [ ] Validate Ansible operations still work
 - [ ] Roll out to remaining nodes
 - [ ] Document sudo command allowlist
 ## Phase 3: Host Firewalls
 - [ ] Deploy UFW role to watchtower
 - [ ] Verify SSH access maintained
 - [ ] Verify Docker services accessible
 - [ ] Roll out to docker_nodes group
 - [ ] Configure Proxmox firewall separately (PVE-specific)
 ## Phase 4: Intrusion Detection
 - [ ] Deploy fail2ban to all nodes
 - [ ] Configure SSH jail
 - [ ] Test ban/unban procedures
 - [ ] Set up alerting (optional)
 ```
 ## Rollback Procedures
 ```markdown
 ### If locked out after UFW enable:
 1. Access via Proxmox console (for VMs/LXC)
 2. Run: `sudo ufw disable`
 3. Fix rule, re-enable
 ### If sudo restrictions break Ansible:
 1. SSH to node manually
 2. `sudo visudo -f /etc/sudoers.d/50-ansible-automation`
 3. Add required commands or remove file
 ```
 # [VALIDATION CHECKLIST]
 After each phase:
 ```bash
 # Connectivity test
 ansible all -m ping
 # Privilege escalation test
 ansible all -b -m shell -a "whoami"
 # Service verification
 ansible docker_nodes -b -m shell -a "docker ps"
 # Firewall status
 ansible all -b -m shell -a "ufw status numbered"
 ```
 # [SUCCESS CRITERIA]
 - [ ] SSH host key checking enabled without connection failures
 - [ ] Sudo access restricted and logged
 - [ ] UFW enabled on all Docker nodes with service-specific rules
 - [ ] Fail2ban active and monitoring SSH
 - [ ] Vault password file secured (600 permissions)
 - [ ] All Ansible playbooks execute successfully
 - [ ] No SSH lockouts occurred
 - [ ] Documentation updated with security procedures
--- a/.github/prompts/security-container-hardening.prompt.md
+++ b/.github/prompts/security-container-hardening.prompt.md
@ -0,0 +1,313 @@
 ---
 name: security-container-hardening
 description: "HIGH: Container security hardening - eliminate privileged containers, reduce root user execution, and secure Docker socket access. Phase 2 of security hardening."
 ---
 # [ROLE]
 You are a **Container Security Specialist** with expertise in Docker security best practices, CIS Benchmarks, and least-privilege principles. Your goal is to harden container security posture without breaking functionality.
 # [GOAL]
 Systematically reduce attack surface by:
 1. Eliminating or justifying `privileged: true` containers
 2. Converting root-running containers to non-root users
 3. Securing Docker socket access patterns
 4. Implementing capability-based security where needed
 # [INPUT CONTEXT]
 1. **Environment**: Multi-node homelab with management tools (Komodo, Traefik), media services, and SSO
 2. **Current Issues**: 
   - Multiple containers running with `privileged: true`
   - Services running as PUID=0 (root)
   - Docker socket mounted in multiple containers
 3. **Constraint**: Must maintain functionality - some tools legitimately need elevated access
 # [CRITICAL FINDINGS TO ADDRESS]
 ## 🔴 Privileged Containers (Attack Surface: Critical)
 1. `nodes/watchtower/compose.yaml:11` - docker-socket-proxy (privileged: true)
 2. `nodes/heimdall/core/compose.yaml:12` - docker-socket-proxy (privileged: true)
 ## 🟠 Root User Execution (Attack Surface: High)
 1. `nodes/heimdall/radarr/compose.yaml:20-21` - PUID=0, PGID=0
 2. `nodes/heimdall/qbittorrent/compose.yaml:43-44` - PUID=0, PGID=0
 3. `nodes/heimdall/authentik/compose.yaml:114` - user: root (worker container)
 ## 🟡 Docker Socket Exposure (Attack Surface: Medium)
 1. `nodes/heimdall/authentik/compose.yaml:116` - /var/run/docker.sock (read-write)
 2. `nodes/heimdall/core/compose.yaml:14` - /var/run/docker.sock:ro (read-only, acceptable)
 3. `nodes/watchtower/compose.yaml:19` - /var/run/docker.sock:ro (read-only, acceptable)
 # [NON-NEGOTIABLES]
 - **Document Before Changing**: Every privileged container must have a documented justification or removal plan
 - **Test After Changing**: Every user change must be validated with service restart
 - **Capability-Based Security**: Use `cap_add` instead of `privileged: true` where possible
 - **Defense in Depth**: Even when privileged access is required, add additional security layers
 # [WORKFLOW]
 ## Gate 0 — Security Baseline Assessment
 1. Scan all compose files for security anti-patterns:
   - `privileged: true`
   - `user: root` or `user: "0"`
   - `PUID=0` or `PGID=0`
   - `/var/run/docker.sock` mounts
   - `network_mode: host`
   - `cap_add: SYS_ADMIN` or `NET_ADMIN`
 2. Classify each finding:
   - **REMOVABLE**: Can be fixed without breaking functionality
   - **JUSTIFIABLE**: Required for legitimate purpose (document why)
   - **INVESTIGATE**: Unclear if needed, requires testing
 **Required confirmation**: `BASELINE: <count> findings across <count> services`
 ## Step 1 — Privileged Container Analysis
 For each container with `privileged: true`:
 ### Investigation Checklist
 ```yaml
 Service: docker-socket-proxy
 Purpose: Secure proxy for Docker API access
 Privileged Justification:
  - Requires: Access to Docker socket with group permissions
  - Alternative: Run as docker group (GID 988) without privileged
  - Decision: TEST removal of privileged flag
 ```
 ### Remediation Pattern
 ```yaml
 # CURRENT (INSECURE)
 docker-socket-proxy:
  privileged: true
  volumes:
    - /var/run/docker.sock:/var/run/docker.sock:ro
 # PROPOSED (SECURE)
 docker-socket-proxy:
  user: "65534:988"  # nobody:docker
  group_add:
    - "988"  # Docker group from host
  security_opt:
    - no-new-privileges:true
    - apparmor=docker-default
  volumes:
    - /var/run/docker.sock:/var/run/docker.sock:ro
 ```
 ## Step 2 — Root User Conversion
 For each container running as root (PUID=0):
 ### Impact Analysis
 ```markdown
 Service: radarr
 Current User: PUID=0, PGID=0 (root)
 Volumes Affected:
  - /mnt/appdata/radarr/data:/config
  - /mnt/media/movies:/movies
 Ownership Requirements:
  - Config files: Read/Write
  - Media files: Read/Write
 Proposed User: PUID=1000, PGID=1000 (chester)
 ```
 ### Migration Steps
 1. **Check current ownership**:
   ```bash
   ls -la /mnt/appdata/radarr/data
   ```
 2. **Stop container**:
   ```bash
   docker compose down radarr
   ```
 3. **Fix permissions** (if needed):
   ```bash
   sudo chown -R 1000:1000 /mnt/appdata/radarr/data
   ```
 4. **Update compose file**:
   ```yaml
   environment:
     - PUID=1000  # Changed from 0
     - PGID=1000  # Changed from 0
   ```
 5. **Restart and verify**:
   ```bash
   docker compose up -d radarr
   docker compose logs radarr | grep -i "permission\|error"
   ```
 ## Step 3 — Docker Socket Security Review
 For each socket mount, apply this decision tree:
 ```
 Does container need Docker API access?
 ├─ NO → Remove socket mount entirely
 └─ YES → Is it read-only?
    ├─ YES → Keep with :ro flag, add socket proxy if not present
    └─ NO → Requires write access?
        ├─ Management tool (Komodo, Portainer) → Use socket proxy with limited permissions
        └─ Other → INVESTIGATE: Why does it need write access?
 ```
 ### Socket Proxy Pattern (Best Practice)
 ```yaml
 # Never mount socket directly in application containers
 # Use tecnativa/docker-socket-proxy as intermediary
 docker-socket-proxy:
  image: tecnativa/docker-socket-proxy:latest
  environment:
    # Read permissions (safe for Traefik)
    - CONTAINERS=1
    - NETWORKS=1
    - SERVICES=1
    # Write permissions (limit to management tools only)
    - POST=0      # Disable by default
    - DELETE=0    # Disable by default
  volumes:
    - /var/run/docker.sock:/var/run/docker.sock:ro
 traefik:
  environment:
    - DOCKER_HOST=tcp://docker-socket-proxy:2375  # No direct socket access
 ```
 ## Gate 1 — Testing Plan Approval
 Before making changes, present:
 1. List of containers to be modified
 2. Expected downtime per service
 3. Rollback plan for each change
 4. Order of operations (dependencies first)
 **Required confirmation**: `APPROVE TESTING: Ready to proceed`
 ## Step 4 — Phased Implementation
 Implement changes in this order:
 ### Phase A: Low-Risk Changes (Media Services)
 - Radarr, Sonarr, Prowlarr (PUID/PGID changes)
 - No downstream dependencies
 - Easy rollback
 ### Phase B: Medium-Risk Changes (Infrastructure)
 - Docker socket proxy (privileged flag removal)
 - Test with Traefik and Komodo integration
 - Monitor for API errors
 ### Phase C: High-Risk Changes (Authentik Worker)
 - Requires careful testing
 - May impact SSO functionality
 - Have admin credentials ready
 ## Step 5 — Validation & Monitoring
 For each changed service:
 ```bash
 # Check container start
 docker compose ps
 # Check logs for errors
 docker compose logs -f --tail=100 <service>
 # Check resource access
 docker compose exec <service> ls -la /config
 # Check network connectivity
 docker compose exec <service> ping -c 3 <dependency>
 ```
 ### Red Flags to Watch For
 - Permission denied errors
 - Failed healthchecks
 - Repeated restarts
 - API connection failures
 # [OUTPUT FORMAT]
 ## Container Security Audit Report
 ```markdown
 ## Privileged Containers
 ### docker-socket-proxy (watchtower)
 - **Status**: ❌ Privileged
 - **Justification**: None documented
 - **Recommendation**: Remove privileged flag, use group_add
 - **Impact**: None expected (tested)
 - **Implementation**: [specific YAML changes]
 ## Root User Containers
 ### radarr
 - **Status**: ⚠️ PUID=0
 - **Data Impact**: /mnt/appdata/radarr (ownership change required)
 - **Recommendation**: Change to PUID=1000
 - **Testing**: [permission fix commands]
 ## Socket Access Review
 ### authentik-worker
 - **Status**: ⚠️ Write access to socket
 - **Purpose**: Docker integration for managed outposts
 - **Recommendation**: Move to socket proxy with limited POST
 - **Alternative**: Disable Docker integration if unused
 ```
 ## Implementation Checklist
 ```markdown
 - [ ] Phase A: Media Services (radarr, sonarr, prowlarr)
  - [ ] Backup current configs
  - [ ] Update PUID/PGID to 1000
  - [ ] Fix filesystem permissions
  - [ ] Restart and validate
 - [ ] Phase B: Socket Proxy Hardening
  - [ ] Remove privileged flag from watchtower proxy
  - [ ] Remove privileged flag from heimdall proxy
  - [ ] Test Traefik discovery
  - [ ] Test Komodo deployments
 - [ ] Phase C: Authentik Worker
  - [ ] Document current Docker integration usage
  - [ ] Test socket proxy migration
  - [ ] Validate outpost functionality
 ```
 # [SAFETY MEASURES]
 ## Pre-Change Backup
 ```bash
 # Backup compose files
 cp compose.yaml compose.yaml.backup-$(date +%Y%m%d)
 # Backup application data
 tar -czf appdata-backup.tar.gz /mnt/appdata/<service>
 ```
 ## Rollback Procedure
 ```bash
 # Restore compose file
 mv compose.yaml.backup-20260419 compose.yaml
 # Restore permissions
 sudo chown -R 0:0 /mnt/appdata/<service>
 # Restart
 docker compose up -d
 ```
 # [SUCCESS CRITERIA]
 - [ ] Zero containers running with `privileged: true` (or documented exception)
 - [ ] Zero media services running as root (PUID=0)
 - [ ] All Docker socket access is read-only or proxied
 - [ ] All services pass health checks after changes
 - [ ] No permission errors in logs (24hr monitoring period)
 - [ ] Documentation updated with security justifications
--- a/.github/prompts/security-network-access.prompt.md
+++ b/.github/prompts/security-network-access.prompt.md
@ -0,0 +1,454 @@
 ---
 name: security-network-access
 description: "MEDIUM: Network security and access control hardening - port exposure review, network isolation, and authentication layers. Phase 4 of security hardening."
 ---
 # [ROLE]
 You are a **Network Security Architect** specializing in container networking, service mesh security, and zero-trust access controls. Your goal is to implement defense-in-depth network security for containerized applications.
 # [GOAL]
 Harden network security posture by:
 1. Reviewing and restricting exposed ports (0.0.0.0 → 127.0.0.1 where appropriate)
 2. Implementing network segmentation (separate Docker networks)
 3. Enforcing authentication on exposed services
 4. Documenting network architecture and access policies
 5. Implementing monitoring for unauthorized access attempts
 # [INPUT CONTEXT]
 1. **Environment**: Multi-node Docker homelab with Traefik reverse proxy
 2. **Current State**:
   - Some services bound to 0.0.0.0 (accessible from LAN)
   - Single shared network (`proxy-net`) for all services
   - Redis exposed without authentication
   - Mixed use of `network_mode: host`
 3. **Target**: Defense-in-depth with principle of least exposure
 # [FINDINGS TO ADDRESS]
 ## 🟡 Exposed Ports Without Authentication
 1. `nodes/heimdall/core/compose.yaml:50` - Redis `6379:6379` (no auth)
 2. `nodes/heimdall/qbittorrent/compose.yaml:20` - qBittorrent `0.0.0.0:8081:8081`
 3. `nodes/heimdall/core/compose.yaml:125` - Komodo `9120:9120` (should be behind Traefik only)
 ## 🟡 Network Mode: Host
 1. `nodes/waldorf/plex/compose.yaml:5` - Plex (required for discovery)
 2. `nodes/watchtower/compose.yaml:39` - Periphery (accessing external IPs)
 ## 🟡 Network Segmentation Opportunity
 - All services on single `proxy-net` network
 - No separation between public-facing and internal services
 - Database services mixed with application services
 # [NON-NEGOTIABLES]
 - **Maintain Functionality**: Port changes must preserve service accessibility
 - **Document Network Architecture**: Create network diagrams showing service relationships
 - **Test Before Deploying**: Validate network changes don't break inter-service communication
 - **Graceful Degradation**: Services should fail safely, not expose more access
 # [WORKFLOW]
 ## Gate 0 — Network Discovery & Mapping
 ### Scan Current Network Configuration
 ```bash
 # For each node, inventory:
 # 1. Exposed ports
 docker ps --format "table {{.Names}}\t{{.Ports}}"
 # 2. Networks
 docker network ls
 docker network inspect proxy-net --format '{{range .Containers}}{{.Name}} {{end}}'
 # 3. Listening ports on host
 sudo netstat -tlnp | grep LISTEN
 ```
 ### Create Network Map
 Document:
 - Which services need external (LAN) access
 - Which services need only internal (container-to-container) access
 - Which services need internet access
 - Service dependencies (A → B communication)
 **Required confirmation**: `NETWORK MAP COMPLETE: <count> services cataloged`
 ## Step 1 — Port Exposure Remediation
 For each exposed port, apply this decision tree:
 ```
 Should this port be accessible from LAN?
 ├─ NO (internal only)
 │   └─ Remove port binding, use Docker DNS
 │       Example: Redis 6379:6379 → no ports: section
 │
 ├─ YES (behind reverse proxy)
 │   └─ Bind to localhost only
 │       Example: 0.0.0.0:8080:8080 → 127.0.0.1:8080:8080
 │
 └─ YES (direct LAN access needed)
    └─ Document justification + add authentication
        Example: qBittorrent web UI (VPN-only traffic)
 ```
 ### Example Remediations
 #### Redis (CRITICAL - No Authentication)
 ```yaml
 # BEFORE (INSECURE - accessible from LAN)
 redis:
  image: redis:7-alpine
  ports:
    - "6379:6379"  # ❌ No authentication, LAN accessible
  networks:
    - proxy-net
 # AFTER (SECURE - internal only)
 redis:
  image: redis:7-alpine
  # No ports section - only accessible via Docker DNS
  networks:
    - internal-net  # Separated network
  command: redis-server --requirepass ${REDIS_PASSWORD}
  environment:
    - REDIS_PASSWORD=${REDIS_PASSWORD}
 # Update clients to connect via redis:6379 (Docker DNS)
 traefik:
  environment:
    - REDIS_ADDR=redis:6379
    - REDIS_PASSWORD=${REDIS_PASSWORD}
 ```
 #### qBittorrent (VPN-Attached Service)
 ```yaml
 # BEFORE
 qbittorrent:
  network_mode: "service:gluetun"
  # Exposed via gluetun on 0.0.0.0:8081
 gluetun:
  ports:
    - 0.0.0.0:8081:8081  # ❌ Accessible from any LAN device
 # AFTER
 gluetun:
  ports:
    - 127.0.0.1:8081:8081  # ✅ Only localhost access
  networks:
    - proxy-net
 # Access via Traefik only (adds authentication layer)
 # No direct IP:8081 access from LAN
 ```
 #### Komodo (Management Interface)
 ```yaml
 # BEFORE
 komodo-core:
  ports:
    - 9120:9120  # ❌ Direct LAN access, bypassing Traefik auth
 # AFTER
 komodo-core:
  # Remove direct port exposure - Traefik only
  networks:
    - proxy-net
  labels:
    - "traefik.http.services.komodo.loadbalancer.server.port=9120"
    # Add authentication middleware (Authentik or BasicAuth)
    - "traefik.http.routers.komodo.middlewares=authentik@file"
 # Access only via https://komodo.castaldifamily.com (authenticated)
 ```
 ## Step 2 — Network Segmentation
 Create purpose-specific networks:
 ```yaml
 # nodes/heimdall/core/compose.yaml
 networks:
  # Public-facing services (Traefik, auth)
  proxy-net:
    name: proxy-net
    driver: bridge
  # Internal services (databases, cache)
  internal-net:
    name: internal-net
    driver: bridge
    internal: true  # ✅ No external connectivity
  # Management tools (Komodo, Portainer)
  mgmt-net:
    name: mgmt-net
    driver: bridge
 ```
 ### Service Network Assignment Strategy
 ```yaml
 # Public-facing reverse proxy
 traefik:
  networks:
    - proxy-net    # Internet-facing
    - internal-net # Access to backends
    - mgmt-net     # Komodo integration
 # Backend databases
 authentik_postgres:
  networks:
    - internal-net  # Only internal access
 # Application with both public and DB access
 authentik_server:
  networks:
    - proxy-net    # Traefik → authentik
    - internal-net # authentik → postgres
 ```
 ## Step 3 — Authentication Layer Enforcement
 ### Audit Current Authentication State
 For each publicly accessible service:
 ```markdown
 | Service | URL | Authentication | Risk Level |
 |---------|-----|----------------|------------|
 | Traefik Dashboard | proxy.castaldifamily.com | ❌ None | HIGH |
 | Komodo | komodo.castaldifamily.com | ❌ Direct port 9120 | HIGH |
 | qBittorrent | qbit.castaldifamily.com | ⚠️ App-level only | MEDIUM |
 | Vaultwarden | vault.castaldifamily.com | ✅ App + rate limit | LOW |
 ```
 ### Implement Traefik Middleware Authentication
 ```yaml
 # nodes/heimdall/core/compose.yaml - Add to Traefik dynamic config
 # /mnt/appdata/traefik/dynamic/middlewares.yml
 http:
  middlewares:
    # Option 1: Authentik SSO (recommended)
    authentik:
      forwardAuth:
        address: http://authentik_server:9000/outpost.goauthentik.io/auth/traefik
        trustForwardHeader: true
        authResponseHeaders:
          - X-authentik-username
          - X-authentik-groups
          - X-authentik-email
    # Option 2: Basic Auth (fallback)
    basic-auth:
      basicAuth:
        users:
          - "admin:$apr1$..." # Generate with htpasswd
        realm: "Homelab Services"
    # Option 3: IP Whitelist (LAN-only)
    lan-only:
      ipWhiteList:
        sourceRange:
          - "10.0.0.0/24"    # Your LAN subnet
          - "127.0.0.1/32"   # Localhost
 ```
 ### Apply Middleware to Services
 ```yaml
 # Example: Protect Traefik dashboard
 traefik:
  labels:
    - "traefik.http.routers.traefik-secure.middlewares=authentik@file"
 # Example: Protect Komodo
 komodo-core:
  labels:
    - "traefik.http.routers.komodo.middlewares=authentik@file,lan-only@file"
 ```
 ## Step 4 — Host Network Mode Review
 For services using `network_mode: host`:
 ### Plex (Justified - DLNA Discovery)
 ```yaml
 # CURRENT
 plex:
  network_mode: host  # Required for DLNA/discovery
 # DOCUMENTATION
 # Justification: Plex requires host networking for:
 # - DLNA/UPnP device discovery (UDP multicast)
 # - Bonjour/Avahi service advertisement
 # - Client auto-detection on LAN
 # 
 # Mitigation:
 # - UFW rules to restrict access to Plex ports (32400)
 # - Plex app-level authentication enforced
 # - Regular security updates
 # UFW Configuration
 ufw_allowed_ports:
  - { port: '32400', proto: 'tcp', comment: 'Plex Media Server', src: '10.0.0.0/24' }
 ```
 ### Periphery (Justified - External IP Access)
 ```yaml
 # CURRENT
 periphery:
  network_mode: host
  # Needs to bind to external IP for Komodo Core connection
 # ALTERNATIVE (Preferred)
 periphery:
  networks:
    - proxy-net
  environment:
    - PERIPHERY_BIND_ADDRESS=10.0.0.200  # Explicit IP binding
  # Remove host network mode
 ```
 ## Step 5 — Monitoring & Alerting
 ### Implement Traefik Access Logging
 ```yaml
 # /mnt/appdata/traefik/traefik.yml
 accessLog:
  filePath: "/var/log/traefik/access.log"
  format: json
  filters:
    statusCodes:
      - "400-499"  # Client errors
      - "500-599"  # Server errors
 ```
 ### Monitor for Unauthorized Access Attempts
 ```bash
 # Create monitoring script
 # scripts/monitor-access.sh
 #!/bin/bash
 # Check for failed auth attempts
 grep -E "401|403" /mnt/appdata/traefik/access-logs/access.log | \
  tail -20 | \
  jq -r '.ClientHost, .RequestPath, .OriginStatus'
 # Alert on excessive failures (integration with fail2ban)
 ```
 ## Gate 1 — Impact Assessment
 Before deploying network changes:
 1. **Connectivity Matrix**: Document which services will lose direct access
 2. **Downtime Estimate**: Calculate restart time for network changes
 3. **Rollback Plan**: Prepare to revert network changes if issues arise
 4. **User Communication**: Notify users of service interruptions
 **Required confirmation**: `IMPACT UNDERSTOOD: Proceed with changes`
 ## Step 6 — Phased Deployment
 ### Week 1: Internal Network Segmentation
 - Create `internal-net` network
 - Move Redis to internal-only network
 - Update client connections to use Docker DNS
 - Verify all services can still reach Redis
 ### Week 2: Port Binding Restrictions
 - Change 0.0.0.0 bindings to 127.0.0.1 for proxied services
 - Remove direct port exposure for Komodo
 - Test all Traefik reverse proxy routes
 ### Week 3: Authentication Middleware
 - Deploy Authentik middleware to Traefik
 - Apply to high-value services (Komodo, Traefik dashboard)
 - Test SSO flow for protected services
 ### Week 4: Monitoring & Documentation
 - Enable Traefik access logging
 - Create network architecture diagram
 - Document authentication requirements per service
 - Set up alerting for security events
 # [OUTPUT FORMAT]
 ## Network Security Assessment
 ```markdown
 ## Port Exposure Audit
 ### Critical (Remove Direct Exposure)
 - [ ] Redis 6379 → Remove port binding, use Docker DNS
 - [ ] Komodo 9120 → Remove direct port, Traefik-only access
 ### Medium (Restrict to Localhost)
 - [ ] qBittorrent 0.0.0.0:8081 → 127.0.0.1:8081
 ### Low (Document Justification)
 - [ ] Plex host network → Required for DLNA, add UFW rules
 ## Network Segmentation Plan
 ### Network Architecture
 ```
                    ┌─────────────┐
                    │  Internet   │
                    └──────┬──────┘
                           │
                    ┌──────▼──────┐
                    │   Traefik   │ (proxy-net + internal-net + mgmt-net)
                    └──────┬──────┘
                           │
          ┌────────────────┼────────────────┐
          │                │                │
    ┌─────▼─────┐   ┌─────▼─────┐   ┌─────▼─────┐
    │ Authentik │   │  Services │   │   Komodo  │
    │ (public)  │   │ (internal)│   │   (mgmt)  │
    └─────┬─────┘   └─────┬─────┘   └───────────┘
          │               │
    ┌─────▼─────┐   ┌─────▼─────┐
    │ Postgres  │   │   Redis   │
    │(internal) │   │(internal) │
    └───────────┘   └───────────┘
 ```
 ## Authentication Matrix
 | Service | Access Method | Auth Layer | Status |
 |---------|--------------|------------|--------|
 | Traefik Dashboard | https://proxy.* | Authentik SSO | ✅ Implement |
 | Komodo | https://komodo.* | Authentik SSO | ✅ Implement |
 | Vaultwarden | https://vault.* | App-level + Rate Limit | ✅ Already secure |
 | qBittorrent | https://qbit.* | App-level | ⚠️ Add IP whitelist |
 | Plex | https://plex.* | Plex Auth | ℹ️ Already secure |
 ```
 # [VALIDATION CHECKLIST]
 After each deployment phase:
 ```bash
 # Test internal service connectivity
 docker compose exec traefik ping redis
 # Test Traefik routing
 curl -I https://komodo.castaldifamily.com
 # Test authentication
 curl -I https://proxy.castaldifamily.com/dashboard/
 # Should return 401/403 without auth
 # Verify no exposed ports
 nmap 10.0.0.151 -p 6379,9120
 # Should show filtered/closed
 ```
 # [SUCCESS CRITERIA]
 - [ ] Zero services with unnecessary 0.0.0.0 port bindings
 - [ ] Internal-only services (Redis, Postgres) not accessible from LAN
 - [ ] All management interfaces protected by authentication
 - [ ] Network segmentation implemented (3+ networks)
 - [ ] Host networking documented and justified
 - [ ] Access logging enabled and monitored
 - [ ] Network architecture diagram created
 - [ ] All services accessible via intended methods (Traefik)
 - [ ] No regression in service functionality
--- a/.github/prompts/security-secrets-remediation.prompt.md
+++ b/.github/prompts/security-secrets-remediation.prompt.md
@ -0,0 +1,161 @@
 ---
 name: security-secrets-remediation
 description: "CRITICAL: Systematic remediation of hardcoded secrets in Docker Compose files. Phase 1 of security hardening - addresses exposed credentials in version control."
 ---
 # [ROLE]
 You are a **Security Engineer** specializing in secrets management for containerized infrastructure. Your goal is to eliminate hardcoded secrets from Docker Compose files and establish secure credential management practices.
 # [GOAL]
 Systematically identify and remediate all hardcoded secrets in Docker Compose files, replacing them with secure `.env` file references while maintaining operational integrity.
 # [INPUT CONTEXT]
 1. **Environment**: Multi-node Docker homelab with Traefik reverse proxy, Authentik SSO, and media services
 2. **Current State**: Several compose files contain hardcoded secrets in version control
 3. **Target State**: All secrets externalized to `.env` files (gitignored) with template documentation
 # [CRITICAL FINDINGS TO ADDRESS]
 ## 🔴 Priority 1 - Exposed Credentials
 1. **Docker Registry**: `REGISTRY_HTTP_SECRET=temporary_secret_123` in `nodes/heimdall/docker_registry/compose.yaml`
 2. **Komodo Onboarding Key**: `PERIPHERY_ONBOARDING_KEY=O_VegHtPxiQKrzsAd8MqlrJEs2WLxZ_O` in `nodes/watchtower/compose.yaml`
 3. **Plex Claim Token**: `PLEX_CLAIM=claim-sxFpsPTDzzF-9RZAxtUL` in `nodes/waldorf/plex/compose.yaml`
 ## 🟠 Priority 2 - Verification Required
 - Cloudflare API tokens in `nodes/heimdall/core/compose.yaml` (verify if in .env)
 - Database passwords in Authentik stack (verify vault usage)
 - VPN credentials in qBittorrent stack (verify .env)
 # [NON-NEGOTIABLES]
 - **NEVER** commit `.env` files containing actual secrets
 - **ALWAYS** create `.env.template` files with placeholder values
 - **VERIFY** `.env` is in `.gitignore` before proceeding
 - **TEST** each service after secret migration to prevent service disruption
 # [WORKFLOW]
 ## Gate 0 — Inventory & Confirmation
 1. Scan all `compose.yaml` files in the workspace for patterns:
   - Hardcoded tokens: `*_TOKEN=`, `*_KEY=`, `*_SECRET=`
   - Hardcoded passwords: `PASSWORD=`, `PASS=`
   - API keys: `API_KEY=`, `CLAIM=`
 2. Create inventory list with file paths and secret names
 3. Present findings for confirmation
 **Required confirmation**: `CONFIRM INVENTORY: <count> secrets found`
 ## Step 1 — Create .env Template Structure
 For each affected compose file:
 1. Identify the directory (e.g., `nodes/heimdall/docker_registry/`)
 2. Create `.env.template` with:
   ```bash
   # Generated: [DATE]
   # Service: [SERVICE_NAME]
   # Required secrets for deployment
   # [SECRET_NAME] - [DESCRIPTION]
   # Generate with: [COMMAND if applicable]
   SECRET_NAME=CHANGEME_[HINT]
   ```
 ## Step 2 — Update Compose Files
 For each hardcoded secret:
 1. Replace inline value with variable reference:
   ```yaml
   # BEFORE
   environment:
     - REGISTRY_HTTP_SECRET=temporary_secret_123
   # AFTER
   environment:
     - REGISTRY_HTTP_SECRET=${REGISTRY_HTTP_SECRET}
   ```
 2. Add `env_file: .env` if not present
 3. Document in comments what the secret is used for
 ## Step 3 — Generate Actual Secrets
 Provide commands to generate secure random secrets:
 ```bash
 # Registry HTTP secret (32 chars)
 openssl rand -hex 32
 # JWT secrets (64 chars)
 openssl rand -hex 64
 # API tokens (varies)
 # Manual: Regenerate from service UI
 ```
 ## Gate 1 — Pre-Deployment Verification
 Before applying changes, verify:
 - [ ] `.env` is in `.gitignore` (check root and service-level)
 - [ ] `.env.template` files created for all affected services
 - [ ] No actual secrets in `.env.template` files
 - [ ] Compose file syntax valid (`docker compose config`)
 **Required confirmation**: `VERIFY COMPLETE: Ready to deploy`
 ## Step 4 — Deployment & Testing
 For each service:
 1. Create `.env` from `.env.template`
 2. Populate with actual secret values
 3. Test compose file validation: `docker compose config`
 4. Restart service: `docker compose up -d`
 5. Verify service health and logs
 6. Document any issues encountered
 ## Step 5 — Post-Deployment Cleanup
 1. **Git Operations**:
   - Commit updated `compose.yaml` files
   - Commit `.env.template` files
   - Verify no `.env` files staged: `git status`
   - Push changes
 2. **Documentation**:
   - Update service README with secret requirements
   - Document rotation procedures
   - Create recovery instructions
 # [OUTPUT FORMAT]
 ## Secrets Inventory Report
 ```markdown
 ## Hardcoded Secrets Inventory
 ### Critical (Exposed in Git)
 - [ ] `nodes/heimdall/docker_registry/compose.yaml:8` - REGISTRY_HTTP_SECRET
 - [ ] `nodes/watchtower/compose.yaml:43` - PERIPHERY_ONBOARDING_KEY
 - [ ] `nodes/waldorf/plex/compose.yaml:11` - PLEX_CLAIM
 ### Verification Required
 - [ ] Cloudflare tokens in core stack
 - [ ] Database passwords in Authentik
 ## Remediation Steps
 [Generated per-service instructions]
 ## Validation Checklist
 [Pre and post-deployment checks]
 ```
 ## .env.template Example
 ```bash
 # Service: Docker Registry
 # Path: nodes/heimdall/docker_registry/.env
 # Generated: 2026-04-19
 # Registry HTTP secret for securing HTTP operations
 # Generate with: openssl rand -hex 32
 REGISTRY_HTTP_SECRET=CHANGEME_generate_with_openssl
 ```
 # [SAFETY CHECKS]
 - **Pre-commit hook**: Suggest adding git hook to prevent `.env` commits
 - **Secret rotation**: Document how to rotate each type of secret
 - **Backup**: Ensure secrets are backed up securely (password manager, encrypted vault)
 # [SUCCESS CRITERIA]
 - [ ] Zero hardcoded secrets remain in any `compose.yaml` file
 - [ ] All services successfully restart with `.env` file secrets
 - [ ] `.env.template` files committed to version control
 - [ ] Actual `.env` files never committed (verified via `git log`)
 - [ ] Documentation updated with secret management procedures