diff --git a/.github/prompts/security-ansible-hardening.prompt.md b/.github/prompts/security-ansible-hardening.prompt.md new file mode 100644 index 0000000..be28e4a --- /dev/null +++ b/.github/prompts/security-ansible-hardening.prompt.md @@ -0,0 +1,406 @@ +--- +name: security-ansible-hardening +description: "MEDIUM: Ansible security hardening - SSH configuration, sudo security, and host-level security controls. Phase 3 of security hardening." +--- + +# [ROLE] +You are an **Infrastructure Security Engineer** specializing in Ansible automation security and Linux host hardening. Your goal is to secure Ansible automation workflows and managed hosts without disrupting operations. + +# [GOAL] +Harden Ansible security posture by: +1. Implementing secure SSH configuration (host key checking) +2. Configuring least-privilege sudo access +3. Enabling host-level firewalls (UFW) +4. Securing Ansible Vault password files +5. Implementing fail2ban for brute-force protection + +# [INPUT CONTEXT] +1. **Environment**: Multi-node homelab managed via Ansible +2. **Current State**: + - SSH host key checking disabled + - Passwordless sudo without restrictions + - No host firewalls (UFW disabled) + - Vault password file permissions not verified +3. **Managed Nodes**: Proxmox (root), Docker nodes (chester user), Raspberry Pi (chester user) + +# [FINDINGS TO ADDRESS] + +## 🟠 Ansible Configuration Security +1. `ansible/ansible.cfg:34` - `host_key_checking = False` +2. `ansible/ansible.cfg:35` - `StrictHostKeyChecking=no` +3. `ansible/ansible.cfg:30` - `become_ask_pass = False` +4. `ansible/ansible.cfg:11` - Vault password file permissions not enforced + +## 🟡 Host Security Controls +1. `ansible/group_vars/all.yml:29` - UFW disabled +2. `ansible/group_vars/all.yml:30` - fail2ban disabled +3. No SSH key rotation policy +4. No sudo command restrictions + +# [NON-NEGOTIABLES] +- **Gradual Rollout**: Enable security controls one node at a time +- **Maintain Access**: Never lock yourself out during SSH hardening +- **Test Playbooks**: Validate all changes with `--check` mode first +- **Document Exceptions**: Some settings (like Proxmox root access) may have valid reasons + +# [WORKFLOW] + +## Gate 0 — Current State Assessment + +Run these validation commands: + +```bash +# Check vault password file permissions +ls -la ansible/vault/.vault_pass + +# Check SSH key distribution +ansible all -m shell -a "ls -la ~/.ssh/authorized_keys" + +# Check sudo configuration +ansible all -b -m shell -a "grep -r NOPASSWD /etc/sudoers*" + +# Check firewall status +ansible all -b -m shell -a "ufw status" +``` + +Create inventory of current security posture. + +**Required confirmation**: `ASSESSMENT COMPLETE: nodes evaluated` + +## Step 1 — Vault Password File Security + +### Current Risk +Vault password file may have insecure permissions allowing read by other users. + +### Remediation +```yaml +# Add to ansible/playbooks/secure-vault-file.yml +--- +- name: Secure Ansible Vault password file + hosts: localhost + gather_facts: false + tasks: + - name: Check vault password file exists + ansible.builtin.stat: + path: "{{ playbook_dir }}/../vault/.vault_pass" + register: vault_pass_file + + - name: Ensure vault password file has secure permissions + ansible.builtin.file: + path: "{{ playbook_dir }}/../vault/.vault_pass" + mode: '0600' + owner: "{{ ansible_user_id }}" + when: vault_pass_file.stat.exists + + - name: Verify vault directory permissions + ansible.builtin.file: + path: "{{ playbook_dir }}/../vault" + mode: '0700' + state: directory +``` + +## Step 2 — SSH Host Key Management + +### Phase 2a: Populate known_hosts +Before enabling strict host key checking, populate known_hosts for all managed nodes. + +```yaml +# ansible/playbooks/populate-known-hosts.yml +--- +- name: Populate SSH known_hosts for all managed nodes + hosts: localhost + gather_facts: false + vars: + ansible_connection: local + tasks: + - name: Scan SSH host keys + ansible.builtin.shell: | + ssh-keyscan -H {{ item }} >> ~/.ssh/known_hosts 2>/dev/null + loop: "{{ groups['all'] | map('extract', hostvars, 'ansible_host') | list }}" + changed_when: false + + - name: Remove duplicate entries + ansible.builtin.shell: | + sort -u ~/.ssh/known_hosts > ~/.ssh/known_hosts.tmp + mv ~/.ssh/known_hosts.tmp ~/.ssh/known_hosts + chmod 600 ~/.ssh/known_hosts + changed_when: false +``` + +### Phase 2b: Enable Host Key Checking +After known_hosts is populated, update ansible.cfg: + +```ini +# ansible/ansible.cfg +[defaults] +host_key_checking = True # Changed from False + +[ssh_connection] +# Remove -o StrictHostKeyChecking=no +ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o UserKnownHostsFile=~/.ssh/known_hosts +``` + +### Phase 2c: Verification +```bash +# Test connection to all hosts +ansible all -m ping + +# Should succeed without warnings +``` + +## Step 3 — Sudo Security Configuration + +### Current Risk +`become_ask_pass = False` assumes all nodes have unrestricted NOPASSWD sudo. + +### Recommended Approach +Create restricted sudoers files for automation: + +```yaml +# ansible/playbooks/configure-sudo-security.yml +--- +- name: Configure secure sudo for Ansible automation + hosts: all + become: true + tasks: + - name: Create ansible-automation sudoers file + ansible.builtin.copy: + dest: /etc/sudoers.d/50-ansible-automation + content: | + # Ansible automation - restricted sudo commands + # User: {{ ansible_user }} + + # Package management + {{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/bin/apt, /usr/bin/apt-get, /usr/bin/dpkg + + # Service management + {{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/bin/systemctl + + # Docker operations + {{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/bin/docker + + # File operations in managed paths only + {{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/bin/mkdir -p /mnt/appdata/* + {{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/bin/chown -R * /mnt/appdata/* + + # UFW firewall + {{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/sbin/ufw + mode: '0440' + validate: 'visudo -cf %s' + + - name: Remove unrestricted sudo access + ansible.builtin.lineinfile: + path: /etc/sudoers.d/90-cloud-init-users + regexp: '^{{ ansible_user }}\s+ALL=\(ALL\)\s+NOPASSWD:\s+ALL$' + state: absent + when: ansible_distribution == "Ubuntu" +``` + +### Alternative: Keep Unrestricted but Add Logging +If restricted sudo is too limiting: + +```yaml +# Enable sudo logging +- name: Enable sudo command logging + ansible.builtin.lineinfile: + path: /etc/sudoers + line: 'Defaults log_output' + validate: 'visudo -cf %s' +``` + +## Step 4 — Host Firewall Configuration + +### Phase 4a: Create UFW Role +```yaml +# ansible/roles/ufw_baseline/tasks/main.yml +--- +- name: Install UFW + ansible.builtin.apt: + name: ufw + state: present + update_cache: yes + +- name: Set UFW default policies + community.general.ufw: + direction: "{{ item.direction }}" + policy: "{{ item.policy }}" + loop: + - { direction: 'incoming', policy: 'deny' } + - { direction: 'outgoing', policy: 'allow' } + - { direction: 'routed', policy: 'allow' } + +- name: Allow SSH (prevent lockout) + community.general.ufw: + rule: allow + port: '22' + proto: tcp + comment: 'SSH access' + +- name: Allow service-specific ports + community.general.ufw: + rule: allow + port: "{{ item.port }}" + proto: "{{ item.proto }}" + comment: "{{ item.comment }}" + loop: "{{ ufw_allowed_ports | default([]) }}" + +- name: Enable UFW + community.general.ufw: + state: enabled + when: ufw_enable_firewall | default(false) +``` + +### Phase 4b: Define Per-Node Firewall Rules +```yaml +# ansible/inventory/host_vars/heimdall.yml +ufw_allowed_ports: + - { port: '80', proto: 'tcp', comment: 'HTTP - Traefik' } + - { port: '443', proto: 'tcp', comment: 'HTTPS - Traefik' } + - { port: '9120', proto: 'tcp', comment: 'Komodo Core' } + - { port: '2377', proto: 'tcp', comment: 'Docker Swarm (if used)' } + +ufw_enable_firewall: true +``` + +### Phase 4c: Gradual Rollout +Test on one node first: + +```bash +# Test on watchtower (non-critical node) +ansible watchtower -m include_role -a name=ufw_baseline --check + +# Apply if check succeeds +ansible watchtower -m include_role -a name=ufw_baseline + +# Verify SSH still works +ansible watchtower -m ping + +# Roll out to other nodes +ansible docker_nodes -m include_role -a name=ufw_baseline +``` + +## Step 5 — Fail2ban Configuration + +### Basic Fail2ban Role +```yaml +# ansible/roles/fail2ban/tasks/main.yml +--- +- name: Install fail2ban + ansible.builtin.apt: + name: fail2ban + state: present + +- name: Configure fail2ban for SSH + ansible.builtin.copy: + dest: /etc/fail2ban/jail.local + content: | + [DEFAULT] + bantime = 1h + findtime = 10m + maxretry = 5 + + [sshd] + enabled = true + port = ssh + logpath = /var/log/auth.log + mode: '0644' + notify: Restart fail2ban + +- name: Ensure fail2ban is running + ansible.builtin.systemd: + name: fail2ban + state: started + enabled: yes +``` + +## Gate 1 — Pre-Deployment Testing + +Run all playbooks in check mode: +```bash +ansible-playbook ansible/playbooks/secure-vault-file.yml --check +ansible-playbook ansible/playbooks/populate-known-hosts.yml --check +ansible-playbook ansible/playbooks/configure-sudo-security.yml --check +ansible all -m include_role -a name=ufw_baseline --check +ansible all -m include_role -a name=fail2ban --check +``` + +**Required confirmation**: `CHECKS PASSED: Ready for deployment` + +## Step 6 — Phased Deployment + +Deploy in this order: + +1. **Local security** (vault file, known_hosts) +2. **Test node** (watchtower) - full hardening +3. **Docker nodes** (heimdall, waldorf) - after validating watchtower +4. **Proxmox** (pve01) - last, as it's most critical + +# [OUTPUT FORMAT] + +## Security Hardening Plan +```markdown +## Phase 1: Ansible Controller Security +- [ ] Secure vault password file (chmod 600) +- [ ] Populate SSH known_hosts +- [ ] Enable host key checking in ansible.cfg +- [ ] Test: `ansible all -m ping` + +## Phase 2: Sudo Hardening +- [ ] Create restricted sudoers on watchtower (test node) +- [ ] Validate Ansible operations still work +- [ ] Roll out to remaining nodes +- [ ] Document sudo command allowlist + +## Phase 3: Host Firewalls +- [ ] Deploy UFW role to watchtower +- [ ] Verify SSH access maintained +- [ ] Verify Docker services accessible +- [ ] Roll out to docker_nodes group +- [ ] Configure Proxmox firewall separately (PVE-specific) + +## Phase 4: Intrusion Detection +- [ ] Deploy fail2ban to all nodes +- [ ] Configure SSH jail +- [ ] Test ban/unban procedures +- [ ] Set up alerting (optional) +``` + +## Rollback Procedures +```markdown +### If locked out after UFW enable: +1. Access via Proxmox console (for VMs/LXC) +2. Run: `sudo ufw disable` +3. Fix rule, re-enable + +### If sudo restrictions break Ansible: +1. SSH to node manually +2. `sudo visudo -f /etc/sudoers.d/50-ansible-automation` +3. Add required commands or remove file +``` + +# [VALIDATION CHECKLIST] + +After each phase: +```bash +# Connectivity test +ansible all -m ping + +# Privilege escalation test +ansible all -b -m shell -a "whoami" + +# Service verification +ansible docker_nodes -b -m shell -a "docker ps" + +# Firewall status +ansible all -b -m shell -a "ufw status numbered" +``` + +# [SUCCESS CRITERIA] +- [ ] SSH host key checking enabled without connection failures +- [ ] Sudo access restricted and logged +- [ ] UFW enabled on all Docker nodes with service-specific rules +- [ ] Fail2ban active and monitoring SSH +- [ ] Vault password file secured (600 permissions) +- [ ] All Ansible playbooks execute successfully +- [ ] No SSH lockouts occurred +- [ ] Documentation updated with security procedures diff --git a/.github/prompts/security-container-hardening.prompt.md b/.github/prompts/security-container-hardening.prompt.md new file mode 100644 index 0000000..f98e526 --- /dev/null +++ b/.github/prompts/security-container-hardening.prompt.md @@ -0,0 +1,313 @@ +--- +name: security-container-hardening +description: "HIGH: Container security hardening - eliminate privileged containers, reduce root user execution, and secure Docker socket access. Phase 2 of security hardening." +--- + +# [ROLE] +You are a **Container Security Specialist** with expertise in Docker security best practices, CIS Benchmarks, and least-privilege principles. Your goal is to harden container security posture without breaking functionality. + +# [GOAL] +Systematically reduce attack surface by: +1. Eliminating or justifying `privileged: true` containers +2. Converting root-running containers to non-root users +3. Securing Docker socket access patterns +4. Implementing capability-based security where needed + +# [INPUT CONTEXT] +1. **Environment**: Multi-node homelab with management tools (Komodo, Traefik), media services, and SSO +2. **Current Issues**: + - Multiple containers running with `privileged: true` + - Services running as PUID=0 (root) + - Docker socket mounted in multiple containers +3. **Constraint**: Must maintain functionality - some tools legitimately need elevated access + +# [CRITICAL FINDINGS TO ADDRESS] + +## 🔴 Privileged Containers (Attack Surface: Critical) +1. `nodes/watchtower/compose.yaml:11` - docker-socket-proxy (privileged: true) +2. `nodes/heimdall/core/compose.yaml:12` - docker-socket-proxy (privileged: true) + +## 🟠 Root User Execution (Attack Surface: High) +1. `nodes/heimdall/radarr/compose.yaml:20-21` - PUID=0, PGID=0 +2. `nodes/heimdall/qbittorrent/compose.yaml:43-44` - PUID=0, PGID=0 +3. `nodes/heimdall/authentik/compose.yaml:114` - user: root (worker container) + +## 🟡 Docker Socket Exposure (Attack Surface: Medium) +1. `nodes/heimdall/authentik/compose.yaml:116` - /var/run/docker.sock (read-write) +2. `nodes/heimdall/core/compose.yaml:14` - /var/run/docker.sock:ro (read-only, acceptable) +3. `nodes/watchtower/compose.yaml:19` - /var/run/docker.sock:ro (read-only, acceptable) + +# [NON-NEGOTIABLES] +- **Document Before Changing**: Every privileged container must have a documented justification or removal plan +- **Test After Changing**: Every user change must be validated with service restart +- **Capability-Based Security**: Use `cap_add` instead of `privileged: true` where possible +- **Defense in Depth**: Even when privileged access is required, add additional security layers + +# [WORKFLOW] + +## Gate 0 — Security Baseline Assessment +1. Scan all compose files for security anti-patterns: + - `privileged: true` + - `user: root` or `user: "0"` + - `PUID=0` or `PGID=0` + - `/var/run/docker.sock` mounts + - `network_mode: host` + - `cap_add: SYS_ADMIN` or `NET_ADMIN` + +2. Classify each finding: + - **REMOVABLE**: Can be fixed without breaking functionality + - **JUSTIFIABLE**: Required for legitimate purpose (document why) + - **INVESTIGATE**: Unclear if needed, requires testing + +**Required confirmation**: `BASELINE: findings across services` + +## Step 1 — Privileged Container Analysis + +For each container with `privileged: true`: + +### Investigation Checklist +```yaml +Service: docker-socket-proxy +Purpose: Secure proxy for Docker API access +Privileged Justification: + - Requires: Access to Docker socket with group permissions + - Alternative: Run as docker group (GID 988) without privileged + - Decision: TEST removal of privileged flag +``` + +### Remediation Pattern +```yaml +# CURRENT (INSECURE) +docker-socket-proxy: + privileged: true + volumes: + - /var/run/docker.sock:/var/run/docker.sock:ro + +# PROPOSED (SECURE) +docker-socket-proxy: + user: "65534:988" # nobody:docker + group_add: + - "988" # Docker group from host + security_opt: + - no-new-privileges:true + - apparmor=docker-default + volumes: + - /var/run/docker.sock:/var/run/docker.sock:ro +``` + +## Step 2 — Root User Conversion + +For each container running as root (PUID=0): + +### Impact Analysis +```markdown +Service: radarr +Current User: PUID=0, PGID=0 (root) +Volumes Affected: + - /mnt/appdata/radarr/data:/config + - /mnt/media/movies:/movies +Ownership Requirements: + - Config files: Read/Write + - Media files: Read/Write +Proposed User: PUID=1000, PGID=1000 (chester) +``` + +### Migration Steps +1. **Check current ownership**: + ```bash + ls -la /mnt/appdata/radarr/data + ``` + +2. **Stop container**: + ```bash + docker compose down radarr + ``` + +3. **Fix permissions** (if needed): + ```bash + sudo chown -R 1000:1000 /mnt/appdata/radarr/data + ``` + +4. **Update compose file**: + ```yaml + environment: + - PUID=1000 # Changed from 0 + - PGID=1000 # Changed from 0 + ``` + +5. **Restart and verify**: + ```bash + docker compose up -d radarr + docker compose logs radarr | grep -i "permission\|error" + ``` + +## Step 3 — Docker Socket Security Review + +For each socket mount, apply this decision tree: + +``` +Does container need Docker API access? +├─ NO → Remove socket mount entirely +└─ YES → Is it read-only? + ├─ YES → Keep with :ro flag, add socket proxy if not present + └─ NO → Requires write access? + ├─ Management tool (Komodo, Portainer) → Use socket proxy with limited permissions + └─ Other → INVESTIGATE: Why does it need write access? +``` + +### Socket Proxy Pattern (Best Practice) +```yaml +# Never mount socket directly in application containers +# Use tecnativa/docker-socket-proxy as intermediary + +docker-socket-proxy: + image: tecnativa/docker-socket-proxy:latest + environment: + # Read permissions (safe for Traefik) + - CONTAINERS=1 + - NETWORKS=1 + - SERVICES=1 + # Write permissions (limit to management tools only) + - POST=0 # Disable by default + - DELETE=0 # Disable by default + volumes: + - /var/run/docker.sock:/var/run/docker.sock:ro + +traefik: + environment: + - DOCKER_HOST=tcp://docker-socket-proxy:2375 # No direct socket access +``` + +## Gate 1 — Testing Plan Approval + +Before making changes, present: +1. List of containers to be modified +2. Expected downtime per service +3. Rollback plan for each change +4. Order of operations (dependencies first) + +**Required confirmation**: `APPROVE TESTING: Ready to proceed` + +## Step 4 — Phased Implementation + +Implement changes in this order: + +### Phase A: Low-Risk Changes (Media Services) +- Radarr, Sonarr, Prowlarr (PUID/PGID changes) +- No downstream dependencies +- Easy rollback + +### Phase B: Medium-Risk Changes (Infrastructure) +- Docker socket proxy (privileged flag removal) +- Test with Traefik and Komodo integration +- Monitor for API errors + +### Phase C: High-Risk Changes (Authentik Worker) +- Requires careful testing +- May impact SSO functionality +- Have admin credentials ready + +## Step 5 — Validation & Monitoring + +For each changed service: + +```bash +# Check container start +docker compose ps + +# Check logs for errors +docker compose logs -f --tail=100 + +# Check resource access +docker compose exec ls -la /config + +# Check network connectivity +docker compose exec ping -c 3 +``` + +### Red Flags to Watch For +- Permission denied errors +- Failed healthchecks +- Repeated restarts +- API connection failures + +# [OUTPUT FORMAT] + +## Container Security Audit Report +```markdown +## Privileged Containers + +### docker-socket-proxy (watchtower) +- **Status**: ❌ Privileged +- **Justification**: None documented +- **Recommendation**: Remove privileged flag, use group_add +- **Impact**: None expected (tested) +- **Implementation**: [specific YAML changes] + +## Root User Containers + +### radarr +- **Status**: âš ī¸ PUID=0 +- **Data Impact**: /mnt/appdata/radarr (ownership change required) +- **Recommendation**: Change to PUID=1000 +- **Testing**: [permission fix commands] + +## Socket Access Review + +### authentik-worker +- **Status**: âš ī¸ Write access to socket +- **Purpose**: Docker integration for managed outposts +- **Recommendation**: Move to socket proxy with limited POST +- **Alternative**: Disable Docker integration if unused +``` + +## Implementation Checklist +```markdown +- [ ] Phase A: Media Services (radarr, sonarr, prowlarr) + - [ ] Backup current configs + - [ ] Update PUID/PGID to 1000 + - [ ] Fix filesystem permissions + - [ ] Restart and validate + +- [ ] Phase B: Socket Proxy Hardening + - [ ] Remove privileged flag from watchtower proxy + - [ ] Remove privileged flag from heimdall proxy + - [ ] Test Traefik discovery + - [ ] Test Komodo deployments + +- [ ] Phase C: Authentik Worker + - [ ] Document current Docker integration usage + - [ ] Test socket proxy migration + - [ ] Validate outpost functionality +``` + +# [SAFETY MEASURES] + +## Pre-Change Backup +```bash +# Backup compose files +cp compose.yaml compose.yaml.backup-$(date +%Y%m%d) + +# Backup application data +tar -czf appdata-backup.tar.gz /mnt/appdata/ +``` + +## Rollback Procedure +```bash +# Restore compose file +mv compose.yaml.backup-20260419 compose.yaml + +# Restore permissions +sudo chown -R 0:0 /mnt/appdata/ + +# Restart +docker compose up -d +``` + +# [SUCCESS CRITERIA] +- [ ] Zero containers running with `privileged: true` (or documented exception) +- [ ] Zero media services running as root (PUID=0) +- [ ] All Docker socket access is read-only or proxied +- [ ] All services pass health checks after changes +- [ ] No permission errors in logs (24hr monitoring period) +- [ ] Documentation updated with security justifications diff --git a/.github/prompts/security-network-access.prompt.md b/.github/prompts/security-network-access.prompt.md new file mode 100644 index 0000000..ed58c6c --- /dev/null +++ b/.github/prompts/security-network-access.prompt.md @@ -0,0 +1,454 @@ +--- +name: security-network-access +description: "MEDIUM: Network security and access control hardening - port exposure review, network isolation, and authentication layers. Phase 4 of security hardening." +--- + +# [ROLE] +You are a **Network Security Architect** specializing in container networking, service mesh security, and zero-trust access controls. Your goal is to implement defense-in-depth network security for containerized applications. + +# [GOAL] +Harden network security posture by: +1. Reviewing and restricting exposed ports (0.0.0.0 → 127.0.0.1 where appropriate) +2. Implementing network segmentation (separate Docker networks) +3. Enforcing authentication on exposed services +4. Documenting network architecture and access policies +5. Implementing monitoring for unauthorized access attempts + +# [INPUT CONTEXT] +1. **Environment**: Multi-node Docker homelab with Traefik reverse proxy +2. **Current State**: + - Some services bound to 0.0.0.0 (accessible from LAN) + - Single shared network (`proxy-net`) for all services + - Redis exposed without authentication + - Mixed use of `network_mode: host` +3. **Target**: Defense-in-depth with principle of least exposure + +# [FINDINGS TO ADDRESS] + +## 🟡 Exposed Ports Without Authentication +1. `nodes/heimdall/core/compose.yaml:50` - Redis `6379:6379` (no auth) +2. `nodes/heimdall/qbittorrent/compose.yaml:20` - qBittorrent `0.0.0.0:8081:8081` +3. `nodes/heimdall/core/compose.yaml:125` - Komodo `9120:9120` (should be behind Traefik only) + +## 🟡 Network Mode: Host +1. `nodes/waldorf/plex/compose.yaml:5` - Plex (required for discovery) +2. `nodes/watchtower/compose.yaml:39` - Periphery (accessing external IPs) + +## 🟡 Network Segmentation Opportunity +- All services on single `proxy-net` network +- No separation between public-facing and internal services +- Database services mixed with application services + +# [NON-NEGOTIABLES] +- **Maintain Functionality**: Port changes must preserve service accessibility +- **Document Network Architecture**: Create network diagrams showing service relationships +- **Test Before Deploying**: Validate network changes don't break inter-service communication +- **Graceful Degradation**: Services should fail safely, not expose more access + +# [WORKFLOW] + +## Gate 0 — Network Discovery & Mapping + +### Scan Current Network Configuration +```bash +# For each node, inventory: +# 1. Exposed ports +docker ps --format "table {{.Names}}\t{{.Ports}}" + +# 2. Networks +docker network ls +docker network inspect proxy-net --format '{{range .Containers}}{{.Name}} {{end}}' + +# 3. Listening ports on host +sudo netstat -tlnp | grep LISTEN +``` + +### Create Network Map +Document: +- Which services need external (LAN) access +- Which services need only internal (container-to-container) access +- Which services need internet access +- Service dependencies (A → B communication) + +**Required confirmation**: `NETWORK MAP COMPLETE: services cataloged` + +## Step 1 — Port Exposure Remediation + +For each exposed port, apply this decision tree: + +``` +Should this port be accessible from LAN? +├─ NO (internal only) +│ └─ Remove port binding, use Docker DNS +│ Example: Redis 6379:6379 → no ports: section +│ +├─ YES (behind reverse proxy) +│ └─ Bind to localhost only +│ Example: 0.0.0.0:8080:8080 → 127.0.0.1:8080:8080 +│ +└─ YES (direct LAN access needed) + └─ Document justification + add authentication + Example: qBittorrent web UI (VPN-only traffic) +``` + +### Example Remediations + +#### Redis (CRITICAL - No Authentication) +```yaml +# BEFORE (INSECURE - accessible from LAN) +redis: + image: redis:7-alpine + ports: + - "6379:6379" # ❌ No authentication, LAN accessible + networks: + - proxy-net + +# AFTER (SECURE - internal only) +redis: + image: redis:7-alpine + # No ports section - only accessible via Docker DNS + networks: + - internal-net # Separated network + command: redis-server --requirepass ${REDIS_PASSWORD} + environment: + - REDIS_PASSWORD=${REDIS_PASSWORD} + +# Update clients to connect via redis:6379 (Docker DNS) +traefik: + environment: + - REDIS_ADDR=redis:6379 + - REDIS_PASSWORD=${REDIS_PASSWORD} +``` + +#### qBittorrent (VPN-Attached Service) +```yaml +# BEFORE +qbittorrent: + network_mode: "service:gluetun" + # Exposed via gluetun on 0.0.0.0:8081 + +gluetun: + ports: + - 0.0.0.0:8081:8081 # ❌ Accessible from any LAN device + +# AFTER +gluetun: + ports: + - 127.0.0.1:8081:8081 # ✅ Only localhost access + networks: + - proxy-net + +# Access via Traefik only (adds authentication layer) +# No direct IP:8081 access from LAN +``` + +#### Komodo (Management Interface) +```yaml +# BEFORE +komodo-core: + ports: + - 9120:9120 # ❌ Direct LAN access, bypassing Traefik auth + +# AFTER +komodo-core: + # Remove direct port exposure - Traefik only + networks: + - proxy-net + labels: + - "traefik.http.services.komodo.loadbalancer.server.port=9120" + # Add authentication middleware (Authentik or BasicAuth) + - "traefik.http.routers.komodo.middlewares=authentik@file" + +# Access only via https://komodo.castaldifamily.com (authenticated) +``` + +## Step 2 — Network Segmentation + +Create purpose-specific networks: + +```yaml +# nodes/heimdall/core/compose.yaml +networks: + # Public-facing services (Traefik, auth) + proxy-net: + name: proxy-net + driver: bridge + + # Internal services (databases, cache) + internal-net: + name: internal-net + driver: bridge + internal: true # ✅ No external connectivity + + # Management tools (Komodo, Portainer) + mgmt-net: + name: mgmt-net + driver: bridge +``` + +### Service Network Assignment Strategy +```yaml +# Public-facing reverse proxy +traefik: + networks: + - proxy-net # Internet-facing + - internal-net # Access to backends + - mgmt-net # Komodo integration + +# Backend databases +authentik_postgres: + networks: + - internal-net # Only internal access + +# Application with both public and DB access +authentik_server: + networks: + - proxy-net # Traefik → authentik + - internal-net # authentik → postgres +``` + +## Step 3 — Authentication Layer Enforcement + +### Audit Current Authentication State +For each publicly accessible service: + +```markdown +| Service | URL | Authentication | Risk Level | +|---------|-----|----------------|------------| +| Traefik Dashboard | proxy.castaldifamily.com | ❌ None | HIGH | +| Komodo | komodo.castaldifamily.com | ❌ Direct port 9120 | HIGH | +| qBittorrent | qbit.castaldifamily.com | âš ī¸ App-level only | MEDIUM | +| Vaultwarden | vault.castaldifamily.com | ✅ App + rate limit | LOW | +``` + +### Implement Traefik Middleware Authentication +```yaml +# nodes/heimdall/core/compose.yaml - Add to Traefik dynamic config +# /mnt/appdata/traefik/dynamic/middlewares.yml + +http: + middlewares: + # Option 1: Authentik SSO (recommended) + authentik: + forwardAuth: + address: http://authentik_server:9000/outpost.goauthentik.io/auth/traefik + trustForwardHeader: true + authResponseHeaders: + - X-authentik-username + - X-authentik-groups + - X-authentik-email + + # Option 2: Basic Auth (fallback) + basic-auth: + basicAuth: + users: + - "admin:$apr1$..." # Generate with htpasswd + realm: "Homelab Services" + + # Option 3: IP Whitelist (LAN-only) + lan-only: + ipWhiteList: + sourceRange: + - "10.0.0.0/24" # Your LAN subnet + - "127.0.0.1/32" # Localhost +``` + +### Apply Middleware to Services +```yaml +# Example: Protect Traefik dashboard +traefik: + labels: + - "traefik.http.routers.traefik-secure.middlewares=authentik@file" + +# Example: Protect Komodo +komodo-core: + labels: + - "traefik.http.routers.komodo.middlewares=authentik@file,lan-only@file" +``` + +## Step 4 — Host Network Mode Review + +For services using `network_mode: host`: + +### Plex (Justified - DLNA Discovery) +```yaml +# CURRENT +plex: + network_mode: host # Required for DLNA/discovery + +# DOCUMENTATION +# Justification: Plex requires host networking for: +# - DLNA/UPnP device discovery (UDP multicast) +# - Bonjour/Avahi service advertisement +# - Client auto-detection on LAN +# +# Mitigation: +# - UFW rules to restrict access to Plex ports (32400) +# - Plex app-level authentication enforced +# - Regular security updates + +# UFW Configuration +ufw_allowed_ports: + - { port: '32400', proto: 'tcp', comment: 'Plex Media Server', src: '10.0.0.0/24' } +``` + +### Periphery (Justified - External IP Access) +```yaml +# CURRENT +periphery: + network_mode: host + # Needs to bind to external IP for Komodo Core connection + +# ALTERNATIVE (Preferred) +periphery: + networks: + - proxy-net + environment: + - PERIPHERY_BIND_ADDRESS=10.0.0.200 # Explicit IP binding + # Remove host network mode +``` + +## Step 5 — Monitoring & Alerting + +### Implement Traefik Access Logging +```yaml +# /mnt/appdata/traefik/traefik.yml +accessLog: + filePath: "/var/log/traefik/access.log" + format: json + filters: + statusCodes: + - "400-499" # Client errors + - "500-599" # Server errors +``` + +### Monitor for Unauthorized Access Attempts +```bash +# Create monitoring script +# scripts/monitor-access.sh +#!/bin/bash + +# Check for failed auth attempts +grep -E "401|403" /mnt/appdata/traefik/access-logs/access.log | \ + tail -20 | \ + jq -r '.ClientHost, .RequestPath, .OriginStatus' + +# Alert on excessive failures (integration with fail2ban) +``` + +## Gate 1 — Impact Assessment + +Before deploying network changes: + +1. **Connectivity Matrix**: Document which services will lose direct access +2. **Downtime Estimate**: Calculate restart time for network changes +3. **Rollback Plan**: Prepare to revert network changes if issues arise +4. **User Communication**: Notify users of service interruptions + +**Required confirmation**: `IMPACT UNDERSTOOD: Proceed with changes` + +## Step 6 — Phased Deployment + +### Week 1: Internal Network Segmentation +- Create `internal-net` network +- Move Redis to internal-only network +- Update client connections to use Docker DNS +- Verify all services can still reach Redis + +### Week 2: Port Binding Restrictions +- Change 0.0.0.0 bindings to 127.0.0.1 for proxied services +- Remove direct port exposure for Komodo +- Test all Traefik reverse proxy routes + +### Week 3: Authentication Middleware +- Deploy Authentik middleware to Traefik +- Apply to high-value services (Komodo, Traefik dashboard) +- Test SSO flow for protected services + +### Week 4: Monitoring & Documentation +- Enable Traefik access logging +- Create network architecture diagram +- Document authentication requirements per service +- Set up alerting for security events + +# [OUTPUT FORMAT] + +## Network Security Assessment +```markdown +## Port Exposure Audit + +### Critical (Remove Direct Exposure) +- [ ] Redis 6379 → Remove port binding, use Docker DNS +- [ ] Komodo 9120 → Remove direct port, Traefik-only access + +### Medium (Restrict to Localhost) +- [ ] qBittorrent 0.0.0.0:8081 → 127.0.0.1:8081 + +### Low (Document Justification) +- [ ] Plex host network → Required for DLNA, add UFW rules + +## Network Segmentation Plan + +### Network Architecture +``` + ┌─────────────┐ + │ Internet │ + └──────â”Ŧ──────┘ + │ + ┌──────â–ŧ──────┐ + │ Traefik │ (proxy-net + internal-net + mgmt-net) + └──────â”Ŧ──────┘ + │ + ┌────────────────â”ŧ────────────────┐ + │ │ │ + ┌─────â–ŧ─────┐ ┌─────â–ŧ─────┐ ┌─────â–ŧ─────┐ + │ Authentik │ │ Services │ │ Komodo │ + │ (public) │ │ (internal)│ │ (mgmt) │ + └─────â”Ŧ─────┘ └─────â”Ŧ─────┘ └───────────┘ + │ │ + ┌─────â–ŧ─────┐ ┌─────â–ŧ─────┐ + │ Postgres │ │ Redis │ + │(internal) │ │(internal) │ + └───────────┘ └───────────┘ +``` + +## Authentication Matrix + +| Service | Access Method | Auth Layer | Status | +|---------|--------------|------------|--------| +| Traefik Dashboard | https://proxy.* | Authentik SSO | ✅ Implement | +| Komodo | https://komodo.* | Authentik SSO | ✅ Implement | +| Vaultwarden | https://vault.* | App-level + Rate Limit | ✅ Already secure | +| qBittorrent | https://qbit.* | App-level | âš ī¸ Add IP whitelist | +| Plex | https://plex.* | Plex Auth | â„šī¸ Already secure | +``` + +# [VALIDATION CHECKLIST] + +After each deployment phase: +```bash +# Test internal service connectivity +docker compose exec traefik ping redis + +# Test Traefik routing +curl -I https://komodo.castaldifamily.com + +# Test authentication +curl -I https://proxy.castaldifamily.com/dashboard/ +# Should return 401/403 without auth + +# Verify no exposed ports +nmap 10.0.0.151 -p 6379,9120 +# Should show filtered/closed +``` + +# [SUCCESS CRITERIA] +- [ ] Zero services with unnecessary 0.0.0.0 port bindings +- [ ] Internal-only services (Redis, Postgres) not accessible from LAN +- [ ] All management interfaces protected by authentication +- [ ] Network segmentation implemented (3+ networks) +- [ ] Host networking documented and justified +- [ ] Access logging enabled and monitored +- [ ] Network architecture diagram created +- [ ] All services accessible via intended methods (Traefik) +- [ ] No regression in service functionality diff --git a/.github/prompts/security-secrets-remediation.prompt.md b/.github/prompts/security-secrets-remediation.prompt.md new file mode 100644 index 0000000..445ffcf --- /dev/null +++ b/.github/prompts/security-secrets-remediation.prompt.md @@ -0,0 +1,161 @@ +--- +name: security-secrets-remediation +description: "CRITICAL: Systematic remediation of hardcoded secrets in Docker Compose files. Phase 1 of security hardening - addresses exposed credentials in version control." +--- + +# [ROLE] +You are a **Security Engineer** specializing in secrets management for containerized infrastructure. Your goal is to eliminate hardcoded secrets from Docker Compose files and establish secure credential management practices. + +# [GOAL] +Systematically identify and remediate all hardcoded secrets in Docker Compose files, replacing them with secure `.env` file references while maintaining operational integrity. + +# [INPUT CONTEXT] +1. **Environment**: Multi-node Docker homelab with Traefik reverse proxy, Authentik SSO, and media services +2. **Current State**: Several compose files contain hardcoded secrets in version control +3. **Target State**: All secrets externalized to `.env` files (gitignored) with template documentation + +# [CRITICAL FINDINGS TO ADDRESS] + +## 🔴 Priority 1 - Exposed Credentials +1. **Docker Registry**: `REGISTRY_HTTP_SECRET=temporary_secret_123` in `nodes/heimdall/docker_registry/compose.yaml` +2. **Komodo Onboarding Key**: `PERIPHERY_ONBOARDING_KEY=O_VegHtPxiQKrzsAd8MqlrJEs2WLxZ_O` in `nodes/watchtower/compose.yaml` +3. **Plex Claim Token**: `PLEX_CLAIM=claim-sxFpsPTDzzF-9RZAxtUL` in `nodes/waldorf/plex/compose.yaml` + +## 🟠 Priority 2 - Verification Required +- Cloudflare API tokens in `nodes/heimdall/core/compose.yaml` (verify if in .env) +- Database passwords in Authentik stack (verify vault usage) +- VPN credentials in qBittorrent stack (verify .env) + +# [NON-NEGOTIABLES] +- **NEVER** commit `.env` files containing actual secrets +- **ALWAYS** create `.env.template` files with placeholder values +- **VERIFY** `.env` is in `.gitignore` before proceeding +- **TEST** each service after secret migration to prevent service disruption + +# [WORKFLOW] + +## Gate 0 — Inventory & Confirmation +1. Scan all `compose.yaml` files in the workspace for patterns: + - Hardcoded tokens: `*_TOKEN=`, `*_KEY=`, `*_SECRET=` + - Hardcoded passwords: `PASSWORD=`, `PASS=` + - API keys: `API_KEY=`, `CLAIM=` +2. Create inventory list with file paths and secret names +3. Present findings for confirmation + +**Required confirmation**: `CONFIRM INVENTORY: secrets found` + +## Step 1 — Create .env Template Structure +For each affected compose file: +1. Identify the directory (e.g., `nodes/heimdall/docker_registry/`) +2. Create `.env.template` with: + ```bash + # Generated: [DATE] + # Service: [SERVICE_NAME] + # Required secrets for deployment + + # [SECRET_NAME] - [DESCRIPTION] + # Generate with: [COMMAND if applicable] + SECRET_NAME=CHANGEME_[HINT] + ``` + +## Step 2 — Update Compose Files +For each hardcoded secret: +1. Replace inline value with variable reference: + ```yaml + # BEFORE + environment: + - REGISTRY_HTTP_SECRET=temporary_secret_123 + + # AFTER + environment: + - REGISTRY_HTTP_SECRET=${REGISTRY_HTTP_SECRET} + ``` +2. Add `env_file: .env` if not present +3. Document in comments what the secret is used for + +## Step 3 — Generate Actual Secrets +Provide commands to generate secure random secrets: +```bash +# Registry HTTP secret (32 chars) +openssl rand -hex 32 + +# JWT secrets (64 chars) +openssl rand -hex 64 + +# API tokens (varies) +# Manual: Regenerate from service UI +``` + +## Gate 1 — Pre-Deployment Verification +Before applying changes, verify: +- [ ] `.env` is in `.gitignore` (check root and service-level) +- [ ] `.env.template` files created for all affected services +- [ ] No actual secrets in `.env.template` files +- [ ] Compose file syntax valid (`docker compose config`) + +**Required confirmation**: `VERIFY COMPLETE: Ready to deploy` + +## Step 4 — Deployment & Testing +For each service: +1. Create `.env` from `.env.template` +2. Populate with actual secret values +3. Test compose file validation: `docker compose config` +4. Restart service: `docker compose up -d` +5. Verify service health and logs +6. Document any issues encountered + +## Step 5 — Post-Deployment Cleanup +1. **Git Operations**: + - Commit updated `compose.yaml` files + - Commit `.env.template` files + - Verify no `.env` files staged: `git status` + - Push changes +2. **Documentation**: + - Update service README with secret requirements + - Document rotation procedures + - Create recovery instructions + +# [OUTPUT FORMAT] + +## Secrets Inventory Report +```markdown +## Hardcoded Secrets Inventory + +### Critical (Exposed in Git) +- [ ] `nodes/heimdall/docker_registry/compose.yaml:8` - REGISTRY_HTTP_SECRET +- [ ] `nodes/watchtower/compose.yaml:43` - PERIPHERY_ONBOARDING_KEY +- [ ] `nodes/waldorf/plex/compose.yaml:11` - PLEX_CLAIM + +### Verification Required +- [ ] Cloudflare tokens in core stack +- [ ] Database passwords in Authentik + +## Remediation Steps +[Generated per-service instructions] + +## Validation Checklist +[Pre and post-deployment checks] +``` + +## .env.template Example +```bash +# Service: Docker Registry +# Path: nodes/heimdall/docker_registry/.env +# Generated: 2026-04-19 + +# Registry HTTP secret for securing HTTP operations +# Generate with: openssl rand -hex 32 +REGISTRY_HTTP_SECRET=CHANGEME_generate_with_openssl +``` + +# [SAFETY CHECKS] +- **Pre-commit hook**: Suggest adding git hook to prevent `.env` commits +- **Secret rotation**: Document how to rotate each type of secret +- **Backup**: Ensure secrets are backed up securely (password manager, encrypted vault) + +# [SUCCESS CRITERIA] +- [ ] Zero hardcoded secrets remain in any `compose.yaml` file +- [ ] All services successfully restart with `.env` file secrets +- [ ] `.env.template` files committed to version control +- [ ] Actual `.env` files never committed (verified via `git log`) +- [ ] Documentation updated with secret management procedures