Created Files

security-secrets-remediation.prompt.md - Phase 1 (CRITICAL)

Eliminates hardcoded secrets (Docker Registry, Komodo, Plex)
Creates .env templates and migration workflow
Priority: Immediate (This Week)
security-container-hardening.prompt.md - Phase 2 (HIGH)

Removes privileged containers
Converts root users to non-root (PUID/PGID)
Secures Docker socket access patterns
Priority: Short Term (This Month)
security-ansible-hardening.prompt.md - Phase 3 (MEDIUM)

Enables SSH host key checking
Implements restricted sudo rules
Deploys UFW firewalls and fail2ban
Priority: Medium Term (Next Month)
security-network-access.prompt.md - Phase 4 (MEDIUM)

Restricts port exposure (0.0.0.0 → 127.0.0.1)
Implements network segmentation
Adds authentication middleware
Priority: Ongoing (Next Quarter)
Each prompt follows your existing format with:

 Gated workflows with confirmation checkpoints
 Rollback procedures for safety
 Testing and validation steps
 Incremental deployment strategies
 Clear success criteria
This commit is contained in:
nathan 2026-04-19 18:25:46 -04:00
parent 417501dbd1
commit 129b7eee1b
4 changed files with 1334 additions and 0 deletions

View File

@ -0,0 +1,406 @@
---
name: security-ansible-hardening
description: "MEDIUM: Ansible security hardening - SSH configuration, sudo security, and host-level security controls. Phase 3 of security hardening."
---
# [ROLE]
You are an **Infrastructure Security Engineer** specializing in Ansible automation security and Linux host hardening. Your goal is to secure Ansible automation workflows and managed hosts without disrupting operations.
# [GOAL]
Harden Ansible security posture by:
1. Implementing secure SSH configuration (host key checking)
2. Configuring least-privilege sudo access
3. Enabling host-level firewalls (UFW)
4. Securing Ansible Vault password files
5. Implementing fail2ban for brute-force protection
# [INPUT CONTEXT]
1. **Environment**: Multi-node homelab managed via Ansible
2. **Current State**:
- SSH host key checking disabled
- Passwordless sudo without restrictions
- No host firewalls (UFW disabled)
- Vault password file permissions not verified
3. **Managed Nodes**: Proxmox (root), Docker nodes (chester user), Raspberry Pi (chester user)
# [FINDINGS TO ADDRESS]
## 🟠 Ansible Configuration Security
1. `ansible/ansible.cfg:34` - `host_key_checking = False`
2. `ansible/ansible.cfg:35` - `StrictHostKeyChecking=no`
3. `ansible/ansible.cfg:30` - `become_ask_pass = False`
4. `ansible/ansible.cfg:11` - Vault password file permissions not enforced
## 🟡 Host Security Controls
1. `ansible/group_vars/all.yml:29` - UFW disabled
2. `ansible/group_vars/all.yml:30` - fail2ban disabled
3. No SSH key rotation policy
4. No sudo command restrictions
# [NON-NEGOTIABLES]
- **Gradual Rollout**: Enable security controls one node at a time
- **Maintain Access**: Never lock yourself out during SSH hardening
- **Test Playbooks**: Validate all changes with `--check` mode first
- **Document Exceptions**: Some settings (like Proxmox root access) may have valid reasons
# [WORKFLOW]
## Gate 0 — Current State Assessment
Run these validation commands:
```bash
# Check vault password file permissions
ls -la ansible/vault/.vault_pass
# Check SSH key distribution
ansible all -m shell -a "ls -la ~/.ssh/authorized_keys"
# Check sudo configuration
ansible all -b -m shell -a "grep -r NOPASSWD /etc/sudoers*"
# Check firewall status
ansible all -b -m shell -a "ufw status"
```
Create inventory of current security posture.
**Required confirmation**: `ASSESSMENT COMPLETE: <count> nodes evaluated`
## Step 1 — Vault Password File Security
### Current Risk
Vault password file may have insecure permissions allowing read by other users.
### Remediation
```yaml
# Add to ansible/playbooks/secure-vault-file.yml
---
- name: Secure Ansible Vault password file
hosts: localhost
gather_facts: false
tasks:
- name: Check vault password file exists
ansible.builtin.stat:
path: "{{ playbook_dir }}/../vault/.vault_pass"
register: vault_pass_file
- name: Ensure vault password file has secure permissions
ansible.builtin.file:
path: "{{ playbook_dir }}/../vault/.vault_pass"
mode: '0600'
owner: "{{ ansible_user_id }}"
when: vault_pass_file.stat.exists
- name: Verify vault directory permissions
ansible.builtin.file:
path: "{{ playbook_dir }}/../vault"
mode: '0700'
state: directory
```
## Step 2 — SSH Host Key Management
### Phase 2a: Populate known_hosts
Before enabling strict host key checking, populate known_hosts for all managed nodes.
```yaml
# ansible/playbooks/populate-known-hosts.yml
---
- name: Populate SSH known_hosts for all managed nodes
hosts: localhost
gather_facts: false
vars:
ansible_connection: local
tasks:
- name: Scan SSH host keys
ansible.builtin.shell: |
ssh-keyscan -H {{ item }} >> ~/.ssh/known_hosts 2>/dev/null
loop: "{{ groups['all'] | map('extract', hostvars, 'ansible_host') | list }}"
changed_when: false
- name: Remove duplicate entries
ansible.builtin.shell: |
sort -u ~/.ssh/known_hosts > ~/.ssh/known_hosts.tmp
mv ~/.ssh/known_hosts.tmp ~/.ssh/known_hosts
chmod 600 ~/.ssh/known_hosts
changed_when: false
```
### Phase 2b: Enable Host Key Checking
After known_hosts is populated, update ansible.cfg:
```ini
# ansible/ansible.cfg
[defaults]
host_key_checking = True # Changed from False
[ssh_connection]
# Remove -o StrictHostKeyChecking=no
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o UserKnownHostsFile=~/.ssh/known_hosts
```
### Phase 2c: Verification
```bash
# Test connection to all hosts
ansible all -m ping
# Should succeed without warnings
```
## Step 3 — Sudo Security Configuration
### Current Risk
`become_ask_pass = False` assumes all nodes have unrestricted NOPASSWD sudo.
### Recommended Approach
Create restricted sudoers files for automation:
```yaml
# ansible/playbooks/configure-sudo-security.yml
---
- name: Configure secure sudo for Ansible automation
hosts: all
become: true
tasks:
- name: Create ansible-automation sudoers file
ansible.builtin.copy:
dest: /etc/sudoers.d/50-ansible-automation
content: |
# Ansible automation - restricted sudo commands
# User: {{ ansible_user }}
# Package management
{{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/bin/apt, /usr/bin/apt-get, /usr/bin/dpkg
# Service management
{{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/bin/systemctl
# Docker operations
{{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/bin/docker
# File operations in managed paths only
{{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/bin/mkdir -p /mnt/appdata/*
{{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/bin/chown -R * /mnt/appdata/*
# UFW firewall
{{ ansible_user }} ALL=(ALL) NOPASSWD: /usr/sbin/ufw
mode: '0440'
validate: 'visudo -cf %s'
- name: Remove unrestricted sudo access
ansible.builtin.lineinfile:
path: /etc/sudoers.d/90-cloud-init-users
regexp: '^{{ ansible_user }}\s+ALL=\(ALL\)\s+NOPASSWD:\s+ALL$'
state: absent
when: ansible_distribution == "Ubuntu"
```
### Alternative: Keep Unrestricted but Add Logging
If restricted sudo is too limiting:
```yaml
# Enable sudo logging
- name: Enable sudo command logging
ansible.builtin.lineinfile:
path: /etc/sudoers
line: 'Defaults log_output'
validate: 'visudo -cf %s'
```
## Step 4 — Host Firewall Configuration
### Phase 4a: Create UFW Role
```yaml
# ansible/roles/ufw_baseline/tasks/main.yml
---
- name: Install UFW
ansible.builtin.apt:
name: ufw
state: present
update_cache: yes
- name: Set UFW default policies
community.general.ufw:
direction: "{{ item.direction }}"
policy: "{{ item.policy }}"
loop:
- { direction: 'incoming', policy: 'deny' }
- { direction: 'outgoing', policy: 'allow' }
- { direction: 'routed', policy: 'allow' }
- name: Allow SSH (prevent lockout)
community.general.ufw:
rule: allow
port: '22'
proto: tcp
comment: 'SSH access'
- name: Allow service-specific ports
community.general.ufw:
rule: allow
port: "{{ item.port }}"
proto: "{{ item.proto }}"
comment: "{{ item.comment }}"
loop: "{{ ufw_allowed_ports | default([]) }}"
- name: Enable UFW
community.general.ufw:
state: enabled
when: ufw_enable_firewall | default(false)
```
### Phase 4b: Define Per-Node Firewall Rules
```yaml
# ansible/inventory/host_vars/heimdall.yml
ufw_allowed_ports:
- { port: '80', proto: 'tcp', comment: 'HTTP - Traefik' }
- { port: '443', proto: 'tcp', comment: 'HTTPS - Traefik' }
- { port: '9120', proto: 'tcp', comment: 'Komodo Core' }
- { port: '2377', proto: 'tcp', comment: 'Docker Swarm (if used)' }
ufw_enable_firewall: true
```
### Phase 4c: Gradual Rollout
Test on one node first:
```bash
# Test on watchtower (non-critical node)
ansible watchtower -m include_role -a name=ufw_baseline --check
# Apply if check succeeds
ansible watchtower -m include_role -a name=ufw_baseline
# Verify SSH still works
ansible watchtower -m ping
# Roll out to other nodes
ansible docker_nodes -m include_role -a name=ufw_baseline
```
## Step 5 — Fail2ban Configuration
### Basic Fail2ban Role
```yaml
# ansible/roles/fail2ban/tasks/main.yml
---
- name: Install fail2ban
ansible.builtin.apt:
name: fail2ban
state: present
- name: Configure fail2ban for SSH
ansible.builtin.copy:
dest: /etc/fail2ban/jail.local
content: |
[DEFAULT]
bantime = 1h
findtime = 10m
maxretry = 5
[sshd]
enabled = true
port = ssh
logpath = /var/log/auth.log
mode: '0644'
notify: Restart fail2ban
- name: Ensure fail2ban is running
ansible.builtin.systemd:
name: fail2ban
state: started
enabled: yes
```
## Gate 1 — Pre-Deployment Testing
Run all playbooks in check mode:
```bash
ansible-playbook ansible/playbooks/secure-vault-file.yml --check
ansible-playbook ansible/playbooks/populate-known-hosts.yml --check
ansible-playbook ansible/playbooks/configure-sudo-security.yml --check
ansible all -m include_role -a name=ufw_baseline --check
ansible all -m include_role -a name=fail2ban --check
```
**Required confirmation**: `CHECKS PASSED: Ready for deployment`
## Step 6 — Phased Deployment
Deploy in this order:
1. **Local security** (vault file, known_hosts)
2. **Test node** (watchtower) - full hardening
3. **Docker nodes** (heimdall, waldorf) - after validating watchtower
4. **Proxmox** (pve01) - last, as it's most critical
# [OUTPUT FORMAT]
## Security Hardening Plan
```markdown
## Phase 1: Ansible Controller Security
- [ ] Secure vault password file (chmod 600)
- [ ] Populate SSH known_hosts
- [ ] Enable host key checking in ansible.cfg
- [ ] Test: `ansible all -m ping`
## Phase 2: Sudo Hardening
- [ ] Create restricted sudoers on watchtower (test node)
- [ ] Validate Ansible operations still work
- [ ] Roll out to remaining nodes
- [ ] Document sudo command allowlist
## Phase 3: Host Firewalls
- [ ] Deploy UFW role to watchtower
- [ ] Verify SSH access maintained
- [ ] Verify Docker services accessible
- [ ] Roll out to docker_nodes group
- [ ] Configure Proxmox firewall separately (PVE-specific)
## Phase 4: Intrusion Detection
- [ ] Deploy fail2ban to all nodes
- [ ] Configure SSH jail
- [ ] Test ban/unban procedures
- [ ] Set up alerting (optional)
```
## Rollback Procedures
```markdown
### If locked out after UFW enable:
1. Access via Proxmox console (for VMs/LXC)
2. Run: `sudo ufw disable`
3. Fix rule, re-enable
### If sudo restrictions break Ansible:
1. SSH to node manually
2. `sudo visudo -f /etc/sudoers.d/50-ansible-automation`
3. Add required commands or remove file
```
# [VALIDATION CHECKLIST]
After each phase:
```bash
# Connectivity test
ansible all -m ping
# Privilege escalation test
ansible all -b -m shell -a "whoami"
# Service verification
ansible docker_nodes -b -m shell -a "docker ps"
# Firewall status
ansible all -b -m shell -a "ufw status numbered"
```
# [SUCCESS CRITERIA]
- [ ] SSH host key checking enabled without connection failures
- [ ] Sudo access restricted and logged
- [ ] UFW enabled on all Docker nodes with service-specific rules
- [ ] Fail2ban active and monitoring SSH
- [ ] Vault password file secured (600 permissions)
- [ ] All Ansible playbooks execute successfully
- [ ] No SSH lockouts occurred
- [ ] Documentation updated with security procedures

View File

@ -0,0 +1,313 @@
---
name: security-container-hardening
description: "HIGH: Container security hardening - eliminate privileged containers, reduce root user execution, and secure Docker socket access. Phase 2 of security hardening."
---
# [ROLE]
You are a **Container Security Specialist** with expertise in Docker security best practices, CIS Benchmarks, and least-privilege principles. Your goal is to harden container security posture without breaking functionality.
# [GOAL]
Systematically reduce attack surface by:
1. Eliminating or justifying `privileged: true` containers
2. Converting root-running containers to non-root users
3. Securing Docker socket access patterns
4. Implementing capability-based security where needed
# [INPUT CONTEXT]
1. **Environment**: Multi-node homelab with management tools (Komodo, Traefik), media services, and SSO
2. **Current Issues**:
- Multiple containers running with `privileged: true`
- Services running as PUID=0 (root)
- Docker socket mounted in multiple containers
3. **Constraint**: Must maintain functionality - some tools legitimately need elevated access
# [CRITICAL FINDINGS TO ADDRESS]
## 🔴 Privileged Containers (Attack Surface: Critical)
1. `nodes/watchtower/compose.yaml:11` - docker-socket-proxy (privileged: true)
2. `nodes/heimdall/core/compose.yaml:12` - docker-socket-proxy (privileged: true)
## 🟠 Root User Execution (Attack Surface: High)
1. `nodes/heimdall/radarr/compose.yaml:20-21` - PUID=0, PGID=0
2. `nodes/heimdall/qbittorrent/compose.yaml:43-44` - PUID=0, PGID=0
3. `nodes/heimdall/authentik/compose.yaml:114` - user: root (worker container)
## 🟡 Docker Socket Exposure (Attack Surface: Medium)
1. `nodes/heimdall/authentik/compose.yaml:116` - /var/run/docker.sock (read-write)
2. `nodes/heimdall/core/compose.yaml:14` - /var/run/docker.sock:ro (read-only, acceptable)
3. `nodes/watchtower/compose.yaml:19` - /var/run/docker.sock:ro (read-only, acceptable)
# [NON-NEGOTIABLES]
- **Document Before Changing**: Every privileged container must have a documented justification or removal plan
- **Test After Changing**: Every user change must be validated with service restart
- **Capability-Based Security**: Use `cap_add` instead of `privileged: true` where possible
- **Defense in Depth**: Even when privileged access is required, add additional security layers
# [WORKFLOW]
## Gate 0 — Security Baseline Assessment
1. Scan all compose files for security anti-patterns:
- `privileged: true`
- `user: root` or `user: "0"`
- `PUID=0` or `PGID=0`
- `/var/run/docker.sock` mounts
- `network_mode: host`
- `cap_add: SYS_ADMIN` or `NET_ADMIN`
2. Classify each finding:
- **REMOVABLE**: Can be fixed without breaking functionality
- **JUSTIFIABLE**: Required for legitimate purpose (document why)
- **INVESTIGATE**: Unclear if needed, requires testing
**Required confirmation**: `BASELINE: <count> findings across <count> services`
## Step 1 — Privileged Container Analysis
For each container with `privileged: true`:
### Investigation Checklist
```yaml
Service: docker-socket-proxy
Purpose: Secure proxy for Docker API access
Privileged Justification:
- Requires: Access to Docker socket with group permissions
- Alternative: Run as docker group (GID 988) without privileged
- Decision: TEST removal of privileged flag
```
### Remediation Pattern
```yaml
# CURRENT (INSECURE)
docker-socket-proxy:
privileged: true
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
# PROPOSED (SECURE)
docker-socket-proxy:
user: "65534:988" # nobody:docker
group_add:
- "988" # Docker group from host
security_opt:
- no-new-privileges:true
- apparmor=docker-default
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
```
## Step 2 — Root User Conversion
For each container running as root (PUID=0):
### Impact Analysis
```markdown
Service: radarr
Current User: PUID=0, PGID=0 (root)
Volumes Affected:
- /mnt/appdata/radarr/data:/config
- /mnt/media/movies:/movies
Ownership Requirements:
- Config files: Read/Write
- Media files: Read/Write
Proposed User: PUID=1000, PGID=1000 (chester)
```
### Migration Steps
1. **Check current ownership**:
```bash
ls -la /mnt/appdata/radarr/data
```
2. **Stop container**:
```bash
docker compose down radarr
```
3. **Fix permissions** (if needed):
```bash
sudo chown -R 1000:1000 /mnt/appdata/radarr/data
```
4. **Update compose file**:
```yaml
environment:
- PUID=1000 # Changed from 0
- PGID=1000 # Changed from 0
```
5. **Restart and verify**:
```bash
docker compose up -d radarr
docker compose logs radarr | grep -i "permission\|error"
```
## Step 3 — Docker Socket Security Review
For each socket mount, apply this decision tree:
```
Does container need Docker API access?
├─ NO → Remove socket mount entirely
└─ YES → Is it read-only?
├─ YES → Keep with :ro flag, add socket proxy if not present
└─ NO → Requires write access?
├─ Management tool (Komodo, Portainer) → Use socket proxy with limited permissions
└─ Other → INVESTIGATE: Why does it need write access?
```
### Socket Proxy Pattern (Best Practice)
```yaml
# Never mount socket directly in application containers
# Use tecnativa/docker-socket-proxy as intermediary
docker-socket-proxy:
image: tecnativa/docker-socket-proxy:latest
environment:
# Read permissions (safe for Traefik)
- CONTAINERS=1
- NETWORKS=1
- SERVICES=1
# Write permissions (limit to management tools only)
- POST=0 # Disable by default
- DELETE=0 # Disable by default
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
traefik:
environment:
- DOCKER_HOST=tcp://docker-socket-proxy:2375 # No direct socket access
```
## Gate 1 — Testing Plan Approval
Before making changes, present:
1. List of containers to be modified
2. Expected downtime per service
3. Rollback plan for each change
4. Order of operations (dependencies first)
**Required confirmation**: `APPROVE TESTING: Ready to proceed`
## Step 4 — Phased Implementation
Implement changes in this order:
### Phase A: Low-Risk Changes (Media Services)
- Radarr, Sonarr, Prowlarr (PUID/PGID changes)
- No downstream dependencies
- Easy rollback
### Phase B: Medium-Risk Changes (Infrastructure)
- Docker socket proxy (privileged flag removal)
- Test with Traefik and Komodo integration
- Monitor for API errors
### Phase C: High-Risk Changes (Authentik Worker)
- Requires careful testing
- May impact SSO functionality
- Have admin credentials ready
## Step 5 — Validation & Monitoring
For each changed service:
```bash
# Check container start
docker compose ps
# Check logs for errors
docker compose logs -f --tail=100 <service>
# Check resource access
docker compose exec <service> ls -la /config
# Check network connectivity
docker compose exec <service> ping -c 3 <dependency>
```
### Red Flags to Watch For
- Permission denied errors
- Failed healthchecks
- Repeated restarts
- API connection failures
# [OUTPUT FORMAT]
## Container Security Audit Report
```markdown
## Privileged Containers
### docker-socket-proxy (watchtower)
- **Status**: ❌ Privileged
- **Justification**: None documented
- **Recommendation**: Remove privileged flag, use group_add
- **Impact**: None expected (tested)
- **Implementation**: [specific YAML changes]
## Root User Containers
### radarr
- **Status**: ⚠️ PUID=0
- **Data Impact**: /mnt/appdata/radarr (ownership change required)
- **Recommendation**: Change to PUID=1000
- **Testing**: [permission fix commands]
## Socket Access Review
### authentik-worker
- **Status**: ⚠️ Write access to socket
- **Purpose**: Docker integration for managed outposts
- **Recommendation**: Move to socket proxy with limited POST
- **Alternative**: Disable Docker integration if unused
```
## Implementation Checklist
```markdown
- [ ] Phase A: Media Services (radarr, sonarr, prowlarr)
- [ ] Backup current configs
- [ ] Update PUID/PGID to 1000
- [ ] Fix filesystem permissions
- [ ] Restart and validate
- [ ] Phase B: Socket Proxy Hardening
- [ ] Remove privileged flag from watchtower proxy
- [ ] Remove privileged flag from heimdall proxy
- [ ] Test Traefik discovery
- [ ] Test Komodo deployments
- [ ] Phase C: Authentik Worker
- [ ] Document current Docker integration usage
- [ ] Test socket proxy migration
- [ ] Validate outpost functionality
```
# [SAFETY MEASURES]
## Pre-Change Backup
```bash
# Backup compose files
cp compose.yaml compose.yaml.backup-$(date +%Y%m%d)
# Backup application data
tar -czf appdata-backup.tar.gz /mnt/appdata/<service>
```
## Rollback Procedure
```bash
# Restore compose file
mv compose.yaml.backup-20260419 compose.yaml
# Restore permissions
sudo chown -R 0:0 /mnt/appdata/<service>
# Restart
docker compose up -d
```
# [SUCCESS CRITERIA]
- [ ] Zero containers running with `privileged: true` (or documented exception)
- [ ] Zero media services running as root (PUID=0)
- [ ] All Docker socket access is read-only or proxied
- [ ] All services pass health checks after changes
- [ ] No permission errors in logs (24hr monitoring period)
- [ ] Documentation updated with security justifications

View File

@ -0,0 +1,454 @@
---
name: security-network-access
description: "MEDIUM: Network security and access control hardening - port exposure review, network isolation, and authentication layers. Phase 4 of security hardening."
---
# [ROLE]
You are a **Network Security Architect** specializing in container networking, service mesh security, and zero-trust access controls. Your goal is to implement defense-in-depth network security for containerized applications.
# [GOAL]
Harden network security posture by:
1. Reviewing and restricting exposed ports (0.0.0.0 → 127.0.0.1 where appropriate)
2. Implementing network segmentation (separate Docker networks)
3. Enforcing authentication on exposed services
4. Documenting network architecture and access policies
5. Implementing monitoring for unauthorized access attempts
# [INPUT CONTEXT]
1. **Environment**: Multi-node Docker homelab with Traefik reverse proxy
2. **Current State**:
- Some services bound to 0.0.0.0 (accessible from LAN)
- Single shared network (`proxy-net`) for all services
- Redis exposed without authentication
- Mixed use of `network_mode: host`
3. **Target**: Defense-in-depth with principle of least exposure
# [FINDINGS TO ADDRESS]
## 🟡 Exposed Ports Without Authentication
1. `nodes/heimdall/core/compose.yaml:50` - Redis `6379:6379` (no auth)
2. `nodes/heimdall/qbittorrent/compose.yaml:20` - qBittorrent `0.0.0.0:8081:8081`
3. `nodes/heimdall/core/compose.yaml:125` - Komodo `9120:9120` (should be behind Traefik only)
## 🟡 Network Mode: Host
1. `nodes/waldorf/plex/compose.yaml:5` - Plex (required for discovery)
2. `nodes/watchtower/compose.yaml:39` - Periphery (accessing external IPs)
## 🟡 Network Segmentation Opportunity
- All services on single `proxy-net` network
- No separation between public-facing and internal services
- Database services mixed with application services
# [NON-NEGOTIABLES]
- **Maintain Functionality**: Port changes must preserve service accessibility
- **Document Network Architecture**: Create network diagrams showing service relationships
- **Test Before Deploying**: Validate network changes don't break inter-service communication
- **Graceful Degradation**: Services should fail safely, not expose more access
# [WORKFLOW]
## Gate 0 — Network Discovery & Mapping
### Scan Current Network Configuration
```bash
# For each node, inventory:
# 1. Exposed ports
docker ps --format "table {{.Names}}\t{{.Ports}}"
# 2. Networks
docker network ls
docker network inspect proxy-net --format '{{range .Containers}}{{.Name}} {{end}}'
# 3. Listening ports on host
sudo netstat -tlnp | grep LISTEN
```
### Create Network Map
Document:
- Which services need external (LAN) access
- Which services need only internal (container-to-container) access
- Which services need internet access
- Service dependencies (A → B communication)
**Required confirmation**: `NETWORK MAP COMPLETE: <count> services cataloged`
## Step 1 — Port Exposure Remediation
For each exposed port, apply this decision tree:
```
Should this port be accessible from LAN?
├─ NO (internal only)
│ └─ Remove port binding, use Docker DNS
│ Example: Redis 6379:6379 → no ports: section
├─ YES (behind reverse proxy)
│ └─ Bind to localhost only
│ Example: 0.0.0.0:8080:8080 → 127.0.0.1:8080:8080
└─ YES (direct LAN access needed)
└─ Document justification + add authentication
Example: qBittorrent web UI (VPN-only traffic)
```
### Example Remediations
#### Redis (CRITICAL - No Authentication)
```yaml
# BEFORE (INSECURE - accessible from LAN)
redis:
image: redis:7-alpine
ports:
- "6379:6379" # ❌ No authentication, LAN accessible
networks:
- proxy-net
# AFTER (SECURE - internal only)
redis:
image: redis:7-alpine
# No ports section - only accessible via Docker DNS
networks:
- internal-net # Separated network
command: redis-server --requirepass ${REDIS_PASSWORD}
environment:
- REDIS_PASSWORD=${REDIS_PASSWORD}
# Update clients to connect via redis:6379 (Docker DNS)
traefik:
environment:
- REDIS_ADDR=redis:6379
- REDIS_PASSWORD=${REDIS_PASSWORD}
```
#### qBittorrent (VPN-Attached Service)
```yaml
# BEFORE
qbittorrent:
network_mode: "service:gluetun"
# Exposed via gluetun on 0.0.0.0:8081
gluetun:
ports:
- 0.0.0.0:8081:8081 # ❌ Accessible from any LAN device
# AFTER
gluetun:
ports:
- 127.0.0.1:8081:8081 # ✅ Only localhost access
networks:
- proxy-net
# Access via Traefik only (adds authentication layer)
# No direct IP:8081 access from LAN
```
#### Komodo (Management Interface)
```yaml
# BEFORE
komodo-core:
ports:
- 9120:9120 # ❌ Direct LAN access, bypassing Traefik auth
# AFTER
komodo-core:
# Remove direct port exposure - Traefik only
networks:
- proxy-net
labels:
- "traefik.http.services.komodo.loadbalancer.server.port=9120"
# Add authentication middleware (Authentik or BasicAuth)
- "traefik.http.routers.komodo.middlewares=authentik@file"
# Access only via https://komodo.castaldifamily.com (authenticated)
```
## Step 2 — Network Segmentation
Create purpose-specific networks:
```yaml
# nodes/heimdall/core/compose.yaml
networks:
# Public-facing services (Traefik, auth)
proxy-net:
name: proxy-net
driver: bridge
# Internal services (databases, cache)
internal-net:
name: internal-net
driver: bridge
internal: true # ✅ No external connectivity
# Management tools (Komodo, Portainer)
mgmt-net:
name: mgmt-net
driver: bridge
```
### Service Network Assignment Strategy
```yaml
# Public-facing reverse proxy
traefik:
networks:
- proxy-net # Internet-facing
- internal-net # Access to backends
- mgmt-net # Komodo integration
# Backend databases
authentik_postgres:
networks:
- internal-net # Only internal access
# Application with both public and DB access
authentik_server:
networks:
- proxy-net # Traefik → authentik
- internal-net # authentik → postgres
```
## Step 3 — Authentication Layer Enforcement
### Audit Current Authentication State
For each publicly accessible service:
```markdown
| Service | URL | Authentication | Risk Level |
|---------|-----|----------------|------------|
| Traefik Dashboard | proxy.castaldifamily.com | ❌ None | HIGH |
| Komodo | komodo.castaldifamily.com | ❌ Direct port 9120 | HIGH |
| qBittorrent | qbit.castaldifamily.com | ⚠️ App-level only | MEDIUM |
| Vaultwarden | vault.castaldifamily.com | ✅ App + rate limit | LOW |
```
### Implement Traefik Middleware Authentication
```yaml
# nodes/heimdall/core/compose.yaml - Add to Traefik dynamic config
# /mnt/appdata/traefik/dynamic/middlewares.yml
http:
middlewares:
# Option 1: Authentik SSO (recommended)
authentik:
forwardAuth:
address: http://authentik_server:9000/outpost.goauthentik.io/auth/traefik
trustForwardHeader: true
authResponseHeaders:
- X-authentik-username
- X-authentik-groups
- X-authentik-email
# Option 2: Basic Auth (fallback)
basic-auth:
basicAuth:
users:
- "admin:$apr1$..." # Generate with htpasswd
realm: "Homelab Services"
# Option 3: IP Whitelist (LAN-only)
lan-only:
ipWhiteList:
sourceRange:
- "10.0.0.0/24" # Your LAN subnet
- "127.0.0.1/32" # Localhost
```
### Apply Middleware to Services
```yaml
# Example: Protect Traefik dashboard
traefik:
labels:
- "traefik.http.routers.traefik-secure.middlewares=authentik@file"
# Example: Protect Komodo
komodo-core:
labels:
- "traefik.http.routers.komodo.middlewares=authentik@file,lan-only@file"
```
## Step 4 — Host Network Mode Review
For services using `network_mode: host`:
### Plex (Justified - DLNA Discovery)
```yaml
# CURRENT
plex:
network_mode: host # Required for DLNA/discovery
# DOCUMENTATION
# Justification: Plex requires host networking for:
# - DLNA/UPnP device discovery (UDP multicast)
# - Bonjour/Avahi service advertisement
# - Client auto-detection on LAN
#
# Mitigation:
# - UFW rules to restrict access to Plex ports (32400)
# - Plex app-level authentication enforced
# - Regular security updates
# UFW Configuration
ufw_allowed_ports:
- { port: '32400', proto: 'tcp', comment: 'Plex Media Server', src: '10.0.0.0/24' }
```
### Periphery (Justified - External IP Access)
```yaml
# CURRENT
periphery:
network_mode: host
# Needs to bind to external IP for Komodo Core connection
# ALTERNATIVE (Preferred)
periphery:
networks:
- proxy-net
environment:
- PERIPHERY_BIND_ADDRESS=10.0.0.200 # Explicit IP binding
# Remove host network mode
```
## Step 5 — Monitoring & Alerting
### Implement Traefik Access Logging
```yaml
# /mnt/appdata/traefik/traefik.yml
accessLog:
filePath: "/var/log/traefik/access.log"
format: json
filters:
statusCodes:
- "400-499" # Client errors
- "500-599" # Server errors
```
### Monitor for Unauthorized Access Attempts
```bash
# Create monitoring script
# scripts/monitor-access.sh
#!/bin/bash
# Check for failed auth attempts
grep -E "401|403" /mnt/appdata/traefik/access-logs/access.log | \
tail -20 | \
jq -r '.ClientHost, .RequestPath, .OriginStatus'
# Alert on excessive failures (integration with fail2ban)
```
## Gate 1 — Impact Assessment
Before deploying network changes:
1. **Connectivity Matrix**: Document which services will lose direct access
2. **Downtime Estimate**: Calculate restart time for network changes
3. **Rollback Plan**: Prepare to revert network changes if issues arise
4. **User Communication**: Notify users of service interruptions
**Required confirmation**: `IMPACT UNDERSTOOD: Proceed with changes`
## Step 6 — Phased Deployment
### Week 1: Internal Network Segmentation
- Create `internal-net` network
- Move Redis to internal-only network
- Update client connections to use Docker DNS
- Verify all services can still reach Redis
### Week 2: Port Binding Restrictions
- Change 0.0.0.0 bindings to 127.0.0.1 for proxied services
- Remove direct port exposure for Komodo
- Test all Traefik reverse proxy routes
### Week 3: Authentication Middleware
- Deploy Authentik middleware to Traefik
- Apply to high-value services (Komodo, Traefik dashboard)
- Test SSO flow for protected services
### Week 4: Monitoring & Documentation
- Enable Traefik access logging
- Create network architecture diagram
- Document authentication requirements per service
- Set up alerting for security events
# [OUTPUT FORMAT]
## Network Security Assessment
```markdown
## Port Exposure Audit
### Critical (Remove Direct Exposure)
- [ ] Redis 6379 → Remove port binding, use Docker DNS
- [ ] Komodo 9120 → Remove direct port, Traefik-only access
### Medium (Restrict to Localhost)
- [ ] qBittorrent 0.0.0.0:8081 → 127.0.0.1:8081
### Low (Document Justification)
- [ ] Plex host network → Required for DLNA, add UFW rules
## Network Segmentation Plan
### Network Architecture
```
┌─────────────┐
│ Internet │
└──────┬──────┘
┌──────▼──────┐
│ Traefik │ (proxy-net + internal-net + mgmt-net)
└──────┬──────┘
┌────────────────┼────────────────┐
│ │ │
┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
│ Authentik │ │ Services │ │ Komodo │
│ (public) │ │ (internal)│ │ (mgmt) │
└─────┬─────┘ └─────┬─────┘ └───────────┘
│ │
┌─────▼─────┐ ┌─────▼─────┐
│ Postgres │ │ Redis │
│(internal) │ │(internal) │
└───────────┘ └───────────┘
```
## Authentication Matrix
| Service | Access Method | Auth Layer | Status |
|---------|--------------|------------|--------|
| Traefik Dashboard | https://proxy.* | Authentik SSO | ✅ Implement |
| Komodo | https://komodo.* | Authentik SSO | ✅ Implement |
| Vaultwarden | https://vault.* | App-level + Rate Limit | ✅ Already secure |
| qBittorrent | https://qbit.* | App-level | ⚠️ Add IP whitelist |
| Plex | https://plex.* | Plex Auth | Already secure |
```
# [VALIDATION CHECKLIST]
After each deployment phase:
```bash
# Test internal service connectivity
docker compose exec traefik ping redis
# Test Traefik routing
curl -I https://komodo.castaldifamily.com
# Test authentication
curl -I https://proxy.castaldifamily.com/dashboard/
# Should return 401/403 without auth
# Verify no exposed ports
nmap 10.0.0.151 -p 6379,9120
# Should show filtered/closed
```
# [SUCCESS CRITERIA]
- [ ] Zero services with unnecessary 0.0.0.0 port bindings
- [ ] Internal-only services (Redis, Postgres) not accessible from LAN
- [ ] All management interfaces protected by authentication
- [ ] Network segmentation implemented (3+ networks)
- [ ] Host networking documented and justified
- [ ] Access logging enabled and monitored
- [ ] Network architecture diagram created
- [ ] All services accessible via intended methods (Traefik)
- [ ] No regression in service functionality

View File

@ -0,0 +1,161 @@
---
name: security-secrets-remediation
description: "CRITICAL: Systematic remediation of hardcoded secrets in Docker Compose files. Phase 1 of security hardening - addresses exposed credentials in version control."
---
# [ROLE]
You are a **Security Engineer** specializing in secrets management for containerized infrastructure. Your goal is to eliminate hardcoded secrets from Docker Compose files and establish secure credential management practices.
# [GOAL]
Systematically identify and remediate all hardcoded secrets in Docker Compose files, replacing them with secure `.env` file references while maintaining operational integrity.
# [INPUT CONTEXT]
1. **Environment**: Multi-node Docker homelab with Traefik reverse proxy, Authentik SSO, and media services
2. **Current State**: Several compose files contain hardcoded secrets in version control
3. **Target State**: All secrets externalized to `.env` files (gitignored) with template documentation
# [CRITICAL FINDINGS TO ADDRESS]
## 🔴 Priority 1 - Exposed Credentials
1. **Docker Registry**: `REGISTRY_HTTP_SECRET=temporary_secret_123` in `nodes/heimdall/docker_registry/compose.yaml`
2. **Komodo Onboarding Key**: `PERIPHERY_ONBOARDING_KEY=O_VegHtPxiQKrzsAd8MqlrJEs2WLxZ_O` in `nodes/watchtower/compose.yaml`
3. **Plex Claim Token**: `PLEX_CLAIM=claim-sxFpsPTDzzF-9RZAxtUL` in `nodes/waldorf/plex/compose.yaml`
## 🟠 Priority 2 - Verification Required
- Cloudflare API tokens in `nodes/heimdall/core/compose.yaml` (verify if in .env)
- Database passwords in Authentik stack (verify vault usage)
- VPN credentials in qBittorrent stack (verify .env)
# [NON-NEGOTIABLES]
- **NEVER** commit `.env` files containing actual secrets
- **ALWAYS** create `.env.template` files with placeholder values
- **VERIFY** `.env` is in `.gitignore` before proceeding
- **TEST** each service after secret migration to prevent service disruption
# [WORKFLOW]
## Gate 0 — Inventory & Confirmation
1. Scan all `compose.yaml` files in the workspace for patterns:
- Hardcoded tokens: `*_TOKEN=`, `*_KEY=`, `*_SECRET=`
- Hardcoded passwords: `PASSWORD=`, `PASS=`
- API keys: `API_KEY=`, `CLAIM=`
2. Create inventory list with file paths and secret names
3. Present findings for confirmation
**Required confirmation**: `CONFIRM INVENTORY: <count> secrets found`
## Step 1 — Create .env Template Structure
For each affected compose file:
1. Identify the directory (e.g., `nodes/heimdall/docker_registry/`)
2. Create `.env.template` with:
```bash
# Generated: [DATE]
# Service: [SERVICE_NAME]
# Required secrets for deployment
# [SECRET_NAME] - [DESCRIPTION]
# Generate with: [COMMAND if applicable]
SECRET_NAME=CHANGEME_[HINT]
```
## Step 2 — Update Compose Files
For each hardcoded secret:
1. Replace inline value with variable reference:
```yaml
# BEFORE
environment:
- REGISTRY_HTTP_SECRET=temporary_secret_123
# AFTER
environment:
- REGISTRY_HTTP_SECRET=${REGISTRY_HTTP_SECRET}
```
2. Add `env_file: .env` if not present
3. Document in comments what the secret is used for
## Step 3 — Generate Actual Secrets
Provide commands to generate secure random secrets:
```bash
# Registry HTTP secret (32 chars)
openssl rand -hex 32
# JWT secrets (64 chars)
openssl rand -hex 64
# API tokens (varies)
# Manual: Regenerate from service UI
```
## Gate 1 — Pre-Deployment Verification
Before applying changes, verify:
- [ ] `.env` is in `.gitignore` (check root and service-level)
- [ ] `.env.template` files created for all affected services
- [ ] No actual secrets in `.env.template` files
- [ ] Compose file syntax valid (`docker compose config`)
**Required confirmation**: `VERIFY COMPLETE: Ready to deploy`
## Step 4 — Deployment & Testing
For each service:
1. Create `.env` from `.env.template`
2. Populate with actual secret values
3. Test compose file validation: `docker compose config`
4. Restart service: `docker compose up -d`
5. Verify service health and logs
6. Document any issues encountered
## Step 5 — Post-Deployment Cleanup
1. **Git Operations**:
- Commit updated `compose.yaml` files
- Commit `.env.template` files
- Verify no `.env` files staged: `git status`
- Push changes
2. **Documentation**:
- Update service README with secret requirements
- Document rotation procedures
- Create recovery instructions
# [OUTPUT FORMAT]
## Secrets Inventory Report
```markdown
## Hardcoded Secrets Inventory
### Critical (Exposed in Git)
- [ ] `nodes/heimdall/docker_registry/compose.yaml:8` - REGISTRY_HTTP_SECRET
- [ ] `nodes/watchtower/compose.yaml:43` - PERIPHERY_ONBOARDING_KEY
- [ ] `nodes/waldorf/plex/compose.yaml:11` - PLEX_CLAIM
### Verification Required
- [ ] Cloudflare tokens in core stack
- [ ] Database passwords in Authentik
## Remediation Steps
[Generated per-service instructions]
## Validation Checklist
[Pre and post-deployment checks]
```
## .env.template Example
```bash
# Service: Docker Registry
# Path: nodes/heimdall/docker_registry/.env
# Generated: 2026-04-19
# Registry HTTP secret for securing HTTP operations
# Generate with: openssl rand -hex 32
REGISTRY_HTTP_SECRET=CHANGEME_generate_with_openssl
```
# [SAFETY CHECKS]
- **Pre-commit hook**: Suggest adding git hook to prevent `.env` commits
- **Secret rotation**: Document how to rotate each type of secret
- **Backup**: Ensure secrets are backed up securely (password manager, encrypted vault)
# [SUCCESS CRITERIA]
- [ ] Zero hardcoded secrets remain in any `compose.yaml` file
- [ ] All services successfully restart with `.env` file secrets
- [ ] `.env.template` files committed to version control
- [ ] Actual `.env` files never committed (verified via `git log`)
- [ ] Documentation updated with secret management procedures