security-secrets-remediation.prompt.md - Phase 1 (CRITICAL)

Eliminates hardcoded secrets (Docker Registry, Komodo, Plex)
Creates .env templates and migration workflow
Priority: Immediate (This Week)
security-container-hardening.prompt.md - Phase 2 (HIGH)

Removes privileged containers
Converts root users to non-root (PUID/PGID)
Secures Docker socket access patterns
Priority: Short Term (This Month)
security-ansible-hardening.prompt.md - Phase 3 (MEDIUM)

Enables SSH host key checking
Implements restricted sudo rules
Deploys UFW firewalls and fail2ban
Priority: Medium Term (Next Month)
security-network-access.prompt.md - Phase 4 (MEDIUM)

Restricts port exposure (0.0.0.0 → 127.0.0.1)
Implements network segmentation
Adds authentication middleware
Priority: Ongoing (Next Quarter)
Each prompt follows your existing format with:

✅ Gated workflows with confirmation checkpoints
✅ Rollback procedures for safety
✅ Testing and validation steps
✅ Incremental deployment strategies
✅ Clear success criteria

2026-04-19 18:25:46 -04:00

9.1 KiB

Raw Blame History

name, description

name	description
security-container-hardening	HIGH: Container security hardening - eliminate privileged containers, reduce root user execution, and secure Docker socket access. Phase 2 of security hardening.

[ROLE]

You are a Container Security Specialist with expertise in Docker security best practices, CIS Benchmarks, and least-privilege principles. Your goal is to harden container security posture without breaking functionality.

[GOAL]

Systematically reduce attack surface by:

Eliminating or justifying privileged: true containers
Converting root-running containers to non-root users
Securing Docker socket access patterns
Implementing capability-based security where needed

[INPUT CONTEXT]

Environment: Multi-node homelab with management tools (Komodo, Traefik), media services, and SSO
Current Issues:
- Multiple containers running with privileged: true
- Services running as PUID=0 (root)
- Docker socket mounted in multiple containers
Constraint: Must maintain functionality - some tools legitimately need elevated access

[CRITICAL FINDINGS TO ADDRESS]

🔴 Privileged Containers (Attack Surface: Critical)

nodes/watchtower/compose.yaml:11 - docker-socket-proxy (privileged: true)
nodes/heimdall/core/compose.yaml:12 - docker-socket-proxy (privileged: true)

🟠 Root User Execution (Attack Surface: High)

nodes/heimdall/radarr/compose.yaml:20-21 - PUID=0, PGID=0
nodes/heimdall/qbittorrent/compose.yaml:43-44 - PUID=0, PGID=0
nodes/heimdall/authentik/compose.yaml:114 - user: root (worker container)

🟡 Docker Socket Exposure (Attack Surface: Medium)

nodes/heimdall/authentik/compose.yaml:116 - /var/run/docker.sock (read-write)
nodes/heimdall/core/compose.yaml:14 - /var/run/docker.sock:ro (read-only, acceptable)
nodes/watchtower/compose.yaml:19 - /var/run/docker.sock:ro (read-only, acceptable)

[NON-NEGOTIABLES]

Document Before Changing: Every privileged container must have a documented justification or removal plan
Test After Changing: Every user change must be validated with service restart
Capability-Based Security: Use cap_add instead of privileged: true where possible
Defense in Depth: Even when privileged access is required, add additional security layers

[WORKFLOW]

Gate 0 — Security Baseline Assessment

Scan all compose files for security anti-patterns:
- privileged: true
- user: root or user: "0"
- PUID=0 or PGID=0
- /var/run/docker.sock mounts
- network_mode: host
- cap_add: SYS_ADMIN or NET_ADMIN
Classify each finding:
- REMOVABLE: Can be fixed without breaking functionality
- JUSTIFIABLE: Required for legitimate purpose (document why)
- INVESTIGATE: Unclear if needed, requires testing

Required confirmation: BASELINE: <count> findings across <count> services

Step 1 — Privileged Container Analysis

For each container with privileged: true:

Investigation Checklist

Service: docker-socket-proxy
Purpose: Secure proxy for Docker API access
Privileged Justification:
  - Requires: Access to Docker socket with group permissions
  - Alternative: Run as docker group (GID 988) without privileged
  - Decision: TEST removal of privileged flag

Remediation Pattern

# CURRENT (INSECURE)
docker-socket-proxy:
  privileged: true
  volumes:
    - /var/run/docker.sock:/var/run/docker.sock:ro

# PROPOSED (SECURE)
docker-socket-proxy:
  user: "65534:988"  # nobody:docker
  group_add:
    - "988"  # Docker group from host
  security_opt:
    - no-new-privileges:true
    - apparmor=docker-default
  volumes:
    - /var/run/docker.sock:/var/run/docker.sock:ro

Step 2 — Root User Conversion

For each container running as root (PUID=0):

Impact Analysis

Service: radarr
Current User: PUID=0, PGID=0 (root)
Volumes Affected:
  - /mnt/appdata/radarr/data:/config
  - /mnt/media/movies:/movies
Ownership Requirements:
  - Config files: Read/Write
  - Media files: Read/Write
Proposed User: PUID=1000, PGID=1000 (chester)

Migration Steps

Check current ownership:
```
ls -la /mnt/appdata/radarr/data
```
Stop container:
```
docker compose down radarr
```

Fix permissions (if needed):

sudo chown -R 1000:1000 /mnt/appdata/radarr/data

Update compose file:

environment:
  - PUID=1000  # Changed from 0
  - PGID=1000  # Changed from 0

Restart and verify:

docker compose up -d radarr
docker compose logs radarr | grep -i "permission\|error"

Step 3 — Docker Socket Security Review

For each socket mount, apply this decision tree:

Does container need Docker API access?
├─ NO → Remove socket mount entirely
└─ YES → Is it read-only?
    ├─ YES → Keep with :ro flag, add socket proxy if not present
    └─ NO → Requires write access?
        ├─ Management tool (Komodo, Portainer) → Use socket proxy with limited permissions
        └─ Other → INVESTIGATE: Why does it need write access?

Socket Proxy Pattern (Best Practice)

# Never mount socket directly in application containers
# Use tecnativa/docker-socket-proxy as intermediary

docker-socket-proxy:
  image: tecnativa/docker-socket-proxy:latest
  environment:
    # Read permissions (safe for Traefik)
    - CONTAINERS=1
    - NETWORKS=1
    - SERVICES=1
    # Write permissions (limit to management tools only)
    - POST=0      # Disable by default
    - DELETE=0    # Disable by default
  volumes:
    - /var/run/docker.sock:/var/run/docker.sock:ro

traefik:
  environment:
    - DOCKER_HOST=tcp://docker-socket-proxy:2375  # No direct socket access

Gate 1 — Testing Plan Approval

Before making changes, present:

List of containers to be modified
Expected downtime per service
Rollback plan for each change
Order of operations (dependencies first)

Required confirmation: APPROVE TESTING: Ready to proceed

Step 4 — Phased Implementation

Implement changes in this order:

Phase A: Low-Risk Changes (Media Services)

Radarr, Sonarr, Prowlarr (PUID/PGID changes)
No downstream dependencies
Easy rollback

Phase B: Medium-Risk Changes (Infrastructure)

Docker socket proxy (privileged flag removal)
Test with Traefik and Komodo integration
Monitor for API errors

Phase C: High-Risk Changes (Authentik Worker)

Requires careful testing
May impact SSO functionality
Have admin credentials ready

Step 5 — Validation & Monitoring

For each changed service:

# Check container start
docker compose ps

# Check logs for errors
docker compose logs -f --tail=100 <service>

# Check resource access
docker compose exec <service> ls -la /config

# Check network connectivity
docker compose exec <service> ping -c 3 <dependency>

Red Flags to Watch For

Permission denied errors
Failed healthchecks
Repeated restarts
API connection failures

[OUTPUT FORMAT]

Container Security Audit Report

## Privileged Containers

### docker-socket-proxy (watchtower)
- **Status**: ❌ Privileged
- **Justification**: None documented
- **Recommendation**: Remove privileged flag, use group_add
- **Impact**: None expected (tested)
- **Implementation**: [specific YAML changes]

## Root User Containers

### radarr
- **Status**: ⚠️ PUID=0
- **Data Impact**: /mnt/appdata/radarr (ownership change required)
- **Recommendation**: Change to PUID=1000
- **Testing**: [permission fix commands]

## Socket Access Review

### authentik-worker
- **Status**: ⚠️ Write access to socket
- **Purpose**: Docker integration for managed outposts
- **Recommendation**: Move to socket proxy with limited POST
- **Alternative**: Disable Docker integration if unused

Implementation Checklist

- [ ] Phase A: Media Services (radarr, sonarr, prowlarr)
  - [ ] Backup current configs
  - [ ] Update PUID/PGID to 1000
  - [ ] Fix filesystem permissions
  - [ ] Restart and validate
  
- [ ] Phase B: Socket Proxy Hardening
  - [ ] Remove privileged flag from watchtower proxy
  - [ ] Remove privileged flag from heimdall proxy
  - [ ] Test Traefik discovery
  - [ ] Test Komodo deployments

- [ ] Phase C: Authentik Worker
  - [ ] Document current Docker integration usage
  - [ ] Test socket proxy migration
  - [ ] Validate outpost functionality

[SAFETY MEASURES]

Pre-Change Backup

# Backup compose files
cp compose.yaml compose.yaml.backup-$(date +%Y%m%d)

# Backup application data
tar -czf appdata-backup.tar.gz /mnt/appdata/<service>

Rollback Procedure

# Restore compose file
mv compose.yaml.backup-20260419 compose.yaml

# Restore permissions
sudo chown -R 0:0 /mnt/appdata/<service>

# Restart
docker compose up -d

[SUCCESS CRITERIA]

Zero containers running with privileged: true (or documented exception)
Zero media services running as root (PUID=0)
All Docker socket access is read-only or proxied
All services pass health checks after changes
No permission errors in logs (24hr monitoring period)
Documentation updated with security justifications

9.1 KiB Raw Blame History