homelab/ansible/archive/documentation/playbooks/manage_docker_environment.md

246 lines
6.2 KiB
Markdown

# Docker Environment Management Playbook
## Overview
The `manage_docker_environment.yml` playbook provides comprehensive Docker management capabilities for your homelab, including installation, configuration, container management, health monitoring, and maintenance tasks.
## Target Hosts
- **Primary:** `docker_hosts` group (includes docker-01 at 10.0.0.251)
- Can be run against any host in the `ubuntu_lab` group
## Features
### 1. Docker Installation
- Installs Docker CE with all required components
- Includes Docker Compose plugin
- Installs Docker BuildKit
- Configures Docker service for auto-start
### 2. Configuration Management
- Configures Docker daemon with logging limits
- Adds specified users to the docker group
- Sets up storage driver (overlay2)
- Creates custom Docker networks
### 3. Container Management
- Lists all running containers
- Creates standard networks (backend, frontend)
- Provides container inventory
### 4. Health Monitoring
- Checks Docker disk usage
- Identifies unhealthy containers
- Reports system status
### 5. Maintenance & Cleanup
- Removes stopped containers
- Prunes unused images
- Cleans up unused volumes
- Removes orphaned networks
### 6. Configuration Backup
- Backs up docker-compose files
- Creates timestamped copies in `/opt/docker-backups`
## Usage
### Basic Execution
```bash
# Run all tasks
ansible-playbook -i inventory/hosts.ini playbooks/manage_docker_environment.yml
# Check mode (dry run)
ansible-playbook -i inventory/hosts.ini playbooks/manage_docker_environment.yml --check
# Run with specific tags
ansible-playbook -i inventory/hosts.ini playbooks/manage_docker_environment.yml --tags "health,monitoring"
```
### Available Tags
| Tag | Description |
| :--- | :--- |
| `install` | Docker installation tasks |
| `setup` | Installation + configuration |
| `config` | Configuration management only |
| `containers` | Container management tasks |
| `management` | Container inventory and network setup |
| `health` | Health checks and monitoring |
| `monitoring` | Same as health |
| `maintenance` | Cleanup and pruning tasks |
| `cleanup` | Same as maintenance |
| `backup` | Configuration backup tasks |
### Tag Combinations
```bash
# Install and configure Docker (first run)
ansible-playbook -i inventory/hosts.ini playbooks/manage_docker_environment.yml --tags "install,config"
# Daily health check
ansible-playbook -i inventory/hosts.ini playbooks/manage_docker_environment.yml --tags "health"
# Weekly maintenance
ansible-playbook -i inventory/hosts.ini playbooks/manage_docker_environment.yml --tags "maintenance" \
-e "docker_cleanup_enabled=true"
# Full system audit
ansible-playbook -i inventory/hosts.ini playbooks/manage_docker_environment.yml --tags "containers,health"
```
## Configuration Variables
### Docker Users
```yaml
docker_users:
- chester
- additional_user
```
### Daemon Configuration
```yaml
docker_daemon_options:
log-driver: "json-file"
log-opts:
max-size: "10m"
max-file: "3"
storage-driver: "overlay2"
insecure-registries:
- "registry.local:5000"
```
### Cleanup Settings
```yaml
# Enable cleanup tasks (default: false for safety)
docker_cleanup_enabled: true
# Remove images older than X days
docker_cleanup_older_than_days: 30
```
## Examples
### First-Time Setup
```bash
# Install Docker on new host
ansible-playbook -i inventory/hosts.ini playbooks/manage_docker_environment.yml \
--limit docker-01 \
--tags "install,config"
```
### Regular Maintenance Workflow
```bash
# 1. Check health status
ansible-playbook -i inventory/hosts.ini playbooks/manage_docker_environment.yml \
--tags "health"
# 2. Review disk usage, then run cleanup if needed
ansible-playbook -i inventory/hosts.ini playbooks/manage_docker_environment.yml \
--tags "maintenance" \
-e "docker_cleanup_enabled=true"
# 3. Backup configurations
ansible-playbook -i inventory/hosts.ini playbooks/manage_docker_environment.yml \
--tags "backup"
```
### Add Custom Networks
```yaml
# In the playbook or as extra vars:
docker_networks:
- name: web_tier
driver: bridge
- name: database_tier
driver: bridge
internal: true
```
## Safety Features
- **Cleanup Disabled by Default:** Cleanup tasks require explicit enabling via `docker_cleanup_enabled=true`
- **Check Mode Compatible:** All tasks support `--check` for dry-run testing
- **Idempotent:** Can be run multiple times safely
- **Non-Destructive Monitoring:** Health checks don't modify system state
## Prerequisites
- Ubuntu/Debian-based system
- SSH access with sudo privileges
- Python 3 with pip available
- Internet connection for package downloads
## Post-Execution
After running the playbook:
1. **Verify Docker installation:**
```bash
ssh chester@10.0.0.251 "docker --version && docker compose version"
```
2. **Test Docker without sudo:**
```bash
ssh chester@10.0.0.251 "docker ps"
```
> [!NOTE]
> Users may need to log out and back in for group membership changes to take effect.
3. **Check Docker status:**
```bash
ssh chester@10.0.0.251 "sudo systemctl status docker"
```
## Troubleshooting
### Docker service won't start
```bash
# Check Docker daemon logs
ssh chester@10.0.0.251 "sudo journalctl -u docker -n 50"
# Validate daemon.json syntax
ssh chester@10.0.0.251 "sudo cat /etc/docker/daemon.json | jq ."
```
### Permission denied errors
```bash
# Verify group membership
ssh chester@10.0.0.251 "groups"
# Force group update (requires re-login)
ssh chester@10.0.0.251 "newgrp docker"
```
### High disk usage
```bash
# Run cleanup manually
ansible-playbook -i inventory/hosts.ini playbooks/manage_docker_environment.yml \
--tags "maintenance" \
-e "docker_cleanup_enabled=true"
```
## Integration with Other Playbooks
This playbook works alongside:
- [init_swarm_cluster.yml](../../playbooks/init_swarm_cluster.yml) - Run Docker setup first
- [bootstrap_ai_workstation.yml](../../playbooks/bootstrap_ai_workstation.yml) - Can install Docker as dependency
## Next Steps
1. **Deploy Applications:** Create docker-compose files in `/opt/docker/`
2. **Set Up Monitoring:** Integrate with Prometheus/Grafana
3. **Automate Backups:** Schedule regular configuration backups
4. **Container Orchestration:** Consider Swarm or K3s for multi-host deployments