Compare commits

..

2 Commits

Author SHA1 Message Date
325c4b98a5 feat(documentation): add planning document standards for migration plans and implementation guides 2026-04-12 01:31:43 -04:00
2531cb4705 feat(documentation): enhance migration guides for Git-crypt and prompt distribution
- Update Git-crypt migration guide with detailed phase breakdown and time estimates
- Expand prompt distribution plan with implementation options and timelines
2026-04-12 01:31:32 -04:00
4 changed files with 1475 additions and 0 deletions

View File

@ -0,0 +1,675 @@
---
description: "Planning Document Standards: Format and structure requirements for migration plans, implementation guides, and project roadmaps."
applyTo: "documentation/plans/**/*.md"
---
# Planning Document Standards
## Purpose
This document defines the structure, formatting, and content requirements for all planning documents stored in `documentation/plans/`. These standards ensure consistency, enable accurate time estimation, and provide clear execution pathways for infrastructure changes.
---
## Document Types
### Migration Plans
Files prefixed with `plan-` that detail transitioning from one state to another.
**Examples:**
- `plan-gitcryptMigration.md` — Implementing git-crypt for secret management
- `plan-ansibleSetup.md` — Setting up Ansible control node
- `plan-promptDistribution.md` — Centralizing prompt repository
### Implementation Guides
Step-by-step instructions for deploying new technologies or systems.
### Roadmaps
Long-term strategic plans spanning multiple phases or quarters.
---
## Required Document Structure
### 1. Header Section
Every plan must begin with:
```markdown
# [Descriptive Title]: [Specific Goal]
## Overview
[2-3 sentence summary of what this plan accomplishes and why it matters]
**[Primary Metric]:** [Value/Target]
**[Secondary Metric]:** [Value/Target] (optional)
**End State:** [Concrete description of success]
**Estimated Time to Complete:** X-Y hours (first-time setup) | A-B hours (experienced operator)
```
**Example:**
```markdown
# Ansible Control Node Setup: Path to Production Readiness
## Overview
Transform **Watchtower** (Raspberry Pi 5) into a production-ready Ansible control node capable of managing the entire homelab infrastructure.
**Control Node:** Watchtower (10.0.0.200) — Raspberry Pi 5, ARM Cortex-A76, 16GB RAM
**Managed Nodes:** Heimdall (10.0.0.151), Waldorf (10.0.0.251), Watchtower (localhost)
**End State:** Fully configured Ansible environment with validated connectivity, encrypted secrets, and role scaffolding.
**Estimated Time to Complete:** 2-3 hours (first-time setup) | 45-60 minutes (experienced operator)
```
---
### 2. Time Breakdown Table
**Required immediately after Overview section:**
```markdown
## Time Breakdown by Phase
| Phase | Description | Time Estimate |
|-------|-------------|---------------|
| **Phase 1** | [Phase Name] | XX-YY minutes |
| **Phase 2** | [Phase Name] | XX-YY minutes |
| **Phase 3** | [Phase Name] | XX-YY minutes |
| **Phase N** | [Phase Name] | XX-YY minutes |
| **Total** | End-to-End [Process Name] | **X-Y hours** |
```
**Guidelines:**
- Use **bold** for phase numbers and total row
- Time estimates should be ranges (min-max) in minutes or hours
- Total should sum to the overall estimate in the Overview
- Include 3-6 phases typically (not too granular, not too broad)
**For plans with multiple options:**
```markdown
## Time Breakdown by Implementation Option
| Option | Approach | Initial Setup Time | Ongoing Maintenance |
|--------|----------|-------------------|---------------------|
| **Option 1** | [Method] | X-Y hours | Z min/operation |
| **Option 2** | [Method] | X-Y hours | Z min/operation |
| **Option 3** | [Method] | X-Y hours | Z min/operation |
**Recommended:** Option [N] (detailed time breakdown below)
### Option [N]: Detailed Phase Breakdown
| Phase | Description | Time Estimate |
|-------|-------------|---------------|
[Standard phase table as above]
```
---
### 3. Prerequisites Section
**Required before Phase 1:**
```markdown
## Prerequisites
- [ ] [Requirement 1]
- [ ] [Requirement 2]
- [ ] [Requirement 3]
- [ ] [Optional: Requirement 4 (optional, but recommended)]
```
**Guidelines:**
- Use checkboxes (`- [ ]`) for trackability
- List in order of validation (check access before checking software versions)
- Distinguish between required and optional prerequisites
- Include both technical and knowledge prerequisites
---
### 4. Phase Sections
**Each phase must include:**
```markdown
## Phase [N]: [Phase Name]
**Estimated Time:** XX-YY minutes
[1-2 sentence phase description explaining the goal]
### Step [N]: [Step Name]
**Time:** X-Y minutes
[Step instructions with code blocks, explanations, and verification steps]
**[Optional Section: Why/Security Note/Troubleshooting]:**
[Additional context if needed]
```
**Step-Level Requirements:**
- Each step must have individual time estimate
- Include verification commands where applicable
- Add `# Expected: [output]` comments to commands
- Use consistent heading levels (## for Phase, ### for Step)
**Example:**
```markdown
## Phase 1: Control Node Foundation (Watchtower Setup)
**Estimated Time:** 20-30 minutes
### Step 1: Install Ansible Toolchain
**Time:** 10-15 minutes (depends on network speed)
Connect to Watchtower via SSH and install the complete Ansible stack:
```bash
# SSH to Watchtower
ssh chester@10.0.0.200
# Update package index
sudo apt update
# Verify Ansible installation
ansible --version
# Expected: ansible [core 2.x.x] or newer
```
**Why these tools:**
- `ansible`: Execution engine
- `ansible-lint`: Code quality enforcement
```
---
### 5. Troubleshooting Section
**Required before final checklist:**
```markdown
## Troubleshooting
### Issue: [Symptom Description]
**Cause:** [Root cause explanation]
**Fix:**
```bash
[Commands to resolve]
```
[Optional: Additional context or prevention tips]
```
**Guidelines:**
- Anticipate 3-5 common failure modes
- Use consistent format: Issue → Cause → Fix
- Include verification steps in the fix
- Order by likelihood (most common first)
---
### 6. Summary/Checklist Section
**Required at end of document:**
```markdown
## Summary Checklist
- [ ] [Major milestone 1]
- [ ] [Major milestone 2]
- [ ] [Major milestone 3]
- [ ] [Verification step]
- [ ] [Documentation updated]
- [ ] [Changes committed to Git]
**Environment Status:** 🟢 **PRODUCTION READY** | 🟡 **TESTING** | 🔴 **NOT READY**
```
**Optional sections:**
```markdown
## Migration Checklist
**Pre-Migration:**
- [ ] [Backup step]
- [ ] [Verification step]
**Migration Steps:**
- [ ] [Core steps from the plan]
**Post-Migration:**
- [ ] [Validation]
- [ ] [Documentation]
- [ ] [Cleanup]
```
---
### 7. Document Metadata (Footer)
**Required at very end:**
```markdown
---
**Document Version:** X.Y
**Last Updated:** [Month DD, YYYY]
**Author:** [Author Name/Tool]
**Review Cycle:** [Quarterly | Monthly | After infrastructure changes]
```
---
## Formatting Standards
### Time Estimates
**Range Format:**
- Use ranges for uncertainty: `10-15 minutes`, `2-3 hours`
- Larger range for first-time: `2-3 hours (first-time)`
- Smaller range for experienced: `45-60 minutes (experienced)`
**Never use:**
- Exact times without ranges: ❌ `15 minutes`
- Vague estimates: ❌ `about an hour`, `quickly`
### Code Blocks
**Always specify language:**
```markdown
```bash
# Good: Language specified
```
```markdown
```
# Bad: Generic code block
```
```
**Include expected outputs:**
```bash
ansible --version
# Expected: ansible [core 2.x.x] or newer
```
**Add comments explaining why:**
```bash
# Generate ED25519 key (modern, secure, fast)
ssh-keygen -t ed25519 -C "ansible@watchtower"
```
### Emphasis & Highlighting
**Use bold for:**
- Phase names in tables
- **Important:** warnings or critical notes
- **Goal:** objective statements
- Key metrics in overview
**Use inline code for:**
- File paths: `ansible/ansible.cfg`
- Commands: `git-crypt unlock`
- Variables: `vault_password_file`
- Package names: `ansible-lint`
**Use INFO blocks for decisions:**
```markdown
**Key Decisions:**
- `host_key_checking = False`: Simplifies homelab automation (acceptable for trusted private network)
- `forks = 3`: Limits parallel execution (prevents overwhelming Pi resources)
```
### Lists & Tables
**Checkbox lists for trackable items:**
```markdown
- [ ] Ansible toolchain installed
- [ ] SSH keys distributed
- [ ] First playbook run successful
```
**Markdown tables for comparisons:**
```markdown
| Option | Pros | Cons | Time |
|--------|------|------|------|
| A | Fast | Complex | 1h |
| B | Simple | Slow | 3h |
```
### Links & References
**Internal links use relative paths:**
```markdown
See [ansible/.ansible-lint](ansible/.ansible-lint) for configuration.
```
**External links use full URLs:**
```markdown
Git-crypt Documentation: https://github.com/AGWA/git-crypt
```
**Related documentation section:**
```markdown
## Related Documentation
- [SECURITY_AUDIT_REPORT.md](../documentation/SECURITY_AUDIT_REPORT.md) - Security findings
- [SOP-001](../documentation/SOPs/SOP-001-Example.md) - Related procedure
```
---
## Optional Sections (Use When Applicable)
### Rollback Plan
For destructive or risky migrations:
```markdown
## Rollback Plan
If [technology] causes issues:
```bash
# 1. Stop service
# 2. Restore backup
# 3. Revert configuration
```
**Recovery Time:** Estimated X-Y minutes
```
### Migration Strategy (Multi-Week Plans)
For staged rollouts:
```markdown
## Migration Strategy
**Total Timeline:** [N] weeks (staged rollout) | [M] days (aggressive deployment)
### Week 1: [Phase Name]
**Time:** X-Y hours
[Objectives]
### Week 2: [Phase Name]
**Time:** X-Y hours
[Objectives]
```
### Security Considerations
For security-sensitive plans:
```markdown
## Security Considerations
### [Area of Concern]
- [Risk description]
- [Mitigation strategy]
- [Verification method]
```
### Maintenance & Next Steps
For ongoing operations:
```markdown
## Maintenance & Next Steps
### Ongoing Operations
**[Operation Name]:**
```bash
[Command]
```
### Future Enhancements
1. **[Enhancement Name]:**
- [Description]
- [Estimated effort]
```
---
## Success Criteria Template
Each plan should define measurable success criteria:
```markdown
## Success Criteria
- ✅ [Technical validation 1]
- ✅ [Technical validation 2]
- ✅ [Functional test passed]
- ✅ [Documentation updated]
- ✅ [Zero errors in logs]
**Estimated Migration Time:** [Same as in overview]
**Maintenance:** [Ongoing time commitment]
```
---
## File Naming Convention
**Pattern:** `plan-[technology][Action].md`
**Examples:**
- `plan-ansibleSetup.md` — Setup/installation guide
- `plan-gitcryptMigration.md` — Migration from one state to another
- `plan-promptDistribution.md` — Distribution/deployment plan
**Rules:**
- Use camelCase for multi-word components
- No spaces or special characters
- Prefix always `plan-`
- Suffix always `.md`
---
## Quality Checklist
Before committing a planning document, verify:
- [ ] Overview includes total time estimate
- [ ] Time breakdown table present and accurate
- [ ] Prerequisites listed with checkboxes
- [ ] Each phase has time estimate
- [ ] Each step has individual time estimate
- [ ] Code blocks specify language
- [ ] Expected outputs documented
- [ ] Troubleshooting section included
- [ ] Summary checklist present
- [ ] Document metadata complete
- [ ] All links are valid
- [ ] Formatting consistent with examples
- [ ] Success criteria defined
- [ ] Related documentation referenced
---
## Anti-Patterns (Common Mistakes)
❌ **Missing time estimates:**
```markdown
## Phase 1: Setup
### Step 1: Install tools
```
✅ **Correct format:**
```markdown
## Phase 1: Setup
**Estimated Time:** 20-30 minutes
### Step 1: Install tools
**Time:** 10-15 minutes
```
---
❌ **Vague success criteria:**
```markdown
- Everything works
- No errors
```
✅ **Specific validation:**
```markdown
- ✅ All nodes respond to `ansible all -m ping`
- ✅ Zero lint errors when running `ansible-lint playbooks/*.yml`
- ✅ Validation playbook completes with exit code 0
```
---
❌ **Commands without context:**
```bash
git-crypt unlock
cat nodes/waldorf/plex/.env.secrets
```
✅ **Commands with verification:**
```bash
# Unlock with the key
git-crypt unlock ~/homelab-secrets.key
# Verify decryption worked
cat nodes/waldorf/plex/.env.secrets
# Expected: plaintext secrets (not binary data)
```
---
❌ **Flat structure (no phases):**
```markdown
# Setup Guide
## Step 1
## Step 2
## Step 3
[... 20 more steps ...]
```
✅ **Hierarchical organization:**
```markdown
# Setup Guide
## Phase 1: Foundation (30-40 min)
### Step 1 (10 min)
### Step 2 (20 min)
## Phase 2: Configuration (20-30 min)
### Step 3 (15 min)
### Step 4 (10 min)
```
---
## Example Template
Use this as a starting point for new planning documents:
```markdown
# [Title]: [Goal]
## Overview
[Description]
**[Key Metric 1]:** [Value]
**[Key Metric 2]:** [Value]
**End State:** [Success description]
**Estimated Time to Complete:** X-Y hours (first-time) | A-B hours (experienced)
---
## Time Breakdown by Phase
| Phase | Description | Time Estimate |
|-------|-------------|---------------|
| **Phase 1** | [Name] | XX-YY minutes |
| **Phase 2** | [Name] | XX-YY minutes |
| **Total** | End-to-End | **X-Y hours** |
---
## Prerequisites
- [ ] [Requirement 1]
- [ ] [Requirement 2]
---
## Phase 1: [Name]
**Estimated Time:** XX-YY minutes
### Step 1: [Action]
**Time:** X-Y minutes
```bash
# Commands
```
---
## Phase 2: [Name]
**Estimated Time:** XX-YY minutes
### Step 2: [Action]
**Time:** X-Y minutes
---
## Troubleshooting
### Issue: [Problem]
**Cause:** [Explanation]
**Fix:**
```bash
# Solution
```
---
## Summary Checklist
- [ ] [Milestone 1]
- [ ] [Milestone 2]
**Environment Status:** 🟢 **PRODUCTION READY**
---
**Document Version:** 1.0
**Last Updated:** [Date]
**Author:** [Name]
**Review Cycle:** Quarterly
```
---
## Enforcement
This standard applies to:
- All files in `documentation/plans/`
- Migration guides
- Implementation roadmaps
- Multi-phase technical projects
**Exclusions:**
- Quick reference guides (use KBA format instead)
- Single-step procedures (use SOP format instead)
- Architecture diagrams (standalone)
---
**Document Version:** 1.0
**Last Updated:** April 12, 2026
**Author:** FrankGPT (Ansible Architect Mode)
**Review Cycle:** Quarterly or when planning format needs evolve

View File

@ -0,0 +1,727 @@
# Ansible Control Node Setup: Path to Production Readiness
## Overview
Transform **Watchtower** (Raspberry Pi 5) into a production-ready Ansible control node capable of managing the entire homelab infrastructure. This guide builds the foundational runtime environment required to execute automation against Heimdall, Waldorf, and Watchtower itself.
**Control Node:** Watchtower (10.0.0.200) — Raspberry Pi 5, ARM Cortex-A76, 16GB RAM
**Managed Nodes:** Heimdall (10.0.0.151), Waldorf (10.0.0.251), Watchtower (localhost)
**End State:** Fully configured Ansible environment with validated connectivity, encrypted secrets, and role scaffolding.
**Estimated Time to Complete:** 2-3 hours (first-time setup) | 45-60 minutes (experienced operator)
---
## Time Breakdown by Phase
| Phase | Description | Time Estimate |
|-------|-------------|---------------|
| **Phase 1** | Control Node Foundation | 20-30 minutes |
| **Phase 2** | Ansible Project Configuration | 25-35 minutes |
| **Phase 3** | Validation & First Automation | 15-25 minutes |
| **Phase 4** | Role Scaffolding & Developer Experience | 20-30 minutes |
| **Phase 5** | Final Verification & Documentation | 15-20 minutes |
| **Total** | End-to-End Setup | **2-3 hours** |
---
## Prerequisites
- [ ] SSH access to Watchtower as `chester` (or your primary user)
- [ ] Watchtower has network access to Heimdall, Waldorf, and TerraMaster NAS
- [ ] Git repository cloned to Watchtower at `/home/chester/homelab` (or similar)
- [ ] Sudo privileges on Watchtower
- [ ] Basic understanding of YAML syntax
- [ ] VSCode with Remote-SSH extension (optional, but recommended)
---
## Phase 1: Control Node Foundation (Watchtower Setup)
**Estimated Time:** 20-30 minutes
### Step 1: Install Ansible Toolchain
**Time:** 10-15 minutes (depends on network speed)
Connect to Watchtower via SSH and install the complete Ansible stack:
```bash
# SSH to Watchtower
ssh chester@10.0.0.200
# Update package index
sudo apt update
# Install Ansible core components
sudo apt install -y ansible ansible-lint sshpass python3-pip python3-venv git
# Verify Ansible installation
ansible --version
# Expected: ansible [core 2.x.x] or newer
# Install Python API libraries
pip3 install proxmoxer requests --break-system-packages
# Verify ansible-lint
ansible-lint --version
# Expected: ansible-lint 6.x.x or newer
```
**Why these tools:**
- `ansible`: Execution engine
- `ansible-lint`: Code quality enforcement (aligns with `.ansible-lint` configuration)
- `sshpass`: Enables password-based initial SSH key deployment
- `proxmoxer`: Required for Proxmox API automation (future state)
- `python3-pip`: Package manager for Python libraries
---
### Step 2: Generate SSH Keys (ED25519)
**Time:** 2-3 minutes
Create the SSH key pair that Ansible will use for node authentication:
```bash
# Generate ED25519 key (modern, secure, fast)
ssh-keygen -t ed25519 -C "ansible@watchtower" -f ~/.ssh/id_ed25519 -N ""
# Set proper permissions
chmod 600 ~/.ssh/id_ed25519
chmod 644 ~/.ssh/id_ed25519.pub
# Verify key creation
ls -lh ~/.ssh/id_ed25519*
# Expected: Two files (private key and .pub)
# Display public key (for manual distribution if needed)
cat ~/.ssh/id_ed25519.pub
```
**Security Note:** The private key (`id_ed25519`) never leaves Watchtower. Only the `.pub` file is distributed to managed nodes.
---
### Step 3: Distribute SSH Key to Managed Nodes
**Time:** 3-5 minutes
Deploy the public key to all nodes (including localhost for self-management):
```bash
# Deploy to Heimdall
ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@10.0.0.151
# Deploy to Waldorf
ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@10.0.0.251
# Deploy to localhost (Watchtower managing itself)
ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@localhost
# Test passwordless authentication
ssh -i ~/.ssh/id_ed25519 chester@10.0.0.151 "hostname && exit"
# Expected: heimdall
ssh -i ~/.ssh/id_ed25519 chester@10.0.0.251 "hostname && exit"
# Expected: waldorf
ssh -i ~/.ssh/id_ed25519 chester@localhost "hostname && exit"
# Expected: watchtower
```
**Troubleshooting:** If `ssh-copy-id` fails, ensure:
1. You can SSH to the target with password authentication first
2. The target user has a `~/.ssh` directory with proper permissions (`700`)
3. The firewall allows SSH (port 22)
---
### Step 4: Configure Passwordless Sudo (If Required)
**Time:** 5-7 minutes (per node)
If Ansible tasks require privilege escalation without password prompts:
```bash
# On EACH node (Heimdall, Waldorf, Watchtower), run:
sudo visudo
# Add this line (replace 'chester' with your username):
chester ALL=(ALL) NOPASSWD: ALL
# Save and exit (:wq in vi)
```
**Alternative (More Secure):** Use Ansible Vault to encrypt the sudo password and configure `ansible_become_pass` in inventory. See Step 7 below.
---
## Phase 2: Ansible Project Configuration
**Estimated Time:** 25-35 minutes
### Step 5: Create ansible.cfg
**Time:** 5-7 minutes
Navigate to the homelab repository on Watchtower and create the main configuration file:
```bash
cd ~/homelab/ansible
cat > ansible.cfg <<'EOF'
[defaults]
# Inventory Configuration
inventory = ./inventory/hosts.yml
host_key_checking = False
# SSH Behavior
remote_user = chester
private_key_file = ~/.ssh/id_ed25519
timeout = 30
forks = 3
# Output & Logging
stdout_callback = yaml
display_skipped_hosts = False
display_ok_hosts = True
log_path = ./ansible.log
# Vault Configuration
vault_password_file = ./.vault_pass
# Role Path
roles_path = ./roles
# Retry Configuration
retry_files_enabled = False
[privilege_escalation]
become = True
become_method = sudo
become_user = root
become_ask_pass = False
[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
pipelining = True
EOF
# Verify syntax
ansible-config dump --only-changed
```
**Key Decisions:**
- `host_key_checking = False`: Simplifies homelab automation (acceptable for trusted private network)
- `vault_password_file`: Points to `.vault_pass` (created in Step 7)
- `forks = 3`: Limits parallel execution (prevents overwhelming Pi resources)
- `pipelining = True`: Performance optimization
---
### Step 6: Create Inventory Structure
**Time:** 8-10 minutes
Define the three-node infrastructure with hybrid grouping (hardware type + function):
```bash
# Create inventory directory
mkdir -p ~/homelab/ansible/inventory/group_vars/all
# Create main inventory file
cat > ~/homelab/ansible/inventory/hosts.yml <<'EOF'
---
# Homelab Infrastructure Inventory
# Control Node: Watchtower (10.0.0.200)
all:
vars:
ansible_user: chester
ansible_ssh_private_key_file: ~/.ssh/id_ed25519
children:
# --- Hardware Hierarchy ---
proxmox_vms:
hosts:
heimdall:
ansible_host: 10.0.0.151
physical_servers:
hosts:
waldorf:
ansible_host: 10.0.0.251
# GPU passthrough capability
gpu_enabled: true
gpu_type: nvidia
raspberry_pi:
hosts:
watchtower:
ansible_host: localhost
ansible_connection: local
# --- Functional Hierarchy ---
infrastructure:
hosts:
heimdall:
watchtower:
media_servers:
hosts:
waldorf:
docker_hosts:
hosts:
heimdall:
waldorf:
watchtower:
EOF
# Validate inventory
ansible-inventory --list
ansible-inventory --graph
```
**Inventory Design:**
- **Hardware groups** (`proxmox_vms`, `physical_servers`, `raspberry_pi`): Target based on architecture
- **Functional groups** (`infrastructure`, `media_servers`, `docker_hosts`): Target based on role
- **Localhost optimization**: Watchtower uses `ansible_connection: local` (no SSH overhead)
---
### Step 7: Initialize Ansible Vault
**Time:** 12-15 minutes
Create encrypted storage for sensitive variables (passwords, API keys, tokens):
```bash
cd ~/homelab/ansible
# Create vault password file
echo "YourSecureVaultPassword123!" > .vault_pass
chmod 600 .vault_pass
# CRITICAL: Add to .gitignore
echo ".vault_pass" >> ../.gitignore
# Create encrypted variable file
cat > inventory/group_vars/all/vault.yml <<'EOF'
---
# Encrypted Secrets (Ansible Vault)
# Edit with: ansible-vault edit inventory/group_vars/all/vault.yml
vault_sudo_password: "YourSudoPasswordHere"
vault_nfs_password: "" # If NFS requires auth
vault_proxmox_api_token: "" # For future Proxmox automation
vault_gitea_token: "" # For Git automation
EOF
# Encrypt the file
ansible-vault encrypt inventory/group_vars/all/vault.yml
# Verify encryption
cat inventory/group_vars/all/vault.yml
# Expected: $ANSIBLE_VAULT... (encrypted content)
# Test decryption
ansible-vault view inventory/group_vars/all/vault.yml
# Expected: Original YAML content
```
**Usage Pattern:**
1. Store all secrets in `vault.yml` with `vault_` prefix
2. Reference in playbooks/roles as: `become_pass: "{{ vault_sudo_password }}"`
3. Edit encrypted file: `ansible-vault edit inventory/group_vars/all/vault.yml`
---
## Phase 3: Validation & First Automation
**Estimated Time:** 15-25 minutes
### Step 8: Create Connectivity Validation Playbook
**Time:** 8-10 minutes
Build a simple playbook to prove the entire stack works:
```bash
mkdir -p ~/homelab/ansible/playbooks
cat > ~/homelab/ansible/playbooks/validate-connectivity.yml <<'EOF'
---
- name: Ansible Environment Validation
hosts: all
gather_facts: true
tasks:
- name: Test ping module
ansible.builtin.ping:
- name: Display node facts
ansible.builtin.debug:
msg: |
Hostname: {{ ansible_hostname }}
OS: {{ ansible_distribution }} {{ ansible_distribution_version }}
Architecture: {{ ansible_architecture }}
Python: {{ ansible_python_version }}
- name: Test privilege escalation
ansible.builtin.command:
cmd: whoami
become: true
register: sudo_test
changed_when: false
- name: Verify sudo worked
ansible.builtin.assert:
that:
- sudo_test.stdout == "root"
success_msg: "Privilege escalation: PASS"
fail_msg: "Privilege escalation: FAIL"
- name: Check NFS mount (infrastructure nodes only)
ansible.builtin.stat:
path: /mnt/appdata
register: nfs_mount
when: inventory_hostname in groups['infrastructure']
- name: Display NFS status
ansible.builtin.debug:
msg: "NFS mount exists: {{ nfs_mount.stat.exists | default(false) }}"
when: inventory_hostname in groups['infrastructure']
EOF
# Validate playbook syntax
ansible-playbook playbooks/validate-connectivity.yml --syntax-check
# Lint check (must pass with zero errors)
ansible-lint playbooks/validate-connectivity.yml
```
---
### Step 9: Execute Validation Playbook
**Time:** 7-15 minutes (includes troubleshooting)
Run the playbook to confirm end-to-end functionality:
```bash
cd ~/homelab/ansible
# Dry-run first (check mode)
ansible-playbook playbooks/validate-connectivity.yml --check
# Full execution
ansible-playbook playbooks/validate-connectivity.yml
# Expected output summary:
# heimdall : ok=6 changed=0 unreachable=0 failed=0
# waldorf : ok=6 changed=0 unreachable=0 failed=0
# watchtower : ok=6 changed=0 unreachable=0 failed=0
```
**Success Criteria:**
- All hosts return `ok` status (no `unreachable` or `failed`)
- Sudo test shows "Privilege escalation: PASS"
- Facts display correct OS/architecture for each node
**Troubleshooting:**
- `unreachable`: Check SSH keys, network connectivity, firewall
- `failed` on sudo: Verify passwordless sudo or Vault configuration
- Lint errors: Fix YAML indentation, task naming, FQCN usage
---
## Phase 4: Role Scaffolding & Developer Experience
**Estimated Time:** 20-30 minutes
### Step 10: Create Standard Role Directory Structure
**Time:** 5-7 minutes
Generate the skeleton for reusable Ansible roles:
```bash
cd ~/homelab/ansible
# Create roles directory
mkdir -p roles
# Generate a sample role (follows .ansible-standards.md)
cat > roles/setup-docker.yml <<'EOF'
---
# Placeholder: This will become a proper role with:
# - roles/setup-docker/tasks/main.yml
# - roles/setup-docker/defaults/main.yml
# - roles/setup-docker/handlers/main.yml
# - roles/setup-docker/meta/main.yml
#
# For now, just structural validation.
EOF
# Create external roles directory (for Ansible Galaxy)
mkdir -p roles/external
# Update .ansible-lint exclusion (already configured)
grep -q "roles/external" .ansible-lint && echo "✅ Lint exclusion exists"
```
**Next Steps (Future):**
- Install Galaxy roles: `ansible-galaxy install geerlingguy.docker -p roles/external/`
- Create custom roles following the patterns in `.ansible-standards.md`
- Use `molecule` for role testing (installed in Step 1)
---
### Step 11: Configure VSCode Remote Development
**Time:** 15-20 minutes (includes extension installation)
Connect VSCode from your Windows workstation to Watchtower for seamless editing:
**On Windows:**
1. Install **Remote - SSH** extension in VSCode
2. Open Command Palette (`Ctrl+Shift+P`) → "Remote-SSH: Connect to Host"
3. Enter: `chester@10.0.0.200`
4. VSCode opens a new window connected to Watchtower
5. Navigate to `/home/chester/homelab/ansible`
6. Install extensions **on the remote** (VSCode will prompt):
- Ansible (by Red Hat)
- YAML (by Red Hat)
**Verify:**
- Open `ansible.cfg` → Syntax highlighting works
- Open `playbooks/validate-connectivity.yml` → Ansible linting shows in Problems panel
- Terminal in VSCode → Runs commands directly on Watchtower
---
## Phase 5: Final Verification & Documentation
**Estimated Time:** 15-20 minutes
### Step 12: Execute Full Environment Test
**Time:** 10-12 minutes
Run comprehensive checks to certify the environment:
```bash
cd ~/homelab/ansible
# 1. Lint all playbooks
ansible-lint playbooks/*.yml
# 2. Configuration dump
ansible-config dump --only-changed
# 3. Inventory validation
ansible-inventory --list --yaml
# 4. Ad-hoc ping test
ansible all -m ping
# 5. Fact gathering test
ansible all -m setup -a "filter=ansible_distribution*"
# 6. Vault operations test
ansible-vault view inventory/group_vars/all/vault.yml
ansible-vault edit inventory/group_vars/all/vault.yml # Add a test variable, save, exit
# 7. Privilege escalation test
ansible all -m command -a "whoami" --become
# 8. Full playbook run
ansible-playbook playbooks/validate-connectivity.yml
```
**Success State:**
- ✅ Zero lint errors
- ✅ All nodes respond to `ping`
- ✅ Facts gathered from all hosts
- ✅ Vault encrypt/decrypt cycle works
- ✅ Sudo escalation succeeds
- ✅ Validation playbook completes with no failures
---
### Step 13: Update Repository Documentation
**Time:** 5-8 minutes
Document the new Ansible capabilities:
```bash
cd ~/homelab
# Update main README
cat >> README.md <<'EOF'
## Ansible Automation
**Control Node:** Watchtower (10.0.0.200)
**Managed Nodes:** Heimdall, Waldorf, Watchtower
**Quick Start:**
```bash
# SSH to control node
ssh chester@10.0.0.200
# Run validation
cd ~/homelab/ansible
ansible-playbook playbooks/validate-connectivity.yml
# Ad-hoc commands
ansible all -m ping
ansible docker_hosts -m command -a "docker --version"
```
**Configuration:**
- Inventory: `ansible/inventory/hosts.yml`
- Main Config: `ansible/ansible.cfg`
- Secrets: `ansible/inventory/group_vars/all/vault.yml` (encrypted)
**Standards:** See `ansible/.ansible-standards.md` for architectural patterns.
EOF
# Commit the changes
git add ansible/
git commit -m "feat(ansible): complete control node setup on Watchtower
- Install ansible-core, ansible-lint, proxmoxer
- Generate ED25519 SSH keys and distribute to nodes
- Create ansible.cfg with vault integration
- Build YAML inventory with hardware + functional grouping
- Initialize Ansible Vault for secret management
- Create validate-connectivity.yml playbook
- Verify end-to-end automation capability
Environment tested and production-ready."
git push origin main
```
---
## Maintenance & Next Steps
### Ongoing Operations
**Update Ansible:**
```bash
sudo apt update && sudo apt upgrade ansible ansible-lint
```
**Rotate Vault Password:**
```bash
ansible-vault rekey inventory/group_vars/all/vault.yml
# Update .vault_pass file with new password
```
**Add New Managed Node:**
1. Deploy SSH key: `ssh-copy-id -i ~/.ssh/id_ed25519.pub user@new-host`
2. Add to `inventory/hosts.yml`
3. Test: `ansible new-host -m ping`
### Future Enhancements
1. **Proxmox Automation:**
- Create playbook to manage VM creation/deletion via Proxmox API
- Use `proxmoxer` library (already installed)
2. **Docker Stack Management:**
- Ansible role to deploy Compose stacks (replacing manual Git pulls)
- Integration with Komodo API for automated deployments
3. **System Maintenance:**
- Scheduled playbook for OS updates (`apt update && upgrade`)
- NFS mount validation and auto-remediation
- Log rotation and backup verification
4. **CI/CD Integration:**
- Gitea webhook triggers Ansible playbook runs
- Automated testing via Molecule in Docker containers
---
## Troubleshooting
### Issue: "Host key verification failed"
**Cause:** SSH strict host checking is enabled despite `ansible.cfg` setting.
**Fix:**
```bash
# Clear known_hosts
rm ~/.ssh/known_hosts
# Force disable in SSH config
cat >> ~/.ssh/config <<EOF
Host 10.0.0.*
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF
```
---
### Issue: "Permission denied (publickey)"
**Cause:** SSH key not properly deployed to target node.
**Fix:**
```bash
# Re-deploy key manually
ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@TARGET_IP
# Verify key is in authorized_keys
ssh chester@TARGET_IP "cat ~/.ssh/authorized_keys | grep ansible@watchtower"
```
---
### Issue: "Vault password file not found"
**Cause:** `.vault_pass` missing or wrong permissions.
**Fix:**
```bash
# Recreate vault password file
echo "YourVaultPassword" > ~/homelab/ansible/.vault_pass
chmod 600 ~/homelab/ansible/.vault_pass
# Verify ansible.cfg points to it
grep vault_password_file ~/homelab/ansible/ansible.cfg
```
---
### Issue: Lint errors on playbook execution
**Cause:** Code violates `.ansible-lint` safety profile rules.
**Fix:**
```bash
# Run linter to see specific violations
ansible-lint playbooks/YOUR_PLAYBOOK.yml
# Common fixes:
# - Use FQCN: ansible.builtin.command instead of 'command'
# - Add 'name:' to all tasks
# - Use changed_when/failed_when for shell/command tasks
# - Add check_mode support for idempotency testing
```
---
## Summary Checklist
- [ ] Ansible toolchain installed on Watchtower
- [ ] ED25519 SSH keys generated and distributed
- [ ] Passwordless sudo configured (or Vault password set)
- [ ] `ansible.cfg` created and validated
- [ ] Inventory file with all three nodes defined
- [ ] Ansible Vault initialized and `.vault_pass` secured
- [ ] Validation playbook created and linted
- [ ] First playbook run successful (all hosts green)
- [ ] VSCode Remote-SSH connected to Watchtower
- [ ] Repository documentation updated
- [ ] Git commit pushed to Gitea
**Environment Status:** 🟢 **PRODUCTION READY**
---
**Document Version:** 1.0
**Last Updated:** April 12, 2026
**Author:** FrankGPT (Ansible Architect Mode)
**Review Cycle:** Quarterly or after infrastructure changes

View File

@ -6,6 +6,21 @@ Implement Git-crypt to encrypt sensitive `.env` files in the homelab repository,
**Goal:** Zero workflow changes for Komodo, encrypted secrets in Git, transparent decryption on pull.
**Estimated Time to Complete:** 2-3 hours (first-time setup) | 1-1.5 hours (experienced operator)
---
## Time Breakdown by Phase
| Phase | Description | Time Estimate |
|-------|-------------|---------------|
| **Phase 1** | Local Setup (Workstation) | 30-40 minutes |
| **Phase 2** | Node Setup (Komodo Targets) | 25-35 minutes |
| **Phase 3** | Update Compose Files | 15-20 minutes |
| **Phase 4** | Testing & Validation | 30-40 minutes |
| **Phase 5** | Security Hardening | 20-30 minutes |
| **Total** | End-to-End Migration | **2-3 hours** |
---
## Prerequisites
@ -18,8 +33,10 @@ Implement Git-crypt to encrypt sensitive `.env` files in the homelab repository,
---
## Phase 1: Local Setup (Your Workstation)
**Estimated Time:** 30-40 minutes
### Step 1: Install Git-crypt
**Time:** 3-5 minutes
**Windows (via Git Bash):**
```bash
@ -43,6 +60,7 @@ git-crypt --version
---
### Step 2: Initialize Git-crypt in Repository
**Time:** 3-5 minutes
```bash
cd ~/homelab
@ -62,6 +80,7 @@ git-crypt export-key ~/homelab-secrets.key
---
### Step 3: Configure Encryption Rules
**Time:** 5-7 minutes
Create `.gitattributes` in repository root:
@ -86,6 +105,7 @@ git commit -m "chore(security): configure git-crypt encryption rules"
---
### Step 4: Update .gitignore
**Time:** 3-5 minutes
**Remove** `.env.secrets` from `.gitignore` since they'll now be encrypted:
@ -117,6 +137,7 @@ homelab-secrets.key
---
### Step 5: Create Encrypted Secret Files
**Time:** 8-10 minutes
**For Plex (Waldorf):**
@ -156,6 +177,7 @@ git-crypt status nodes/heimdall/core/.env.secrets
---
### Step 6: Test Encryption Locally
**Time:** 5-7 minutes
```bash
# Check encryption status
@ -186,6 +208,7 @@ git-crypt unlock ~/homelab-secrets.key
---
### Step 7: Commit Encrypted Secrets
**Time:** 3-5 minutes
```bash
# Stage encrypted files
@ -211,8 +234,10 @@ git push origin main
---
## Phase 2: Node Setup (Komodo Deployment Targets)
**Estimated Time:** 25-35 minutes
### Step 8: Distribute Key to Komodo Nodes
**Time:** 5-8 minutes
**SECURITY NOTE:** Use secure methods to transfer the key (not email, not Slack).
@ -237,6 +262,7 @@ scp ~/homelab-secrets.key chester@10.0.0.200:~/
---
### Step 9: Install Git-crypt on Nodes
**Time:** 10-15 minutes (across all 3 nodes)
**On each node (Heimdall, Waldorf, Watchtower):**
@ -255,6 +281,7 @@ git-crypt --version
---
### Step 10: Unlock Repositories on Nodes
**Time:** 10-12 minutes (across all 3 nodes)
**Critical:** This must be done in Komodo's repo directories, not just any clone.
@ -313,8 +340,10 @@ chmod 600 ~/homelab-secrets.key
---
## Phase 3: Update Compose Files
**Estimated Time:** 15-20 minutes
### Step 11: Reference Encrypted Secret Files
**Time:** 10-12 minutes
**Example: Plex (Waldorf)**
@ -375,6 +404,7 @@ services:
---
### Step 12: Commit Compose Updates
**Time:** 5-8 minutes
```bash
git add nodes/waldorf/plex/compose.yaml
@ -392,8 +422,10 @@ git push origin main
---
## Phase 4: Testing & Validation
**Estimated Time:** 30-40 minutes
### Step 13: Test Automated Deployment
**Time:** 20-25 minutes (includes waiting for deployment)
**Trigger a deployment via Komodo:**
@ -418,6 +450,7 @@ docker logs plex --tail 50
---
### Step 14: Test Secret Rotation
**Time:** 10-15 minutes
**Scenario: Update Plex claim token**
@ -444,8 +477,10 @@ git push
---
## Phase 5: Security Hardening
**Estimated Time:** 20-30 minutes
### Step 15: Secure the Keys
**Time:** 12-15 minutes (across all nodes)
**On each node:**
@ -476,6 +511,7 @@ age -p ~/homelab-secrets.key > homelab-secrets.key.age
---
### Step 16: Document Key Access
**Time:** 8-15 minutes
Create `documentation/SECURITY_KEY_MANAGEMENT.md`:

View File

@ -4,6 +4,31 @@
Create a centralized "llm-prompts" repository that automatically distributes prompt files, instructions, and knowledge bases to multiple consumer repositories whenever changes are pushed.
**Estimated Time to Complete (Option 1):** 1.5-2.5 hours (initial setup) | 30-45 minutes (experienced operator)
---
## Time Breakdown by Implementation Option
| Option | Approach | Initial Setup Time | Ongoing Maintenance |
|--------|----------|-------------------|---------------------|
| **Option 1** | Gitea Webhook + Sync Script | 1.5-2.5 hours | 5-10 min/change |
| **Option 2** | Git Submodules | 30-45 minutes | 2-5 min/update |
| **Option 3** | NFS Shared Directory | 15-20 minutes | 0 min (instant) |
**Recommended:** Option 1 (detailed time breakdown below)
### Option 1: Detailed Phase Breakdown
| Phase | Description | Time Estimate |
|-------|-------------|---------------|
| **Setup** | Create Central Repository | 15-20 minutes |
| **Automation** | Deploy Sync Script & Webhook | 40-60 minutes |
| **Testing** | Validate Single Target | 20-30 minutes |
| **Scaling** | Add Multiple Repositories | 15-25 minutes |
| **Documentation** | Security & Troubleshooting | 10-15 minutes |
| **Total** | End-to-End Implementation | **1.5-2.5 hours** |
## Architecture Overview
```mermaid
@ -21,10 +46,12 @@ graph LR
---
## Option 1: Gitea Webhook + Sync Script (Recommended)
**Total Time:** 1.5-2.5 hours (initial setup)
**Best for:** Your current stack (Gitea, automation-friendly)
### Step 1: Create Central Repository
**Time:** 15-20 minutes
```bash
# On Gitea: Create new repo "llm-prompts"
@ -47,6 +74,7 @@ git push -u origin main
```
### Step 2: Create Sync Script
**Time:** 20-25 minutes
Create `/mnt/appdata/scripts/sync-prompts.sh` on Heimdall:
@ -91,6 +119,7 @@ echo "✅ Sync complete"
```
### Step 3: Configure Gitea Webhook
**Time:** 5-8 minutes
1. **In Gitea UI** → llm-prompts repo → Settings → Webhooks
2. **Add Webhook:**
@ -100,6 +129,7 @@ echo "✅ Sync complete"
- Active: ✅
### Step 4: Choose Webhook Receiver
**Time:** 15-25 minutes (Option A) | 20-30 minutes (Option B)
**Option A: Simple HTTP Server (Quick)**
@ -153,6 +183,7 @@ Add to your Komodo deployment if you prefer centralized management.
---
## Option 2: Git Submodules (Traditional)
**Total Time:** 30-45 minutes (initial setup) | 2-5 minutes per update
**Best for:** Simple, standard Git workflow
@ -181,6 +212,7 @@ git submodule update --remote
---
## Option 3: NFS Shared Directory (Simplest)
**Total Time:** 15-20 minutes (one-time setup) | 0 minutes (instant sync)
**Best for:** Instant sync across all repos
@ -258,14 +290,17 @@ echo "✅ Validation passed"
---
## Migration Strategy
**Total Timeline:** 4 weeks (staged rollout) | 1 day (aggressive deployment)
### Week 1: Setup Central Repository
**Time:** 2-3 hours
- Create `llm-prompts` repo on Gitea
- Copy existing prompts/instructions/knowledge from homelab
- Test structure and validate files
- Document repository layout
### Week 2: Test Automation (Single Target)
**Time:** 1-2 hours + monitoring
- Deploy webhook listener container
- Create sync script in `/mnt/appdata/scripts/`
- Configure Gitea webhook
@ -273,12 +308,14 @@ echo "✅ Validation passed"
- Monitor logs and verify commits
### Week 3: Scale to Multiple Repos
**Time:** 30-60 minutes + validation
- Add additional target repositories to sync script
- Test with 2-3 repos
- Verify no conflicts or issues
- Document any edge cases
### Week 4: Production Rollout
**Time:** 1-2 hours (documentation + monitoring setup)
- Archive old `.github` directories in target repos
- Add monitoring/alerting for sync failures
- Create troubleshooting guide