BREAKING CHANGE: day0bootstrap.sh deprecated in favor of bootstrap.sh - Add scripts/bootstrap.sh (488 lines): Unified entrypoint supporting multiple hardware types (Proxmox/Docker VMs/Pi) - Create scripts/lib/ modular library system: - detection.sh: OS/hardware/container detection (362 lines) - fingerprint.sh: System fingerprinting and inventory (494 lines) - network.sh: IP configuration and VLAN placement (356 lines) - proxmox.sh: PVE post-install automation (453 lines) - validation.sh: Comprehensive pre-flight checks (510 lines) - Add validation tools: validate-node.sh, onboarding.sh, pi_init.sh - Deprecate scripts/day0bootstrap.sh with graceful redirect wrapper - Document architecture in scripts/README.md (495 lines) and PROXMOX-COMPARISON.md - Update SOP-002 with new bootstrap workflow - Add nodes/watchtower/compose.yaml (Raspberry Pi 5 stack) Migration: Existing day0bootstrap.sh users automatically redirected to new system after 5-second warning. No manual intervention required. Ref: Infrastructure automation modernization per active-tasks.md
569 lines
17 KiB
Markdown
569 lines
17 KiB
Markdown
# Homelab Bootstrap & Utility Scripts
|
|
|
|
This directory contains the unified day0 onboarding infrastructure and operational utility scripts for homelab hardware provisioning.
|
|
|
|
## 🚀 Quick Start
|
|
|
|
**New Hardware Onboarding (Recommended):**
|
|
```bash
|
|
# Auto-detect hardware type and configure
|
|
./bootstrap.sh
|
|
|
|
# Preview changes without applying
|
|
./bootstrap.sh --dry-run
|
|
|
|
# Force specific hardware type
|
|
./bootstrap.sh --hardware-type proxmox --target-ip 10.0.0.201
|
|
```
|
|
|
|
**Health Check on Existing Nodes:**
|
|
```bash
|
|
# Run comprehensive validation
|
|
./validate-node.sh
|
|
|
|
# JSON output for monitoring
|
|
./validate-node.sh --json
|
|
```
|
|
|
|
---
|
|
|
|
## 📁 Directory Structure
|
|
|
|
```
|
|
scripts/
|
|
├── bootstrap.sh # ⭐ Unified day0 onboarding (NEW)
|
|
├── validate-node.sh # ⭐ Standalone health check tool (NEW)
|
|
├── lib/ # ⭐ Modular libraries (NEW)
|
|
│ ├── detection.sh # OS/hardware auto-detection
|
|
│ ├── network.sh # Network configuration & validation
|
|
│ ├── validation.sh # Comprehensive health checks
|
|
│ ├── fingerprint.sh # Hardware inventory collection
|
|
│ └── proxmox.sh # Proxmox VE post-install (comprehensive)
|
|
├── day0bootstrap.sh # 🔻 DEPRECATED - use bootstrap.sh
|
|
├── pi_init.sh # 🔻 DEPRECATED - use bootstrap.sh
|
|
├── onboarding.sh # 🔻 DEPRECATED - use bootstrap.sh
|
|
└── README.md # This file
|
|
```
|
|
|
|
---
|
|
|
|
## 🛠️ Core Tools
|
|
|
|
### `bootstrap.sh` — Unified Day0 Onboarding
|
|
|
|
**Purpose:** Intelligent initialization for all homelab hardware types. Auto-detects OS, hardware, and applies appropriate configuration.
|
|
|
|
**Replaces:** `day0bootstrap.sh`, `pi_init.sh`, `onboarding.sh`
|
|
|
|
**Features:**
|
|
- ✅ Auto-detection (Proxmox, Docker VM, Raspberry Pi, physical servers, AI workstations)
|
|
- ✅ Static IP configuration via netplan
|
|
- ✅ Docker & Ansible installation (with Debian Trixie compatibility)
|
|
- ✅ SSH key generation (ED25519 preferred, RSA fallback)
|
|
- ✅ Comprehensive validation suite
|
|
- ✅ Hardware fingerprinting with YAML/JSON output
|
|
- ✅ Ansible inventory auto-discovery
|
|
|
|
**Usage:**
|
|
```bash
|
|
./bootstrap.sh [OPTIONS]
|
|
|
|
Options:
|
|
--help Show detailed help
|
|
--dry-run Preview actions without changes
|
|
--hardware-type TYPE Override auto-detection
|
|
(proxmox|docker-vm|pi|physical-docker|ai-workstation)
|
|
--skip-network Skip network configuration
|
|
--skip-validation Skip post-bootstrap validation
|
|
--output-json Generate JSON output instead of YAML
|
|
--target-ip IP Set static IP address (default: auto-assigned)
|
|
--gateway IP Set gateway IP (default: 10.0.0.1)
|
|
--dns IP Set DNS server (default: 10.0.0.2)
|
|
```
|
|
|
|
**Examples:**
|
|
```bash
|
|
# Auto-detect and configure with defaults
|
|
./bootstrap.sh
|
|
|
|
# Preview what would be done (safe to run)
|
|
./bootstrap.sh --dry-run
|
|
|
|
# Force Proxmox mode with custom IP
|
|
./bootstrap.sh --hardware-type proxmox --target-ip 10.0.0.201
|
|
|
|
# Skip network config (already configured manually)
|
|
./bootstrap.sh --skip-network
|
|
|
|
# Generate JSON for monitoring integration
|
|
./bootstrap.sh --output-json
|
|
```
|
|
|
|
**Output Files:**
|
|
- `../ansible/archive/outputs/hardware-facts-{hostname}-{timestamp}.yml`
|
|
- `../ansible/archive/outputs/bootstrap-validation-{hostname}-{timestamp}.log`
|
|
- `../ansible/archive/inventory/discovered-hosts.yml` (auto-discovery)
|
|
|
|
**Workflow:**
|
|
1. **Detection** → Identify OS, hardware type, CPU, GPU
|
|
2. **Network Config** → Apply static IP via netplan (SSH will disconnect)
|
|
3. **Package Install** → Docker, Ansible, proxmoxer (as needed)
|
|
4. **SSH Keys** → Generate/verify ED25519 keys
|
|
5. **Validation** → Comprehensive health checks (disk, memory, network, etc.)
|
|
6. **Fingerprinting** → Generate hardware inventory YAML
|
|
|
|
**Network Warning:** Network configuration will disconnect SSH. Reconnect to new IP after ~10 seconds.
|
|
|
|
---
|
|
|
|
### `validate-node.sh` — Standalone Health Check Tool
|
|
|
|
**Purpose:** Comprehensive validation suite for operational readiness. Can be run post-bootstrap or ad-hoc on any managed node.
|
|
|
|
**Features:**
|
|
- ✅ Disk space & performance checks
|
|
- ✅ Memory & swap validation
|
|
- ✅ Network routing & connectivity tests
|
|
- ✅ NFS client configuration
|
|
- ✅ Docker daemon health
|
|
- ✅ Proxmox API accessibility (if applicable)
|
|
- ✅ SSH security audit
|
|
- ✅ Time synchronization verification
|
|
- ✅ JSON output for monitoring integration
|
|
|
|
**Usage:**
|
|
```bash
|
|
./validate-node.sh [OPTIONS]
|
|
|
|
Options:
|
|
--help Show detailed help
|
|
--json Output results in JSON format
|
|
--critical-only Show only critical errors
|
|
--verbose Show detailed output for each check
|
|
```
|
|
|
|
**Examples:**
|
|
```bash
|
|
# Run all checks with standard output
|
|
./validate-node.sh
|
|
|
|
# Only show critical errors (useful in scripts)
|
|
./validate-node.sh --critical-only
|
|
|
|
# JSON output for monitoring/alerting
|
|
./validate-node.sh --json | jq '.validation.errors'
|
|
|
|
# Verbose mode with detection summary
|
|
./validate-node.sh --verbose
|
|
```
|
|
|
|
**Exit Codes:**
|
|
- `0` → All checks passed (or warnings only)
|
|
- `1` → Critical errors found
|
|
- `2` → Invalid usage
|
|
|
|
**Integration Example:**
|
|
```bash
|
|
# Use in deployment pipeline
|
|
if ./validate-node.sh --critical-only; then
|
|
echo "Node healthy, proceeding with deployment"
|
|
else
|
|
echo "Critical issues found, aborting"
|
|
exit 1
|
|
fi
|
|
```
|
|
|
|
---
|
|
|
|
## 📚 Library Components
|
|
|
|
### `lib/detection.sh` — System Detection
|
|
|
|
**Functions:**
|
|
- `detect_os_family()` → debian, ubuntu, raspbian
|
|
- `detect_os_version()` → trixie, bookworm, noble, etc.
|
|
- `detect_hardware_type()` → proxmox, docker-vm, pi, physical-docker, ai-workstation
|
|
- `detect_cpu_vendor()` → intel, amd, arm
|
|
- `detect_cpu_generation()` → Intel generation number (12, 13, 14, etc.)
|
|
- `detect_gpu()` → nvidia, amd, intel, none
|
|
- `detect_primary_interface()` → Network interface name
|
|
- `get_current_ip()` → Current IPv4 address
|
|
|
|
**Usage:**
|
|
```bash
|
|
source lib/detection.sh
|
|
HW_TYPE=$(detect_hardware_type)
|
|
echo "Hardware type: $HW_TYPE"
|
|
```
|
|
|
|
---
|
|
|
|
### `lib/network.sh` — Network Configuration
|
|
|
|
**Functions:**
|
|
- `apply_static_ip()` → Configure netplan with static IP
|
|
- `configure_network_safe()` → Apply network changes with reconnection instructions
|
|
- `validate_connectivity()` → Test gateway, internet, DNS
|
|
- `check_nfs_accessibility()` → Test NFS server reachability
|
|
- `get_desired_vlan_ip()` → Determine VLAN IP based on hardware type (future)
|
|
|
|
**Usage:**
|
|
```bash
|
|
source lib/network.sh
|
|
configure_network_safe "10.0.0.151" "10.0.0.1" "10.0.0.2"
|
|
```
|
|
|
|
---
|
|
|
|
### `lib/validation.sh` — Health Checks
|
|
|
|
**Functions:**
|
|
- `check_disk_space()` → Validate sufficient disk space
|
|
- `check_memory()` → Validate RAM meets hardware type requirements
|
|
- `check_network_routes()` → Validate routing configuration
|
|
- `check_docker_daemon()` → Validate Docker installation and health
|
|
- `check_proxmox_api()` → Validate Proxmox VE (if applicable)
|
|
- `run_validation_suite()` → Run all checks and return summary
|
|
|
|
**Severity Levels:**
|
|
- **Critical** → Blocks Ansible playbook execution
|
|
- **Warning** → Manual review recommended
|
|
- **Info** → Logged for reference
|
|
|
|
---
|
|
|
|
### `lib/fingerprint.sh` — Hardware Inventory
|
|
|
|
**Functions:**
|
|
- `collect_cpu_facts()` → CPU model, cores, frequency
|
|
- `collect_memory_facts()` → RAM and swap details
|
|
- `collect_disk_facts()` → Storage capacity and type (SSD/HDD)
|
|
- `collect_network_facts()` → Network interfaces and connectivity
|
|
- `generate_hardware_facts_yaml()` → Full YAML output
|
|
- `generate_ansible_inventory_snippet()` → Ansible inventory entry
|
|
|
|
**Output Format (YAML):**
|
|
```yaml
|
|
hostname:
|
|
os:
|
|
family: " "bookworm"
|
|
cpu:
|
|
model: "Intel Core i7-12700"
|
|
cores: 12
|
|
memory:
|
|
total_gb: 32
|
|
storage:
|
|
root:
|
|
total_gb: 500
|
|
type: "ssd"
|
|
```
|
|
|
|
---
|
|
|
|
### `lib/proxmox.sh` — Proxmox VE Post-Install
|
|
|
|
**Comprehensive Proxmox configuration (inspired by community-scripts/ProxmoxVE):**
|
|
|
|
**Repository Management:**
|
|
- `configure_all_pve_repos()` → Complete repository configuration
|
|
- `disable_pve_enterprise_repo()` → Disable subscription-required repository
|
|
- `enable_pve_no_subscription_repo()` → Enable free updates repository
|
|
- `configure_ceph_repos()` → Configure Ceph storage repositories
|
|
- `add_pvetest_repo()` → Add beta/test repository (disabled by default)
|
|
|
|
**Subscription Nag Removal:**
|
|
- `disable_subscription_nag()` → Remove subscription warnings from UI
|
|
- Patches web UI JavaScript (proxmoxlib.js)
|
|
- Patches mobile UI templates
|
|
- Creates apt hook to persist after updates
|
|
|
|
**High Availability Management:**
|
|
- `disable_ha_services()` → Disable HA for single-node setups (saves resources)
|
|
- `enable_ha_services()` → Enable HA for clustered environments
|
|
- `check_ha_status()` → Check current HA service status
|
|
|
|
**System Maintenance:**
|
|
- `update_pve_system()` → Run full system update (apt dist-upgrade)
|
|
- `check_reboot_required()` → Check if reboot needed after updates
|
|
|
|
**Comprehensive Routine:**
|
|
- `run_proxmox_post_install()` → **All-in-one post-install configuration**
|
|
- Repository fixes (PVE 8.x and 9.x compatible)
|
|
- Subscription nag removal
|
|
- Auto-detect single vs. cluster and adjust HA accordingly
|
|
- System update checks
|
|
- Reboot requirement detection
|
|
|
|
**Validation:**
|
|
- `validate_proxmox_config()` → Verify Proxmox configuration
|
|
- `get_pve_version()` → Get Proxmox VE version
|
|
- `is_pve_supported_version()` → Check if version is supported (8.0-8.9.x, 9.0-9.1.x)
|
|
|
|
**Usage:**
|
|
```bash
|
|
source lib/proxmox.sh
|
|
|
|
# Run complete post-install (auto mode)
|
|
run_proxmox_post_install "auto"
|
|
|
|
# Individual functions
|
|
disable_subscription_nag
|
|
disable_ha_services # For single-node setups
|
|
validate_proxmox_config
|
|
```
|
|
|
|
**What It Does (Compared to Community Script):**
|
|
- ✅ Repository management (enterprise/no-subscription/Ceph/pvetest)
|
|
- ✅ Subscription nag removal (web + mobile UI)
|
|
- ✅ HA service management (auto-detect single vs. cluster)
|
|
- ✅ Supports PVE 8.x (.list format) and 9.x (.sources deb822 format)
|
|
- ✅ System update and reboot detection
|
|
- ✅ Automated (non-interactive) for use in bootstrap.sh
|
|
|
|
**Integration:**
|
|
When `bootstrap.sh` detects Proxmox hardware, it automatically runs `run_proxmox_post_install "auto"` after package installation, providing the same comprehensive post-install configuration as the community tteck/ProxmoxVE script. type: "ssd"
|
|
```
|
|
|
|
---
|
|
|
|
## 🔻 Deprecated Scripts
|
|
|
|
**These scripts are deprecated and will be removed in a future release. Use `bootstrap.sh` instead.**
|
|
|
|
### `day0bootstrap.sh` → `bootstrap.sh --hardware-type pi`
|
|
**Original Purpose:** Debian Trixie bootstrap for Raspberry Pi
|
|
**Status:** Wrapper redirects to bootstrap.sh with deprecation warning
|
|
|
|
### `pi_init.sh` → `bootstrap.sh --hardware-type pi`
|
|
**Original Purpose:** Raspberry Pi initialization
|
|
**Status:** Wrapper redirects to bootstrap.sh with deprecation warning
|
|
|
|
### `onboarding.sh` → `bootstrap.sh`
|
|
**Original Purpose:** Generic onboarding + Proxmox integration
|
|
**Status:** Wrapper redirects to bootstrap.sh with deprecation warning
|
|
|
|
**Migration Path:**
|
|
All legacy scripts now display a deprecation warning and automatically redirect to `bootstrap.sh` with appropriate flags. You have 6 months to update your workflows before these wrappers are removed.
|
|
|
|
---
|
|
|
|
## 🎯 Common Workflows
|
|
|
|
### New Raspberry Pi Control Node
|
|
```bash
|
|
# 1. Transfer scripts to Pi
|
|
scp -r scripts/ chester@10.0.0.200:~/
|
|
|
|
# 2. SSH to Pi
|
|
ssh chester@10.0.0.200
|
|
|
|
# 3. Run bootstrap
|
|
cd scripts
|
|
./bootstrap.sh # Auto-detects Pi hardware
|
|
|
|
# 4. SSH will disconnect - reconnect to new IP
|
|
ssh chester@10.0.0.200
|
|
```
|
|
|
|
### New Proxmox Host
|
|
```bash
|
|
# On Proxmox console (pre-SSH setup)
|
|
curl -O https://git.castaldifamily.com/nathan/homelab/raw/branch/main/scripts/bootstrap.sh
|
|
chmod +x bootstrap.sh
|
|
./bootstrap.sh --hardware-type proxmox --target-ip 10.0.0.201
|
|
```
|
|
|
|
### New Docker Swarm VM
|
|
```bash
|
|
# From control node (Watchtower)
|
|
scp -r scripts/ chester@10.0.0.211:~/
|
|
ssh chester@10.0.0.211
|
|
cd scripts
|
|
./bootstrap.sh --hardware-type docker-vm --target-ip 10.0.0.211
|
|
```
|
|
|
|
### Health Check All Nodes (from Control Node)
|
|
```bash
|
|
# Create validation script
|
|
cat > validate-all.sh <<'EOF'
|
|
#!/bin/bash
|
|
for host in heimdall waldorf watchtower; do
|
|
echo "=== $host ==="
|
|
ssh $host "~/scripts/validate-node.sh --critical-only"
|
|
done
|
|
EOF
|
|
|
|
chmod +x validate-all.sh
|
|
./validate-all.sh
|
|
```
|
|
|
|
---
|
|
|
|
## 🧪 Testing & Development
|
|
|
|
### Test Bootstrap in Dry-Run Mode
|
|
```bash
|
|
# Safe to run on production nodes
|
|
./bootstrap.sh --dry-run --hardware-type docker-vm
|
|
```
|
|
|
|
### Manually Test Libraries
|
|
```bash
|
|
# Source library and test functions
|
|
source lib/detection.sh
|
|
print_detection_summary
|
|
|
|
source lib/validation.sh
|
|
run_validation_suite
|
|
```
|
|
|
|
### Validate Script Syntax
|
|
```bash
|
|
# Check for syntax errors
|
|
bash -n bootstrap.sh
|
|
shellcheck bootstrap.sh lib/*.sh
|
|
```
|
|
|
|
---
|
|
|
|
## 📖 Documentation
|
|
|
|
**Primary References:**
|
|
- [SOP-002: Initial Infrastructure Deployment](../documentation/SOPs/SOP-002-Initial-Infrastructure-Deployment.md)
|
|
- [Technical Runbook](../documentation/TECHNICAL_RUNBOOK.md)
|
|
- [Environment Constraints](../ansible/archive/documentation/standards/environment-constraints.md)
|
|
|
|
**Related Ansible Playbooks:**
|
|
- [generic_host.yml](../ansible/archive/playbooks/onboarding/generic_host.yml) — Second-stage onboarding
|
|
- [proxmox_host.yml](../ansible/archive/playbooks/onboarding/proxmox_host.yml) — Proxmox-specific config
|
|
- [gather_hardware_facts.yml](../ansible/archive/playbooks/preflight/gather_hardware_facts.yml) — Ansible fact collection
|
|
|
|
---
|
|
|
|
## 🐛 Troubleshooting
|
|
|
|
**Bootstrap fails with "could not detect primary interface":**
|
|
- Check network cable/Wi-Fi connection
|
|
- Run manually: `ip -o link show` to see interfaces
|
|
- Override: `INTERFACE=eth0 ./bootstrap.sh`
|
|
|
|
**SSH disconnects and doesn't reconnect:**
|
|
- Wait 30 seconds and retry
|
|
- Check console access if available
|
|
- Verify IP in router DHCP leases
|
|
|
|
**Validation reports critical errors:**
|
|
- Review log: `cat ../ansible/archive/outputs/bootstrap-validation-*.log`
|
|
- Address critical issues (disk space, memory, etc.)
|
|
- Re-run validation: `./validate-node.sh`
|
|
|
|
**Docker installation fails on Debian Trixie:**
|
|
- Script automatically uses Bookworm repos as fallback
|
|
- Check logs: `journalctl -u docker`
|
|
|
|
**Hardware fingerprinting shows "unknown" values:**
|
|
- Some VMs may not expose full hardware info
|
|
- Run `sudo dmidecode --type system` for manual inspection
|
|
|
|
---
|
|
|
|
## 🔮 Future Enhancements
|
|
|
|
**Planned (Not Yet Implemented):**
|
|
- [ ] PXE boot + cloud-init provisioning (eliminate manual OS install)
|
|
- [ ] Active VLAN migration logic (10.0.0.x → 10.0.10.x/10.0.200.x)
|
|
- [ ] Proxmox VM auto-provisioning via API
|
|
- [ ] Node retirement/decommissioning automation
|
|
- [ ] Integration with Prometheus node exporter metrics
|
|
- [ ] Containerized control node for disaster recovery
|
|
|
|
**VLAN Placeholder:**
|
|
The `get_desired_vlan_ip()` function contains placeholders for future VLAN segmentation. Current implementation uses flat 10.0.0.x network. See comments in `lib/network.sh` for migration plan.
|
|
|
|
---
|
|
|
|
## 📝 Version History
|
|
|
|
- **v4.0.0** (2026-04-12): Initial unified bootstrap release
|
|
- Consolidated 3 legacy scripts into single entry point
|
|
- Added modular library architecture
|
|
- Comprehensive validation and fingerprinting
|
|
- Auto-discovery with Ansible inventory generation
|
|
|
|
---
|
|
|
|
**Questions?** See [documentation/TECHNICAL_RUNBOOK.md](../documentation/TECHNICAL_RUNBOOK.md) or review session memory in `/memories/session/plan.md`.
|
|
# scripts
|
|
|
|
Automation utilities and helper scripts for homelab infrastructure management.
|
|
|
|
---
|
|
|
|
## Inventory
|
|
|
|
| Script | Purpose | Status |
|
|
|--------|---------|--------|
|
|
| [onboarding.sh](onboarding.sh) | Bootstrap Ansible control node for Proxmox management | 🟡 **DRAFT** - Testing Required |
|
|
|
|
---
|
|
|
|
## onboarding.sh
|
|
|
|
**Purpose:** Automated setup of Ansible control node for Proxmox infrastructure management.
|
|
|
|
**What it does:**
|
|
1. Installs Ansible and Proxmoxer Python library
|
|
2. Detects or generates SSH keypair (ED25519 preferred, RSA fallback)
|
|
3. Copies public key to Proxmox server for passwordless authentication
|
|
4. Generates Ansible inventory file (`hosts.ini`) with Proxmox connection details
|
|
|
|
**Prerequisites:**
|
|
- Debian/Ubuntu-based system (uses `apt`)
|
|
- Network access to Proxmox server
|
|
- Initial SSH password for target Proxmox server
|
|
|
|
**Configuration:**
|
|
Edit the following variables at the top of the script:
|
|
```bash
|
|
PROXMOX_IP="192.168.1.100" # Target Proxmox server IP
|
|
PROXMOX_USER="root" # Proxmox SSH user
|
|
```
|
|
|
|
**Usage:**
|
|
```bash
|
|
cd ~/dev/homelab/scripts
|
|
chmod +x onboarding.sh
|
|
./onboarding.sh
|
|
```
|
|
|
|
**Verification:**
|
|
```bash
|
|
ansible proxmox_nodes -m ping -i hosts.ini
|
|
```
|
|
|
|
---
|
|
|
|
## ⚠️ Development Status
|
|
|
|
| Script | Testing Status | Known Issues |
|
|
|--------|---------------|--------------|
|
|
| onboarding.sh | ❌ Untested in production | • Hardcoded Proxmox IP/user variables<br>• No error handling for failed SSH key copy<br>• Assumes Debian/Ubuntu package manager<br>• No validation of Proxmox connectivity |
|
|
|
|
**DO NOT USE IN PRODUCTION** until the following are addressed:
|
|
|
|
1. **Error Handling:** Add validation checks for each step
|
|
2. **Idempotency:** Verify script can be safely re-run
|
|
3. **Multi-OS Support:** Test on RHEL/Arch variants or add OS detection
|
|
4. **Interactive Mode:** Prompt for PROXMOX_IP/USER instead of manual editing
|
|
5. **Rollback:** Add cleanup mechanism for failed installations
|
|
|
|
---
|
|
|
|
## Contributing
|
|
|
|
When adding new scripts:
|
|
1. Update the **Inventory** table with script name and purpose
|
|
2. Document prerequisites, configuration, and usage
|
|
3. Mark status as 🟡 DRAFT until production-tested
|
|
4. Add to **Development Status** table with known issues
|