Homelab Bootstrap & Utility Scripts
This directory contains the unified day0 onboarding infrastructure and operational utility scripts for homelab hardware provisioning.
🚀 Quick Start
New Hardware Onboarding (Recommended):
# Auto-detect hardware type and configure
./bootstrap.sh
# Preview changes without applying
./bootstrap.sh --dry-run
# Force specific hardware type
./bootstrap.sh --hardware-type proxmox --target-ip 10.0.0.201
Health Check on Existing Nodes:
# Run comprehensive validation
./validate-node.sh
# JSON output for monitoring
./validate-node.sh --json
📁 Directory Structure
scripts/
├── bootstrap.sh # ⭐ Unified day0 onboarding (NEW)
├── validate-node.sh # ⭐ Standalone health check tool (NEW)
├── lib/ # ⭐ Modular libraries (NEW)
│ ├── detection.sh # OS/hardware auto-detection
│ ├── network.sh # Network configuration & validation
│ ├── validation.sh # Comprehensive health checks
│ ├── fingerprint.sh # Hardware inventory collection
│ └── proxmox.sh # Proxmox VE post-install (comprehensive)
├── day0bootstrap.sh # 🔻 DEPRECATED - use bootstrap.sh
├── pi_init.sh # 🔻 DEPRECATED - use bootstrap.sh
├── onboarding.sh # 🔻 DEPRECATED - use bootstrap.sh
└── README.md # This file
🛠️ Core Tools
bootstrap.sh — Unified Day0 Onboarding
Purpose: Intelligent initialization for all homelab hardware types. Auto-detects OS, hardware, and applies appropriate configuration.
Replaces: day0bootstrap.sh, pi_init.sh, onboarding.sh
Features:
- ✅ Auto-detection (Proxmox, Docker VM, Raspberry Pi, physical servers, AI workstations)
- ✅ Static IP configuration via netplan
- ✅ Docker & Ansible installation (with Debian Trixie compatibility)
- ✅ SSH key generation (ED25519 preferred, RSA fallback)
- ✅ Comprehensive validation suite
- ✅ Hardware fingerprinting with YAML/JSON output
- ✅ Ansible inventory auto-discovery
Usage:
./bootstrap.sh [OPTIONS]
Options:
--help Show detailed help
--dry-run Preview actions without changes
--hardware-type TYPE Override auto-detection
(proxmox|docker-vm|pi|physical-docker|ai-workstation)
--skip-network Skip network configuration
--skip-validation Skip post-bootstrap validation
--output-json Generate JSON output instead of YAML
--target-ip IP Set static IP address (default: auto-assigned)
--gateway IP Set gateway IP (default: 10.0.0.1)
--dns IP Set DNS server (default: 10.0.0.2)
Examples:
# Auto-detect and configure with defaults
./bootstrap.sh
# Preview what would be done (safe to run)
./bootstrap.sh --dry-run
# Force Proxmox mode with custom IP
./bootstrap.sh --hardware-type proxmox --target-ip 10.0.0.201
# Skip network config (already configured manually)
./bootstrap.sh --skip-network
# Generate JSON for monitoring integration
./bootstrap.sh --output-json
Output Files:
../ansible/archive/outputs/hardware-facts-{hostname}-{timestamp}.yml../ansible/archive/outputs/bootstrap-validation-{hostname}-{timestamp}.log../ansible/archive/inventory/discovered-hosts.yml(auto-discovery)
Workflow:
- Detection → Identify OS, hardware type, CPU, GPU
- Network Config → Apply static IP via netplan (SSH will disconnect)
- Package Install → Docker, Ansible, proxmoxer (as needed)
- SSH Keys → Generate/verify ED25519 keys
- Validation → Comprehensive health checks (disk, memory, network, etc.)
- Fingerprinting → Generate hardware inventory YAML
Network Warning: Network configuration will disconnect SSH. Reconnect to new IP after ~10 seconds.
validate-node.sh — Standalone Health Check Tool
Purpose: Comprehensive validation suite for operational readiness. Can be run post-bootstrap or ad-hoc on any managed node.
Features:
- ✅ Disk space & performance checks
- ✅ Memory & swap validation
- ✅ Network routing & connectivity tests
- ✅ NFS client configuration
- ✅ Docker daemon health
- ✅ Proxmox API accessibility (if applicable)
- ✅ SSH security audit
- ✅ Time synchronization verification
- ✅ JSON output for monitoring integration
Usage:
./validate-node.sh [OPTIONS]
Options:
--help Show detailed help
--json Output results in JSON format
--critical-only Show only critical errors
--verbose Show detailed output for each check
Examples:
# Run all checks with standard output
./validate-node.sh
# Only show critical errors (useful in scripts)
./validate-node.sh --critical-only
# JSON output for monitoring/alerting
./validate-node.sh --json | jq '.validation.errors'
# Verbose mode with detection summary
./validate-node.sh --verbose
Exit Codes:
0→ All checks passed (or warnings only)1→ Critical errors found2→ Invalid usage
Integration Example:
# Use in deployment pipeline
if ./validate-node.sh --critical-only; then
echo "Node healthy, proceeding with deployment"
else
echo "Critical issues found, aborting"
exit 1
fi
📚 Library Components
lib/detection.sh — System Detection
Functions:
detect_os_family()→ debian, ubuntu, raspbiandetect_os_version()→ trixie, bookworm, noble, etc.detect_hardware_type()→ proxmox, docker-vm, pi, physical-docker, ai-workstationdetect_cpu_vendor()→ intel, amd, armdetect_cpu_generation()→ Intel generation number (12, 13, 14, etc.)detect_gpu()→ nvidia, amd, intel, nonedetect_primary_interface()→ Network interface nameget_current_ip()→ Current IPv4 address
Usage:
source lib/detection.sh
HW_TYPE=$(detect_hardware_type)
echo "Hardware type: $HW_TYPE"
lib/network.sh — Network Configuration
Functions:
apply_static_ip()→ Configure netplan with static IPconfigure_network_safe()→ Apply network changes with reconnection instructionsvalidate_connectivity()→ Test gateway, internet, DNScheck_nfs_accessibility()→ Test NFS server reachabilityget_desired_vlan_ip()→ Determine VLAN IP based on hardware type (future)
Usage:
source lib/network.sh
configure_network_safe "10.0.0.151" "10.0.0.1" "10.0.0.2"
lib/validation.sh — Health Checks
Functions:
check_disk_space()→ Validate sufficient disk spacecheck_memory()→ Validate RAM meets hardware type requirementscheck_network_routes()→ Validate routing configurationcheck_docker_daemon()→ Validate Docker installation and healthcheck_proxmox_api()→ Validate Proxmox VE (if applicable)run_validation_suite()→ Run all checks and return summary
Severity Levels:
- Critical → Blocks Ansible playbook execution
- Warning → Manual review recommended
- Info → Logged for reference
lib/fingerprint.sh — Hardware Inventory
Functions:
collect_cpu_facts()→ CPU model, cores, frequencycollect_memory_facts()→ RAM and swap detailscollect_disk_facts()→ Storage capacity and type (SSD/HDD)collect_network_facts()→ Network interfaces and connectivitygenerate_hardware_facts_yaml()→ Full YAML outputgenerate_ansible_inventory_snippet()→ Ansible inventory entry
Output Format (YAML):
hostname:
os:
family: " "bookworm"
cpu:
model: "Intel Core i7-12700"
cores: 12
memory:
total_gb: 32
storage:
root:
total_gb: 500
type: "ssd"
lib/proxmox.sh — Proxmox VE Post-Install
Comprehensive Proxmox configuration (inspired by community-scripts/ProxmoxVE):
Repository Management:
configure_all_pve_repos()→ Complete repository configurationdisable_pve_enterprise_repo()→ Disable subscription-required repositoryenable_pve_no_subscription_repo()→ Enable free updates repositoryconfigure_ceph_repos()→ Configure Ceph storage repositoriesadd_pvetest_repo()→ Add beta/test repository (disabled by default)
Subscription Nag Removal:
disable_subscription_nag()→ Remove subscription warnings from UI- Patches web UI JavaScript (proxmoxlib.js)
- Patches mobile UI templates
- Creates apt hook to persist after updates
High Availability Management:
disable_ha_services()→ Disable HA for single-node setups (saves resources)enable_ha_services()→ Enable HA for clustered environmentscheck_ha_status()→ Check current HA service status
System Maintenance:
update_pve_system()→ Run full system update (apt dist-upgrade)check_reboot_required()→ Check if reboot needed after updates
Comprehensive Routine:
run_proxmox_post_install()→ All-in-one post-install configuration- Repository fixes (PVE 8.x and 9.x compatible)
- Subscription nag removal
- Auto-detect single vs. cluster and adjust HA accordingly
- System update checks
- Reboot requirement detection
Validation:
validate_proxmox_config()→ Verify Proxmox configurationget_pve_version()→ Get Proxmox VE versionis_pve_supported_version()→ Check if version is supported (8.0-8.9.x, 9.0-9.1.x)
Usage:
source lib/proxmox.sh
# Run complete post-install (auto mode)
run_proxmox_post_install "auto"
# Individual functions
disable_subscription_nag
disable_ha_services # For single-node setups
validate_proxmox_config
What It Does (Compared to Community Script):
- ✅ Repository management (enterprise/no-subscription/Ceph/pvetest)
- ✅ Subscription nag removal (web + mobile UI)
- ✅ HA service management (auto-detect single vs. cluster)
- ✅ Supports PVE 8.x (.list format) and 9.x (.sources deb822 format)
- ✅ System update and reboot detection
- ✅ Automated (non-interactive) for use in bootstrap.sh
Integration:
When bootstrap.sh detects Proxmox hardware, it automatically runs run_proxmox_post_install "auto" after package installation, providing the same comprehensive post-install configuration as the community tteck/ProxmoxVE script. type: "ssd"
---
## 🔻 Deprecated Scripts
**These scripts are deprecated and will be removed in a future release. Use `bootstrap.sh` instead.**
### `day0bootstrap.sh` → `bootstrap.sh --hardware-type pi`
**Original Purpose:** Debian Trixie bootstrap for Raspberry Pi
**Status:** Wrapper redirects to bootstrap.sh with deprecation warning
### `pi_init.sh` → `bootstrap.sh --hardware-type pi`
**Original Purpose:** Raspberry Pi initialization
**Status:** Wrapper redirects to bootstrap.sh with deprecation warning
### `onboarding.sh` → `bootstrap.sh`
**Original Purpose:** Generic onboarding + Proxmox integration
**Status:** Wrapper redirects to bootstrap.sh with deprecation warning
**Migration Path:**
All legacy scripts now display a deprecation warning and automatically redirect to `bootstrap.sh` with appropriate flags. You have 6 months to update your workflows before these wrappers are removed.
---
## 🎯 Common Workflows
### New Raspberry Pi Control Node
```bash
# 1. Transfer scripts to Pi
scp -r scripts/ chester@10.0.0.200:~/
# 2. SSH to Pi
ssh chester@10.0.0.200
# 3. Run bootstrap
cd scripts
./bootstrap.sh # Auto-detects Pi hardware
# 4. SSH will disconnect - reconnect to new IP
ssh chester@10.0.0.200
New Proxmox Host
# On Proxmox console (pre-SSH setup)
curl -O https://git.castaldifamily.com/nathan/homelab/raw/branch/main/scripts/bootstrap.sh
chmod +x bootstrap.sh
./bootstrap.sh --hardware-type proxmox --target-ip 10.0.0.201
New Docker Swarm VM
# From control node (Watchtower)
scp -r scripts/ chester@10.0.0.211:~/
ssh chester@10.0.0.211
cd scripts
./bootstrap.sh --hardware-type docker-vm --target-ip 10.0.0.211
Health Check All Nodes (from Control Node)
# Create validation script
cat > validate-all.sh <<'EOF'
#!/bin/bash
for host in heimdall waldorf watchtower; do
echo "=== $host ==="
ssh $host "~/scripts/validate-node.sh --critical-only"
done
EOF
chmod +x validate-all.sh
./validate-all.sh
🧪 Testing & Development
Test Bootstrap in Dry-Run Mode
# Safe to run on production nodes
./bootstrap.sh --dry-run --hardware-type docker-vm
Manually Test Libraries
# Source library and test functions
source lib/detection.sh
print_detection_summary
source lib/validation.sh
run_validation_suite
Validate Script Syntax
# Check for syntax errors
bash -n bootstrap.sh
shellcheck bootstrap.sh lib/*.sh
📖 Documentation
Primary References:
Related Ansible Playbooks:
- generic_host.yml — Second-stage onboarding
- proxmox_host.yml — Proxmox-specific config
- gather_hardware_facts.yml — Ansible fact collection
🐛 Troubleshooting
Bootstrap fails with "could not detect primary interface":
- Check network cable/Wi-Fi connection
- Run manually:
ip -o link showto see interfaces - Override:
INTERFACE=eth0 ./bootstrap.sh
SSH disconnects and doesn't reconnect:
- Wait 30 seconds and retry
- Check console access if available
- Verify IP in router DHCP leases
Validation reports critical errors:
- Review log:
cat ../ansible/archive/outputs/bootstrap-validation-*.log - Address critical issues (disk space, memory, etc.)
- Re-run validation:
./validate-node.sh
Docker installation fails on Debian Trixie:
- Script automatically uses Bookworm repos as fallback
- Check logs:
journalctl -u docker
Hardware fingerprinting shows "unknown" values:
- Some VMs may not expose full hardware info
- Run
sudo dmidecode --type systemfor manual inspection
🔮 Future Enhancements
Planned (Not Yet Implemented):
- PXE boot + cloud-init provisioning (eliminate manual OS install)
- Active VLAN migration logic (10.0.0.x → 10.0.10.x/10.0.200.x)
- Proxmox VM auto-provisioning via API
- Node retirement/decommissioning automation
- Integration with Prometheus node exporter metrics
- Containerized control node for disaster recovery
VLAN Placeholder:
The get_desired_vlan_ip() function contains placeholders for future VLAN segmentation. Current implementation uses flat 10.0.0.x network. See comments in lib/network.sh for migration plan.
📝 Version History
- v4.0.0 (2026-04-12): Initial unified bootstrap release
- Consolidated 3 legacy scripts into single entry point
- Added modular library architecture
- Comprehensive validation and fingerprinting
- Auto-discovery with Ansible inventory generation
Questions? See documentation/TECHNICAL_RUNBOOK.md or review session memory in /memories/session/plan.md.
scripts
Automation utilities and helper scripts for homelab infrastructure management.
Inventory
| Script | Purpose | Status |
|---|---|---|
| onboarding.sh | Bootstrap Ansible control node for Proxmox management | 🟡 DRAFT - Testing Required |
onboarding.sh
Purpose: Automated setup of Ansible control node for Proxmox infrastructure management.
What it does:
- Installs Ansible and Proxmoxer Python library
- Detects or generates SSH keypair (ED25519 preferred, RSA fallback)
- Copies public key to Proxmox server for passwordless authentication
- Generates Ansible inventory file (
hosts.ini) with Proxmox connection details
Prerequisites:
- Debian/Ubuntu-based system (uses
apt) - Network access to Proxmox server
- Initial SSH password for target Proxmox server
Configuration: Edit the following variables at the top of the script:
PROXMOX_IP="192.168.1.100" # Target Proxmox server IP
PROXMOX_USER="root" # Proxmox SSH user
Usage:
cd ~/dev/homelab/scripts
chmod +x onboarding.sh
./onboarding.sh
Verification:
ansible proxmox_nodes -m ping -i hosts.ini
⚠️ Development Status
| Script | Testing Status | Known Issues |
|---|---|---|
| onboarding.sh | ❌ Untested in production | • Hardcoded Proxmox IP/user variables • No error handling for failed SSH key copy • Assumes Debian/Ubuntu package manager • No validation of Proxmox connectivity |
DO NOT USE IN PRODUCTION until the following are addressed:
- Error Handling: Add validation checks for each step
- Idempotency: Verify script can be safely re-run
- Multi-OS Support: Test on RHEL/Arch variants or add OS detection
- Interactive Mode: Prompt for PROXMOX_IP/USER instead of manual editing
- Rollback: Add cleanup mechanism for failed installations
Contributing
When adding new scripts:
- Update the Inventory table with script name and purpose
- Document prerequisites, configuration, and usage
- Mark status as 🟡 DRAFT until production-tested
- Add to Development Status table with known issues