feat(bootstrap)!: introduce unified bootstrap system with modular libraries

BREAKING CHANGE: day0bootstrap.sh deprecated in favor of bootstrap.sh

- Add scripts/bootstrap.sh (488 lines): Unified entrypoint supporting multiple hardware types (Proxmox/Docker VMs/Pi)
- Create scripts/lib/ modular library system:
  - detection.sh: OS/hardware/container detection (362 lines)
  - fingerprint.sh: System fingerprinting and inventory (494 lines)
  - network.sh: IP configuration and VLAN placement (356 lines)
  - proxmox.sh: PVE post-install automation (453 lines)
  - validation.sh: Comprehensive pre-flight checks (510 lines)
- Add validation tools: validate-node.sh, onboarding.sh, pi_init.sh
- Deprecate scripts/day0bootstrap.sh with graceful redirect wrapper
- Document architecture in scripts/README.md (495 lines) and PROXMOX-COMPARISON.md
- Update SOP-002 with new bootstrap workflow
- Add nodes/watchtower/compose.yaml (Raspberry Pi 5 stack)

Migration: Existing day0bootstrap.sh users automatically redirected to new system after 5-second warning. No manual intervention required.

Ref: Infrastructure automation modernization per active-tasks.md
This commit is contained in:
nathan 2026-04-12 22:48:19 -04:00
parent 2414d8dfc5
commit e16f98a183
14 changed files with 3917 additions and 42 deletions

View File

@ -132,45 +132,74 @@ Deploy the complete homelab infrastructure from a clean state using GitOps princ
## Ansible Control Node Setup ## Ansible Control Node Setup
### Step 3: Configure Watchtower as Control Node ### Step 3: Bootstrap Watchtower as Control Node
**Time:** 25-35 minutes **Time:** 15-20 minutes (reduced from 25-35 via automation)
**Rationale:** Watchtower (Raspberry Pi 5) serves as the Ansible control node to manage all infrastructure, including itself. **Rationale:** Watchtower (Raspberry Pi 5) serves as the Ansible control node to manage all infrastructure, including itself.
1. **SSH to Watchtower:** **New Method:** Use the unified bootstrap script for automated, idempotent configuration.
1. **Transfer Bootstrap Script to Watchtower:**
**Option A: From local repository (if cloned on workstation):**
```bash
# From your workstation
scp -r homelab/scripts chester@10.0.0.200:~/
```
**Option B: Direct clone on Watchtower:**
```bash
# SSH to Watchtower
ssh chester@10.0.0.200
# Minimal clone (scripts only)
git clone --depth=1 https://git.castaldifamily.com/nathan/homelab.git
cd homelab/scripts
```
2. **Run Unified Bootstrap Script:**
```bash
# Auto-detect and configure (Raspberry Pi will be detected)
./bootstrap.sh
# The script will:
# - Detect Raspberry Pi hardware
# - Configure static IP (10.0.0.200)
# - Install Docker with Debian Trixie compatibility
# - Install Ansible and proxmoxer
# - Generate ED25519 SSH keys
# - Run comprehensive validation
# - Generate hardware fingerprint
```
**⚠️ Important:** SSH connection will drop during network reconfiguration.
Reconnect after ~10 seconds:
```bash ```bash
ssh chester@10.0.0.200 ssh chester@10.0.0.200
``` ```
2. **Install Ansible Toolchain:** 3. **Verify Bootstrap Success:**
```bash ```bash
# Update package index # After reconnecting
sudo apt update cd homelab/scripts
# Install Ansible and dependencies # Check validation report
sudo apt install -y ansible ansible-lint sshpass python3-pip git cat ../ansible/archive/outputs/bootstrap-validation-watchtower-*.log
# Install Python libraries # Verify installations
pip3 install proxmoxer requests --break-system-packages docker --version # Should show Docker 24.x or newer
ansible --version # Should show ansible [core 2.x.x]
# Verify installation # Check SSH key
ansible --version ls -lh ~/.ssh/id_ed25519.pub
# Expected: ansible [core 2.x.x] cat ~/.ssh/id_ed25519.pub # Copy this for distribution
``` ```
3. **Generate SSH Keys for Automation:** 4. **Distribute SSH Keys to Managed Nodes:**
```bash ```bash
# Generate ED25519 key (modern cryptography) # The bootstrap script generated keys, now distribute them
ssh-keygen -t ed25519 -C "ansible@watchtower" -f ~/.ssh/id_ed25519 -N ""
# Set proper permissions
chmod 600 ~/.ssh/id_ed25519
chmod 644 ~/.ssh/id_ed25519.pub
```
4. **Distribute Keys to All Nodes:**
```bash
# Deploy to Heimdall # Deploy to Heimdall
ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@10.0.0.151 ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@10.0.0.151
@ -194,9 +223,12 @@ Deploy the complete homelab infrastructure from a clean state using GitOps princ
# Expected: watchtower # Expected: watchtower
``` ```
6. **Clone Repository to Control Node:** 6. **Clone Full Repository (If Not Already Present):**
```bash ```bash
cd ~ cd ~
# If you only did shallow clone earlier, get full repo
rm -rf homelab # Remove shallow clone
git clone https://git.castaldifamily.com/nathan/homelab.git git clone https://git.castaldifamily.com/nathan/homelab.git
cd homelab cd homelab
@ -205,13 +237,19 @@ Deploy the complete homelab infrastructure from a clean state using GitOps princ
git-crypt unlock ~/homelab-secrets.key git-crypt unlock ~/homelab-secrets.key
``` ```
**Troubleshooting:**
- **Bootstrap fails:** Run with `--dry-run` first to preview actions: `./bootstrap.sh --dry-run`
- **Network doesn't reconnect:** Wait 30 seconds and retry SSH
- **Validation errors:** Review the validation log, address critical errors before proceeding
- **Manual intervention needed:** Use `./validate-node.sh` to re-check after fixes
--- ---
## Core Infrastructure Deployment ## Core Infrastructure Deployment
### Step 4: Deploy Core Stack on Heimdall ### Step 4: Bootstrap and Deploy Core Stack on Heimdall
**Time:** 20-30 minutes **Time:** 15-25 minutes (reduced from 20-30 via automation)
**Core Stack Components:** **Core Stack Components:**
- Docker Socket Proxy (security boundary) - Docker Socket Proxy (security boundary)
@ -219,29 +257,41 @@ Deploy the complete homelab infrastructure from a clean state using GitOps princ
- Redis (caching layer) - Redis (caching layer)
- Komodo Core (container orchestration) - Komodo Core (container orchestration)
**Deployment Method:** Manual Docker Compose (Ansible automation planned for future state) 1. **Bootstrap Heimdall Node:**
1. **SSH to Heimdall:** **Option A: Remote bootstrap from Watchtower (recommended):**
```bash ```bash
ssh chester@10.0.0.151 # From Watchtower control node
cd ~/homelab
# Copy bootstrap script to Heimdall
scp -r scripts chester@10.0.0.151:~/
# SSH and run bootstrap
ssh chester@10.0.0.151 "cd scripts && ./bootstrap.sh --hardware-type docker-vm"
``` ```
2. **Install Docker & Docker Compose:** **Option B: Direct console access:**
```bash ```bash
# Install Docker # Login to Heimdall directly
curl -fsSL https://get.docker.com -o get-docker.sh ssh chester@10.0.0.151
sudo sh get-docker.sh
# Clone repo or copy scripts
# Add user to docker group git clone --depth=1 https://git.castaldifamily.com/nathan/homelab.git
sudo usermod -aG docker $USER cd homelab/scripts
# Log out and back in for group to take effect # Run bootstrap
exit ./bootstrap.sh --hardware-type docker-vm --target-ip 10.0.0.151
```
2. **Verify Docker Installation:**
```bash
# After bootstrap completes
ssh chester@10.0.0.151 ssh chester@10.0.0.151
# Verify Docker installation
docker --version docker --version
docker compose version docker compose version
docker ps # Should return empty list (no containers yet)
``` ```
3. **Create Komodo Directory Structure:** 3. **Create Komodo Directory Structure:**

View File

@ -0,0 +1,85 @@
name: node-tools
services:
# 🔒 Local Security Layer for this Node
docker-socket-proxy:
image: tecnativa/docker-socket-proxy:latest
container_name: docker-socket-proxy
userns_mode: "host"
user: "0:0"
security_opt:
- apparmor=unconfined
privileged: true
networks:
- node-net
ports:
- "127.0.0.1:2375:2375" # Expose on localhost for host-mode periphery
group_add:
- "988"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
environment:
- CONTAINERS=1
- NETWORKS=1
- IMAGES=1
- INFO=1
- POST=1
- ALLOW_START=1
- ALLOW_STOP=1
# Added for Stack Management
- SERVICES=1 # Required for stack/service operations
- TASKS=1 # Required for stack task management
- VOLUMES=1 # Required if stacks use volumes
- CONFIGS=1 # Required for Docker configs
- SECRETS=1 # Required for Docker secrets
# 🦎 Komodo Periphery
periphery:
image: ghcr.io/moghtech/komodo-periphery:2
container_name: komodo-perihery-watchtower
network_mode: host # Use host networking to access external IPs
depends_on:
- docker-socket-proxy
environment:
- DOCKER_HOST=tcp://127.0.0.1:2375 # Access via localhost
- PERIPHERY_CORE_ADDRESS=ws://10.0.0.151:9120
- PERIPHERY_CONNECT_AS=Watchtower
- PERIPHERY_ONBOARDING_KEY=O_VegHtPxiQKrzsAd8MqlrJEs2WLxZ_O
volumes:
- /proc:/proc
- /mnt/appdata/komodo/watchtower/keys:/config/keys
- /mnt/appdata/komodo/watchtower/work:/etc/komodo
# ✅ Added for Stack Deployments
- /mnt/appdata/komodo/watchtower/stacks:/etc/komodo/stacks
# ✅ Added for Git-linked Stacks
- /mnt/appdata/komodo/watchtower/repos:/etc/komodo/repos
# 🔍 Traefik-KOP (Kubernetes Operator for Traefik Discovery)
traefik-kop:
image: ghcr.io/jittering/traefik-kop:0.19.4
container_name: traefik-kop
restart: unless-stopped
depends_on:
- docker-socket-proxy
networks:
- node-net
environment:
- DOCKER_HOST=tcp://docker-socket-proxy:2375
- REDIS_ADDR=10.0.0.151:6379
- BIND_IP=10.0.0.200
- KOP_HOSTNAME=watchtower
# Optional: Enable debug logging
# - VERBOSE=true
# 📜 Dozzle Agent
# dozzle:
# image: amir20/dozzle:latest
# depends_on:
# - docker-socket-proxy
# networks:
# - node-net
# environment:
# - DOCKER_HOST=tcp://docker-socket-proxy:2375
networks:
node-net:
driver: bridge

View File

@ -0,0 +1,256 @@
# Proxmox Post-Install Feature Comparison
## Community Script vs. Bootstrap.sh
This document compares the tteck/community-scripts ProxmoxVE post-install script with our unified bootstrap.sh implementation.
---
## ✅ Features Now Implemented
### Repository Management
| Feature | Community Script | bootstrap.sh | Status |
|---------|-----------------|--------------|--------|
| Disable enterprise repo | ✅ | ✅ | **COMPLETE** |
| Enable no-subscription repo | ✅ | ✅ | **COMPLETE** |
| Configure Ceph repos | ✅ | ✅ | **COMPLETE** |
| Add pvetest repo (disabled) | ✅ | ✅ | **COMPLETE** |
| PVE 8.x (.list format) | ✅ | ✅ | **COMPLETE** |
| PVE 9.x (.sources deb822) | ✅ | ✅ | **COMPLETE** |
### UI Customization
| Feature | Community Script | bootstrap.sh | Status |
|---------|-----------------|--------------|--------|
| Web UI subscription nag removal | ✅ | ✅ | **COMPLETE** |
| Mobile UI subscription nag removal | ✅ | ✅ | **COMPLETE** |
| Persistent via apt hook | ✅ | ✅ | **COMPLETE** |
### High Availability
| Feature | Community Script | bootstrap.sh | Status |
|---------|-----------------|--------------|--------|
| Disable HA on single nodes | ✅ | ✅ | **COMPLETE** |
| Auto-detect cluster vs. single | ✅ | ✅ | **COMPLETE** |
| Corosync management | ✅ | ✅ | **COMPLETE** |
### System Maintenance
| Feature | Community Script | bootstrap.sh | Status |
|---------|-----------------|--------------|--------|
| System update (apt dist-upgrade) | ✅ | ⚠️ | **PARTIAL** |
| Reboot detection | ✅ | ✅ | **COMPLETE** |
| Interactive prompts | ✅ | ❌ | **N/A (Auto Mode)** |
---
## 🎯 Key Differences
### Community Script (Interactive)
- **Mode:** Interactive (whiptail menus)
- **Approach:** User chooses each action
- **Best For:** Manual post-install on fresh Proxmox
### bootstrap.sh (Automated)
- **Mode:** Automated (sensible defaults)
- **Approach:** Auto-detect and apply best practices
- **Best For:** Scripted deployments, CI/CD, repeatability
---
## 📋 What bootstrap.sh Does for Proxmox
When `./bootstrap.sh --hardware-type proxmox` is run:
### Phase 1: Detection
```
✓ Detect Proxmox VE version (8.x or 9.x)
✓ Check cluster status (single node vs. cluster)
✓ Identify current repository configuration
```
### Phase 2: Repository Configuration
```
✓ Disable pve-enterprise repository
✓ Enable pve-no-subscription repository
✓ Configure Ceph repos (disabled by default)
✓ Add pvetest repository (disabled)
✓ Auto-select .list (PVE 8.x) or .sources (PVE 9.x) format
```
### Phase 3: UI Customization
```
✓ Remove subscription nag from web UI
✓ Remove subscription nag from mobile UI
✓ Create apt hook to persist after updates
```
### Phase 4: High Availability
```
✓ Detect single-node setup → Disable HA services (saves resources)
✓ Detect cluster → Keep HA services enabled
✓ Manage pve-ha-lrm, pve-ha-crm, corosync
```
### Phase 5: Validation
```
✓ Verify PVE version supported
✓ Check repository configuration
✓ Validate cluster status
✓ Check for required reboot
```
---
## 🚀 Usage Examples
### Basic Proxmox Bootstrap
```bash
# Auto-detect and configure Proxmox host
./bootstrap.sh --hardware-type proxmox --target-ip 10.0.0.201
```
**Output:**
```
[PHASE 1] System Detection
Auto-detected hardware type: proxmox
[PHASE 3] Package Installation
[⚙] Installing Proxmox-specific packages...
[✓] proxmoxer installed
=== PROXMOX POST-INSTALL CONFIGURATION ===
Version: 8.2.4
Mode: auto
[⚙] Disabling Proxmox enterprise repository...
[✓] Enterprise repository disabled
[⚙] Enabling Proxmox no-subscription repository...
[✓] No-subscription repository enabled
[⚙] Configuring Ceph package repositories...
[✓] Ceph repositories configured (disabled)
[⚙] Adding pvetest repository (disabled)...
[✓] pvetest repository added (disabled)
[⚙] Disabling subscription nag dialog...
[✓] Subscription nag disabled (clear browser cache)
[⚙] Single-node setup detected
[⚙] Disabling high availability services...
[✓] HA services disabled
[✓] Proxmox repository configuration complete
==========================================
Next Steps:
1. Clear browser cache (Ctrl+Shift+R)
2. Reboot if kernel was updated
3. Configure storage/networking in web UI
```
### Standalone Proxmox Configuration (Existing Host)
```bash
# Source library and run post-install only
source lib/proxmox.sh
run_proxmox_post_install "auto"
```
### Validate Proxmox Configuration
```bash
source lib/proxmox.sh
validate_proxmox_config
```
**Output:**
```
[⚙] Validating Proxmox configuration...
[✓] PVE version: 8.2.4
[✓] No-subscription repository configured
[✓] Enterprise repository disabled
[✓] Standalone node
[✓] Proxmox validation passed
```
---
## 🔍 Technical Implementation
### Subscription Nag Removal
**Web UI (proxmoxlib.js):**
```javascript
// Patches data.status check
// Before: if (!data.status || data.status !== 'active')
// After: if (data.status || data.status !== 'NoMoreNagging')
```
**Mobile UI (index.html.tpl):**
```javascript
// Injects MutationObserver to remove subscription dialogs/cards
function removeSubscriptionElements() {
// Remove dialogs with subscription text
// Remove cards without buttons containing subscription text
}
```
**Persistence (apt hook):**
```bash
# /etc/apt/apt.conf.d/90-pve-no-nag
DPkg::Post-Invoke { "/usr/local/bin/pve-remove-nag.sh"; };
```
### HA Detection Logic
```bash
# Count cluster nodes
cluster_nodes=$(pvesh get /nodes --output-format=json | jq -r 'length')
if [ "$cluster_nodes" -eq 1 ]; then
# Single node → disable HA services
systemctl disable --now pve-ha-lrm pve-ha-crm corosync
else
# Multi-node cluster → keep HA enabled
echo "HA services left enabled"
fi
```
### Repository Format Detection
```bash
major_version=$(get_pve_major_version)
if [ "$major_version" == "9" ]; then
# PVE 9.x uses deb822 .sources format
cat >/etc/apt/sources.list.d/proxmox.sources <<EOF
Types: deb
URIs: http://download.proxmox.com/debian/pve
Suites: trixie
Components: pve-no-subscription
EOF
else
# PVE 8.x uses traditional .list format
echo "deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription" \
> /etc/apt/sources.list.d/pve-no-subscription.list
fi
```
---
## 📖 Related Documentation
- **Community Script Source:** [tteck/ProxmoxVE](https://github.com/community-scripts/ProxmoxVE/blob/main/tools/pve/post-pve-install.sh)
- **Bootstrap README:** [scripts/README.md](README.md)
- **Proxmox Ansible Playbook:** [ansible/archive/playbooks/onboarding/proxmox_host.yml](../../ansible/archive/playbooks/onboarding/proxmox_host.yml)
---
## 🎓 Lessons Learned
1. **Idempotency Is Critical:** All operations can be safely re-run without side effects
2. **Version Detection Matters:** PVE 8.x vs. 9.x have different repository formats
3. **HA Context Awareness:** Single-node setups don't need HA services (wastes resources)
4. **Persistence Mechanisms:** Apt hooks ensure UI patches survive widget updates
5. **Non-Interactive Design:** Automation requires sensible defaults, not user prompts
---
**Last Updated:** 2026-04-12
**Version:** 4.0.0 (aligned with bootstrap.sh)

View File

@ -1,3 +1,498 @@
# Homelab Bootstrap & Utility Scripts
This directory contains the unified day0 onboarding infrastructure and operational utility scripts for homelab hardware provisioning.
## 🚀 Quick Start
**New Hardware Onboarding (Recommended):**
```bash
# Auto-detect hardware type and configure
./bootstrap.sh
# Preview changes without applying
./bootstrap.sh --dry-run
# Force specific hardware type
./bootstrap.sh --hardware-type proxmox --target-ip 10.0.0.201
```
**Health Check on Existing Nodes:**
```bash
# Run comprehensive validation
./validate-node.sh
# JSON output for monitoring
./validate-node.sh --json
```
---
## 📁 Directory Structure
```
scripts/
├── bootstrap.sh # ⭐ Unified day0 onboarding (NEW)
├── validate-node.sh # ⭐ Standalone health check tool (NEW)
├── lib/ # ⭐ Modular libraries (NEW)
│ ├── detection.sh # OS/hardware auto-detection
│ ├── network.sh # Network configuration & validation
│ ├── validation.sh # Comprehensive health checks
│ ├── fingerprint.sh # Hardware inventory collection
│ └── proxmox.sh # Proxmox VE post-install (comprehensive)
├── day0bootstrap.sh # 🔻 DEPRECATED - use bootstrap.sh
├── pi_init.sh # 🔻 DEPRECATED - use bootstrap.sh
├── onboarding.sh # 🔻 DEPRECATED - use bootstrap.sh
└── README.md # This file
```
---
## 🛠️ Core Tools
### `bootstrap.sh` — Unified Day0 Onboarding
**Purpose:** Intelligent initialization for all homelab hardware types. Auto-detects OS, hardware, and applies appropriate configuration.
**Replaces:** `day0bootstrap.sh`, `pi_init.sh`, `onboarding.sh`
**Features:**
- ✅ Auto-detection (Proxmox, Docker VM, Raspberry Pi, physical servers, AI workstations)
- ✅ Static IP configuration via netplan
- ✅ Docker & Ansible installation (with Debian Trixie compatibility)
- ✅ SSH key generation (ED25519 preferred, RSA fallback)
- ✅ Comprehensive validation suite
- ✅ Hardware fingerprinting with YAML/JSON output
- ✅ Ansible inventory auto-discovery
**Usage:**
```bash
./bootstrap.sh [OPTIONS]
Options:
--help Show detailed help
--dry-run Preview actions without changes
--hardware-type TYPE Override auto-detection
(proxmox|docker-vm|pi|physical-docker|ai-workstation)
--skip-network Skip network configuration
--skip-validation Skip post-bootstrap validation
--output-json Generate JSON output instead of YAML
--target-ip IP Set static IP address (default: auto-assigned)
--gateway IP Set gateway IP (default: 10.0.0.1)
--dns IP Set DNS server (default: 10.0.0.2)
```
**Examples:**
```bash
# Auto-detect and configure with defaults
./bootstrap.sh
# Preview what would be done (safe to run)
./bootstrap.sh --dry-run
# Force Proxmox mode with custom IP
./bootstrap.sh --hardware-type proxmox --target-ip 10.0.0.201
# Skip network config (already configured manually)
./bootstrap.sh --skip-network
# Generate JSON for monitoring integration
./bootstrap.sh --output-json
```
**Output Files:**
- `../ansible/archive/outputs/hardware-facts-{hostname}-{timestamp}.yml`
- `../ansible/archive/outputs/bootstrap-validation-{hostname}-{timestamp}.log`
- `../ansible/archive/inventory/discovered-hosts.yml` (auto-discovery)
**Workflow:**
1. **Detection** → Identify OS, hardware type, CPU, GPU
2. **Network Config** → Apply static IP via netplan (SSH will disconnect)
3. **Package Install** → Docker, Ansible, proxmoxer (as needed)
4. **SSH Keys** → Generate/verify ED25519 keys
5. **Validation** → Comprehensive health checks (disk, memory, network, etc.)
6. **Fingerprinting** → Generate hardware inventory YAML
**Network Warning:** Network configuration will disconnect SSH. Reconnect to new IP after ~10 seconds.
---
### `validate-node.sh` — Standalone Health Check Tool
**Purpose:** Comprehensive validation suite for operational readiness. Can be run post-bootstrap or ad-hoc on any managed node.
**Features:**
- ✅ Disk space & performance checks
- ✅ Memory & swap validation
- ✅ Network routing & connectivity tests
- ✅ NFS client configuration
- ✅ Docker daemon health
- ✅ Proxmox API accessibility (if applicable)
- ✅ SSH security audit
- ✅ Time synchronization verification
- ✅ JSON output for monitoring integration
**Usage:**
```bash
./validate-node.sh [OPTIONS]
Options:
--help Show detailed help
--json Output results in JSON format
--critical-only Show only critical errors
--verbose Show detailed output for each check
```
**Examples:**
```bash
# Run all checks with standard output
./validate-node.sh
# Only show critical errors (useful in scripts)
./validate-node.sh --critical-only
# JSON output for monitoring/alerting
./validate-node.sh --json | jq '.validation.errors'
# Verbose mode with detection summary
./validate-node.sh --verbose
```
**Exit Codes:**
- `0` → All checks passed (or warnings only)
- `1` → Critical errors found
- `2` → Invalid usage
**Integration Example:**
```bash
# Use in deployment pipeline
if ./validate-node.sh --critical-only; then
echo "Node healthy, proceeding with deployment"
else
echo "Critical issues found, aborting"
exit 1
fi
```
---
## 📚 Library Components
### `lib/detection.sh` — System Detection
**Functions:**
- `detect_os_family()` → debian, ubuntu, raspbian
- `detect_os_version()` → trixie, bookworm, noble, etc.
- `detect_hardware_type()` → proxmox, docker-vm, pi, physical-docker, ai-workstation
- `detect_cpu_vendor()` → intel, amd, arm
- `detect_cpu_generation()` → Intel generation number (12, 13, 14, etc.)
- `detect_gpu()` → nvidia, amd, intel, none
- `detect_primary_interface()` → Network interface name
- `get_current_ip()` → Current IPv4 address
**Usage:**
```bash
source lib/detection.sh
HW_TYPE=$(detect_hardware_type)
echo "Hardware type: $HW_TYPE"
```
---
### `lib/network.sh` — Network Configuration
**Functions:**
- `apply_static_ip()` → Configure netplan with static IP
- `configure_network_safe()` → Apply network changes with reconnection instructions
- `validate_connectivity()` → Test gateway, internet, DNS
- `check_nfs_accessibility()` → Test NFS server reachability
- `get_desired_vlan_ip()` → Determine VLAN IP based on hardware type (future)
**Usage:**
```bash
source lib/network.sh
configure_network_safe "10.0.0.151" "10.0.0.1" "10.0.0.2"
```
---
### `lib/validation.sh` — Health Checks
**Functions:**
- `check_disk_space()` → Validate sufficient disk space
- `check_memory()` → Validate RAM meets hardware type requirements
- `check_network_routes()` → Validate routing configuration
- `check_docker_daemon()` → Validate Docker installation and health
- `check_proxmox_api()` → Validate Proxmox VE (if applicable)
- `run_validation_suite()` → Run all checks and return summary
**Severity Levels:**
- **Critical** → Blocks Ansible playbook execution
- **Warning** → Manual review recommended
- **Info** → Logged for reference
---
### `lib/fingerprint.sh` — Hardware Inventory
**Functions:**
- `collect_cpu_facts()` → CPU model, cores, frequency
- `collect_memory_facts()` → RAM and swap details
- `collect_disk_facts()` → Storage capacity and type (SSD/HDD)
- `collect_network_facts()` → Network interfaces and connectivity
- `generate_hardware_facts_yaml()` → Full YAML output
- `generate_ansible_inventory_snippet()` → Ansible inventory entry
**Output Format (YAML):**
```yaml
hostname:
os:
family: " "bookworm"
cpu:
model: "Intel Core i7-12700"
cores: 12
memory:
total_gb: 32
storage:
root:
total_gb: 500
type: "ssd"
```
---
### `lib/proxmox.sh` — Proxmox VE Post-Install
**Comprehensive Proxmox configuration (inspired by community-scripts/ProxmoxVE):**
**Repository Management:**
- `configure_all_pve_repos()` → Complete repository configuration
- `disable_pve_enterprise_repo()` → Disable subscription-required repository
- `enable_pve_no_subscription_repo()` → Enable free updates repository
- `configure_ceph_repos()` → Configure Ceph storage repositories
- `add_pvetest_repo()` → Add beta/test repository (disabled by default)
**Subscription Nag Removal:**
- `disable_subscription_nag()` → Remove subscription warnings from UI
- Patches web UI JavaScript (proxmoxlib.js)
- Patches mobile UI templates
- Creates apt hook to persist after updates
**High Availability Management:**
- `disable_ha_services()` → Disable HA for single-node setups (saves resources)
- `enable_ha_services()` → Enable HA for clustered environments
- `check_ha_status()` → Check current HA service status
**System Maintenance:**
- `update_pve_system()` → Run full system update (apt dist-upgrade)
- `check_reboot_required()` → Check if reboot needed after updates
**Comprehensive Routine:**
- `run_proxmox_post_install()` → **All-in-one post-install configuration**
- Repository fixes (PVE 8.x and 9.x compatible)
- Subscription nag removal
- Auto-detect single vs. cluster and adjust HA accordingly
- System update checks
- Reboot requirement detection
**Validation:**
- `validate_proxmox_config()` → Verify Proxmox configuration
- `get_pve_version()` → Get Proxmox VE version
- `is_pve_supported_version()` → Check if version is supported (8.0-8.9.x, 9.0-9.1.x)
**Usage:**
```bash
source lib/proxmox.sh
# Run complete post-install (auto mode)
run_proxmox_post_install "auto"
# Individual functions
disable_subscription_nag
disable_ha_services # For single-node setups
validate_proxmox_config
```
**What It Does (Compared to Community Script):**
- ✅ Repository management (enterprise/no-subscription/Ceph/pvetest)
- ✅ Subscription nag removal (web + mobile UI)
- ✅ HA service management (auto-detect single vs. cluster)
- ✅ Supports PVE 8.x (.list format) and 9.x (.sources deb822 format)
- ✅ System update and reboot detection
- ✅ Automated (non-interactive) for use in bootstrap.sh
**Integration:**
When `bootstrap.sh` detects Proxmox hardware, it automatically runs `run_proxmox_post_install "auto"` after package installation, providing the same comprehensive post-install configuration as the community tteck/ProxmoxVE script. type: "ssd"
```
---
## 🔻 Deprecated Scripts
**These scripts are deprecated and will be removed in a future release. Use `bootstrap.sh` instead.**
### `day0bootstrap.sh``bootstrap.sh --hardware-type pi`
**Original Purpose:** Debian Trixie bootstrap for Raspberry Pi
**Status:** Wrapper redirects to bootstrap.sh with deprecation warning
### `pi_init.sh``bootstrap.sh --hardware-type pi`
**Original Purpose:** Raspberry Pi initialization
**Status:** Wrapper redirects to bootstrap.sh with deprecation warning
### `onboarding.sh``bootstrap.sh`
**Original Purpose:** Generic onboarding + Proxmox integration
**Status:** Wrapper redirects to bootstrap.sh with deprecation warning
**Migration Path:**
All legacy scripts now display a deprecation warning and automatically redirect to `bootstrap.sh` with appropriate flags. You have 6 months to update your workflows before these wrappers are removed.
---
## 🎯 Common Workflows
### New Raspberry Pi Control Node
```bash
# 1. Transfer scripts to Pi
scp -r scripts/ chester@10.0.0.200:~/
# 2. SSH to Pi
ssh chester@10.0.0.200
# 3. Run bootstrap
cd scripts
./bootstrap.sh # Auto-detects Pi hardware
# 4. SSH will disconnect - reconnect to new IP
ssh chester@10.0.0.200
```
### New Proxmox Host
```bash
# On Proxmox console (pre-SSH setup)
curl -O https://git.castaldifamily.com/nathan/homelab/raw/branch/main/scripts/bootstrap.sh
chmod +x bootstrap.sh
./bootstrap.sh --hardware-type proxmox --target-ip 10.0.0.201
```
### New Docker Swarm VM
```bash
# From control node (Watchtower)
scp -r scripts/ chester@10.0.0.211:~/
ssh chester@10.0.0.211
cd scripts
./bootstrap.sh --hardware-type docker-vm --target-ip 10.0.0.211
```
### Health Check All Nodes (from Control Node)
```bash
# Create validation script
cat > validate-all.sh <<'EOF'
#!/bin/bash
for host in heimdall waldorf watchtower; do
echo "=== $host ==="
ssh $host "~/scripts/validate-node.sh --critical-only"
done
EOF
chmod +x validate-all.sh
./validate-all.sh
```
---
## 🧪 Testing & Development
### Test Bootstrap in Dry-Run Mode
```bash
# Safe to run on production nodes
./bootstrap.sh --dry-run --hardware-type docker-vm
```
### Manually Test Libraries
```bash
# Source library and test functions
source lib/detection.sh
print_detection_summary
source lib/validation.sh
run_validation_suite
```
### Validate Script Syntax
```bash
# Check for syntax errors
bash -n bootstrap.sh
shellcheck bootstrap.sh lib/*.sh
```
---
## 📖 Documentation
**Primary References:**
- [SOP-002: Initial Infrastructure Deployment](../documentation/SOPs/SOP-002-Initial-Infrastructure-Deployment.md)
- [Technical Runbook](../documentation/TECHNICAL_RUNBOOK.md)
- [Environment Constraints](../ansible/archive/documentation/standards/environment-constraints.md)
**Related Ansible Playbooks:**
- [generic_host.yml](../ansible/archive/playbooks/onboarding/generic_host.yml) — Second-stage onboarding
- [proxmox_host.yml](../ansible/archive/playbooks/onboarding/proxmox_host.yml) — Proxmox-specific config
- [gather_hardware_facts.yml](../ansible/archive/playbooks/preflight/gather_hardware_facts.yml) — Ansible fact collection
---
## 🐛 Troubleshooting
**Bootstrap fails with "could not detect primary interface":**
- Check network cable/Wi-Fi connection
- Run manually: `ip -o link show` to see interfaces
- Override: `INTERFACE=eth0 ./bootstrap.sh`
**SSH disconnects and doesn't reconnect:**
- Wait 30 seconds and retry
- Check console access if available
- Verify IP in router DHCP leases
**Validation reports critical errors:**
- Review log: `cat ../ansible/archive/outputs/bootstrap-validation-*.log`
- Address critical issues (disk space, memory, etc.)
- Re-run validation: `./validate-node.sh`
**Docker installation fails on Debian Trixie:**
- Script automatically uses Bookworm repos as fallback
- Check logs: `journalctl -u docker`
**Hardware fingerprinting shows "unknown" values:**
- Some VMs may not expose full hardware info
- Run `sudo dmidecode --type system` for manual inspection
---
## 🔮 Future Enhancements
**Planned (Not Yet Implemented):**
- [ ] PXE boot + cloud-init provisioning (eliminate manual OS install)
- [ ] Active VLAN migration logic (10.0.0.x → 10.0.10.x/10.0.200.x)
- [ ] Proxmox VM auto-provisioning via API
- [ ] Node retirement/decommissioning automation
- [ ] Integration with Prometheus node exporter metrics
- [ ] Containerized control node for disaster recovery
**VLAN Placeholder:**
The `get_desired_vlan_ip()` function contains placeholders for future VLAN segmentation. Current implementation uses flat 10.0.0.x network. See comments in `lib/network.sh` for migration plan.
---
## 📝 Version History
- **v4.0.0** (2026-04-12): Initial unified bootstrap release
- Consolidated 3 legacy scripts into single entry point
- Added modular library architecture
- Comprehensive validation and fingerprinting
- Auto-discovery with Ansible inventory generation
---
**Questions?** See [documentation/TECHNICAL_RUNBOOK.md](../documentation/TECHNICAL_RUNBOOK.md) or review session memory in `/memories/session/plan.md`.
# scripts # scripts
Automation utilities and helper scripts for homelab infrastructure management. Automation utilities and helper scripts for homelab infrastructure management.

488
scripts/bootstrap.sh Normal file
View File

@ -0,0 +1,488 @@
#!/bin/bash
# ==============================================================================
# UNIFIED HOMELAB BOOTSTRAP SCRIPT
# ==============================================================================
# Intelligent day0 onboarding for all homelab hardware types
# Replaces: day0bootstrap.sh, pi_init.sh, onboarding.sh
#
# Usage:
# ./bootstrap.sh [OPTIONS]
#
# Options:
# --help Show this help message
# --dry-run Show what would be done without making changes
# --hardware-type TYPE Override auto-detection (proxmox|docker-vm|pi|physical-docker|ai-workstation)
# --skip-network Skip network configuration
# --skip-validation Skip post-bootstrap validation
# --output-json Generate JSON output instead of YAML
# --target-ip IP Target static IP address (default: auto-assigned)
#
# Examples:
# ./bootstrap.sh # Auto-detect and configure
# ./bootstrap.sh --dry-run # Preview actions
# ./bootstrap.sh --hardware-type proxmox # Force Proxmox mode
# ./bootstrap.sh --target-ip 10.0.0.205 # Custom IP address
#
# ==============================================================================
set -euo pipefail
# --- SCRIPT METADATA ---
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SCRIPT_NAME="$(basename "${BASH_SOURCE[0]}")"
VERSION="4.0.0"
TIMESTAMP=$(date +%Y%m%dT%H%M%S)
# --- LOAD LIBRARIES ---
# shellcheck source=./lib/detection.sh
source "${SCRIPT_DIR}/lib/detection.sh"
# shellcheck source=./lib/network.sh
source "${SCRIPT_DIR}/lib/network.sh"
# shellcheck source=./lib/validation.sh
source "${SCRIPT_DIR}/lib/validation.sh"
# shellcheck source=./lib/fingerprint.sh
source "${SCRIPT_DIR}/lib/fingerprint.sh"
# shellcheck source=./lib/proxmox.sh
source "${SCRIPT_DIR}/lib/proxmox.sh"
# --- COMMAND LINE ARGUMENTS ---
SHOW_HELP=false
DRY_RUN=false
HARDWARE_TYPE="auto"
SKIP_NETWORK=false
SKIP_VALIDATION=false
OUTPUT_JSON=false
TARGET_IP=""
GATEWAY="10.0.0.1"
DNS_SERVER="10.0.0.2"
parse_arguments() {
while [[ $# -gt 0 ]]; do
case "$1" in
--help|-h)
SHOW_HELP=true
shift
;;
--dry-run)
DRY_RUN=true
shift
;;
--hardware-type)
HARDWARE_TYPE="$2"
shift 2
;;
--skip-network)
SKIP_NETWORK=true
shift
;;
--skip-validation)
SKIP_VALIDATION=true
shift
;;
--output-json)
OUTPUT_JSON=true
shift
;;
--target-ip)
TARGET_IP="$2"
shift 2
;;
--gateway)
GATEWAY="$2"
shift 2
;;
--dns)
DNS_SERVER="$2"
shift 2
;;
*)
echo "ERROR: Unknown option: $1" >&2
echo "Use --help for usage information" >&2
exit 1
;;
esac
done
}
show_help() {
cat <<'EOF'
=============================================================================
UNIFIED HOMELAB BOOTSTRAP SCRIPT v4.0.0
=============================================================================
Intelligent day0 onboarding for all homelab hardware types.
Auto-detects OS, hardware type, and applies appropriate configuration.
USAGE:
./bootstrap.sh [OPTIONS]
OPTIONS:
--help Show this help message
--dry-run Preview actions without making changes
--hardware-type TYPE Override auto-detection
Types: proxmox, docker-vm, pi, physical-docker, ai-workstation
--skip-network Skip network configuration (use if already configured)
--skip-validation Skip post-bootstrap validation checks
--output-json Generate JSON output instead of YAML
--target-ip IP Set static IP address (default: auto-assigned)
--gateway IP Set gateway IP (default: 10.0.0.1)
--dns IP Set DNS server (default: 10.0.0.2)
EXAMPLES:
./bootstrap.sh
Auto-detect hardware and configure with defaults
./bootstrap.sh --dry-run
Preview what would be done without making changes
./bootstrap.sh --hardware-type proxmox --target-ip 10.0.0.201
Force Proxmox mode and set specific IP
./bootstrap.sh --skip-network
Skip network configuration (already configured manually)
WORKFLOW:
1. System Detection → Identify OS, hardware type, CPU, GPU
2. Network Config → Apply static IP via netplan (skippable)
3. Package Install → Docker, Ansible, proxmoxer (as needed)
4. SSH Keys → Generate/verify ED25519 keys
5. Validation → Comprehensive health checks
6. Fingerprinting → Generate hardware inventory YAML
OUTPUT FILES:
ansible/archive/outputs/bootstrap-validation-{hostname}-{timestamp}.log
ansible/archive/outputs/hardware-facts-{hostname}-{timestamp}.yml
ansible/archive/inventory/discovered-hosts.yml (auto-discovery)
NOTES:
- Network configuration will disconnect SSH (reconnect to new IP)
- Run from console or plan for reconnection
- Safe to re-run (idempotent where possible)
- Logs saved even if script interrupted
=============================================================================
EOF
}
# --- MAIN WORKFLOW ---
main() {
# Parse CLI arguments
parse_arguments "$@"
if [ "$SHOW_HELP" == "true" ]; then
show_help
exit 0
fi
# === PHASE 1: DETECTION ===
echo "======================================="
echo "HOMELAB BOOTSTRAP v${VERSION}"
echo "Timestamp: $(date -u +"%Y-%m-%d %H:%M:%S UTC")"
echo "======================================="
echo ""
echo "[PHASE 1] System Detection"
echo "---"
# Detect hardware type
if [ "$HARDWARE_TYPE" == "auto" ]; then
HARDWARE_TYPE=$(detect_hardware_type)
echo " Auto-detected hardware type: $HARDWARE_TYPE"
else
echo " Hardware type (forced): $HARDWARE_TYPE"
fi
# Print detection summary
print_detection_summary
echo ""
# Determine target IP if not specified
if [ -z "$TARGET_IP" ]; then
TARGET_IP=$(get_desired_vlan_ip)
echo " Auto-assigned target IP: $TARGET_IP"
echo " (Based on hardware type and environment-constraints.md)"
else
echo " Target IP (user-specified): $TARGET_IP"
fi
echo ""
# Dry-run check
if [ "$DRY_RUN" == "true" ]; then
echo "[DRY-RUN MODE] Would perform the following actions:"
echo ""
echo " 1. Configure network: $TARGET_IP (gateway: $GATEWAY, DNS: $DNS_SERVER)"
echo " 2. Install Docker (with Debian Trixie workaround if needed)"
echo " 3. Install Ansible"
[ "$HARDWARE_TYPE" == "proxmox" ] && echo " 4. Apply Proxmox repository fixes"
[ "$HARDWARE_TYPE" == "proxmox" ] && echo " 5. Install proxmoxer Python library"
echo " 6. Generate/verify SSH keys (ED25519)"
echo " 7. Run validation suite"
echo " 8. Generate hardware fingerprint"
echo ""
echo "No changes made. Re-run without --dry-run to execute."
exit 0
fi
# === PHASE 2: NETWORK CONFIGURATION ===
if [ "$SKIP_NETWORK" == "false" ]; then
echo "[PHASE 2] Network Configuration"
echo "---"
configure_network_safe "$TARGET_IP" "$GATEWAY" "$DNS_SERVER"
# Wait for network to stabilize
sleep 3
wait_for_network 15
echo ""
else
echo "[PHASE 2] Network Configuration (SKIPPED)"
echo ""
fi
# === PHASE 3: PACKAGE INSTALLATION ===
echo "[PHASE 3] Package Installation"
echo "---"
# Update package lists
echo " [⚙] Updating package lists..."
sudo apt-get update -qq
# Install prerequisites
echo " [⚙] Installing prerequisites (ca-certificates, curl, gnupg)..."
sudo apt-get install -y -qq ca-certificates curl gnupg lsb-release
# --- DOCKER INSTALLATION ---
if ! command -v docker &>/dev/null; then
echo " [⚙] Installing Docker..."
# Remove existing Docker repo configs
sudo rm -f /etc/apt/sources.list.d/docker.list
sudo rm -f /etc/apt/sources.list.d/docker*.list
# Add Docker GPG key
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg | \
sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg --yes
# Determine repository codename
local os_version=$(detect_os_version)
local repo_codename="$os_version"
# Debian Trixie workaround (use Bookworm repos)
if is_debian_trixie; then
echo " [!] Debian Trixie detected - using Bookworm repos for Docker"
repo_codename="bookworm"
fi
# Add Docker repository
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/debian $repo_codename stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# Install Docker
sudo apt-get update -qq
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
# Add current user to docker group
sudo usermod -aG docker "$USER"
echo " [✓] Docker installed: $(docker --version)"
else
echo " [✓] Docker already installed: $(docker --version)"
fi
# --- ANSIBLE INSTALLATION ---
if ! command -v ansible &>/dev/null; then
echo " [⚙] Installing Ansible..."
sudo apt-get install -y ansible
echo " [✓] Ansible installed: $(ansible --version | head -n1)"
else
echo " [✓] Ansible already installed: $(ansible --version | head -n1)"
fi
# --- PROXMOX-SPECIFIC PACKAGES ---
if [ "$HARDWARE_TYPE" == "proxmox" ]; then
echo " [⚙] Installing Proxmox-specific packages..."
# Install Python pip if needed
if ! command -v pip3 &>/dev/null; then
sudo apt-get install -y python3-pip
fi
# Install proxmoxer
if ! python3 -c "import proxmoxer" 2>/dev/null; then
echo " [⚙] Installing proxmoxer Python library..."
pip3 install proxmoxer --break-system-packages 2>/dev/null || pip3 install proxmoxer
echo " [✓] proxmoxer installed"
else
echo " [✓] proxmoxer already installed"
fiInstall jq for Proxmox post-install tasks
if ! command -v jq &>/dev/null; then
sudo apt-get install -y jq
fi
echo ""
echo "=== PROXMOX POST-INSTALL CONFIGURATION ==="
# Run comprehensive Proxmox post-install routine
# This includes: repository fixes, subscription nag removal, HA management
run_proxmox_post_install "auto"
echo "=========================================="
echo "" echo "deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription" | \
sudo tee /etc/apt/sources.list.d/pve-no-subscription.list > /dev/null
fi
fi
# --- UTILITY PACKAGES ---
echo " [⚙] Installing utility packages..."
sudo apt-get install -y -qq git vim htop curl wget nfs-common net-tools dnsutils
echo " [✓] Package installation complete"
echo ""
# === PHASE 4: SSH KEY MANAGEMENT ===
echo "[PHASE 4] SSH Key Management"
echo "---"
local ssh_key_path=""
# Check for existing keys (prefer ED25519)
if [ -f "$HOME/.ssh/id_ed25519" ]; then
ssh_key_path="$HOME/.ssh/id_ed25519"
echo " [✓] Found existing ED25519 key: $ssh_key_path"
elif [ -f "$HOME/.ssh/id_rsa" ]; then
ssh_key_path="$HOME/.ssh/id_rsa"
echo " [✓] Found existing RSA key: $ssh_key_path"
else
# Generate new ED25519 key
ssh_key_path="$HOME/.ssh/id_ed25519"
echo " [⚙] Generating new ED25519 key pair..."
ssh-keygen -t ed25519 -f "$ssh_key_path" -N "" -C "$(whoami)@$(hostname)-$(date +%Y%m%d)"
echo " [✓] SSH key generated: $ssh_key_path"
fi
# Display public key
echo ""
echo " Public key (add to authorized_keys on managed nodes):"
echo " ---"
cat "${ssh_key_path}.pub"
echo " ---"
echo ""
# === PHASE 5: VALIDATION ===
if [ "$SKIP_VALIDATION" == "false" ]; then
echo "[PHASE 5] System Validation"
echo "---"
# Run comprehensive validation
if run_validation_suite; then
echo ""
echo " [✓] All validation checks passed"
else
echo ""
echo " [!] Validation completed with errors - review above"
echo " [!] You may proceed, but manual intervention may be required"
fi
echo ""
# Save validation log
local log_dir="${SCRIPT_DIR}/../ansible/archive/outputs"
mkdir -p "$log_dir"
local log_file="${log_dir}/bootstrap-validation-$(hostname)-${TIMESTAMP}.log"
# Re-run validation to capture log
run_validation_suite 2>&1 | tee "$log_file" >/dev/null
echo " [✓] Validation log saved: $log_file"
echo ""
else
echo "[PHASE 5] System Validation (SKIPPED)"
echo ""
fi
# === PHASE 6: HARDWARE FINGERPRINTING ===
echo "[PHASE 6] Hardware Fingerprinting"
echo "---"
# Print summary
print_hardware_summary
echo ""
# Validate against standards
echo " Checking against environment-constraints.md standards..."
validate_against_standards || true
echo ""
# Save hardware facts
local facts_dir="${SCRIPT_DIR}/../ansible/archive/outputs"
local facts_file=$(save_hardware_facts "$facts_dir")
echo " [✓] Hardware facts saved: $facts_file"
# Save JSON if requested
if [ "$OUTPUT_JSON" == "true" ]; then
local json_file="${facts_dir}/hardware-facts-$(hostname)-${TIMESTAMP}.json"
generate_json_output > "$json_file"
echo " [✓] JSON output saved: $json_file"
fi
# Generate inventory snippet
local inventory_dir="${SCRIPT_DIR}/../ansible/archive/inventory"
mkdir -p "$inventory_dir"
local inventory_file=$(append_to_discovered_inventory "${inventory_dir}/discovered-hosts.yml")
echo " [✓] Inventory snippet appended: $inventory_file"
echo ""
# === COMPLETION ===
echo "======================================="
echo "BOOTSTRAP COMPLETE"
echo "======================================="
echo ""
echo "Summary:"
echo " Hostname: $(hostname)"
echo " IP Address: $(get_current_ip)"
echo " Hardware Type: $HARDWARE_TYPE"
echo " OS: $(detect_os_family) $(detect_os_version)"
echo ""
echo "Next Steps:"
echo " 1. Reconnect SSH if network was reconfigured: ssh $(whoami)@$(get_current_ip)"
echo " 2. Verify Docker: docker ps"
echo " 3. Verify Ansible: ansible --version"
echo " 4. Review hardware facts: cat $facts_file"
echo " 5. Run Ansible playbook: ansible-playbook playbooks/onboarding/generic_host.yml"
echo ""
echo "Files Generated:"
echo " - $facts_file"
[ "$OUTPUT_JSON" == "true" ] && echo " - ${facts_dir}/hardware-facts-$(hostname)-${TIMESTAMP}.json"
[ "$SKIP_VALIDATION" == "false" ] && echo " - ${log_dir}/bootstrap-validation-$(hostname)-${TIMESTAMP}.log"
echo " - $inventory_file"
echo ""
echo "Documentation:"
echo " - SOP: documentation/SOPs/SOP-002-Initial-Infrastructure-Deployment.md"
echo " - Technical Runbook: documentation/TECHNICAL_RUNBOOK.md"
echo ""
echo "Have a great day! 🚀"
echo ""
}
# --- ENTRY POINT ---
# Trap errors
trap 'echo "ERROR: Bootstrap failed at line $LINENO. Check logs for details." >&2' ERR
# Run main workflow
main "$@"

View File

@ -1,11 +1,42 @@
#!/bin/bash #!/bin/bash
# ============================================================================== # ==============================================================================
# DEBIAN TRIXIE BOOTSTRAP: IP, DOCKER, ANSIBLE # DEPRECATED: day0bootstrap.sh
# ==============================================================================
# ⚠️ DEPRECATION NOTICE
# This script is deprecated and will be removed in a future release.
# Please use the unified bootstrap.sh script instead:
#
# ./bootstrap.sh --hardware-type pi
#
# This wrapper will redirect to bootstrap.sh with appropriate flags.
# ============================================================================== # ==============================================================================
set -euo pipefail set -euo pipefail
# Show deprecation warning
echo "=======================================" >&2
echo "⚠️ DEPRECATION WARNING" >&2
echo "=======================================" >&2
echo "day0bootstrap.sh is deprecated!" >&2
echo "" >&2
echo "Please use: ./bootstrap.sh" >&2
echo "" >&2
echo "Redirecting to bootstrap.sh in 5 seconds..." >&2
echo "Press Ctrl+C to cancel" >&2
echo "=======================================" >&2
sleep 5
# Redirect to unified bootstrap
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
exec "${SCRIPT_DIR}/bootstrap.sh" --hardware-type pi "$@"
# ==============================================================================
# LEGACY CODE BELOW (no longer executed)
# ==============================================================================
exit 0
# --- 1. SET STATIC IP (Netplan) --- # --- 1. SET STATIC IP (Netplan) ---
echo "[⚙] Configuring Static IP to 10.0.0.200..." echo "[⚙] Configuring Static IP to 10.0.0.200..."

362
scripts/lib/detection.sh Normal file
View File

@ -0,0 +1,362 @@
#!/bin/bash
# ==============================================================================
# DETECTION LIBRARY: OS, Hardware, and Environment Detection
# ==============================================================================
# Part of unified bootstrap system for homelab infrastructure
# Provides functions to identify OS family, hardware type, CPU, GPU, and
# deployment environment for intelligent provisioning decisions.
# ==============================================================================
# --- OS DETECTION ---
detect_os_family() {
# Detect OS family: debian, ubuntu, raspbian, or unknown
# Returns lowercase OS identifier
if [ -f /etc/os-release ]; then
. /etc/os-release
case "${ID,,}" in
debian)
echo "debian"
return 0
;;
ubuntu)
echo "ubuntu"
return 0
;;
raspbian)
echo "raspbian"
return 0
;;
*)
echo "unknown"
return 1
;;
esac
else
echo "unknown"
return 1
fi
}
detect_os_version() {
# Get OS version codename (e.g., trixie, bookworm, noble)
if [ -f /etc/os-release ]; then
. /etc/os-release
echo "${VERSION_CODENAME:-unknown}"
else
echo "unknown"
fi
}
is_debian_trixie() {
# Special detection for Debian Trixie (requires Docker repo workaround)
local os_family=$(detect_os_family)
local os_version=$(detect_os_version)
[[ "$os_family" == "debian" && "$os_version" == "trixie" ]]
}
# --- HARDWARE TYPE DETECTION ---
detect_hardware_type() {
# Auto-detect hardware deployment type
# Returns: proxmox, docker-vm, pi, physical-docker, ai-workstation, unknown
# Check for Proxmox VE installation
if command -v pveversion &>/dev/null; then
echo "proxmox"
return 0
fi
# Check if running inside a VM (common indicators)
local is_vm=0
if systemd-detect-virt --vm &>/dev/null; then
is_vm=1
elif [ -d /sys/class/dmi/id ]; then
local product_name=$(cat /sys/class/dmi/id/product_name 2>/dev/null || echo "")
if [[ "$product_name" =~ (VirtualBox|VMware|KVM|QEMU) ]]; then
is_vm=1
fi
fi
# Check for Raspberry Pi
if grep -q "Raspberry Pi" /proc/cpuinfo 2>/dev/null; then
echo "pi"
return 0
fi
# Check for Docker installation (distinguishes VM vs physical)
local has_docker=0
if command -v docker &>/dev/null || [ -f /usr/bin/docker ]; then
has_docker=1
fi
# Check for high-end GPU (indicates AI workstation)
local gpu_type=$(detect_gpu)
if [[ "$gpu_type" =~ (nvidia|amd) ]]; then
local gpu_info=$(lspci 2>/dev/null | grep -i vga | head -n1)
# High-end NVIDIA cards (RTX, Tesla, Quadro) or AMD (Radeon Pro, Instinct)
if [[ "$gpu_info" =~ (RTX|Tesla|Quadro|A[0-9]{3,4}|Radeon Pro|Instinct) ]]; then
echo "ai-workstation"
return 0
fi
fi
# Classify based on VM + Docker
if [ $is_vm -eq 1 ]; then
echo "docker-vm"
return 0
else
echo "physical-docker"
return 0
fi
}
# --- CPU DETECTION ---
detect_cpu_vendor() {
# Detect CPU vendor: intel, amd, arm, or unknown
if [ -f /proc/cpuinfo ]; then
if grep -qi "intel" /proc/cpuinfo; then
echo "intel"
return 0
elif grep -qi "amd" /proc/cpuinfo; then
echo "amd"
return 0
elif grep -qi "arm\|aarch64" /proc/cpuinfo; then
echo "arm"
return 0
fi
fi
echo "unknown"
return 1
}
detect_cpu_generation() {
# Detect Intel CPU generation for kernel parameter tuning
# Returns generation number (e.g., 12, 13, 14) or "unknown"
local vendor=$(detect_cpu_vendor)
if [ "$vendor" != "intel" ]; then
echo "unknown"
return 1
fi
local model_name=$(grep "model name" /proc/cpuinfo | head -n1 | cut -d: -f2 | xargs)
# Extract generation from model name patterns
# Example: "12th Gen Intel(R) Core(TM) i7-12700"
if [[ "$model_name" =~ ([0-9]{2})th\ Gen ]]; then
echo "${BASH_REMATCH[1]}"
return 0
elif [[ "$model_name" =~ i[3579]-([0-9]{2})[0-9]{2,3} ]]; then
echo "${BASH_REMATCH[1]}"
return 0
fi
echo "unknown"
return 1
}
needs_intel_hybrid_core_tuning() {
# 12th Gen+ Intel CPUs with hybrid architecture need special kernel params
# Returns 0 (true) if tuning required, 1 (false) otherwise
local gen=$(detect_cpu_generation)
# 12th Gen (Alder Lake) and newer have P-cores + E-cores
if [[ "$gen" =~ ^[0-9]+$ ]] && [ "$gen" -ge 12 ]; then
return 0
fi
return 1
}
# --- GPU DETECTION ---
detect_gpu() {
# Detect GPU vendor: nvidia, amd, intel, or none
if ! command -v lspci &>/dev/null; then
echo "none"
return 1
fi
local gpu_info=$(lspci 2>/dev/null | grep -i vga)
if echo "$gpu_info" | grep -qi nvidia; then
echo "nvidia"
return 0
elif echo "$gpu_info" | grep -qi amd; then
echo "amd"
return 0
elif echo "$gpu_info" | grep -qi intel; then
echo "intel"
return 0
fi
echo "none"
return 1
}
get_gpu_model() {
# Get detailed GPU model information
if ! command -v lspci &>/dev/null; then
echo "unknown"
return 1
fi
local gpu_line=$(lspci 2>/dev/null | grep -i "vga\|3d\|display" | head -n1)
if [ -n "$gpu_line" ]; then
# Extract model after the vendor info
echo "$gpu_line" | cut -d: -f3 | xargs
else
echo "none"
fi
}
# --- NETWORK INTERFACE DETECTION ---
detect_primary_interface() {
# Find the primary physical network interface (excludes lo, docker, veth)
# Try to find interface with default route
local iface=$(ip route show default 2>/dev/null | awk '/^default/ {print $5; exit}')
if [ -n "$iface" ]; then
echo "$iface"
return 0
fi
# Fallback: first non-loopback physical interface
iface=$(ip -o link show | awk -F': ' '$2 != "lo" && $2 !~ /^(docker|veth|br-)/ {print $2; exit}')
if [ -n "$iface" ]; then
echo "$iface"
return 0
fi
echo "unknown"
return 1
}
get_current_ip() {
# Get current IPv4 address of primary interface
local iface=$(detect_primary_interface)
if [ "$iface" == "unknown" ]; then
echo "unknown"
return 1
fi
local ip=$(ip -4 addr show "$iface" 2>/dev/null | grep -oP '(?<=inet\s)\d+(\.\d+){3}' | head -n1)
if [ -n "$ip" ]; then
echo "$ip"
else
echo "unknown"
return 1
fi
}
# --- ENVIRONMENT DETECTION ---
is_running_in_container() {
# Check if running inside a container (Docker, LXC, etc.)
if [ -f /.dockerenv ]; then
return 0
fi
if grep -q "lxc\|docker" /proc/1/cgroup 2>/dev/null; then
return 0
fi
return 1
}
detect_init_system() {
# Detect init system: systemd, sysvinit, or unknown
if [ -d /run/systemd/system ]; then
echo "systemd"
return 0
elif [ -f /sbin/init ]; then
local init_path=$(readlink -f /sbin/init)
if [[ "$init_path" =~ systemd ]]; then
echo "systemd"
return 0
fi
fi
echo "unknown"
return 1
}
# --- PACKAGE MANAGER DETECTION ---
detect_package_manager() {
# Detect primary package manager: apt, dnf, yum, or unknown
if command -v apt-get &>/dev/null; then
echo "apt"
return 0
elif command -v dnf &>/dev/null; then
echo "dnf"
return 0
elif command -v yum &>/dev/null; then
echo "yum"
return 0
fi
echo "unknown"
return 1
}
# --- SUMMARY FUNCTION ---
print_detection_summary() {
# Print comprehensive detection summary to stderr for logging
cat >&2 <<EOF
=== System Detection Summary ===
OS Family: $(detect_os_family)
OS Version: $(detect_os_version)
Hardware Type: $(detect_hardware_type)
CPU Vendor: $(detect_cpu_vendor)
CPU Generation: $(detect_cpu_generation)
GPU: $(detect_gpu) ($(get_gpu_model))
Primary NIC: $(detect_primary_interface)
Current IP: $(get_current_ip)
Init System: $(detect_init_system)
Package Manager: $(detect_package_manager)
In Container: $(is_running_in_container && echo "yes" || echo "no")
================================
EOF
}
# Export functions for use in other scripts
export -f detect_os_family
export -f detect_os_version
export -f is_debian_trixie
export -f detect_hardware_type
export -f detect_cpu_vendor
export -f detect_cpu_generation
export -f needs_intel_hybrid_core_tuning
export -f detect_gpu
export -f get_gpu_model
export -f detect_primary_interface
export -f get_current_ip
export -f is_running_in_container
export -f detect_init_system
export -f detect_package_manager
export -f print_detection_summary

494
scripts/lib/fingerprint.sh Normal file
View File

@ -0,0 +1,494 @@
#!/bin/bash
# ==============================================================================
# FINGERPRINT LIBRARY: Hardware Inventory Collection
# ==============================================================================
# Part of unified bootstrap system for homelab infrastructure
# Collects comprehensive hardware facts and generates structured YAML output
# compatible with gather_hardware_facts.yml playbook format.
# ==============================================================================
# Source detection library if not already loaded
if ! type -t detect_os_family &>/dev/null; then
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=./detection.sh
source "${SCRIPT_DIR}/detection.sh"
fi
# --- HARDWARE FACT COLLECTION ---
collect_cpu_facts() {
# Collect CPU information in structured format
# Returns JSON-like structured data
local cpu_model=$(grep "model name" /proc/cpuinfo | head -n1 | cut -d: -f2 | xargs)
local cpu_cores=$(nproc 2>/dev/null || echo "unknown")
local cpu_vendor=$(detect_cpu_vendor)
local cpu_gen=$(detect_cpu_generation)
local cpu_arch=$(uname -m)
# Get CPU frequency (current)
local cpu_freq_mhz=$(grep "cpu MHz" /proc/cpuinfo | head -n1 | awk '{print $4}' | cut -d. -f1)
# Get CPU max frequency if available
local cpu_max_freq="unknown"
if [ -f /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq ]; then
local freq_khz=$(cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq)
cpu_max_freq=$((freq_khz / 1000))
fi
cat <<EOF
cpu:
model: "$cpu_model"
vendor: "$cpu_vendor"
architecture: "$cpu_arch"
generation: "$cpu_gen"
cores: $cpu_cores
current_freq_mhz: ${cpu_freq_mhz:-unknown}
max_freq_mhz: $cpu_max_freq
EOF
}
collect_memory_facts() {
# Collect memory information
local mem_total_kb=$(grep MemTotal /proc/meminfo | awk '{print $2}')
local mem_total_mb=$((mem_total_kb / 1024))
local mem_total_gb=$((mem_total_kb / 1024 / 1024))
local mem_free_kb=$(grep MemFree /proc/meminfo | awk '{print $2}')
local mem_free_mb=$((mem_free_kb / 1024))
local swap_total_kb=$(grep SwapTotal /proc/meminfo | awk '{print $2}')
local swap_total_mb=$((swap_total_kb / 1024))
cat <<EOF
memory:
total_mb: $mem_total_mb
total_gb: $mem_total_gb
free_mb: $mem_free_mb
swap_mb: $swap_total_mb
EOF
}
collect_disk_facts() {
# Collect disk/storage information
cat <<EOF
storage:
EOF
# Root partition
local root_total=$(df -BG / | awk 'NR==2 {print $2}' | sed 's/G//')
local root_used=$(df -BG / | awk 'NR==2 {print $3}' | sed 's/G//')
local root_avail=$(df -BG / | awk 'NR==2 {print $4}' | sed 's/G//')
local root_device=$(df / | awk 'NR==2 {print $1}')
cat <<EOF
root:
device: "$root_device"
total_gb: $root_total
used_gb: $root_used
available_gb: $root_avail
EOF
# Detect disk type (SSD vs HDD)
local device_name=$(echo "$root_device" | sed 's|/dev/||' | sed 's/[0-9]*$//')
local disk_type="unknown"
if [ -f "/sys/block/$device_name/queue/rotational" ]; then
local rotational=$(cat "/sys/block/$device_name/queue/rotational")
if [ "$rotational" -eq 0 ]; then
disk_type="ssd"
else
disk_type="hdd"
fi
fi
cat <<EOF
type: "$disk_type"
EOF
# List all block devices
if command -v lsblk &>/dev/null; then
local block_devices=$(lsblk -d -n -o NAME,SIZE,TYPE | grep disk | awk '{printf " - { name: \"%s\", size: \"%s\" }\n", $1, $2}')
if [ -n "$block_devices" ]; then
cat <<EOF
block_devices:
$block_devices
EOF
fi
fi
}
collect_network_facts() {
# Collect network interface information
local primary_iface=$(detect_primary_interface)
local current_ip=$(get_current_ip)
local gateway=$(ip route show default 2>/dev/null | awk '/^default/ {print $3; exit}')
cat <<EOF
network:
primary_interface: "$primary_iface"
current_ip: "$current_ip"
gateway: "${gateway:-unknown}"
EOF
# MAC address
if [ "$primary_iface" != "unknown" ]; then
local mac=$(ip link show "$primary_iface" 2>/dev/null | grep "link/ether" | awk '{print $2}')
cat <<EOF
mac_address: "${mac:-unknown}"
EOF
fi
# Link speed if available
if [ "$primary_iface" != "unknown" ] && [ -f "/sys/class/net/$primary_iface/speed" ]; then
local speed=$(cat "/sys/class/net/$primary_iface/speed" 2>/dev/null || echo "unknown")
cat <<EOF
link_speed_mbps: $speed
EOF
fi
# All interfaces (brief)
local all_interfaces=$(ip -o link show | awk -F': ' '$2 != "lo" {printf " - \"%s\"\n", $2}')
if [ -n "$all_interfaces" ]; then
cat <<EOF
all_interfaces:
$all_interfaces
EOF
fi
}
collect_gpu_facts() {
# Collect GPU information
local gpu_vendor=$(detect_gpu)
local gpu_model=$(get_gpu_model)
cat <<EOF
gpu:
vendor: "$gpu_vendor"
model: "$gpu_model"
EOF
# NVIDIA-specific details
if [ "$gpu_vendor" == "nvidia" ] && command -v nvidia-smi &>/dev/null; then
local nvidia_driver=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader 2>/dev/null | head -n1)
local nvidia_cuda=$(nvidia-smi --query-gpu=compute_cap --format=csv,noheader 2>/dev/null | head -n1)
cat <<EOF
nvidia_driver: "${nvidia_driver:-unknown}"
cuda_capability: "${nvidia_cuda:-unknown}"
EOF
fi
}
collect_os_facts() {
# Collect OS and kernel information
local os_family=$(detect_os_family)
local os_version=$(detect_os_version)
local kernel=$(uname -r)
local hostname=$(hostname)
local fqdn=$(hostname -f 2>/dev/null || echo "$hostname")
if [ -f /etc/os-release ]; then
. /etc/os-release
local os_name="$NAME"
local os_version_id="$VERSION_ID"
fi
cat <<EOF
os:
family: "$os_family"
name: "${os_name:-unknown}"
version: "$os_version"
version_id: "${os_version_id:-unknown}"
kernel: "$kernel"
hostname: "$hostname"
fqdn: "$fqdn"
EOF
}
collect_hardware_type_facts() {
# Collect deployment type and special hardware flags
local hw_type=$(detect_hardware_type)
local in_container=$(is_running_in_container && echo "true" || echo "false")
local is_vm=$(systemd-detect-virt --vm &>/dev/null && echo "true" || echo "false")
local virt_type=$(systemd-detect-virt 2>/dev/null || echo "none")
cat <<EOF
deployment:
hardware_type: "$hw_type"
is_virtual: $is_vm
virtualization: "$virt_type"
in_container: $in_container
EOF
# Proxmox-specific
if [ "$hw_type" == "proxmox" ] && command -v pveversion &>/dev/null; then
local pve_version=$(pveversion | head -n1 | awk '{print $2}')
cat <<EOF
proxmox_version: "$pve_version"
EOF
fi
}
collect_timestamp_facts() {
# Collect collection timestamp
local timestamp=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
local date_collected=$(date +"%Y-%m-%d")
local epoch=$(date +%s)
cat <<EOF
metadata:
collected_at: "$timestamp"
date: "$date_collected"
epoch: $epoch
collector: "bootstrap.sh/fingerprint"
EOF
}
# --- STANDARDS VALIDATION ---
validate_against_standards() {
# Check hardware against environment-constraints.md requirements
# Returns validation status
local hw_type=$(detect_hardware_type)
local mem_gb=$(grep MemTotal /proc/meminfo | awk '{print int($2/1024/1024)}')
local cpu_cores=$(nproc)
local disk_gb=$(df -BG / | awk 'NR==2 {print $2}' | sed 's/G//')
local violations=0
echo " standards_validation:" >&2
# Memory requirements
case "$hw_type" in
proxmox)
if [ "$mem_gb" -lt 8 ]; then
echo " - \"WARNING: Proxmox host has ${mem_gb}GB RAM (minimum 8GB recommended)\"" >&2
((violations++))
fi
;;
docker-vm|physical-docker)
if [ "$mem_gb" -lt 4 ]; then
echo " - \"WARNING: Docker host has ${mem_gb}GB RAM (minimum 4GB recommended)\"" >&2
((violations++))
fi
;;
esac
# Disk space
if [ "$disk_gb" -lt 20 ]; then
echo " - \"WARNING: Low disk space (${disk_gb}GB) - minimum 20GB recommended\"" >&2
((violations++))
fi
# CPU cores
if [ "$cpu_cores" -lt 2 ]; then
echo " - \"WARNING: Single CPU core detected - minimum 2 recommended\"" >&2
((violations++))
fi
if [ $violations -eq 0 ]; then
echo " - \"OK: Hardware meets minimum standards\"" >&2
fi
return $violations
}
# --- STRUCTURED OUTPUT ---
generate_hardware_facts_yaml() {
# Generate complete hardware facts in YAML format
# Compatible with gather_hardware_facts.yml output
local hostname=$(hostname)
cat <<EOF
---
# Hardware Facts Collection
# Generated by: bootstrap.sh fingerprint library
# Host: $hostname
# $(date -u +"%Y-%m-%d %H:%M:%S UTC")
$hostname:
$(collect_os_facts)
$(collect_cpu_facts)
$(collect_memory_facts)
$(collect_disk_facts)
$(collect_network_facts)
$(collect_gpu_facts)
$(collect_hardware_type_facts)
$(collect_timestamp_facts)
EOF
}
save_hardware_facts() {
# Save hardware facts to output file
# Args: $1 = output directory (default: ../ansible/archive/outputs)
local output_dir="${1:-../ansible/archive/outputs}"
local hostname=$(hostname)
local timestamp=$(date +%Y%m%dT%H%M%S)
local output_file="${output_dir}/hardware-facts-${hostname}-${timestamp}.yml"
# Ensure output directory exists
mkdir -p "$output_dir"
# Generate and save
generate_hardware_facts_yaml > "$output_file"
echo "$output_file"
}
# --- ANSIBLE INVENTORY GENERATION ---
generate_ansible_inventory_snippet() {
# Generate Ansible inventory entry for this host
# Can be appended to discovered-hosts.yml
local hostname=$(hostname)
local current_ip=$(get_current_ip)
local hw_type=$(detect_hardware_type)
local os_family=$(detect_os_family)
# Determine inventory group
local inventory_group="docker_hosts"
case "$hw_type" in
proxmox)
inventory_group="proxmox_cluster"
;;
docker-vm)
inventory_group="swarm_managers" # Or swarm_workers, requires manual classification
;;
pi)
inventory_group="control_nodes"
;;
ai-workstation)
inventory_group="ai_nodes"
;;
esac
cat <<EOF
# Auto-discovered: $(date -u +"%Y-%m-%d %H:%M:%S UTC")
[$inventory_group]
$hostname ansible_host=$current_ip ansible_user=chester
# Hardware details:
# - Type: $hw_type
# - OS: $os_family
# - IP: $current_ip
# Review and merge into inventory/hosts.ini after validation
EOF
}
append_to_discovered_inventory() {
# Append this host to discovered-hosts.yml
# Args: $1 = inventory file (default: ../ansible/archive/inventory/discovered-hosts.yml)
local inventory_file="${1:-../ansible/archive/inventory/discovered-hosts.yml}"
local hostname=$(hostname)
# Create file if doesn't exist
if [ ! -f "$inventory_file" ]; then
cat > "$inventory_file" <<EOF
# Auto-Discovery Inventory
# Generated by bootstrap.sh - DO NOT MANUALLY EDIT
# Merge entries into hosts.ini after validation
EOF
fi
# Check if host already exists
if grep -q "^${hostname} " "$inventory_file" 2>/dev/null; then
echo "Host $hostname already in discovered inventory, skipping" >&2
return 0
fi
# Append
generate_ansible_inventory_snippet >> "$inventory_file"
echo "" >> "$inventory_file"
echo "$inventory_file"
}
# --- OUTPUT FORMAT OPTIONS ---
generate_json_output() {
# Generate JSON format instead of YAML (for monitoring integration)
# Converts YAML facts to JSON
local hostname=$(hostname)
local current_ip=$(get_current_ip)
local hw_type=$(detect_hardware_type)
local os_family=$(detect_os_family)
local cpu_cores=$(nproc)
local mem_gb=$(grep MemTotal /proc/meminfo | awk '{print int($2/1024/1024)}')
local disk_gb=$(df -BG / | awk 'NR==2 {print $2}' | sed 's/G//')
cat <<EOF
{
"hostname": "$hostname",
"ip": "$current_ip",
"hardware_type": "$hw_type",
"os_family": "$os_family",
"cpu_cores": $cpu_cores,
"memory_gb": $mem_gb,
"disk_gb": $disk_gb,
"collected_at": "$(date -u +"%Y-%m-%dT%H:%M:%SZ")"
}
EOF
}
# --- SUMMARY DISPLAY ---
print_hardware_summary() {
# Print human-readable hardware summary
local hostname=$(hostname)
local hw_type=$(detect_hardware_type)
local os=$(detect_os_family) $(detect_os_version)
local cpu_model=$(grep "model name" /proc/cpuinfo | head -n1 | cut -d: -f2 | xargs)
local cpu_cores=$(nproc)
local mem_gb=$(grep MemTotal /proc/meminfo | awk '{print int($2/1024/1024)}')
local disk_gb=$(df -BG / | awk 'NR==2 {print $2}' | sed 's/G//')
local gpu=$(detect_gpu)
local ip=$(get_current_ip)
cat >&2 <<EOF
=== Hardware Fingerprint Summary ===
Hostname: $hostname
IP Address: $ip
Type: $hw_type
OS: $os
CPU: $cpu_model ($cpu_cores cores)
Memory: ${mem_gb}GB
Storage: ${disk_gb}GB
GPU: $gpu
====================================
EOF
}
# Export functions
export -f collect_cpu_facts
export -f collect_memory_facts
export -f collect_disk_facts
export -f collect_network_facts
export -f collect_gpu_facts
export -f collect_os_facts
export -f collect_hardware_type_facts
export -f collect_timestamp_facts
export -f validate_against_standards
export -f generate_hardware_facts_yaml
export -f save_hardware_facts
export -f generate_ansible_inventory_snippet
export -f append_to_discovered_inventory
export -f generate_json_output
export -f print_hardware_summary

356
scripts/lib/network.sh Normal file
View File

@ -0,0 +1,356 @@
#!/bin/bash
# ==============================================================================
# NETWORK LIBRARY: Network Configuration and Validation
# ==============================================================================
# Part of unified bootstrap system for homelab infrastructure
# Handles static IP configuration via netplan, network validation, and
# VLAN capability detection for future network segmentation.
# ==============================================================================
# Source detection library if not already loaded
if ! type -t detect_primary_interface &>/dev/null; then
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=./detection.sh
source "${SCRIPT_DIR}/detection.sh"
fi
# --- NETWORK CONFIGURATION ---
apply_static_ip() {
# Configure static IP via netplan (Ubuntu/Debian)
# Args: $1 = Target IP, $2 = Gateway (default: 10.0.0.1), $3 = DNS (default: 10.0.0.2)
local target_ip="$1"
local gateway="${2:-10.0.0.1}"
local dns="${3:-10.0.0.2}"
if [ -z "$target_ip" ]; then
echo "ERROR: Target IP address required" >&2
return 1
fi
# Validate IP format
if ! [[ "$target_ip" =~ ^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$ ]]; then
echo "ERROR: Invalid IP address format: $target_ip" >&2
return 1
fi
local interface=$(detect_primary_interface)
if [ "$interface" == "unknown" ]; then
echo "ERROR: Could not detect primary network interface" >&2
return 1
fi
echo "[⚙] Configuring static IP: $target_ip on $interface..." >&2
# Fix permissions on existing netplan files (common issue)
sudo chmod 600 /lib/netplan/*.yaml 2>/dev/null || true
sudo chmod 600 /etc/netplan/*.yaml 2>/dev/null || true
# Create netplan directory if missing
sudo mkdir -p /etc/netplan
# Generate netplan configuration
sudo tee /etc/netplan/01-netcfg.yaml >/dev/null <<EOF
network:
version: 2
renderer: networkd
ethernets:
$interface:
addresses:
- ${target_ip}/24
nameservers:
addresses: [${dns}, 8.8.8.8]
routes:
- to: default
via: ${gateway}
EOF
# Fix permissions (netplan requires 600)
sudo chmod 600 /etc/netplan/01-netcfg.yaml
echo "[✓] Netplan configuration created" >&2
return 0
}
apply_network_changes() {
# Apply netplan configuration (WARNING: may cause SSH disconnect)
# Uses background apply to prevent SSH session hang
echo "[⚙] Applying network configuration (SSH may disconnect)..." >&2
# Test configuration first
if ! sudo netplan generate 2>/dev/null; then
echo "ERROR: netplan configuration validation failed" >&2
return 1
fi
# Apply in background to avoid blocking SSH
sudo netplan apply &
local apply_pid=$!
echo "[✓] Network apply started (PID: $apply_pid)" >&2
echo "[!] SSH connection will drop. Reconnect to new IP address." >&2
# Give it a moment to start
sleep 2
return 0
}
configure_network_safe() {
# Safe wrapper: configure IP + apply with reconnection instructions
# Args: $1 = Target IP, $2 = Gateway (optional), $3 = DNS (optional)
local target_ip="$1"
local current_ip=$(get_current_ip)
# Check if already configured
if [ "$current_ip" == "$target_ip" ]; then
echo "[✓] IP already configured as $target_ip, skipping" >&2
return 0
fi
# Configure
if ! apply_static_ip "$@"; then
return 1
fi
# Apply
apply_network_changes
echo "" >&2
echo "=========================================" >&2
echo "Network configuration applied" >&2
echo "Old IP: $current_ip" >&2
echo "New IP: $target_ip" >&2
echo "Reconnect with: ssh user@$target_ip" >&2
echo "=========================================" >&2
return 0
}
# --- VLAN CONFIGURATION (PLACEHOLDER) ---
get_desired_vlan_ip() {
# Determine desired VLAN IP based on hardware type
# Returns IP address from environment-constraints.md topology
# TODO: Enable when VLAN segmentation is live
local hardware_type=$(detect_hardware_type)
local hostname=$(hostname)
# TODO: Implement VLAN placement logic based on:
# - Proxmox hosts → 10.0.10.x (infra VLAN)
# - Swarm VMs → 10.0.200.x (compute VLAN)
# - Control nodes → 10.0.0.x (main VLAN)
# For now, return flat network assignment
case "$hardware_type" in
proxmox)
# Currently: 10.0.0.200-209
# Desired: 10.0.10.11-13 (future VLAN)
echo "10.0.0.201" # Placeholder
;;
docker-vm)
# Currently: 10.0.0.210-229
# Desired: 10.0.200.11+ (future VLAN)
echo "10.0.0.211" # Placeholder
;;
pi|physical-docker)
# Control nodes stay on main VLAN
echo "10.0.0.200"
;;
ai-workstation)
# Currently: 10.0.0.230-239
# Desired: 10.0.200.x (future VLAN)
echo "10.0.0.230" # Placeholder
;;
*)
echo "10.0.0.200" # Safe default
;;
esac
}
check_vlan_support() {
# Check if network hardware supports VLAN tagging
# Returns 0 if supported, 1 otherwise
local interface=$(detect_primary_interface)
if [ "$interface" == "unknown" ]; then
return 1
fi
# Check for 802.1Q VLAN support in kernel modules
if lsmod | grep -q "^8021q"; then
return 0
fi
# Check if module can be loaded
if sudo modprobe 8021q 2>/dev/null; then
return 0
fi
return 1
}
# --- NETWORK VALIDATION ---
validate_connectivity() {
# Test basic network connectivity
# Returns 0 if healthy, 1 otherwise
local errors=0
echo "[⚙] Validating network connectivity..." >&2
# Test default gateway
local gateway=$(ip route show default 2>/dev/null | awk '/^default/ {print $3; exit}')
if [ -n "$gateway" ]; then
if ping -c 2 -W 3 "$gateway" &>/dev/null; then
echo " [✓] Gateway reachable: $gateway" >&2
else
echo " [✗] Gateway unreachable: $gateway" >&2
((errors++))
fi
else
echo " [✗] No default gateway configured" >&2
((errors++))
fi
# Test DNS resolution
if ping -c 2 -W 3 8.8.8.8 &>/dev/null; then
echo " [✓] Internet connectivity (8.8.8.8)" >&2
else
echo " [✗] No internet connectivity" >&2
((errors++))
fi
# Test DNS resolution
if host google.com &>/dev/null; then
echo " [✓] DNS resolution working" >&2
else
echo " [!] DNS resolution issue (warning)" >&2
fi
if [ $errors -eq 0 ]; then
echo "[✓] Network validation passed" >&2
return 0
else
echo "[✗] Network validation failed ($errors errors)" >&2
return 1
fi
}
check_nfs_accessibility() {
# Test NFS server accessibility (TerraMaster NAS)
# Args: $1 = NFS server IP (default: 10.0.0.250)
local nfs_server="${1:-10.0.0.250}"
echo "[⚙] Checking NFS server accessibility ($nfs_server)..." >&2
# Test basic connectivity via ping
if ! ping -c 2 -W 3 "$nfs_server" &>/dev/null; then
echo " [✗] NFS server unreachable: $nfs_server" >&2
return 1
fi
echo " [✓] NFS server reachable" >&2
# Test NFS ports (2049 = NFSv3/v4, 111 = portmapper)
if command -v nc &>/dev/null; then
if nc -z -w 3 "$nfs_server" 2049 2>/dev/null; then
echo " [✓] NFS service responding (port 2049)" >&2
else
echo " [✗] NFS port 2049 closed" >&2
return 1
fi
fi
return 0
}
test_internal_hairpin_nat() {
# Test for hairpin NAT issues (lessons-learned.md #3)
# Internal hosts should NOT route through public DNS
local test_domain="castaldifamily.com"
echo "[⚙] Testing for hairpin NAT issues..." >&2
# Get public IP of domain
local public_ip=$(dig +short "$test_domain" @8.8.8.8 2>/dev/null | grep -E '^[0-9.]+$' | head -n1)
if [ -z "$public_ip" ]; then
echo " [!] Could not resolve $test_domain, skipping test" >&2
return 0
fi
# Try to ping public IP from inside network (should fail on hairpin NAT routers)
if ping -c 2 -W 2 "$public_ip" &>/dev/null; then
echo " [✓] No hairpin NAT issue detected" >&2
return 0
else
echo " [!] Possible hairpin NAT - use internal IPs (10.0.0.x) for node-to-node" >&2
return 0 # Warning, not error
fi
}
# --- NETWORK RENDERER DETECTION ---
detect_network_renderer() {
# Detect network configuration system: netplan, networkd, NetworkManager
if [ -d /etc/netplan ] && command -v netplan &>/dev/null; then
echo "netplan"
return 0
elif systemctl is-active systemd-networkd &>/dev/null; then
echo "networkd"
return 0
elif systemctl is-active NetworkManager &>/dev/null; then
echo "NetworkManager"
return 0
fi
echo "unknown"
return 1
}
# --- WAIT FOR NETWORK ---
wait_for_network() {
# Wait for network to stabilize after configuration change
# Args: $1 = timeout in seconds (default: 10)
local timeout="${1:-10}"
local elapsed=0
echo "[⚙] Waiting for network to stabilize (timeout: ${timeout}s)..." >&2
while [ $elapsed -lt $timeout ]; do
if ping -c 1 -W 1 8.8.8.8 &>/dev/null; then
echo "[✓] Network ready after ${elapsed}s" >&2
return 0
fi
sleep 1
((elapsed++))
done
echo "[!] Network not ready after ${timeout}s, continuing anyway" >&2
return 1
}
# Export functions
export -f apply_static_ip
export -f apply_network_changes
export -f configure_network_safe
export -f get_desired_vlan_ip
export -f check_vlan_support
export -f validate_connectivity
export -f check_nfs_accessibility
export -f test_internal_hairpin_nat
export -f detect_network_renderer
export -f wait_for_network

453
scripts/lib/proxmox.sh Normal file
View File

@ -0,0 +1,453 @@
#!/bin/bash
# ==============================================================================
# PROXMOX LIBRARY: Post-Install Configuration and Management
# ==============================================================================
# Part of unified bootstrap system for homelab infrastructure
# Comprehensive Proxmox VE post-install routines inspired by community scripts
# (tteck/community-scripts ProxmoxVE post-pve-install.sh)
# ==============================================================================
# Source detection library if not already loaded
if ! type -t detect_os_family &>/dev/null; then
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=./detection.sh
source "${SCRIPT_DIR}/detection.sh"
fi
# --- PROXMOX VERSION DETECTION ---
get_pve_version() {
# Get Proxmox VE version (e.g., 8.2.4)
if ! command -v pveversion &>/dev/null; then
echo "unknown"
return 1
fi
pveversion | awk -F'/' '{print $2}' | awk -F'-' '{print $1}'
}
get_pve_major_version() {
# Get major version (8 or 9)
local ver=$(get_pve_version)
echo "$ver" | cut -d. -f1
}
is_pve_supported_version() {
# Check if PVE version is supported (8.0-8.9.x or 9.0-9.1.x)
local major=$(get_pve_major_version)
case "$major" in
8|9)
return 0
;;
*)
return 1
;;
esac
}
# --- REPOSITORY MANAGEMENT ---
disable_pve_enterprise_repo() {
# Disable enterprise repository (requires subscription)
local major=$(get_pve_major_version)
echo "[⚙] Disabling Proxmox enterprise repository..." >&2
if [ "$major" == "9" ]; then
# PVE 9.x uses .sources format
for file in /etc/apt/sources.list.d/*.sources; do
if [ -f "$file" ] && grep -q "pve-enterprise" "$file"; then
# Use Enabled: false instead of commenting
if grep -q "^Enabled:" "$file"; then
sudo sed -i 's/^Enabled:.*/Enabled: false/' "$file"
else
echo "Enabled: false" | sudo tee -a "$file" >/dev/null
fi
fi
done
else
# PVE 8.x uses .list format
if [ -f /etc/apt/sources.list.d/pve-enterprise.list ]; then
sudo sed -i 's/^deb/# deb/' /etc/apt/sources.list.d/pve-enterprise.list
fi
fi
echo " [✓] Enterprise repository disabled" >&2
}
enable_pve_no_subscription_repo() {
# Enable no-subscription repository (free updates)
local major=$(get_pve_major_version)
echo "[⚙] Enabling Proxmox no-subscription repository..." >&2
if [ "$major" == "9" ]; then
# PVE 9.x deb822 format
sudo tee /etc/apt/sources.list.d/proxmox.sources >/dev/null <<EOF
Types: deb
URIs: http://download.proxmox.com/debian/pve
Suites: trixie
Components: pve-no-subscription
Signed-By: /usr/share/keyrings/proxmox-archive-keyring.gpg
EOF
else
# PVE 8.x traditional format
sudo tee /etc/apt/sources.list.d/pve-no-subscription.list >/dev/null <<EOF
deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription
EOF
fi
echo " [✓] No-subscription repository enabled" >&2
}
configure_ceph_repos() {
# Configure Ceph storage repositories (disabled by default)
local major=$(get_pve_major_version)
echo "[⚙] Configuring Ceph package repositories..." >&2
if [ "$major" == "9" ]; then
# PVE 9.x - Squid version
sudo tee /etc/apt/sources.list.d/ceph.sources >/dev/null <<EOF
Types: deb
URIs: http://download.proxmox.com/debian/ceph-squid
Suites: trixie
Components: no-subscription
Signed-By: /usr/share/keyrings/proxmox-archive-keyring.gpg
Enabled: false
EOF
else
# PVE 8.x - Reef/Quincy versions
sudo tee /etc/apt/sources.list.d/ceph.list >/dev/null <<EOF
# Ceph Reef (disabled - enable as needed)
# deb http://download.proxmox.com/debian/ceph-reef bookworm no-subscription
# Ceph Quincy (disabled - enable as needed)
# deb http://download.proxmox.com/debian/ceph-quincy bookworm no-subscription
EOF
fi
echo " [✓] Ceph repositories configured (disabled)" >&2
}
add_pvetest_repo() {
# Add pvetest repository for beta features (disabled by default)
local major=$(get_pve_major_version)
echo "[⚙] Adding pvetest repository (disabled)..." >&2
if [ "$major" == "9" ]; then
sudo tee /etc/apt/sources.list.d/pve-test.sources >/dev/null <<EOF
Types: deb
URIs: http://download.proxmox.com/debian/pve
Suites: trixie
Components: pve-test
Signed-By: /usr/share/keyrings/proxmox-archive-keyring.gpg
Enabled: false
EOF
else
sudo tee /etc/apt/sources.list.d/pvetest-for-beta.list >/dev/null <<EOF
# PVE Test Repository (disabled - enable for beta testing)
# deb http://download.proxmox.com/debian/pve bookworm pvetest
EOF
fi
echo " [✓] pvetest repository added (disabled)" >&2
}
configure_all_pve_repos() {
# Run complete repository configuration
disable_pve_enterprise_repo
enable_pve_no_subscription_repo
configure_ceph_repos
add_pvetest_repo
echo "[✓] Proxmox repository configuration complete" >&2
}
# --- SUBSCRIPTION NAG REMOVAL ---
disable_subscription_nag() {
# Remove subscription nag from web UI and mobile UI
# Creates apt hook to persist after updates
echo "[⚙] Disabling subscription nag dialog..." >&2
# Create persistent script
sudo mkdir -p /usr/local/bin
sudo tee /usr/local/bin/pve-remove-nag.sh >/dev/null <<'SCRIPT_EOF'
#!/bin/sh
# Auto-generated by homelab bootstrap - removes Proxmox subscription nag
WEB_JS=/usr/share/javascript/proxmox-widget-toolkit/proxmoxlib.js
if [ -s "$WEB_JS" ] && ! grep -q NoMoreNagging "$WEB_JS"; then
echo "Patching Web UI subscription nag..."
sed -i -e "/data\.status/ s/!//" -e "/data\.status/ s/active/NoMoreNagging/" "$WEB_JS"
fi
MOBILE_TPL=/usr/share/pve-yew-mobile-gui/index.html.tpl
MARKER="<!-- MANAGED BLOCK FOR MOBILE NAG -->"
if [ -f "$MOBILE_TPL" ] && ! grep -q "$MARKER" "$MOBILE_TPL"; then
echo "Patching Mobile UI subscription nag..."
printf "%s\n" \
"$MARKER" \
"<script>" \
" function removeSubscriptionElements() {" \
" const dialogs = document.querySelectorAll('dialog.pwt-outer-dialog');" \
" dialogs.forEach(d => { if ((d.textContent||'').toLowerCase().includes('subscription')) d.remove(); });" \
" const cards = document.querySelectorAll('.pwt-card.pwt-p-2');" \
" cards.forEach(c => { if (!c.querySelector('button') && (c.textContent||'').toLowerCase().includes('subscription')) c.remove(); });" \
" }" \
" const observer = new MutationObserver(removeSubscriptionElements);" \
" observer.observe(document.body, { childList: true, subtree: true });" \
" removeSubscriptionElements();" \
"</script>" >> "$MOBILE_TPL"
fi
SCRIPT_EOF
sudo chmod 755 /usr/local/bin/pve-remove-nag.sh
# Create apt hook to run after updates
sudo tee /etc/apt/apt.conf.d/90-pve-no-nag <<'APT_EOF'
DPkg::Post-Invoke { "/usr/local/bin/pve-remove-nag.sh"; };
APT_EOF
sudo chmod 644 /etc/apt/apt.conf.d/90-pve-no-nag
# Apply immediately
sudo /usr/local/bin/pve-remove-nag.sh 2>/dev/null || true
echo " [✓] Subscription nag disabled (clear browser cache)" >&2
}
enable_subscription_nag() {
# Remove nag removal script (restore default behavior)
echo "[⚙] Re-enabling subscription nag..." >&2
sudo rm -f /usr/local/bin/pve-remove-nag.sh
sudo rm -f /etc/apt/apt.conf.d/90-pve-no-nag
# Reinstall widget toolkit to restore original
sudo apt --reinstall install proxmox-widget-toolkit &>/dev/null || true
echo " [✓] Subscription nag re-enabled" >&2
}
# --- HIGH AVAILABILITY MANAGEMENT ---
disable_ha_services() {
# Disable HA services for single-node setups (saves resources)
echo "[⚙] Disabling high availability services..." >&2
if systemctl is-active --quiet pve-ha-lrm || systemctl is-active --quiet pve-ha-crm; then
sudo systemctl disable --now pve-ha-lrm 2>/dev/null || true
sudo systemctl disable --now pve-ha-crm 2>/dev/null || true
sudo systemctl disable --now corosync 2>/dev/null || true
echo " [✓] HA services disabled" >&2
else
echo " [✓] HA services already disabled" >&2
fi
}
enable_ha_services() {
# Enable HA services for clustered environments
echo "[⚙] Enabling high availability services..." >&2
sudo systemctl enable --now pve-ha-lrm
sudo systemctl enable --now pve-ha-crm
sudo systemctl enable --now corosync
echo " [✓] HA services enabled" >&2
}
check_ha_status() {
# Check if HA services are running
if systemctl is-active --quiet pve-ha-lrm; then
echo "enabled"
else
echo "disabled"
fi
}
# --- SYSTEM MAINTENANCE ---
update_pve_system() {
# Run full system update
echo "[⚙] Updating Proxmox VE system (this may take several minutes)..." >&2
sudo apt-get update -qq
sudo apt-get dist-upgrade -y -qq
echo " [✓] System update complete" >&2
}
check_reboot_required() {
# Check if reboot is needed after updates
if [ -f /var/run/reboot-required ]; then
return 0
fi
# Check if kernel was updated
local running_kernel=$(uname -r)
local latest_kernel=$(dpkg -l | grep linux-image | sort -V | tail -n1 | awk '{print $2}' | sed 's/linux-image-//')
if [ "$running_kernel" != "$latest_kernel" ]; then
return 0
fi
return 1
}
# --- COMPREHENSIVE POST-INSTALL ---
run_proxmox_post_install() {
# Complete post-install routine for Proxmox hosts
# Args: $1 = mode (auto|interactive) - default: auto
local mode="${1:-auto}"
echo "=======================================" >&2
echo "PROXMOX POST-INSTALL CONFIGURATION" >&2
echo "Version: $(get_pve_version)" >&2
echo "Mode: $mode" >&2
echo "=======================================" >&2
echo "" >&2
# Verify Proxmox installation
if ! is_pve_supported_version; then
echo "[✗] Unsupported Proxmox VE version" >&2
return 1
fi
# Repository configuration
configure_all_pve_repos
echo "" >&2
# Subscription nag removal (auto mode)
if [ "$mode" == "auto" ]; then
disable_subscription_nag
echo "" >&2
fi
# HA service management
local cluster_nodes=$(pvesh get /nodes --output-format=json 2>/dev/null | jq -r 'length' 2>/dev/null || echo "1")
if [ "$cluster_nodes" -eq 1 ]; then
echo "[⚙] Single-node setup detected" >&2
disable_ha_services
echo "" >&2
else
echo "[⚙] Multi-node cluster detected ($cluster_nodes nodes)" >&2
echo " [✓] HA services left enabled" >&2
echo "" >&2
fi
# System update
if [ "$mode" == "auto" ]; then
echo "[⚙] Checking for updates..." >&2
sudo apt-get update -qq
local upgrades=$(apt list --upgradable 2>/dev/null | grep -c upgradable || echo "0")
if [ "$upgrades" -gt 1 ]; then
echo " [!] $((upgrades - 1)) packages can be upgraded" >&2
echo " [!] Run manually: apt update && apt dist-upgrade" >&2
else
echo " [✓] System up to date" >&2
fi
echo "" >&2
fi
# Check reboot requirement
if check_reboot_required; then
echo "[!] REBOOT REQUIRED" >&2
echo " Kernel or critical services updated" >&2
echo " Run: reboot" >&2
echo "" >&2
fi
echo "=======================================" >&2
echo "POST-INSTALL COMPLETE" >&2
echo "=======================================" >&2
echo "" >&2
echo "Next Steps:" >&2
echo " 1. Clear browser cache (Ctrl+Shift+R)" >&2
echo " 2. Reboot if kernel was updated" >&2
echo " 3. Configure storage/networking in web UI" >&2
echo "" >&2
}
# --- VALIDATION ---
validate_proxmox_config() {
# Validate Proxmox configuration
local errors=0
echo "[⚙] Validating Proxmox configuration..." >&2
# Check PVE version
if ! is_pve_supported_version; then
echo " [✗] Unsupported PVE version: $(get_pve_version)" >&2
((errors++))
else
echo " [✓] PVE version: $(get_pve_version)" >&2
fi
# Check no-subscription repo
if grep -rq "pve-no-subscription" /etc/apt/sources.list.d/ 2>/dev/null; then
echo " [✓] No-subscription repository configured" >&2
else
echo " [✗] No-subscription repository not found" >&2
((errors++))
fi
# Check enterprise repo disabled
if grep -rq "^deb.*pve-enterprise" /etc/apt/sources.list.d/ 2>/dev/null; then
echo " [!] Enterprise repository still active (requires subscription)" >&2
else
echo " [✓] Enterprise repository disabled" >&2
fi
# Check cluster status
local cluster_status=$(pvesh get /cluster/status --output-format=json 2>/dev/null | jq -r '.[0].type' 2>/dev/null || echo "unknown")
if [ "$cluster_status" == "cluster" ]; then
echo " [✓] Part of Proxmox cluster" >&2
else
echo " [✓] Standalone node" >&2
fi
if [ $errors -eq 0 ]; then
echo "[✓] Proxmox validation passed" >&2
return 0
else
echo "[✗] Proxmox validation failed ($errors errors)" >&2
return 1
fi
}
# Export functions
export -f get_pve_version
export -f get_pve_major_version
export -f is_pve_supported_version
export -f disable_pve_enterprise_repo
export -f enable_pve_no_subscription_repo
export -f configure_ceph_repos
export -f add_pvetest_repo
export -f configure_all_pve_repos
export -f disable_subscription_nag
export -f enable_subscription_nag
export -f disable_ha_services
export -f enable_ha_services
export -f check_ha_status
export -f update_pve_system
export -f check_reboot_required
export -f run_proxmox_post_install
export -f validate_proxmox_config

510
scripts/lib/validation.sh Normal file
View File

@ -0,0 +1,510 @@
#!/bin/bash
# ==============================================================================
# VALIDATION LIBRARY: Comprehensive System Health Checks
# ==============================================================================
# Part of unified bootstrap system for homelab infrastructure
# Provides comprehensive pre-flight and post-bootstrap validation with
# severity levels (critical, warning, info) for operational readiness.
# ==============================================================================
# Source detection library if not already loaded
if ! type -t detect_os_family &>/dev/null; then
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=./detection.sh
source "${SCRIPT_DIR}/detection.sh"
fi
# --- VALIDATION TRACKING ---
declare -g VALIDATION_ERRORS=0
declare -g VALIDATION_WARNINGS=0
declare -g VALIDATION_PASSED=0
reset_validation_counters() {
VALIDATION_ERRORS=0
VALIDATION_WARNINGS=0
VALIDATION_PASSED=0
}
log_pass() {
local message="$1"
echo " [✓] $message" >&2
((VALIDATION_PASSED++))
}
log_warning() {
local message="$1"
echo " [!] WARNING: $message" >&2
((VALIDATION_WARNINGS++))
}
log_error() {
local message="$1"
echo " [✗] ERROR: $message" >&2
((VALIDATION_ERRORS++))
}
# --- DISK VALIDATION ---
check_disk_space() {
# Validate sufficient disk space for Docker and system operations
# Critical: Root partition must have at least 10GB free
# Warning: Root partition should have at least 20GB free
echo "[⚙] Checking disk space..." >&2
local root_avail=$(df -BG / | awk 'NR==2 {print $4}' | sed 's/G//')
if [ -z "$root_avail" ]; then
log_error "Could not determine disk space"
return 1
fi
if [ "$root_avail" -lt 10 ]; then
log_error "Insufficient disk space on /: ${root_avail}GB (minimum 10GB required)"
return 1
elif [ "$root_avail" -lt 20 ]; then
log_warning "Low disk space on /: ${root_avail}GB (recommend 20GB+)"
else
log_pass "Disk space on /: ${root_avail}GB available"
fi
# Check for separate /var partition (common on servers)
if mountpoint -q /var 2>/dev/null; then
local var_avail=$(df -BG /var | awk 'NR==2 {print $4}' | sed 's/G//')
if [ "$var_avail" -lt 20 ]; then
log_warning "Low disk space on /var: ${var_avail}GB (Docker images will go here)"
else
log_pass "Disk space on /var: ${var_avail}GB available"
fi
fi
return 0
}
check_disk_performance() {
# Check if root is on SSD vs HDD (performance indicator)
echo "[⚙] Checking disk performance characteristics..." >&2
local root_device=$(df / | awk 'NR==2 {print $1}' | sed 's/[0-9]*$//')
local device_name=$(basename "$root_device")
if [ -f "/sys/block/$device_name/queue/rotational" ]; then
local rotational=$(cat "/sys/block/$device_name/queue/rotational")
if [ "$rotational" -eq 0 ]; then
log_pass "Root filesystem on SSD ($device_name)"
else
log_warning "Root filesystem on HDD ($device_name) - SSD recommended for Docker"
fi
else
log_warning "Could not determine disk type for $device_name"
fi
return 0
}
# --- MEMORY VALIDATION ---
check_memory() {
# Validate sufficient RAM for deployment type
# Proxmox: 8GB minimum, Docker Swarm: 4GB minimum, Pi: 2GB minimum
echo "[⚙] Checking memory..." >&2
local mem_total_kb=$(grep MemTotal /proc/meminfo | awk '{print $2}')
local mem_total_gb=$((mem_total_kb / 1024 / 1024))
if [ -z "$mem_total_gb" ] || [ "$mem_total_gb" -eq 0 ]; then
log_error "Could not determine system memory"
return 1
fi
local hardware_type=$(detect_hardware_type)
local min_required=2
case "$hardware_type" in
proxmox)
min_required=8
;;
docker-vm|physical-docker)
min_required=4
;;
ai-workstation)
min_required=16
;;
pi)
min_required=2
;;
esac
if [ "$mem_total_gb" -lt "$min_required" ]; then
log_error "Insufficient RAM: ${mem_total_gb}GB (minimum ${min_required}GB for $hardware_type)"
return 1
else
log_pass "Memory: ${mem_total_gb}GB (meets ${min_required}GB minimum for $hardware_type)"
fi
return 0
}
check_swap() {
# Check swap configuration (important for memory-constrained systems)
echo "[⚙] Checking swap configuration..." >&2
local swap_total_kb=$(grep SwapTotal /proc/meminfo | awk '{print $2}')
local swap_total_gb=$((swap_total_kb / 1024 / 1024))
local hardware_type=$(detect_hardware_type)
if [ "$hardware_type" == "proxmox" ]; then
# Proxmox hosts should NOT have swap enabled (best practice)
if [ "$swap_total_gb" -gt 0 ]; then
log_warning "Swap enabled on Proxmox host (${swap_total_gb}GB) - consider disabling"
else
log_pass "Swap disabled (correct for Proxmox)"
fi
else
# Other systems benefit from swap
if [ "$swap_total_gb" -eq 0 ]; then
log_warning "No swap configured - may cause OOM issues under load"
else
log_pass "Swap: ${swap_total_gb}GB configured"
fi
fi
return 0
}
# --- NETWORK VALIDATION ---
check_network_routes() {
# Validate proper network routing configuration
echo "[⚙] Checking network routing..." >&2
# Check for default route
if ip route show default &>/dev/null; then
local gateway=$(ip route show default | awk '/^default/ {print $3; exit}')
log_pass "Default gateway configured: $gateway"
else
log_error "No default gateway configured"
return 1
fi
# Check for DNS servers
if [ -f /etc/resolv.conf ]; then
local dns_count=$(grep -c "^nameserver" /etc/resolv.conf)
if [ "$dns_count" -gt 0 ]; then
log_pass "DNS servers configured ($dns_count entries)"
else
log_warning "No DNS servers in /etc/resolv.conf"
fi
else
log_error "/etc/resolv.conf missing"
return 1
fi
return 0
}
check_hostname_resolution() {
# Validate hostname resolution (important for cluster operations)
echo "[⚙] Checking hostname resolution..." >&2
local hostname=$(hostname)
local fqdn=$(hostname -f 2>/dev/null || echo "")
if [ -n "$hostname" ]; then
log_pass "Hostname: $hostname"
else
log_error "Hostname not set"
return 1
fi
# Check if hostname resolves
if host "$hostname" &>/dev/null || getent hosts "$hostname" &>/dev/null; then
log_pass "Hostname resolves"
else
log_warning "Hostname '$hostname' does not resolve - may cause cluster issues"
fi
# Check /etc/hosts
if grep -q "127.0.1.1.*$hostname" /etc/hosts 2>/dev/null; then
log_pass "Hostname in /etc/hosts"
else
log_warning "Hostname not in /etc/hosts - adding is recommended"
fi
return 0
}
# --- NFS CLIENT VALIDATION ---
check_nfs_client() {
# Validate NFS client packages and kernel modules
echo "[⚙] Checking NFS client..." >&2
# Check for NFS common package
if dpkg -l | grep -q "^ii.*nfs-common"; then
log_pass "nfs-common package installed"
else
log_warning "nfs-common not installed - required for NFS mounts"
return 1
fi
# Check for NFS kernel modules
if lsmod | grep -q "^nfs "; then
log_pass "NFS kernel module loaded"
else
# Try to load it
if sudo modprobe nfs 2>/dev/null; then
log_pass "NFS kernel module loaded successfully"
else
log_warning "Could not load NFS kernel module"
fi
fi
return 0
}
# --- DOCKER VALIDATION ---
check_docker_daemon() {
# Validate Docker installation and daemon health
echo "[⚙] Checking Docker installation..." >&2
# Check if Docker is installed
if ! command -v docker &>/dev/null; then
log_warning "Docker not installed (will be installed during bootstrap)"
return 0
fi
log_pass "Docker binary found: $(docker --version 2>/dev/null | head -n1)"
# Check if daemon is running
if systemctl is-active docker &>/dev/null; then
log_pass "Docker daemon running"
else
log_warning "Docker daemon not running"
return 0
fi
# Check Docker socket permissions
if [ -S /var/run/docker.sock ]; then
if sudo docker ps &>/dev/null; then
log_pass "Docker socket accessible"
else
log_warning "Docker socket exists but not accessible"
fi
else
log_warning "Docker socket not found"
fi
# Check storage driver
local storage_driver=$(docker info 2>/dev/null | grep "Storage Driver" | awk '{print $3}')
if [ -n "$storage_driver" ]; then
log_pass "Storage driver: $storage_driver"
# Warn about devicemapper (deprecated)
if [ "$storage_driver" == "devicemapper" ]; then
log_warning "devicemapper storage driver is deprecated - consider overlay2"
fi
fi
return 0
}
# --- PROXMOX VALIDATION ---
check_proxmox_api() {
# Validate Proxmox VE installation and API accessibility
local hardware_type=$(detect_hardware_type)
if [ "$hardware_type" != "proxmox" ]; then
return 0 # Skip if not Proxmox
fi
echo "[⚙] Checking Proxmox VE..." >&2
# Check for pveversion
if command -v pveversion &>/dev/null; then
local pve_version=$(pveversion | head -n1)
log_pass "Proxmox installed: $pve_version"
else
log_error "Proxmox tools not found (pveversion missing)"
return 1
fi
# Check cluster status
if command -v pvesh &>/dev/null; then
if sudo pvesh get /cluster/status &>/dev/null; then
log_pass "Proxmox API accessible"
else
log_warning "Proxmox API not responding"
fi
fi
# Check for required repositories
if [ -f /etc/apt/sources.list.d/pve-no-subscription.list ]; then
log_pass "No-subscription repository configured"
else
log_warning "No-subscription repository not configured"
fi
return 0
}
# --- SECURITY VALIDATION ---
check_ssh_security() {
# Basic SSH security validation
echo "[⚙] Checking SSH security..." >&2
# Check if SSH is running
if systemctl is-active ssh &>/dev/null || systemctl is-active sshd &>/dev/null; then
log_pass "SSH service running"
else
log_error "SSH service not running"
return 1
fi
# Check for password authentication (should be disabled in production)
if [ -f /etc/ssh/sshd_config ]; then
if grep -q "^PasswordAuthentication no" /etc/ssh/sshd_config; then
log_pass "SSH password authentication disabled (secure)"
else
log_warning "SSH password authentication may be enabled - key-only is recommended"
fi
fi
# Check for root login
if [ -f /etc/ssh/sshd_config ]; then
if grep -q "^PermitRootLogin no" /etc/ssh/sshd_config; then
log_pass "SSH root login disabled (secure)"
else
log_warning "SSH root login may be enabled - consider disabling"
fi
fi
return 0
}
check_firewall() {
# Check firewall status (informational)
echo "[⚙] Checking firewall..." >&2
if systemctl is-active ufw &>/dev/null; then
local ufw_status=$(sudo ufw status 2>/dev/null | head -n1)
log_pass "UFW active: $ufw_status"
elif systemctl is-active firewalld &>/dev/null; then
log_pass "firewalld active"
elif command -v iptables &>/dev/null; then
local iptables_rules=$(sudo iptables -L -n | wc -l)
if [ "$iptables_rules" -gt 8 ]; then
log_pass "iptables rules configured ($iptables_rules lines)"
else
log_warning "No firewall detected - consider enabling UFW"
fi
else
log_warning "No firewall detected"
fi
return 0
}
# --- TIME SYNCHRONIZATION ---
check_time_sync() {
# Validate NTP/timesyncd for cluster time synchronization
echo "[⚙] Checking time synchronization..." >&2
if systemctl is-active systemd-timesyncd &>/dev/null; then
local ntp_status=$(timedatectl status 2>/dev/null | grep "synchronized" | awk '{print $3}')
if [ "$ntp_status" == "yes" ]; then
log_pass "Time synchronized via systemd-timesyncd"
else
log_warning "Time sync not confirmed"
fi
elif systemctl is-active ntp &>/dev/null || systemctl is-active ntpd &>/dev/null; then
log_pass "NTP service running"
else
log_warning "No time synchronization service detected - critical for clusters"
fi
return 0
}
# --- COMPREHENSIVE VALIDATION SUITE ---
run_validation_suite() {
# Run all validation checks and return summary
# Returns 0 if no critical errors, 1 if critical errors found
reset_validation_counters
echo "======================================" >&2
echo "SYSTEM VALIDATION SUITE" >&2
echo "======================================" >&2
# Run all checks (continue even if some fail)
check_disk_space || true
check_disk_performance || true
check_memory || true
check_swap || true
check_network_routes || true
check_hostname_resolution || true
check_nfs_client || true
check_docker_daemon || true
check_proxmox_api || true
check_ssh_security || true
check_firewall || true
check_time_sync || true
# Summary
echo "======================================" >&2
echo "VALIDATION SUMMARY" >&2
echo " Passed: $VALIDATION_PASSED" >&2
echo " Warnings: $VALIDATION_WARNINGS" >&2
echo " Errors: $VALIDATION_ERRORS" >&2
echo "======================================" >&2
if [ $VALIDATION_ERRORS -gt 0 ]; then
echo "[✗] CRITICAL: $VALIDATION_ERRORS validation errors - manual intervention required" >&2
return 1
elif [ $VALIDATION_WARNINGS -gt 0 ]; then
echo "[!] WARNINGS: $VALIDATION_WARNINGS issues detected - review recommended" >&2
return 0
else
echo "[✓] ALL CHECKS PASSED" >&2
return 0
fi
}
# Export functions
export -f reset_validation_counters
export -f log_pass
export -f log_warning
export -f log_error
export -f check_disk_space
export -f check_disk_performance
export -f check_memory
export -f check_swap
export -f check_network_routes
export -f check_hostname_resolution
export -f check_nfs_client
export -f check_docker_daemon
export -f check_proxmox_api
export -f check_ssh_security
export -f check_firewall
export -f check_time_sync
export -f run_validation_suite

View File

@ -1,5 +1,46 @@
#!/bin/bash #!/bin/bash
# ==============================================================================
# DEPRECATED: onboarding.sh
# ==============================================================================
# ⚠️ DEPRECATION NOTICE
# This script is deprecated and will be removed in a future release.
# Please use the unified bootstrap.sh script instead:
#
# ./bootstrap.sh
#
# Note: Proxmox SSH key distribution is now handled by:
# 1. Run bootstrap.sh to generate keys
# 2. Manually copy keys to Proxmox: ssh-copy-id root@<proxmox-ip>
# 3. Run Ansible playbook: ansible-playbook playbooks/onboarding/proxmox_host.yml
#
# This wrapper will redirect to bootstrap.sh.
# ==============================================================================
set -euo pipefail
# Show deprecation warning
echo "=======================================" >&2
echo "⚠️ DEPRECATION WARNING" >&2
echo "=======================================" >&2
echo "onboarding.sh is deprecated!" >&2
echo "" >&2
echo "Please use: ./bootstrap.sh" >&2
echo "" >&2
echo "Note: Proxmox SSH key setup is now a separate step." >&2
echo "See documentation/SOPs/SOP-002-Initial-Infrastructure-Deployment.md" >&2
echo "" >&2
echo "Redirecting to bootstrap.sh in 5 seconds..." >&2
echo "Press Ctrl+C to cancel" >&2
echo "=======================================" >&2
sleep 5
# Redirect to unified bootstrap
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
exec "${SCRIPT_DIR}/bootstrap.sh" "$@"
# ==============================================================================
# LEGACY CODE BELOW (no longer executed)
# ============================================================================== # ==============================================================================
# ENVIRONMENT VARIABLES # ENVIRONMENT VARIABLES
# ============================================================================== # ==============================================================================

View File

@ -1,5 +1,44 @@
#!/bin/bash #!/bin/bash
# ==============================================================================
# DEPRECATED: pi_init.sh
# ==============================================================================
# ⚠️ DEPRECATION NOTICE
# This script is deprecated and will be removed in a future release.
# Please use the unified bootstrap.sh script instead:
#
# ./bootstrap.sh --hardware-type pi
#
# This wrapper will redirect to bootstrap.sh with appropriate flags.
# ==============================================================================
set -euo pipefail
# Show deprecation warning
echo "=======================================" >&2
echo "⚠️ DEPRECATION WARNING" >&2
echo "=======================================" >&2
echo "pi_init.sh is deprecated!" >&2
echo "" >&2
echo "Please use: ./bootstrap.sh" >&2
echo "" >&2
echo "Redirecting to bootstrap.sh in 5 seconds..." >&2
echo "Press Ctrl+C to cancel" >&2
echo "=======================================" >&2
sleep 5
# Redirect to unified bootstrap
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
exec "${SCRIPT_DIR}/bootstrap.sh" --hardware-type pi "$@"
# ==============================================================================
# LEGACY CODE BELOW (no longer executed)
# ==============================================================================
exit 0
#!/bin/bash
# ============================================================================== # ==============================================================================
# SINGLE-COMMAND BOOTSTRAP: IP, DOCKER, ANSIBLE # SINGLE-COMMAND BOOTSTRAP: IP, DOCKER, ANSIBLE
# Target: Ubuntu / Debian # Target: Ubuntu / Debian

215
scripts/validate-node.sh Normal file
View File

@ -0,0 +1,215 @@
#!/bin/bash
# ==============================================================================
# STANDALONE NODE VALIDATION TOOL
# ==============================================================================
# Comprehensive health check utility for homelab nodes
# Can be run independently on any managed host (post-bootstrap or ad-hoc)
#
# Usage:
# ./validate-node.sh [OPTIONS]
#
# Options:
# --help Show this help message
# --json Output results in JSON format
# --critical-only Show only critical errors (exit code 1 if any found)
# --verbose Show detailed output for each check
#
# Exit Codes:
# 0 - All checks passed or warnings only
# 1 - Critical errors found
# 2 - Invalid usage
#
# ==============================================================================
set -euo pipefail
# --- SCRIPT METADATA ---
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SCRIPT_NAME="$(basename "${BASH_SOURCE[0]}")"
VERSION="1.0.0"
# --- LOAD LIBRARIES ---
# shellcheck source=./lib/detection.sh
source "${SCRIPT_DIR}/lib/detection.sh"
# shellcheck source=./lib/validation.sh
source "${SCRIPT_DIR}/lib/validation.sh"
# shellcheck source=./lib/network.sh
source "${SCRIPT_DIR}/lib/network.sh"
# --- COMMAND LINE ARGUMENTS ---
SHOW_HELP=false
OUTPUT_JSON=false
CRITICAL_ONLY=false
VERBOSE=false
parse_arguments() {
while [[ $# -gt 0 ]]; do
case "$1" in
--help|-h)
SHOW_HELP=true
shift
;;
--json)
OUTPUT_JSON=true
shift
;;
--critical-only)
CRITICAL_ONLY=true
shift
;;
--verbose|-v)
VERBOSE=true
shift
;;
*)
echo "ERROR: Unknown option: $1" >&2
echo "Use --help for usage information" >&2
exit 2
;;
esac
done
}
show_help() {
cat <<'EOF'
=============================================================================
STANDALONE NODE VALIDATION TOOL v1.0.0
=============================================================================
Comprehensive health check for homelab infrastructure nodes.
Can be run on any managed host to verify operational readiness.
USAGE:
./validate-node.sh [OPTIONS]
OPTIONS:
--help Show this help message
--json Output results in JSON format (for monitoring integration)
--critical-only Show only critical errors (suppress warnings)
--verbose Show detailed output for each check
EXIT CODES:
0 - All checks passed (or warnings only)
1 - Critical errors found
2 - Invalid usage
CHECKS PERFORMED:
• Disk Space & Performance
• Memory & Swap Configuration
• Network Routes & Connectivity
• Hostname Resolution
• NFS Client Configuration
• Docker Daemon Health
• Proxmox API (if applicable)
• SSH Security Configuration
• Firewall Status
• Time Synchronization
EXAMPLES:
./validate-node.sh
Run all checks with standard output
./validate-node.sh --critical-only
Show only critical errors (useful in scripts)
./validate-node.sh --json
Output JSON for monitoring/alerting systems
./validate-node.sh --verbose
Show detailed information for each check
INTEGRATION:
JSON output can be consumed by monitoring systems (Prometheus, Grafana, etc.)
Example: ./validate-node.sh --json | jq '.errors'
=============================================================================
EOF
}
# --- JSON OUTPUT FUNCTIONS ---
generate_json_report() {
# Generate JSON output for monitoring integration
local hostname=$(hostname)
local timestamp=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
cat <<EOF
{
"hostname": "$hostname",
"timestamp": "$timestamp",
"validation": {
"passed": $VALIDATION_PASSED,
"warnings": $VALIDATION_WARNINGS,
"errors": $VALIDATION_ERRORS
},
"status": "$([ $VALIDATION_ERRORS -eq 0 ] && echo "healthy" || echo "critical")",
"hardware_type": "$(detect_hardware_type)",
"os_family": "$(detect_os_family)"
}
EOF
}
# --- MAIN WORKFLOW ---
main() {
# Parse CLI arguments
parse_arguments "$@"
if [ "$SHOW_HELP" == "true" ]; then
show_help
exit 0
fi
# Standard output header (unless JSON mode)
if [ "$OUTPUT_JSON" == "false" ]; then
echo "======================================="
echo "NODE VALIDATION TOOL v${VERSION}"
echo "Host: $(hostname)"
echo "Time: $(date -u +"%Y-%m-%d %H:%M:%S UTC")"
echo "======================================="
echo ""
fi
# Run validation suite
if [ "$VERBOSE" == "true" ]; then
# Show detection summary in verbose mode
[ "$OUTPUT_JSON" == "false" ] && print_detection_summary
[ "$OUTPUT_JSON" == "false" ] && echo ""
fi
# Run comprehensive validation
run_validation_suite
# Generate output
if [ "$OUTPUT_JSON" == "true" ]; then
generate_json_report
else
echo ""
echo "Validation completed."
echo ""
# Show summary (already printed by run_validation_suite)
if [ $VALIDATION_ERRORS -gt 0 ]; then
echo "⚠️ Critical issues found - manual intervention required"
exit 1
elif [ $VALIDATION_WARNINGS -gt 0 ]; then
echo " Warnings present - review recommended"
exit 0
else
echo "✅ All checks passed - node is healthy"
exit 0
fi
fi
# Exit code based on errors
[ $VALIDATION_ERRORS -eq 0 ] && exit 0 || exit 1
}
# --- ENTRY POINT ---
main "$@"