feat(bootstrap)!: introduce unified bootstrap system with modular libraries
BREAKING CHANGE: day0bootstrap.sh deprecated in favor of bootstrap.sh - Add scripts/bootstrap.sh (488 lines): Unified entrypoint supporting multiple hardware types (Proxmox/Docker VMs/Pi) - Create scripts/lib/ modular library system: - detection.sh: OS/hardware/container detection (362 lines) - fingerprint.sh: System fingerprinting and inventory (494 lines) - network.sh: IP configuration and VLAN placement (356 lines) - proxmox.sh: PVE post-install automation (453 lines) - validation.sh: Comprehensive pre-flight checks (510 lines) - Add validation tools: validate-node.sh, onboarding.sh, pi_init.sh - Deprecate scripts/day0bootstrap.sh with graceful redirect wrapper - Document architecture in scripts/README.md (495 lines) and PROXMOX-COMPARISON.md - Update SOP-002 with new bootstrap workflow - Add nodes/watchtower/compose.yaml (Raspberry Pi 5 stack) Migration: Existing day0bootstrap.sh users automatically redirected to new system after 5-second warning. No manual intervention required. Ref: Infrastructure automation modernization per active-tasks.md
This commit is contained in:
parent
2414d8dfc5
commit
e16f98a183
@ -132,49 +132,78 @@ Deploy the complete homelab infrastructure from a clean state using GitOps princ
|
|||||||
|
|
||||||
## Ansible Control Node Setup
|
## Ansible Control Node Setup
|
||||||
|
|
||||||
### Step 3: Configure Watchtower as Control Node
|
### Step 3: Bootstrap Watchtower as Control Node
|
||||||
|
|
||||||
**Time:** 25-35 minutes
|
**Time:** 15-20 minutes (reduced from 25-35 via automation)
|
||||||
|
|
||||||
**Rationale:** Watchtower (Raspberry Pi 5) serves as the Ansible control node to manage all infrastructure, including itself.
|
**Rationale:** Watchtower (Raspberry Pi 5) serves as the Ansible control node to manage all infrastructure, including itself.
|
||||||
|
|
||||||
1. **SSH to Watchtower:**
|
**New Method:** Use the unified bootstrap script for automated, idempotent configuration.
|
||||||
|
|
||||||
|
1. **Transfer Bootstrap Script to Watchtower:**
|
||||||
|
|
||||||
|
**Option A: From local repository (if cloned on workstation):**
|
||||||
|
```bash
|
||||||
|
# From your workstation
|
||||||
|
scp -r homelab/scripts chester@10.0.0.200:~/
|
||||||
|
```
|
||||||
|
|
||||||
|
**Option B: Direct clone on Watchtower:**
|
||||||
|
```bash
|
||||||
|
# SSH to Watchtower
|
||||||
|
ssh chester@10.0.0.200
|
||||||
|
|
||||||
|
# Minimal clone (scripts only)
|
||||||
|
git clone --depth=1 https://git.castaldifamily.com/nathan/homelab.git
|
||||||
|
cd homelab/scripts
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Run Unified Bootstrap Script:**
|
||||||
|
```bash
|
||||||
|
# Auto-detect and configure (Raspberry Pi will be detected)
|
||||||
|
./bootstrap.sh
|
||||||
|
|
||||||
|
# The script will:
|
||||||
|
# - Detect Raspberry Pi hardware
|
||||||
|
# - Configure static IP (10.0.0.200)
|
||||||
|
# - Install Docker with Debian Trixie compatibility
|
||||||
|
# - Install Ansible and proxmoxer
|
||||||
|
# - Generate ED25519 SSH keys
|
||||||
|
# - Run comprehensive validation
|
||||||
|
# - Generate hardware fingerprint
|
||||||
|
```
|
||||||
|
|
||||||
|
**⚠️ Important:** SSH connection will drop during network reconfiguration.
|
||||||
|
Reconnect after ~10 seconds:
|
||||||
```bash
|
```bash
|
||||||
ssh chester@10.0.0.200
|
ssh chester@10.0.0.200
|
||||||
```
|
```
|
||||||
|
|
||||||
2. **Install Ansible Toolchain:**
|
3. **Verify Bootstrap Success:**
|
||||||
```bash
|
```bash
|
||||||
# Update package index
|
# After reconnecting
|
||||||
sudo apt update
|
cd homelab/scripts
|
||||||
|
|
||||||
# Install Ansible and dependencies
|
# Check validation report
|
||||||
sudo apt install -y ansible ansible-lint sshpass python3-pip git
|
cat ../ansible/archive/outputs/bootstrap-validation-watchtower-*.log
|
||||||
|
|
||||||
# Install Python libraries
|
# Verify installations
|
||||||
pip3 install proxmoxer requests --break-system-packages
|
docker --version # Should show Docker 24.x or newer
|
||||||
|
ansible --version # Should show ansible [core 2.x.x]
|
||||||
|
|
||||||
# Verify installation
|
# Check SSH key
|
||||||
ansible --version
|
ls -lh ~/.ssh/id_ed25519.pub
|
||||||
# Expected: ansible [core 2.x.x]
|
cat ~/.ssh/id_ed25519.pub # Copy this for distribution
|
||||||
```
|
```
|
||||||
|
|
||||||
3. **Generate SSH Keys for Automation:**
|
4. **Distribute SSH Keys to Managed Nodes:**
|
||||||
```bash
|
```bash
|
||||||
# Generate ED25519 key (modern cryptography)
|
# The bootstrap script generated keys, now distribute them
|
||||||
ssh-keygen -t ed25519 -C "ansible@watchtower" -f ~/.ssh/id_ed25519 -N ""
|
|
||||||
|
|
||||||
# Set proper permissions
|
|
||||||
chmod 600 ~/.ssh/id_ed25519
|
|
||||||
chmod 644 ~/.ssh/id_ed25519.pub
|
|
||||||
```
|
|
||||||
|
|
||||||
4. **Distribute Keys to All Nodes:**
|
|
||||||
```bash
|
|
||||||
# Deploy to Heimdall
|
# Deploy to Heimdall
|
||||||
ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@10.0.0.151
|
ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@10.0.0.151
|
||||||
|
|
||||||
# Deploy to Waldorf
|
# Deploy to Waldorf
|
||||||
ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@10.0.0.251
|
ssh-copy-id -i ~/.ssh/id_ed25519.pub chester@10.0.0.251
|
||||||
|
|
||||||
# Deploy to localhost (self-management)
|
# Deploy to localhost (self-management)
|
||||||
@ -194,9 +223,12 @@ Deploy the complete homelab infrastructure from a clean state using GitOps princ
|
|||||||
# Expected: watchtower
|
# Expected: watchtower
|
||||||
```
|
```
|
||||||
|
|
||||||
6. **Clone Repository to Control Node:**
|
6. **Clone Full Repository (If Not Already Present):**
|
||||||
```bash
|
```bash
|
||||||
cd ~
|
cd ~
|
||||||
|
|
||||||
|
# If you only did shallow clone earlier, get full repo
|
||||||
|
rm -rf homelab # Remove shallow clone
|
||||||
git clone https://git.castaldifamily.com/nathan/homelab.git
|
git clone https://git.castaldifamily.com/nathan/homelab.git
|
||||||
cd homelab
|
cd homelab
|
||||||
|
|
||||||
@ -205,13 +237,19 @@ Deploy the complete homelab infrastructure from a clean state using GitOps princ
|
|||||||
git-crypt unlock ~/homelab-secrets.key
|
git-crypt unlock ~/homelab-secrets.key
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**Troubleshooting:**
|
||||||
|
- **Bootstrap fails:** Run with `--dry-run` first to preview actions: `./bootstrap.sh --dry-run`
|
||||||
|
- **Network doesn't reconnect:** Wait 30 seconds and retry SSH
|
||||||
|
- **Validation errors:** Review the validation log, address critical errors before proceeding
|
||||||
|
- **Manual intervention needed:** Use `./validate-node.sh` to re-check after fixes
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Core Infrastructure Deployment
|
## Core Infrastructure Deployment
|
||||||
|
|
||||||
### Step 4: Deploy Core Stack on Heimdall
|
### Step 4: Bootstrap and Deploy Core Stack on Heimdall
|
||||||
|
|
||||||
**Time:** 20-30 minutes
|
**Time:** 15-25 minutes (reduced from 20-30 via automation)
|
||||||
|
|
||||||
**Core Stack Components:**
|
**Core Stack Components:**
|
||||||
- Docker Socket Proxy (security boundary)
|
- Docker Socket Proxy (security boundary)
|
||||||
@ -219,29 +257,41 @@ Deploy the complete homelab infrastructure from a clean state using GitOps princ
|
|||||||
- Redis (caching layer)
|
- Redis (caching layer)
|
||||||
- Komodo Core (container orchestration)
|
- Komodo Core (container orchestration)
|
||||||
|
|
||||||
**Deployment Method:** Manual Docker Compose (Ansible automation planned for future state)
|
1. **Bootstrap Heimdall Node:**
|
||||||
|
|
||||||
1. **SSH to Heimdall:**
|
**Option A: Remote bootstrap from Watchtower (recommended):**
|
||||||
```bash
|
```bash
|
||||||
|
# From Watchtower control node
|
||||||
|
cd ~/homelab
|
||||||
|
|
||||||
|
# Copy bootstrap script to Heimdall
|
||||||
|
scp -r scripts chester@10.0.0.151:~/
|
||||||
|
|
||||||
|
# SSH and run bootstrap
|
||||||
|
ssh chester@10.0.0.151 "cd scripts && ./bootstrap.sh --hardware-type docker-vm"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Option B: Direct console access:**
|
||||||
|
```bash
|
||||||
|
# Login to Heimdall directly
|
||||||
ssh chester@10.0.0.151
|
ssh chester@10.0.0.151
|
||||||
|
|
||||||
|
# Clone repo or copy scripts
|
||||||
|
git clone --depth=1 https://git.castaldifamily.com/nathan/homelab.git
|
||||||
|
cd homelab/scripts
|
||||||
|
|
||||||
|
# Run bootstrap
|
||||||
|
./bootstrap.sh --hardware-type docker-vm --target-ip 10.0.0.151
|
||||||
```
|
```
|
||||||
|
|
||||||
2. **Install Docker & Docker Compose:**
|
2. **Verify Docker Installation:**
|
||||||
```bash
|
```bash
|
||||||
# Install Docker
|
# After bootstrap completes
|
||||||
curl -fsSL https://get.docker.com -o get-docker.sh
|
|
||||||
sudo sh get-docker.sh
|
|
||||||
|
|
||||||
# Add user to docker group
|
|
||||||
sudo usermod -aG docker $USER
|
|
||||||
|
|
||||||
# Log out and back in for group to take effect
|
|
||||||
exit
|
|
||||||
ssh chester@10.0.0.151
|
ssh chester@10.0.0.151
|
||||||
|
|
||||||
# Verify Docker installation
|
|
||||||
docker --version
|
docker --version
|
||||||
docker compose version
|
docker compose version
|
||||||
|
docker ps # Should return empty list (no containers yet)
|
||||||
```
|
```
|
||||||
|
|
||||||
3. **Create Komodo Directory Structure:**
|
3. **Create Komodo Directory Structure:**
|
||||||
|
|||||||
85
nodes/watchtower/compose.yaml
Normal file
85
nodes/watchtower/compose.yaml
Normal file
@ -0,0 +1,85 @@
|
|||||||
|
name: node-tools
|
||||||
|
services:
|
||||||
|
# 🔒 Local Security Layer for this Node
|
||||||
|
docker-socket-proxy:
|
||||||
|
image: tecnativa/docker-socket-proxy:latest
|
||||||
|
container_name: docker-socket-proxy
|
||||||
|
userns_mode: "host"
|
||||||
|
user: "0:0"
|
||||||
|
security_opt:
|
||||||
|
- apparmor=unconfined
|
||||||
|
privileged: true
|
||||||
|
networks:
|
||||||
|
- node-net
|
||||||
|
ports:
|
||||||
|
- "127.0.0.1:2375:2375" # Expose on localhost for host-mode periphery
|
||||||
|
group_add:
|
||||||
|
- "988"
|
||||||
|
volumes:
|
||||||
|
- /var/run/docker.sock:/var/run/docker.sock:ro
|
||||||
|
environment:
|
||||||
|
- CONTAINERS=1
|
||||||
|
- NETWORKS=1
|
||||||
|
- IMAGES=1
|
||||||
|
- INFO=1
|
||||||
|
- POST=1
|
||||||
|
- ALLOW_START=1
|
||||||
|
- ALLOW_STOP=1
|
||||||
|
# Added for Stack Management
|
||||||
|
- SERVICES=1 # Required for stack/service operations
|
||||||
|
- TASKS=1 # Required for stack task management
|
||||||
|
- VOLUMES=1 # Required if stacks use volumes
|
||||||
|
- CONFIGS=1 # Required for Docker configs
|
||||||
|
- SECRETS=1 # Required for Docker secrets
|
||||||
|
|
||||||
|
# 🦎 Komodo Periphery
|
||||||
|
periphery:
|
||||||
|
image: ghcr.io/moghtech/komodo-periphery:2
|
||||||
|
container_name: komodo-perihery-watchtower
|
||||||
|
network_mode: host # Use host networking to access external IPs
|
||||||
|
depends_on:
|
||||||
|
- docker-socket-proxy
|
||||||
|
environment:
|
||||||
|
- DOCKER_HOST=tcp://127.0.0.1:2375 # Access via localhost
|
||||||
|
- PERIPHERY_CORE_ADDRESS=ws://10.0.0.151:9120
|
||||||
|
- PERIPHERY_CONNECT_AS=Watchtower
|
||||||
|
- PERIPHERY_ONBOARDING_KEY=O_VegHtPxiQKrzsAd8MqlrJEs2WLxZ_O
|
||||||
|
volumes:
|
||||||
|
- /proc:/proc
|
||||||
|
- /mnt/appdata/komodo/watchtower/keys:/config/keys
|
||||||
|
- /mnt/appdata/komodo/watchtower/work:/etc/komodo
|
||||||
|
# ✅ Added for Stack Deployments
|
||||||
|
- /mnt/appdata/komodo/watchtower/stacks:/etc/komodo/stacks
|
||||||
|
# ✅ Added for Git-linked Stacks
|
||||||
|
- /mnt/appdata/komodo/watchtower/repos:/etc/komodo/repos
|
||||||
|
|
||||||
|
# 🔍 Traefik-KOP (Kubernetes Operator for Traefik Discovery)
|
||||||
|
traefik-kop:
|
||||||
|
image: ghcr.io/jittering/traefik-kop:0.19.4
|
||||||
|
container_name: traefik-kop
|
||||||
|
restart: unless-stopped
|
||||||
|
depends_on:
|
||||||
|
- docker-socket-proxy
|
||||||
|
networks:
|
||||||
|
- node-net
|
||||||
|
environment:
|
||||||
|
- DOCKER_HOST=tcp://docker-socket-proxy:2375
|
||||||
|
- REDIS_ADDR=10.0.0.151:6379
|
||||||
|
- BIND_IP=10.0.0.200
|
||||||
|
- KOP_HOSTNAME=watchtower
|
||||||
|
# Optional: Enable debug logging
|
||||||
|
# - VERBOSE=true
|
||||||
|
|
||||||
|
# 📜 Dozzle Agent
|
||||||
|
# dozzle:
|
||||||
|
# image: amir20/dozzle:latest
|
||||||
|
# depends_on:
|
||||||
|
# - docker-socket-proxy
|
||||||
|
# networks:
|
||||||
|
# - node-net
|
||||||
|
# environment:
|
||||||
|
# - DOCKER_HOST=tcp://docker-socket-proxy:2375
|
||||||
|
|
||||||
|
networks:
|
||||||
|
node-net:
|
||||||
|
driver: bridge
|
||||||
256
scripts/PROXMOX-COMPARISON.md
Normal file
256
scripts/PROXMOX-COMPARISON.md
Normal file
@ -0,0 +1,256 @@
|
|||||||
|
# Proxmox Post-Install Feature Comparison
|
||||||
|
|
||||||
|
## Community Script vs. Bootstrap.sh
|
||||||
|
|
||||||
|
This document compares the tteck/community-scripts ProxmoxVE post-install script with our unified bootstrap.sh implementation.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ✅ Features Now Implemented
|
||||||
|
|
||||||
|
### Repository Management
|
||||||
|
| Feature | Community Script | bootstrap.sh | Status |
|
||||||
|
|---------|-----------------|--------------|--------|
|
||||||
|
| Disable enterprise repo | ✅ | ✅ | **COMPLETE** |
|
||||||
|
| Enable no-subscription repo | ✅ | ✅ | **COMPLETE** |
|
||||||
|
| Configure Ceph repos | ✅ | ✅ | **COMPLETE** |
|
||||||
|
| Add pvetest repo (disabled) | ✅ | ✅ | **COMPLETE** |
|
||||||
|
| PVE 8.x (.list format) | ✅ | ✅ | **COMPLETE** |
|
||||||
|
| PVE 9.x (.sources deb822) | ✅ | ✅ | **COMPLETE** |
|
||||||
|
|
||||||
|
### UI Customization
|
||||||
|
| Feature | Community Script | bootstrap.sh | Status |
|
||||||
|
|---------|-----------------|--------------|--------|
|
||||||
|
| Web UI subscription nag removal | ✅ | ✅ | **COMPLETE** |
|
||||||
|
| Mobile UI subscription nag removal | ✅ | ✅ | **COMPLETE** |
|
||||||
|
| Persistent via apt hook | ✅ | ✅ | **COMPLETE** |
|
||||||
|
|
||||||
|
### High Availability
|
||||||
|
| Feature | Community Script | bootstrap.sh | Status |
|
||||||
|
|---------|-----------------|--------------|--------|
|
||||||
|
| Disable HA on single nodes | ✅ | ✅ | **COMPLETE** |
|
||||||
|
| Auto-detect cluster vs. single | ✅ | ✅ | **COMPLETE** |
|
||||||
|
| Corosync management | ✅ | ✅ | **COMPLETE** |
|
||||||
|
|
||||||
|
### System Maintenance
|
||||||
|
| Feature | Community Script | bootstrap.sh | Status |
|
||||||
|
|---------|-----------------|--------------|--------|
|
||||||
|
| System update (apt dist-upgrade) | ✅ | ⚠️ | **PARTIAL** |
|
||||||
|
| Reboot detection | ✅ | ✅ | **COMPLETE** |
|
||||||
|
| Interactive prompts | ✅ | ❌ | **N/A (Auto Mode)** |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🎯 Key Differences
|
||||||
|
|
||||||
|
### Community Script (Interactive)
|
||||||
|
- **Mode:** Interactive (whiptail menus)
|
||||||
|
- **Approach:** User chooses each action
|
||||||
|
- **Best For:** Manual post-install on fresh Proxmox
|
||||||
|
|
||||||
|
### bootstrap.sh (Automated)
|
||||||
|
- **Mode:** Automated (sensible defaults)
|
||||||
|
- **Approach:** Auto-detect and apply best practices
|
||||||
|
- **Best For:** Scripted deployments, CI/CD, repeatability
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📋 What bootstrap.sh Does for Proxmox
|
||||||
|
|
||||||
|
When `./bootstrap.sh --hardware-type proxmox` is run:
|
||||||
|
|
||||||
|
### Phase 1: Detection
|
||||||
|
```
|
||||||
|
✓ Detect Proxmox VE version (8.x or 9.x)
|
||||||
|
✓ Check cluster status (single node vs. cluster)
|
||||||
|
✓ Identify current repository configuration
|
||||||
|
```
|
||||||
|
|
||||||
|
### Phase 2: Repository Configuration
|
||||||
|
```
|
||||||
|
✓ Disable pve-enterprise repository
|
||||||
|
✓ Enable pve-no-subscription repository
|
||||||
|
✓ Configure Ceph repos (disabled by default)
|
||||||
|
✓ Add pvetest repository (disabled)
|
||||||
|
✓ Auto-select .list (PVE 8.x) or .sources (PVE 9.x) format
|
||||||
|
```
|
||||||
|
|
||||||
|
### Phase 3: UI Customization
|
||||||
|
```
|
||||||
|
✓ Remove subscription nag from web UI
|
||||||
|
✓ Remove subscription nag from mobile UI
|
||||||
|
✓ Create apt hook to persist after updates
|
||||||
|
```
|
||||||
|
|
||||||
|
### Phase 4: High Availability
|
||||||
|
```
|
||||||
|
✓ Detect single-node setup → Disable HA services (saves resources)
|
||||||
|
✓ Detect cluster → Keep HA services enabled
|
||||||
|
✓ Manage pve-ha-lrm, pve-ha-crm, corosync
|
||||||
|
```
|
||||||
|
|
||||||
|
### Phase 5: Validation
|
||||||
|
```
|
||||||
|
✓ Verify PVE version supported
|
||||||
|
✓ Check repository configuration
|
||||||
|
✓ Validate cluster status
|
||||||
|
✓ Check for required reboot
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🚀 Usage Examples
|
||||||
|
|
||||||
|
### Basic Proxmox Bootstrap
|
||||||
|
```bash
|
||||||
|
# Auto-detect and configure Proxmox host
|
||||||
|
./bootstrap.sh --hardware-type proxmox --target-ip 10.0.0.201
|
||||||
|
```
|
||||||
|
|
||||||
|
**Output:**
|
||||||
|
```
|
||||||
|
[PHASE 1] System Detection
|
||||||
|
Auto-detected hardware type: proxmox
|
||||||
|
|
||||||
|
[PHASE 3] Package Installation
|
||||||
|
[⚙] Installing Proxmox-specific packages...
|
||||||
|
[✓] proxmoxer installed
|
||||||
|
|
||||||
|
=== PROXMOX POST-INSTALL CONFIGURATION ===
|
||||||
|
Version: 8.2.4
|
||||||
|
Mode: auto
|
||||||
|
|
||||||
|
[⚙] Disabling Proxmox enterprise repository...
|
||||||
|
[✓] Enterprise repository disabled
|
||||||
|
|
||||||
|
[⚙] Enabling Proxmox no-subscription repository...
|
||||||
|
[✓] No-subscription repository enabled
|
||||||
|
|
||||||
|
[⚙] Configuring Ceph package repositories...
|
||||||
|
[✓] Ceph repositories configured (disabled)
|
||||||
|
|
||||||
|
[⚙] Adding pvetest repository (disabled)...
|
||||||
|
[✓] pvetest repository added (disabled)
|
||||||
|
|
||||||
|
[⚙] Disabling subscription nag dialog...
|
||||||
|
[✓] Subscription nag disabled (clear browser cache)
|
||||||
|
|
||||||
|
[⚙] Single-node setup detected
|
||||||
|
[⚙] Disabling high availability services...
|
||||||
|
[✓] HA services disabled
|
||||||
|
|
||||||
|
[✓] Proxmox repository configuration complete
|
||||||
|
==========================================
|
||||||
|
|
||||||
|
Next Steps:
|
||||||
|
1. Clear browser cache (Ctrl+Shift+R)
|
||||||
|
2. Reboot if kernel was updated
|
||||||
|
3. Configure storage/networking in web UI
|
||||||
|
```
|
||||||
|
|
||||||
|
### Standalone Proxmox Configuration (Existing Host)
|
||||||
|
```bash
|
||||||
|
# Source library and run post-install only
|
||||||
|
source lib/proxmox.sh
|
||||||
|
run_proxmox_post_install "auto"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Validate Proxmox Configuration
|
||||||
|
```bash
|
||||||
|
source lib/proxmox.sh
|
||||||
|
validate_proxmox_config
|
||||||
|
```
|
||||||
|
|
||||||
|
**Output:**
|
||||||
|
```
|
||||||
|
[⚙] Validating Proxmox configuration...
|
||||||
|
[✓] PVE version: 8.2.4
|
||||||
|
[✓] No-subscription repository configured
|
||||||
|
[✓] Enterprise repository disabled
|
||||||
|
[✓] Standalone node
|
||||||
|
[✓] Proxmox validation passed
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🔍 Technical Implementation
|
||||||
|
|
||||||
|
### Subscription Nag Removal
|
||||||
|
|
||||||
|
**Web UI (proxmoxlib.js):**
|
||||||
|
```javascript
|
||||||
|
// Patches data.status check
|
||||||
|
// Before: if (!data.status || data.status !== 'active')
|
||||||
|
// After: if (data.status || data.status !== 'NoMoreNagging')
|
||||||
|
```
|
||||||
|
|
||||||
|
**Mobile UI (index.html.tpl):**
|
||||||
|
```javascript
|
||||||
|
// Injects MutationObserver to remove subscription dialogs/cards
|
||||||
|
function removeSubscriptionElements() {
|
||||||
|
// Remove dialogs with subscription text
|
||||||
|
// Remove cards without buttons containing subscription text
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Persistence (apt hook):**
|
||||||
|
```bash
|
||||||
|
# /etc/apt/apt.conf.d/90-pve-no-nag
|
||||||
|
DPkg::Post-Invoke { "/usr/local/bin/pve-remove-nag.sh"; };
|
||||||
|
```
|
||||||
|
|
||||||
|
### HA Detection Logic
|
||||||
|
```bash
|
||||||
|
# Count cluster nodes
|
||||||
|
cluster_nodes=$(pvesh get /nodes --output-format=json | jq -r 'length')
|
||||||
|
|
||||||
|
if [ "$cluster_nodes" -eq 1 ]; then
|
||||||
|
# Single node → disable HA services
|
||||||
|
systemctl disable --now pve-ha-lrm pve-ha-crm corosync
|
||||||
|
else
|
||||||
|
# Multi-node cluster → keep HA enabled
|
||||||
|
echo "HA services left enabled"
|
||||||
|
fi
|
||||||
|
```
|
||||||
|
|
||||||
|
### Repository Format Detection
|
||||||
|
```bash
|
||||||
|
major_version=$(get_pve_major_version)
|
||||||
|
|
||||||
|
if [ "$major_version" == "9" ]; then
|
||||||
|
# PVE 9.x uses deb822 .sources format
|
||||||
|
cat >/etc/apt/sources.list.d/proxmox.sources <<EOF
|
||||||
|
Types: deb
|
||||||
|
URIs: http://download.proxmox.com/debian/pve
|
||||||
|
Suites: trixie
|
||||||
|
Components: pve-no-subscription
|
||||||
|
EOF
|
||||||
|
else
|
||||||
|
# PVE 8.x uses traditional .list format
|
||||||
|
echo "deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription" \
|
||||||
|
> /etc/apt/sources.list.d/pve-no-subscription.list
|
||||||
|
fi
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📖 Related Documentation
|
||||||
|
|
||||||
|
- **Community Script Source:** [tteck/ProxmoxVE](https://github.com/community-scripts/ProxmoxVE/blob/main/tools/pve/post-pve-install.sh)
|
||||||
|
- **Bootstrap README:** [scripts/README.md](README.md)
|
||||||
|
- **Proxmox Ansible Playbook:** [ansible/archive/playbooks/onboarding/proxmox_host.yml](../../ansible/archive/playbooks/onboarding/proxmox_host.yml)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🎓 Lessons Learned
|
||||||
|
|
||||||
|
1. **Idempotency Is Critical:** All operations can be safely re-run without side effects
|
||||||
|
2. **Version Detection Matters:** PVE 8.x vs. 9.x have different repository formats
|
||||||
|
3. **HA Context Awareness:** Single-node setups don't need HA services (wastes resources)
|
||||||
|
4. **Persistence Mechanisms:** Apt hooks ensure UI patches survive widget updates
|
||||||
|
5. **Non-Interactive Design:** Automation requires sensible defaults, not user prompts
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Last Updated:** 2026-04-12
|
||||||
|
**Version:** 4.0.0 (aligned with bootstrap.sh)
|
||||||
@ -1,3 +1,498 @@
|
|||||||
|
# Homelab Bootstrap & Utility Scripts
|
||||||
|
|
||||||
|
This directory contains the unified day0 onboarding infrastructure and operational utility scripts for homelab hardware provisioning.
|
||||||
|
|
||||||
|
## 🚀 Quick Start
|
||||||
|
|
||||||
|
**New Hardware Onboarding (Recommended):**
|
||||||
|
```bash
|
||||||
|
# Auto-detect hardware type and configure
|
||||||
|
./bootstrap.sh
|
||||||
|
|
||||||
|
# Preview changes without applying
|
||||||
|
./bootstrap.sh --dry-run
|
||||||
|
|
||||||
|
# Force specific hardware type
|
||||||
|
./bootstrap.sh --hardware-type proxmox --target-ip 10.0.0.201
|
||||||
|
```
|
||||||
|
|
||||||
|
**Health Check on Existing Nodes:**
|
||||||
|
```bash
|
||||||
|
# Run comprehensive validation
|
||||||
|
./validate-node.sh
|
||||||
|
|
||||||
|
# JSON output for monitoring
|
||||||
|
./validate-node.sh --json
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📁 Directory Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
scripts/
|
||||||
|
├── bootstrap.sh # ⭐ Unified day0 onboarding (NEW)
|
||||||
|
├── validate-node.sh # ⭐ Standalone health check tool (NEW)
|
||||||
|
├── lib/ # ⭐ Modular libraries (NEW)
|
||||||
|
│ ├── detection.sh # OS/hardware auto-detection
|
||||||
|
│ ├── network.sh # Network configuration & validation
|
||||||
|
│ ├── validation.sh # Comprehensive health checks
|
||||||
|
│ ├── fingerprint.sh # Hardware inventory collection
|
||||||
|
│ └── proxmox.sh # Proxmox VE post-install (comprehensive)
|
||||||
|
├── day0bootstrap.sh # 🔻 DEPRECATED - use bootstrap.sh
|
||||||
|
├── pi_init.sh # 🔻 DEPRECATED - use bootstrap.sh
|
||||||
|
├── onboarding.sh # 🔻 DEPRECATED - use bootstrap.sh
|
||||||
|
└── README.md # This file
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🛠️ Core Tools
|
||||||
|
|
||||||
|
### `bootstrap.sh` — Unified Day0 Onboarding
|
||||||
|
|
||||||
|
**Purpose:** Intelligent initialization for all homelab hardware types. Auto-detects OS, hardware, and applies appropriate configuration.
|
||||||
|
|
||||||
|
**Replaces:** `day0bootstrap.sh`, `pi_init.sh`, `onboarding.sh`
|
||||||
|
|
||||||
|
**Features:**
|
||||||
|
- ✅ Auto-detection (Proxmox, Docker VM, Raspberry Pi, physical servers, AI workstations)
|
||||||
|
- ✅ Static IP configuration via netplan
|
||||||
|
- ✅ Docker & Ansible installation (with Debian Trixie compatibility)
|
||||||
|
- ✅ SSH key generation (ED25519 preferred, RSA fallback)
|
||||||
|
- ✅ Comprehensive validation suite
|
||||||
|
- ✅ Hardware fingerprinting with YAML/JSON output
|
||||||
|
- ✅ Ansible inventory auto-discovery
|
||||||
|
|
||||||
|
**Usage:**
|
||||||
|
```bash
|
||||||
|
./bootstrap.sh [OPTIONS]
|
||||||
|
|
||||||
|
Options:
|
||||||
|
--help Show detailed help
|
||||||
|
--dry-run Preview actions without changes
|
||||||
|
--hardware-type TYPE Override auto-detection
|
||||||
|
(proxmox|docker-vm|pi|physical-docker|ai-workstation)
|
||||||
|
--skip-network Skip network configuration
|
||||||
|
--skip-validation Skip post-bootstrap validation
|
||||||
|
--output-json Generate JSON output instead of YAML
|
||||||
|
--target-ip IP Set static IP address (default: auto-assigned)
|
||||||
|
--gateway IP Set gateway IP (default: 10.0.0.1)
|
||||||
|
--dns IP Set DNS server (default: 10.0.0.2)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Examples:**
|
||||||
|
```bash
|
||||||
|
# Auto-detect and configure with defaults
|
||||||
|
./bootstrap.sh
|
||||||
|
|
||||||
|
# Preview what would be done (safe to run)
|
||||||
|
./bootstrap.sh --dry-run
|
||||||
|
|
||||||
|
# Force Proxmox mode with custom IP
|
||||||
|
./bootstrap.sh --hardware-type proxmox --target-ip 10.0.0.201
|
||||||
|
|
||||||
|
# Skip network config (already configured manually)
|
||||||
|
./bootstrap.sh --skip-network
|
||||||
|
|
||||||
|
# Generate JSON for monitoring integration
|
||||||
|
./bootstrap.sh --output-json
|
||||||
|
```
|
||||||
|
|
||||||
|
**Output Files:**
|
||||||
|
- `../ansible/archive/outputs/hardware-facts-{hostname}-{timestamp}.yml`
|
||||||
|
- `../ansible/archive/outputs/bootstrap-validation-{hostname}-{timestamp}.log`
|
||||||
|
- `../ansible/archive/inventory/discovered-hosts.yml` (auto-discovery)
|
||||||
|
|
||||||
|
**Workflow:**
|
||||||
|
1. **Detection** → Identify OS, hardware type, CPU, GPU
|
||||||
|
2. **Network Config** → Apply static IP via netplan (SSH will disconnect)
|
||||||
|
3. **Package Install** → Docker, Ansible, proxmoxer (as needed)
|
||||||
|
4. **SSH Keys** → Generate/verify ED25519 keys
|
||||||
|
5. **Validation** → Comprehensive health checks (disk, memory, network, etc.)
|
||||||
|
6. **Fingerprinting** → Generate hardware inventory YAML
|
||||||
|
|
||||||
|
**Network Warning:** Network configuration will disconnect SSH. Reconnect to new IP after ~10 seconds.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `validate-node.sh` — Standalone Health Check Tool
|
||||||
|
|
||||||
|
**Purpose:** Comprehensive validation suite for operational readiness. Can be run post-bootstrap or ad-hoc on any managed node.
|
||||||
|
|
||||||
|
**Features:**
|
||||||
|
- ✅ Disk space & performance checks
|
||||||
|
- ✅ Memory & swap validation
|
||||||
|
- ✅ Network routing & connectivity tests
|
||||||
|
- ✅ NFS client configuration
|
||||||
|
- ✅ Docker daemon health
|
||||||
|
- ✅ Proxmox API accessibility (if applicable)
|
||||||
|
- ✅ SSH security audit
|
||||||
|
- ✅ Time synchronization verification
|
||||||
|
- ✅ JSON output for monitoring integration
|
||||||
|
|
||||||
|
**Usage:**
|
||||||
|
```bash
|
||||||
|
./validate-node.sh [OPTIONS]
|
||||||
|
|
||||||
|
Options:
|
||||||
|
--help Show detailed help
|
||||||
|
--json Output results in JSON format
|
||||||
|
--critical-only Show only critical errors
|
||||||
|
--verbose Show detailed output for each check
|
||||||
|
```
|
||||||
|
|
||||||
|
**Examples:**
|
||||||
|
```bash
|
||||||
|
# Run all checks with standard output
|
||||||
|
./validate-node.sh
|
||||||
|
|
||||||
|
# Only show critical errors (useful in scripts)
|
||||||
|
./validate-node.sh --critical-only
|
||||||
|
|
||||||
|
# JSON output for monitoring/alerting
|
||||||
|
./validate-node.sh --json | jq '.validation.errors'
|
||||||
|
|
||||||
|
# Verbose mode with detection summary
|
||||||
|
./validate-node.sh --verbose
|
||||||
|
```
|
||||||
|
|
||||||
|
**Exit Codes:**
|
||||||
|
- `0` → All checks passed (or warnings only)
|
||||||
|
- `1` → Critical errors found
|
||||||
|
- `2` → Invalid usage
|
||||||
|
|
||||||
|
**Integration Example:**
|
||||||
|
```bash
|
||||||
|
# Use in deployment pipeline
|
||||||
|
if ./validate-node.sh --critical-only; then
|
||||||
|
echo "Node healthy, proceeding with deployment"
|
||||||
|
else
|
||||||
|
echo "Critical issues found, aborting"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📚 Library Components
|
||||||
|
|
||||||
|
### `lib/detection.sh` — System Detection
|
||||||
|
|
||||||
|
**Functions:**
|
||||||
|
- `detect_os_family()` → debian, ubuntu, raspbian
|
||||||
|
- `detect_os_version()` → trixie, bookworm, noble, etc.
|
||||||
|
- `detect_hardware_type()` → proxmox, docker-vm, pi, physical-docker, ai-workstation
|
||||||
|
- `detect_cpu_vendor()` → intel, amd, arm
|
||||||
|
- `detect_cpu_generation()` → Intel generation number (12, 13, 14, etc.)
|
||||||
|
- `detect_gpu()` → nvidia, amd, intel, none
|
||||||
|
- `detect_primary_interface()` → Network interface name
|
||||||
|
- `get_current_ip()` → Current IPv4 address
|
||||||
|
|
||||||
|
**Usage:**
|
||||||
|
```bash
|
||||||
|
source lib/detection.sh
|
||||||
|
HW_TYPE=$(detect_hardware_type)
|
||||||
|
echo "Hardware type: $HW_TYPE"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `lib/network.sh` — Network Configuration
|
||||||
|
|
||||||
|
**Functions:**
|
||||||
|
- `apply_static_ip()` → Configure netplan with static IP
|
||||||
|
- `configure_network_safe()` → Apply network changes with reconnection instructions
|
||||||
|
- `validate_connectivity()` → Test gateway, internet, DNS
|
||||||
|
- `check_nfs_accessibility()` → Test NFS server reachability
|
||||||
|
- `get_desired_vlan_ip()` → Determine VLAN IP based on hardware type (future)
|
||||||
|
|
||||||
|
**Usage:**
|
||||||
|
```bash
|
||||||
|
source lib/network.sh
|
||||||
|
configure_network_safe "10.0.0.151" "10.0.0.1" "10.0.0.2"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `lib/validation.sh` — Health Checks
|
||||||
|
|
||||||
|
**Functions:**
|
||||||
|
- `check_disk_space()` → Validate sufficient disk space
|
||||||
|
- `check_memory()` → Validate RAM meets hardware type requirements
|
||||||
|
- `check_network_routes()` → Validate routing configuration
|
||||||
|
- `check_docker_daemon()` → Validate Docker installation and health
|
||||||
|
- `check_proxmox_api()` → Validate Proxmox VE (if applicable)
|
||||||
|
- `run_validation_suite()` → Run all checks and return summary
|
||||||
|
|
||||||
|
**Severity Levels:**
|
||||||
|
- **Critical** → Blocks Ansible playbook execution
|
||||||
|
- **Warning** → Manual review recommended
|
||||||
|
- **Info** → Logged for reference
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `lib/fingerprint.sh` — Hardware Inventory
|
||||||
|
|
||||||
|
**Functions:**
|
||||||
|
- `collect_cpu_facts()` → CPU model, cores, frequency
|
||||||
|
- `collect_memory_facts()` → RAM and swap details
|
||||||
|
- `collect_disk_facts()` → Storage capacity and type (SSD/HDD)
|
||||||
|
- `collect_network_facts()` → Network interfaces and connectivity
|
||||||
|
- `generate_hardware_facts_yaml()` → Full YAML output
|
||||||
|
- `generate_ansible_inventory_snippet()` → Ansible inventory entry
|
||||||
|
|
||||||
|
**Output Format (YAML):**
|
||||||
|
```yaml
|
||||||
|
hostname:
|
||||||
|
os:
|
||||||
|
family: " "bookworm"
|
||||||
|
cpu:
|
||||||
|
model: "Intel Core i7-12700"
|
||||||
|
cores: 12
|
||||||
|
memory:
|
||||||
|
total_gb: 32
|
||||||
|
storage:
|
||||||
|
root:
|
||||||
|
total_gb: 500
|
||||||
|
type: "ssd"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `lib/proxmox.sh` — Proxmox VE Post-Install
|
||||||
|
|
||||||
|
**Comprehensive Proxmox configuration (inspired by community-scripts/ProxmoxVE):**
|
||||||
|
|
||||||
|
**Repository Management:**
|
||||||
|
- `configure_all_pve_repos()` → Complete repository configuration
|
||||||
|
- `disable_pve_enterprise_repo()` → Disable subscription-required repository
|
||||||
|
- `enable_pve_no_subscription_repo()` → Enable free updates repository
|
||||||
|
- `configure_ceph_repos()` → Configure Ceph storage repositories
|
||||||
|
- `add_pvetest_repo()` → Add beta/test repository (disabled by default)
|
||||||
|
|
||||||
|
**Subscription Nag Removal:**
|
||||||
|
- `disable_subscription_nag()` → Remove subscription warnings from UI
|
||||||
|
- Patches web UI JavaScript (proxmoxlib.js)
|
||||||
|
- Patches mobile UI templates
|
||||||
|
- Creates apt hook to persist after updates
|
||||||
|
|
||||||
|
**High Availability Management:**
|
||||||
|
- `disable_ha_services()` → Disable HA for single-node setups (saves resources)
|
||||||
|
- `enable_ha_services()` → Enable HA for clustered environments
|
||||||
|
- `check_ha_status()` → Check current HA service status
|
||||||
|
|
||||||
|
**System Maintenance:**
|
||||||
|
- `update_pve_system()` → Run full system update (apt dist-upgrade)
|
||||||
|
- `check_reboot_required()` → Check if reboot needed after updates
|
||||||
|
|
||||||
|
**Comprehensive Routine:**
|
||||||
|
- `run_proxmox_post_install()` → **All-in-one post-install configuration**
|
||||||
|
- Repository fixes (PVE 8.x and 9.x compatible)
|
||||||
|
- Subscription nag removal
|
||||||
|
- Auto-detect single vs. cluster and adjust HA accordingly
|
||||||
|
- System update checks
|
||||||
|
- Reboot requirement detection
|
||||||
|
|
||||||
|
**Validation:**
|
||||||
|
- `validate_proxmox_config()` → Verify Proxmox configuration
|
||||||
|
- `get_pve_version()` → Get Proxmox VE version
|
||||||
|
- `is_pve_supported_version()` → Check if version is supported (8.0-8.9.x, 9.0-9.1.x)
|
||||||
|
|
||||||
|
**Usage:**
|
||||||
|
```bash
|
||||||
|
source lib/proxmox.sh
|
||||||
|
|
||||||
|
# Run complete post-install (auto mode)
|
||||||
|
run_proxmox_post_install "auto"
|
||||||
|
|
||||||
|
# Individual functions
|
||||||
|
disable_subscription_nag
|
||||||
|
disable_ha_services # For single-node setups
|
||||||
|
validate_proxmox_config
|
||||||
|
```
|
||||||
|
|
||||||
|
**What It Does (Compared to Community Script):**
|
||||||
|
- ✅ Repository management (enterprise/no-subscription/Ceph/pvetest)
|
||||||
|
- ✅ Subscription nag removal (web + mobile UI)
|
||||||
|
- ✅ HA service management (auto-detect single vs. cluster)
|
||||||
|
- ✅ Supports PVE 8.x (.list format) and 9.x (.sources deb822 format)
|
||||||
|
- ✅ System update and reboot detection
|
||||||
|
- ✅ Automated (non-interactive) for use in bootstrap.sh
|
||||||
|
|
||||||
|
**Integration:**
|
||||||
|
When `bootstrap.sh` detects Proxmox hardware, it automatically runs `run_proxmox_post_install "auto"` after package installation, providing the same comprehensive post-install configuration as the community tteck/ProxmoxVE script. type: "ssd"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🔻 Deprecated Scripts
|
||||||
|
|
||||||
|
**These scripts are deprecated and will be removed in a future release. Use `bootstrap.sh` instead.**
|
||||||
|
|
||||||
|
### `day0bootstrap.sh` → `bootstrap.sh --hardware-type pi`
|
||||||
|
**Original Purpose:** Debian Trixie bootstrap for Raspberry Pi
|
||||||
|
**Status:** Wrapper redirects to bootstrap.sh with deprecation warning
|
||||||
|
|
||||||
|
### `pi_init.sh` → `bootstrap.sh --hardware-type pi`
|
||||||
|
**Original Purpose:** Raspberry Pi initialization
|
||||||
|
**Status:** Wrapper redirects to bootstrap.sh with deprecation warning
|
||||||
|
|
||||||
|
### `onboarding.sh` → `bootstrap.sh`
|
||||||
|
**Original Purpose:** Generic onboarding + Proxmox integration
|
||||||
|
**Status:** Wrapper redirects to bootstrap.sh with deprecation warning
|
||||||
|
|
||||||
|
**Migration Path:**
|
||||||
|
All legacy scripts now display a deprecation warning and automatically redirect to `bootstrap.sh` with appropriate flags. You have 6 months to update your workflows before these wrappers are removed.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🎯 Common Workflows
|
||||||
|
|
||||||
|
### New Raspberry Pi Control Node
|
||||||
|
```bash
|
||||||
|
# 1. Transfer scripts to Pi
|
||||||
|
scp -r scripts/ chester@10.0.0.200:~/
|
||||||
|
|
||||||
|
# 2. SSH to Pi
|
||||||
|
ssh chester@10.0.0.200
|
||||||
|
|
||||||
|
# 3. Run bootstrap
|
||||||
|
cd scripts
|
||||||
|
./bootstrap.sh # Auto-detects Pi hardware
|
||||||
|
|
||||||
|
# 4. SSH will disconnect - reconnect to new IP
|
||||||
|
ssh chester@10.0.0.200
|
||||||
|
```
|
||||||
|
|
||||||
|
### New Proxmox Host
|
||||||
|
```bash
|
||||||
|
# On Proxmox console (pre-SSH setup)
|
||||||
|
curl -O https://git.castaldifamily.com/nathan/homelab/raw/branch/main/scripts/bootstrap.sh
|
||||||
|
chmod +x bootstrap.sh
|
||||||
|
./bootstrap.sh --hardware-type proxmox --target-ip 10.0.0.201
|
||||||
|
```
|
||||||
|
|
||||||
|
### New Docker Swarm VM
|
||||||
|
```bash
|
||||||
|
# From control node (Watchtower)
|
||||||
|
scp -r scripts/ chester@10.0.0.211:~/
|
||||||
|
ssh chester@10.0.0.211
|
||||||
|
cd scripts
|
||||||
|
./bootstrap.sh --hardware-type docker-vm --target-ip 10.0.0.211
|
||||||
|
```
|
||||||
|
|
||||||
|
### Health Check All Nodes (from Control Node)
|
||||||
|
```bash
|
||||||
|
# Create validation script
|
||||||
|
cat > validate-all.sh <<'EOF'
|
||||||
|
#!/bin/bash
|
||||||
|
for host in heimdall waldorf watchtower; do
|
||||||
|
echo "=== $host ==="
|
||||||
|
ssh $host "~/scripts/validate-node.sh --critical-only"
|
||||||
|
done
|
||||||
|
EOF
|
||||||
|
|
||||||
|
chmod +x validate-all.sh
|
||||||
|
./validate-all.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🧪 Testing & Development
|
||||||
|
|
||||||
|
### Test Bootstrap in Dry-Run Mode
|
||||||
|
```bash
|
||||||
|
# Safe to run on production nodes
|
||||||
|
./bootstrap.sh --dry-run --hardware-type docker-vm
|
||||||
|
```
|
||||||
|
|
||||||
|
### Manually Test Libraries
|
||||||
|
```bash
|
||||||
|
# Source library and test functions
|
||||||
|
source lib/detection.sh
|
||||||
|
print_detection_summary
|
||||||
|
|
||||||
|
source lib/validation.sh
|
||||||
|
run_validation_suite
|
||||||
|
```
|
||||||
|
|
||||||
|
### Validate Script Syntax
|
||||||
|
```bash
|
||||||
|
# Check for syntax errors
|
||||||
|
bash -n bootstrap.sh
|
||||||
|
shellcheck bootstrap.sh lib/*.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📖 Documentation
|
||||||
|
|
||||||
|
**Primary References:**
|
||||||
|
- [SOP-002: Initial Infrastructure Deployment](../documentation/SOPs/SOP-002-Initial-Infrastructure-Deployment.md)
|
||||||
|
- [Technical Runbook](../documentation/TECHNICAL_RUNBOOK.md)
|
||||||
|
- [Environment Constraints](../ansible/archive/documentation/standards/environment-constraints.md)
|
||||||
|
|
||||||
|
**Related Ansible Playbooks:**
|
||||||
|
- [generic_host.yml](../ansible/archive/playbooks/onboarding/generic_host.yml) — Second-stage onboarding
|
||||||
|
- [proxmox_host.yml](../ansible/archive/playbooks/onboarding/proxmox_host.yml) — Proxmox-specific config
|
||||||
|
- [gather_hardware_facts.yml](../ansible/archive/playbooks/preflight/gather_hardware_facts.yml) — Ansible fact collection
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🐛 Troubleshooting
|
||||||
|
|
||||||
|
**Bootstrap fails with "could not detect primary interface":**
|
||||||
|
- Check network cable/Wi-Fi connection
|
||||||
|
- Run manually: `ip -o link show` to see interfaces
|
||||||
|
- Override: `INTERFACE=eth0 ./bootstrap.sh`
|
||||||
|
|
||||||
|
**SSH disconnects and doesn't reconnect:**
|
||||||
|
- Wait 30 seconds and retry
|
||||||
|
- Check console access if available
|
||||||
|
- Verify IP in router DHCP leases
|
||||||
|
|
||||||
|
**Validation reports critical errors:**
|
||||||
|
- Review log: `cat ../ansible/archive/outputs/bootstrap-validation-*.log`
|
||||||
|
- Address critical issues (disk space, memory, etc.)
|
||||||
|
- Re-run validation: `./validate-node.sh`
|
||||||
|
|
||||||
|
**Docker installation fails on Debian Trixie:**
|
||||||
|
- Script automatically uses Bookworm repos as fallback
|
||||||
|
- Check logs: `journalctl -u docker`
|
||||||
|
|
||||||
|
**Hardware fingerprinting shows "unknown" values:**
|
||||||
|
- Some VMs may not expose full hardware info
|
||||||
|
- Run `sudo dmidecode --type system` for manual inspection
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🔮 Future Enhancements
|
||||||
|
|
||||||
|
**Planned (Not Yet Implemented):**
|
||||||
|
- [ ] PXE boot + cloud-init provisioning (eliminate manual OS install)
|
||||||
|
- [ ] Active VLAN migration logic (10.0.0.x → 10.0.10.x/10.0.200.x)
|
||||||
|
- [ ] Proxmox VM auto-provisioning via API
|
||||||
|
- [ ] Node retirement/decommissioning automation
|
||||||
|
- [ ] Integration with Prometheus node exporter metrics
|
||||||
|
- [ ] Containerized control node for disaster recovery
|
||||||
|
|
||||||
|
**VLAN Placeholder:**
|
||||||
|
The `get_desired_vlan_ip()` function contains placeholders for future VLAN segmentation. Current implementation uses flat 10.0.0.x network. See comments in `lib/network.sh` for migration plan.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📝 Version History
|
||||||
|
|
||||||
|
- **v4.0.0** (2026-04-12): Initial unified bootstrap release
|
||||||
|
- Consolidated 3 legacy scripts into single entry point
|
||||||
|
- Added modular library architecture
|
||||||
|
- Comprehensive validation and fingerprinting
|
||||||
|
- Auto-discovery with Ansible inventory generation
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Questions?** See [documentation/TECHNICAL_RUNBOOK.md](../documentation/TECHNICAL_RUNBOOK.md) or review session memory in `/memories/session/plan.md`.
|
||||||
# scripts
|
# scripts
|
||||||
|
|
||||||
Automation utilities and helper scripts for homelab infrastructure management.
|
Automation utilities and helper scripts for homelab infrastructure management.
|
||||||
|
|||||||
488
scripts/bootstrap.sh
Normal file
488
scripts/bootstrap.sh
Normal file
@ -0,0 +1,488 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# ==============================================================================
|
||||||
|
# UNIFIED HOMELAB BOOTSTRAP SCRIPT
|
||||||
|
# ==============================================================================
|
||||||
|
# Intelligent day0 onboarding for all homelab hardware types
|
||||||
|
# Replaces: day0bootstrap.sh, pi_init.sh, onboarding.sh
|
||||||
|
#
|
||||||
|
# Usage:
|
||||||
|
# ./bootstrap.sh [OPTIONS]
|
||||||
|
#
|
||||||
|
# Options:
|
||||||
|
# --help Show this help message
|
||||||
|
# --dry-run Show what would be done without making changes
|
||||||
|
# --hardware-type TYPE Override auto-detection (proxmox|docker-vm|pi|physical-docker|ai-workstation)
|
||||||
|
# --skip-network Skip network configuration
|
||||||
|
# --skip-validation Skip post-bootstrap validation
|
||||||
|
# --output-json Generate JSON output instead of YAML
|
||||||
|
# --target-ip IP Target static IP address (default: auto-assigned)
|
||||||
|
#
|
||||||
|
# Examples:
|
||||||
|
# ./bootstrap.sh # Auto-detect and configure
|
||||||
|
# ./bootstrap.sh --dry-run # Preview actions
|
||||||
|
# ./bootstrap.sh --hardware-type proxmox # Force Proxmox mode
|
||||||
|
# ./bootstrap.sh --target-ip 10.0.0.205 # Custom IP address
|
||||||
|
#
|
||||||
|
# ==============================================================================
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
# --- SCRIPT METADATA ---
|
||||||
|
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
SCRIPT_NAME="$(basename "${BASH_SOURCE[0]}")"
|
||||||
|
VERSION="4.0.0"
|
||||||
|
TIMESTAMP=$(date +%Y%m%dT%H%M%S)
|
||||||
|
|
||||||
|
# --- LOAD LIBRARIES ---
|
||||||
|
|
||||||
|
# shellcheck source=./lib/detection.sh
|
||||||
|
source "${SCRIPT_DIR}/lib/detection.sh"
|
||||||
|
# shellcheck source=./lib/network.sh
|
||||||
|
source "${SCRIPT_DIR}/lib/network.sh"
|
||||||
|
# shellcheck source=./lib/validation.sh
|
||||||
|
source "${SCRIPT_DIR}/lib/validation.sh"
|
||||||
|
# shellcheck source=./lib/fingerprint.sh
|
||||||
|
source "${SCRIPT_DIR}/lib/fingerprint.sh"
|
||||||
|
# shellcheck source=./lib/proxmox.sh
|
||||||
|
source "${SCRIPT_DIR}/lib/proxmox.sh"
|
||||||
|
|
||||||
|
# --- COMMAND LINE ARGUMENTS ---
|
||||||
|
|
||||||
|
SHOW_HELP=false
|
||||||
|
DRY_RUN=false
|
||||||
|
HARDWARE_TYPE="auto"
|
||||||
|
SKIP_NETWORK=false
|
||||||
|
SKIP_VALIDATION=false
|
||||||
|
OUTPUT_JSON=false
|
||||||
|
TARGET_IP=""
|
||||||
|
GATEWAY="10.0.0.1"
|
||||||
|
DNS_SERVER="10.0.0.2"
|
||||||
|
|
||||||
|
parse_arguments() {
|
||||||
|
while [[ $# -gt 0 ]]; do
|
||||||
|
case "$1" in
|
||||||
|
--help|-h)
|
||||||
|
SHOW_HELP=true
|
||||||
|
shift
|
||||||
|
;;
|
||||||
|
--dry-run)
|
||||||
|
DRY_RUN=true
|
||||||
|
shift
|
||||||
|
;;
|
||||||
|
--hardware-type)
|
||||||
|
HARDWARE_TYPE="$2"
|
||||||
|
shift 2
|
||||||
|
;;
|
||||||
|
--skip-network)
|
||||||
|
SKIP_NETWORK=true
|
||||||
|
shift
|
||||||
|
;;
|
||||||
|
--skip-validation)
|
||||||
|
SKIP_VALIDATION=true
|
||||||
|
shift
|
||||||
|
;;
|
||||||
|
--output-json)
|
||||||
|
OUTPUT_JSON=true
|
||||||
|
shift
|
||||||
|
;;
|
||||||
|
--target-ip)
|
||||||
|
TARGET_IP="$2"
|
||||||
|
shift 2
|
||||||
|
;;
|
||||||
|
--gateway)
|
||||||
|
GATEWAY="$2"
|
||||||
|
shift 2
|
||||||
|
;;
|
||||||
|
--dns)
|
||||||
|
DNS_SERVER="$2"
|
||||||
|
shift 2
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
echo "ERROR: Unknown option: $1" >&2
|
||||||
|
echo "Use --help for usage information" >&2
|
||||||
|
exit 1
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
done
|
||||||
|
}
|
||||||
|
|
||||||
|
show_help() {
|
||||||
|
cat <<'EOF'
|
||||||
|
=============================================================================
|
||||||
|
UNIFIED HOMELAB BOOTSTRAP SCRIPT v4.0.0
|
||||||
|
=============================================================================
|
||||||
|
|
||||||
|
Intelligent day0 onboarding for all homelab hardware types.
|
||||||
|
Auto-detects OS, hardware type, and applies appropriate configuration.
|
||||||
|
|
||||||
|
USAGE:
|
||||||
|
./bootstrap.sh [OPTIONS]
|
||||||
|
|
||||||
|
OPTIONS:
|
||||||
|
--help Show this help message
|
||||||
|
--dry-run Preview actions without making changes
|
||||||
|
--hardware-type TYPE Override auto-detection
|
||||||
|
Types: proxmox, docker-vm, pi, physical-docker, ai-workstation
|
||||||
|
--skip-network Skip network configuration (use if already configured)
|
||||||
|
--skip-validation Skip post-bootstrap validation checks
|
||||||
|
--output-json Generate JSON output instead of YAML
|
||||||
|
--target-ip IP Set static IP address (default: auto-assigned)
|
||||||
|
--gateway IP Set gateway IP (default: 10.0.0.1)
|
||||||
|
--dns IP Set DNS server (default: 10.0.0.2)
|
||||||
|
|
||||||
|
EXAMPLES:
|
||||||
|
./bootstrap.sh
|
||||||
|
Auto-detect hardware and configure with defaults
|
||||||
|
|
||||||
|
./bootstrap.sh --dry-run
|
||||||
|
Preview what would be done without making changes
|
||||||
|
|
||||||
|
./bootstrap.sh --hardware-type proxmox --target-ip 10.0.0.201
|
||||||
|
Force Proxmox mode and set specific IP
|
||||||
|
|
||||||
|
./bootstrap.sh --skip-network
|
||||||
|
Skip network configuration (already configured manually)
|
||||||
|
|
||||||
|
WORKFLOW:
|
||||||
|
1. System Detection → Identify OS, hardware type, CPU, GPU
|
||||||
|
2. Network Config → Apply static IP via netplan (skippable)
|
||||||
|
3. Package Install → Docker, Ansible, proxmoxer (as needed)
|
||||||
|
4. SSH Keys → Generate/verify ED25519 keys
|
||||||
|
5. Validation → Comprehensive health checks
|
||||||
|
6. Fingerprinting → Generate hardware inventory YAML
|
||||||
|
|
||||||
|
OUTPUT FILES:
|
||||||
|
ansible/archive/outputs/bootstrap-validation-{hostname}-{timestamp}.log
|
||||||
|
ansible/archive/outputs/hardware-facts-{hostname}-{timestamp}.yml
|
||||||
|
ansible/archive/inventory/discovered-hosts.yml (auto-discovery)
|
||||||
|
|
||||||
|
NOTES:
|
||||||
|
- Network configuration will disconnect SSH (reconnect to new IP)
|
||||||
|
- Run from console or plan for reconnection
|
||||||
|
- Safe to re-run (idempotent where possible)
|
||||||
|
- Logs saved even if script interrupted
|
||||||
|
|
||||||
|
=============================================================================
|
||||||
|
EOF
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- MAIN WORKFLOW ---
|
||||||
|
|
||||||
|
main() {
|
||||||
|
# Parse CLI arguments
|
||||||
|
parse_arguments "$@"
|
||||||
|
|
||||||
|
if [ "$SHOW_HELP" == "true" ]; then
|
||||||
|
show_help
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
# === PHASE 1: DETECTION ===
|
||||||
|
|
||||||
|
echo "======================================="
|
||||||
|
echo "HOMELAB BOOTSTRAP v${VERSION}"
|
||||||
|
echo "Timestamp: $(date -u +"%Y-%m-%d %H:%M:%S UTC")"
|
||||||
|
echo "======================================="
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
echo "[PHASE 1] System Detection"
|
||||||
|
echo "---"
|
||||||
|
|
||||||
|
# Detect hardware type
|
||||||
|
if [ "$HARDWARE_TYPE" == "auto" ]; then
|
||||||
|
HARDWARE_TYPE=$(detect_hardware_type)
|
||||||
|
echo " Auto-detected hardware type: $HARDWARE_TYPE"
|
||||||
|
else
|
||||||
|
echo " Hardware type (forced): $HARDWARE_TYPE"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Print detection summary
|
||||||
|
print_detection_summary
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Determine target IP if not specified
|
||||||
|
if [ -z "$TARGET_IP" ]; then
|
||||||
|
TARGET_IP=$(get_desired_vlan_ip)
|
||||||
|
echo " Auto-assigned target IP: $TARGET_IP"
|
||||||
|
echo " (Based on hardware type and environment-constraints.md)"
|
||||||
|
else
|
||||||
|
echo " Target IP (user-specified): $TARGET_IP"
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Dry-run check
|
||||||
|
if [ "$DRY_RUN" == "true" ]; then
|
||||||
|
echo "[DRY-RUN MODE] Would perform the following actions:"
|
||||||
|
echo ""
|
||||||
|
echo " 1. Configure network: $TARGET_IP (gateway: $GATEWAY, DNS: $DNS_SERVER)"
|
||||||
|
echo " 2. Install Docker (with Debian Trixie workaround if needed)"
|
||||||
|
echo " 3. Install Ansible"
|
||||||
|
[ "$HARDWARE_TYPE" == "proxmox" ] && echo " 4. Apply Proxmox repository fixes"
|
||||||
|
[ "$HARDWARE_TYPE" == "proxmox" ] && echo " 5. Install proxmoxer Python library"
|
||||||
|
echo " 6. Generate/verify SSH keys (ED25519)"
|
||||||
|
echo " 7. Run validation suite"
|
||||||
|
echo " 8. Generate hardware fingerprint"
|
||||||
|
echo ""
|
||||||
|
echo "No changes made. Re-run without --dry-run to execute."
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
# === PHASE 2: NETWORK CONFIGURATION ===
|
||||||
|
|
||||||
|
if [ "$SKIP_NETWORK" == "false" ]; then
|
||||||
|
echo "[PHASE 2] Network Configuration"
|
||||||
|
echo "---"
|
||||||
|
configure_network_safe "$TARGET_IP" "$GATEWAY" "$DNS_SERVER"
|
||||||
|
|
||||||
|
# Wait for network to stabilize
|
||||||
|
sleep 3
|
||||||
|
wait_for_network 15
|
||||||
|
echo ""
|
||||||
|
else
|
||||||
|
echo "[PHASE 2] Network Configuration (SKIPPED)"
|
||||||
|
echo ""
|
||||||
|
fi
|
||||||
|
|
||||||
|
# === PHASE 3: PACKAGE INSTALLATION ===
|
||||||
|
|
||||||
|
echo "[PHASE 3] Package Installation"
|
||||||
|
echo "---"
|
||||||
|
|
||||||
|
# Update package lists
|
||||||
|
echo " [⚙] Updating package lists..."
|
||||||
|
sudo apt-get update -qq
|
||||||
|
|
||||||
|
# Install prerequisites
|
||||||
|
echo " [⚙] Installing prerequisites (ca-certificates, curl, gnupg)..."
|
||||||
|
sudo apt-get install -y -qq ca-certificates curl gnupg lsb-release
|
||||||
|
|
||||||
|
# --- DOCKER INSTALLATION ---
|
||||||
|
|
||||||
|
if ! command -v docker &>/dev/null; then
|
||||||
|
echo " [⚙] Installing Docker..."
|
||||||
|
|
||||||
|
# Remove existing Docker repo configs
|
||||||
|
sudo rm -f /etc/apt/sources.list.d/docker.list
|
||||||
|
sudo rm -f /etc/apt/sources.list.d/docker*.list
|
||||||
|
|
||||||
|
# Add Docker GPG key
|
||||||
|
sudo mkdir -p /etc/apt/keyrings
|
||||||
|
curl -fsSL https://download.docker.com/linux/debian/gpg | \
|
||||||
|
sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg --yes
|
||||||
|
|
||||||
|
# Determine repository codename
|
||||||
|
local os_version=$(detect_os_version)
|
||||||
|
local repo_codename="$os_version"
|
||||||
|
|
||||||
|
# Debian Trixie workaround (use Bookworm repos)
|
||||||
|
if is_debian_trixie; then
|
||||||
|
echo " [!] Debian Trixie detected - using Bookworm repos for Docker"
|
||||||
|
repo_codename="bookworm"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Add Docker repository
|
||||||
|
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/debian $repo_codename stable" | \
|
||||||
|
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
|
||||||
|
|
||||||
|
# Install Docker
|
||||||
|
sudo apt-get update -qq
|
||||||
|
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
|
||||||
|
|
||||||
|
# Add current user to docker group
|
||||||
|
sudo usermod -aG docker "$USER"
|
||||||
|
|
||||||
|
echo " [✓] Docker installed: $(docker --version)"
|
||||||
|
else
|
||||||
|
echo " [✓] Docker already installed: $(docker --version)"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# --- ANSIBLE INSTALLATION ---
|
||||||
|
|
||||||
|
if ! command -v ansible &>/dev/null; then
|
||||||
|
echo " [⚙] Installing Ansible..."
|
||||||
|
sudo apt-get install -y ansible
|
||||||
|
echo " [✓] Ansible installed: $(ansible --version | head -n1)"
|
||||||
|
else
|
||||||
|
echo " [✓] Ansible already installed: $(ansible --version | head -n1)"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# --- PROXMOX-SPECIFIC PACKAGES ---
|
||||||
|
|
||||||
|
if [ "$HARDWARE_TYPE" == "proxmox" ]; then
|
||||||
|
echo " [⚙] Installing Proxmox-specific packages..."
|
||||||
|
|
||||||
|
# Install Python pip if needed
|
||||||
|
if ! command -v pip3 &>/dev/null; then
|
||||||
|
sudo apt-get install -y python3-pip
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Install proxmoxer
|
||||||
|
if ! python3 -c "import proxmoxer" 2>/dev/null; then
|
||||||
|
echo " [⚙] Installing proxmoxer Python library..."
|
||||||
|
pip3 install proxmoxer --break-system-packages 2>/dev/null || pip3 install proxmoxer
|
||||||
|
echo " [✓] proxmoxer installed"
|
||||||
|
else
|
||||||
|
echo " [✓] proxmoxer already installed"
|
||||||
|
fiInstall jq for Proxmox post-install tasks
|
||||||
|
if ! command -v jq &>/dev/null; then
|
||||||
|
sudo apt-get install -y jq
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
echo "=== PROXMOX POST-INSTALL CONFIGURATION ==="
|
||||||
|
|
||||||
|
# Run comprehensive Proxmox post-install routine
|
||||||
|
# This includes: repository fixes, subscription nag removal, HA management
|
||||||
|
run_proxmox_post_install "auto"
|
||||||
|
|
||||||
|
echo "=========================================="
|
||||||
|
echo "" echo "deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription" | \
|
||||||
|
sudo tee /etc/apt/sources.list.d/pve-no-subscription.list > /dev/null
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
# --- UTILITY PACKAGES ---
|
||||||
|
|
||||||
|
echo " [⚙] Installing utility packages..."
|
||||||
|
sudo apt-get install -y -qq git vim htop curl wget nfs-common net-tools dnsutils
|
||||||
|
|
||||||
|
echo " [✓] Package installation complete"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# === PHASE 4: SSH KEY MANAGEMENT ===
|
||||||
|
|
||||||
|
echo "[PHASE 4] SSH Key Management"
|
||||||
|
echo "---"
|
||||||
|
|
||||||
|
local ssh_key_path=""
|
||||||
|
|
||||||
|
# Check for existing keys (prefer ED25519)
|
||||||
|
if [ -f "$HOME/.ssh/id_ed25519" ]; then
|
||||||
|
ssh_key_path="$HOME/.ssh/id_ed25519"
|
||||||
|
echo " [✓] Found existing ED25519 key: $ssh_key_path"
|
||||||
|
elif [ -f "$HOME/.ssh/id_rsa" ]; then
|
||||||
|
ssh_key_path="$HOME/.ssh/id_rsa"
|
||||||
|
echo " [✓] Found existing RSA key: $ssh_key_path"
|
||||||
|
else
|
||||||
|
# Generate new ED25519 key
|
||||||
|
ssh_key_path="$HOME/.ssh/id_ed25519"
|
||||||
|
echo " [⚙] Generating new ED25519 key pair..."
|
||||||
|
ssh-keygen -t ed25519 -f "$ssh_key_path" -N "" -C "$(whoami)@$(hostname)-$(date +%Y%m%d)"
|
||||||
|
echo " [✓] SSH key generated: $ssh_key_path"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Display public key
|
||||||
|
echo ""
|
||||||
|
echo " Public key (add to authorized_keys on managed nodes):"
|
||||||
|
echo " ---"
|
||||||
|
cat "${ssh_key_path}.pub"
|
||||||
|
echo " ---"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# === PHASE 5: VALIDATION ===
|
||||||
|
|
||||||
|
if [ "$SKIP_VALIDATION" == "false" ]; then
|
||||||
|
echo "[PHASE 5] System Validation"
|
||||||
|
echo "---"
|
||||||
|
|
||||||
|
# Run comprehensive validation
|
||||||
|
if run_validation_suite; then
|
||||||
|
echo ""
|
||||||
|
echo " [✓] All validation checks passed"
|
||||||
|
else
|
||||||
|
echo ""
|
||||||
|
echo " [!] Validation completed with errors - review above"
|
||||||
|
echo " [!] You may proceed, but manual intervention may be required"
|
||||||
|
fi
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Save validation log
|
||||||
|
local log_dir="${SCRIPT_DIR}/../ansible/archive/outputs"
|
||||||
|
mkdir -p "$log_dir"
|
||||||
|
local log_file="${log_dir}/bootstrap-validation-$(hostname)-${TIMESTAMP}.log"
|
||||||
|
|
||||||
|
# Re-run validation to capture log
|
||||||
|
run_validation_suite 2>&1 | tee "$log_file" >/dev/null
|
||||||
|
echo " [✓] Validation log saved: $log_file"
|
||||||
|
echo ""
|
||||||
|
else
|
||||||
|
echo "[PHASE 5] System Validation (SKIPPED)"
|
||||||
|
echo ""
|
||||||
|
fi
|
||||||
|
|
||||||
|
# === PHASE 6: HARDWARE FINGERPRINTING ===
|
||||||
|
|
||||||
|
echo "[PHASE 6] Hardware Fingerprinting"
|
||||||
|
echo "---"
|
||||||
|
|
||||||
|
# Print summary
|
||||||
|
print_hardware_summary
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Validate against standards
|
||||||
|
echo " Checking against environment-constraints.md standards..."
|
||||||
|
validate_against_standards || true
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Save hardware facts
|
||||||
|
local facts_dir="${SCRIPT_DIR}/../ansible/archive/outputs"
|
||||||
|
local facts_file=$(save_hardware_facts "$facts_dir")
|
||||||
|
echo " [✓] Hardware facts saved: $facts_file"
|
||||||
|
|
||||||
|
# Save JSON if requested
|
||||||
|
if [ "$OUTPUT_JSON" == "true" ]; then
|
||||||
|
local json_file="${facts_dir}/hardware-facts-$(hostname)-${TIMESTAMP}.json"
|
||||||
|
generate_json_output > "$json_file"
|
||||||
|
echo " [✓] JSON output saved: $json_file"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Generate inventory snippet
|
||||||
|
local inventory_dir="${SCRIPT_DIR}/../ansible/archive/inventory"
|
||||||
|
mkdir -p "$inventory_dir"
|
||||||
|
local inventory_file=$(append_to_discovered_inventory "${inventory_dir}/discovered-hosts.yml")
|
||||||
|
echo " [✓] Inventory snippet appended: $inventory_file"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# === COMPLETION ===
|
||||||
|
|
||||||
|
echo "======================================="
|
||||||
|
echo "BOOTSTRAP COMPLETE"
|
||||||
|
echo "======================================="
|
||||||
|
echo ""
|
||||||
|
echo "Summary:"
|
||||||
|
echo " Hostname: $(hostname)"
|
||||||
|
echo " IP Address: $(get_current_ip)"
|
||||||
|
echo " Hardware Type: $HARDWARE_TYPE"
|
||||||
|
echo " OS: $(detect_os_family) $(detect_os_version)"
|
||||||
|
echo ""
|
||||||
|
echo "Next Steps:"
|
||||||
|
echo " 1. Reconnect SSH if network was reconfigured: ssh $(whoami)@$(get_current_ip)"
|
||||||
|
echo " 2. Verify Docker: docker ps"
|
||||||
|
echo " 3. Verify Ansible: ansible --version"
|
||||||
|
echo " 4. Review hardware facts: cat $facts_file"
|
||||||
|
echo " 5. Run Ansible playbook: ansible-playbook playbooks/onboarding/generic_host.yml"
|
||||||
|
echo ""
|
||||||
|
echo "Files Generated:"
|
||||||
|
echo " - $facts_file"
|
||||||
|
[ "$OUTPUT_JSON" == "true" ] && echo " - ${facts_dir}/hardware-facts-$(hostname)-${TIMESTAMP}.json"
|
||||||
|
[ "$SKIP_VALIDATION" == "false" ] && echo " - ${log_dir}/bootstrap-validation-$(hostname)-${TIMESTAMP}.log"
|
||||||
|
echo " - $inventory_file"
|
||||||
|
echo ""
|
||||||
|
echo "Documentation:"
|
||||||
|
echo " - SOP: documentation/SOPs/SOP-002-Initial-Infrastructure-Deployment.md"
|
||||||
|
echo " - Technical Runbook: documentation/TECHNICAL_RUNBOOK.md"
|
||||||
|
echo ""
|
||||||
|
echo "Have a great day! 🚀"
|
||||||
|
echo ""
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- ENTRY POINT ---
|
||||||
|
|
||||||
|
# Trap errors
|
||||||
|
trap 'echo "ERROR: Bootstrap failed at line $LINENO. Check logs for details." >&2' ERR
|
||||||
|
|
||||||
|
# Run main workflow
|
||||||
|
main "$@"
|
||||||
@ -1,11 +1,42 @@
|
|||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
|
|
||||||
# ==============================================================================
|
# ==============================================================================
|
||||||
# DEBIAN TRIXIE BOOTSTRAP: IP, DOCKER, ANSIBLE
|
# DEPRECATED: day0bootstrap.sh
|
||||||
|
# ==============================================================================
|
||||||
|
# ⚠️ DEPRECATION NOTICE
|
||||||
|
# This script is deprecated and will be removed in a future release.
|
||||||
|
# Please use the unified bootstrap.sh script instead:
|
||||||
|
#
|
||||||
|
# ./bootstrap.sh --hardware-type pi
|
||||||
|
#
|
||||||
|
# This wrapper will redirect to bootstrap.sh with appropriate flags.
|
||||||
# ==============================================================================
|
# ==============================================================================
|
||||||
|
|
||||||
set -euo pipefail
|
set -euo pipefail
|
||||||
|
|
||||||
|
# Show deprecation warning
|
||||||
|
echo "=======================================" >&2
|
||||||
|
echo "⚠️ DEPRECATION WARNING" >&2
|
||||||
|
echo "=======================================" >&2
|
||||||
|
echo "day0bootstrap.sh is deprecated!" >&2
|
||||||
|
echo "" >&2
|
||||||
|
echo "Please use: ./bootstrap.sh" >&2
|
||||||
|
echo "" >&2
|
||||||
|
echo "Redirecting to bootstrap.sh in 5 seconds..." >&2
|
||||||
|
echo "Press Ctrl+C to cancel" >&2
|
||||||
|
echo "=======================================" >&2
|
||||||
|
sleep 5
|
||||||
|
|
||||||
|
# Redirect to unified bootstrap
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
exec "${SCRIPT_DIR}/bootstrap.sh" --hardware-type pi "$@"
|
||||||
|
|
||||||
|
# ==============================================================================
|
||||||
|
# LEGACY CODE BELOW (no longer executed)
|
||||||
|
# ==============================================================================
|
||||||
|
|
||||||
|
exit 0
|
||||||
|
|
||||||
# --- 1. SET STATIC IP (Netplan) ---
|
# --- 1. SET STATIC IP (Netplan) ---
|
||||||
echo "[⚙] Configuring Static IP to 10.0.0.200..."
|
echo "[⚙] Configuring Static IP to 10.0.0.200..."
|
||||||
|
|
||||||
|
|||||||
362
scripts/lib/detection.sh
Normal file
362
scripts/lib/detection.sh
Normal file
@ -0,0 +1,362 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# ==============================================================================
|
||||||
|
# DETECTION LIBRARY: OS, Hardware, and Environment Detection
|
||||||
|
# ==============================================================================
|
||||||
|
# Part of unified bootstrap system for homelab infrastructure
|
||||||
|
# Provides functions to identify OS family, hardware type, CPU, GPU, and
|
||||||
|
# deployment environment for intelligent provisioning decisions.
|
||||||
|
# ==============================================================================
|
||||||
|
|
||||||
|
# --- OS DETECTION ---
|
||||||
|
|
||||||
|
detect_os_family() {
|
||||||
|
# Detect OS family: debian, ubuntu, raspbian, or unknown
|
||||||
|
# Returns lowercase OS identifier
|
||||||
|
|
||||||
|
if [ -f /etc/os-release ]; then
|
||||||
|
. /etc/os-release
|
||||||
|
case "${ID,,}" in
|
||||||
|
debian)
|
||||||
|
echo "debian"
|
||||||
|
return 0
|
||||||
|
;;
|
||||||
|
ubuntu)
|
||||||
|
echo "ubuntu"
|
||||||
|
return 0
|
||||||
|
;;
|
||||||
|
raspbian)
|
||||||
|
echo "raspbian"
|
||||||
|
return 0
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
echo "unknown"
|
||||||
|
return 1
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
else
|
||||||
|
echo "unknown"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
detect_os_version() {
|
||||||
|
# Get OS version codename (e.g., trixie, bookworm, noble)
|
||||||
|
|
||||||
|
if [ -f /etc/os-release ]; then
|
||||||
|
. /etc/os-release
|
||||||
|
echo "${VERSION_CODENAME:-unknown}"
|
||||||
|
else
|
||||||
|
echo "unknown"
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
is_debian_trixie() {
|
||||||
|
# Special detection for Debian Trixie (requires Docker repo workaround)
|
||||||
|
|
||||||
|
local os_family=$(detect_os_family)
|
||||||
|
local os_version=$(detect_os_version)
|
||||||
|
|
||||||
|
[[ "$os_family" == "debian" && "$os_version" == "trixie" ]]
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- HARDWARE TYPE DETECTION ---
|
||||||
|
|
||||||
|
detect_hardware_type() {
|
||||||
|
# Auto-detect hardware deployment type
|
||||||
|
# Returns: proxmox, docker-vm, pi, physical-docker, ai-workstation, unknown
|
||||||
|
|
||||||
|
# Check for Proxmox VE installation
|
||||||
|
if command -v pveversion &>/dev/null; then
|
||||||
|
echo "proxmox"
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check if running inside a VM (common indicators)
|
||||||
|
local is_vm=0
|
||||||
|
if systemd-detect-virt --vm &>/dev/null; then
|
||||||
|
is_vm=1
|
||||||
|
elif [ -d /sys/class/dmi/id ]; then
|
||||||
|
local product_name=$(cat /sys/class/dmi/id/product_name 2>/dev/null || echo "")
|
||||||
|
if [[ "$product_name" =~ (VirtualBox|VMware|KVM|QEMU) ]]; then
|
||||||
|
is_vm=1
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check for Raspberry Pi
|
||||||
|
if grep -q "Raspberry Pi" /proc/cpuinfo 2>/dev/null; then
|
||||||
|
echo "pi"
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check for Docker installation (distinguishes VM vs physical)
|
||||||
|
local has_docker=0
|
||||||
|
if command -v docker &>/dev/null || [ -f /usr/bin/docker ]; then
|
||||||
|
has_docker=1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check for high-end GPU (indicates AI workstation)
|
||||||
|
local gpu_type=$(detect_gpu)
|
||||||
|
if [[ "$gpu_type" =~ (nvidia|amd) ]]; then
|
||||||
|
local gpu_info=$(lspci 2>/dev/null | grep -i vga | head -n1)
|
||||||
|
# High-end NVIDIA cards (RTX, Tesla, Quadro) or AMD (Radeon Pro, Instinct)
|
||||||
|
if [[ "$gpu_info" =~ (RTX|Tesla|Quadro|A[0-9]{3,4}|Radeon Pro|Instinct) ]]; then
|
||||||
|
echo "ai-workstation"
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Classify based on VM + Docker
|
||||||
|
if [ $is_vm -eq 1 ]; then
|
||||||
|
echo "docker-vm"
|
||||||
|
return 0
|
||||||
|
else
|
||||||
|
echo "physical-docker"
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- CPU DETECTION ---
|
||||||
|
|
||||||
|
detect_cpu_vendor() {
|
||||||
|
# Detect CPU vendor: intel, amd, arm, or unknown
|
||||||
|
|
||||||
|
if [ -f /proc/cpuinfo ]; then
|
||||||
|
if grep -qi "intel" /proc/cpuinfo; then
|
||||||
|
echo "intel"
|
||||||
|
return 0
|
||||||
|
elif grep -qi "amd" /proc/cpuinfo; then
|
||||||
|
echo "amd"
|
||||||
|
return 0
|
||||||
|
elif grep -qi "arm\|aarch64" /proc/cpuinfo; then
|
||||||
|
echo "arm"
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "unknown"
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
|
||||||
|
detect_cpu_generation() {
|
||||||
|
# Detect Intel CPU generation for kernel parameter tuning
|
||||||
|
# Returns generation number (e.g., 12, 13, 14) or "unknown"
|
||||||
|
|
||||||
|
local vendor=$(detect_cpu_vendor)
|
||||||
|
if [ "$vendor" != "intel" ]; then
|
||||||
|
echo "unknown"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
local model_name=$(grep "model name" /proc/cpuinfo | head -n1 | cut -d: -f2 | xargs)
|
||||||
|
|
||||||
|
# Extract generation from model name patterns
|
||||||
|
# Example: "12th Gen Intel(R) Core(TM) i7-12700"
|
||||||
|
if [[ "$model_name" =~ ([0-9]{2})th\ Gen ]]; then
|
||||||
|
echo "${BASH_REMATCH[1]}"
|
||||||
|
return 0
|
||||||
|
elif [[ "$model_name" =~ i[3579]-([0-9]{2})[0-9]{2,3} ]]; then
|
||||||
|
echo "${BASH_REMATCH[1]}"
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "unknown"
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
|
||||||
|
needs_intel_hybrid_core_tuning() {
|
||||||
|
# 12th Gen+ Intel CPUs with hybrid architecture need special kernel params
|
||||||
|
# Returns 0 (true) if tuning required, 1 (false) otherwise
|
||||||
|
|
||||||
|
local gen=$(detect_cpu_generation)
|
||||||
|
|
||||||
|
# 12th Gen (Alder Lake) and newer have P-cores + E-cores
|
||||||
|
if [[ "$gen" =~ ^[0-9]+$ ]] && [ "$gen" -ge 12 ]; then
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- GPU DETECTION ---
|
||||||
|
|
||||||
|
detect_gpu() {
|
||||||
|
# Detect GPU vendor: nvidia, amd, intel, or none
|
||||||
|
|
||||||
|
if ! command -v lspci &>/dev/null; then
|
||||||
|
echo "none"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
local gpu_info=$(lspci 2>/dev/null | grep -i vga)
|
||||||
|
|
||||||
|
if echo "$gpu_info" | grep -qi nvidia; then
|
||||||
|
echo "nvidia"
|
||||||
|
return 0
|
||||||
|
elif echo "$gpu_info" | grep -qi amd; then
|
||||||
|
echo "amd"
|
||||||
|
return 0
|
||||||
|
elif echo "$gpu_info" | grep -qi intel; then
|
||||||
|
echo "intel"
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "none"
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
|
||||||
|
get_gpu_model() {
|
||||||
|
# Get detailed GPU model information
|
||||||
|
|
||||||
|
if ! command -v lspci &>/dev/null; then
|
||||||
|
echo "unknown"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
local gpu_line=$(lspci 2>/dev/null | grep -i "vga\|3d\|display" | head -n1)
|
||||||
|
|
||||||
|
if [ -n "$gpu_line" ]; then
|
||||||
|
# Extract model after the vendor info
|
||||||
|
echo "$gpu_line" | cut -d: -f3 | xargs
|
||||||
|
else
|
||||||
|
echo "none"
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- NETWORK INTERFACE DETECTION ---
|
||||||
|
|
||||||
|
detect_primary_interface() {
|
||||||
|
# Find the primary physical network interface (excludes lo, docker, veth)
|
||||||
|
|
||||||
|
# Try to find interface with default route
|
||||||
|
local iface=$(ip route show default 2>/dev/null | awk '/^default/ {print $5; exit}')
|
||||||
|
|
||||||
|
if [ -n "$iface" ]; then
|
||||||
|
echo "$iface"
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Fallback: first non-loopback physical interface
|
||||||
|
iface=$(ip -o link show | awk -F': ' '$2 != "lo" && $2 !~ /^(docker|veth|br-)/ {print $2; exit}')
|
||||||
|
|
||||||
|
if [ -n "$iface" ]; then
|
||||||
|
echo "$iface"
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "unknown"
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
|
||||||
|
get_current_ip() {
|
||||||
|
# Get current IPv4 address of primary interface
|
||||||
|
|
||||||
|
local iface=$(detect_primary_interface)
|
||||||
|
|
||||||
|
if [ "$iface" == "unknown" ]; then
|
||||||
|
echo "unknown"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
local ip=$(ip -4 addr show "$iface" 2>/dev/null | grep -oP '(?<=inet\s)\d+(\.\d+){3}' | head -n1)
|
||||||
|
|
||||||
|
if [ -n "$ip" ]; then
|
||||||
|
echo "$ip"
|
||||||
|
else
|
||||||
|
echo "unknown"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- ENVIRONMENT DETECTION ---
|
||||||
|
|
||||||
|
is_running_in_container() {
|
||||||
|
# Check if running inside a container (Docker, LXC, etc.)
|
||||||
|
|
||||||
|
if [ -f /.dockerenv ]; then
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
if grep -q "lxc\|docker" /proc/1/cgroup 2>/dev/null; then
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
|
||||||
|
detect_init_system() {
|
||||||
|
# Detect init system: systemd, sysvinit, or unknown
|
||||||
|
|
||||||
|
if [ -d /run/systemd/system ]; then
|
||||||
|
echo "systemd"
|
||||||
|
return 0
|
||||||
|
elif [ -f /sbin/init ]; then
|
||||||
|
local init_path=$(readlink -f /sbin/init)
|
||||||
|
if [[ "$init_path" =~ systemd ]]; then
|
||||||
|
echo "systemd"
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "unknown"
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- PACKAGE MANAGER DETECTION ---
|
||||||
|
|
||||||
|
detect_package_manager() {
|
||||||
|
# Detect primary package manager: apt, dnf, yum, or unknown
|
||||||
|
|
||||||
|
if command -v apt-get &>/dev/null; then
|
||||||
|
echo "apt"
|
||||||
|
return 0
|
||||||
|
elif command -v dnf &>/dev/null; then
|
||||||
|
echo "dnf"
|
||||||
|
return 0
|
||||||
|
elif command -v yum &>/dev/null; then
|
||||||
|
echo "yum"
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "unknown"
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- SUMMARY FUNCTION ---
|
||||||
|
|
||||||
|
print_detection_summary() {
|
||||||
|
# Print comprehensive detection summary to stderr for logging
|
||||||
|
|
||||||
|
cat >&2 <<EOF
|
||||||
|
=== System Detection Summary ===
|
||||||
|
OS Family: $(detect_os_family)
|
||||||
|
OS Version: $(detect_os_version)
|
||||||
|
Hardware Type: $(detect_hardware_type)
|
||||||
|
CPU Vendor: $(detect_cpu_vendor)
|
||||||
|
CPU Generation: $(detect_cpu_generation)
|
||||||
|
GPU: $(detect_gpu) ($(get_gpu_model))
|
||||||
|
Primary NIC: $(detect_primary_interface)
|
||||||
|
Current IP: $(get_current_ip)
|
||||||
|
Init System: $(detect_init_system)
|
||||||
|
Package Manager: $(detect_package_manager)
|
||||||
|
In Container: $(is_running_in_container && echo "yes" || echo "no")
|
||||||
|
================================
|
||||||
|
EOF
|
||||||
|
}
|
||||||
|
|
||||||
|
# Export functions for use in other scripts
|
||||||
|
export -f detect_os_family
|
||||||
|
export -f detect_os_version
|
||||||
|
export -f is_debian_trixie
|
||||||
|
export -f detect_hardware_type
|
||||||
|
export -f detect_cpu_vendor
|
||||||
|
export -f detect_cpu_generation
|
||||||
|
export -f needs_intel_hybrid_core_tuning
|
||||||
|
export -f detect_gpu
|
||||||
|
export -f get_gpu_model
|
||||||
|
export -f detect_primary_interface
|
||||||
|
export -f get_current_ip
|
||||||
|
export -f is_running_in_container
|
||||||
|
export -f detect_init_system
|
||||||
|
export -f detect_package_manager
|
||||||
|
export -f print_detection_summary
|
||||||
494
scripts/lib/fingerprint.sh
Normal file
494
scripts/lib/fingerprint.sh
Normal file
@ -0,0 +1,494 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# ==============================================================================
|
||||||
|
# FINGERPRINT LIBRARY: Hardware Inventory Collection
|
||||||
|
# ==============================================================================
|
||||||
|
# Part of unified bootstrap system for homelab infrastructure
|
||||||
|
# Collects comprehensive hardware facts and generates structured YAML output
|
||||||
|
# compatible with gather_hardware_facts.yml playbook format.
|
||||||
|
# ==============================================================================
|
||||||
|
|
||||||
|
# Source detection library if not already loaded
|
||||||
|
if ! type -t detect_os_family &>/dev/null; then
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
# shellcheck source=./detection.sh
|
||||||
|
source "${SCRIPT_DIR}/detection.sh"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# --- HARDWARE FACT COLLECTION ---
|
||||||
|
|
||||||
|
collect_cpu_facts() {
|
||||||
|
# Collect CPU information in structured format
|
||||||
|
# Returns JSON-like structured data
|
||||||
|
|
||||||
|
local cpu_model=$(grep "model name" /proc/cpuinfo | head -n1 | cut -d: -f2 | xargs)
|
||||||
|
local cpu_cores=$(nproc 2>/dev/null || echo "unknown")
|
||||||
|
local cpu_vendor=$(detect_cpu_vendor)
|
||||||
|
local cpu_gen=$(detect_cpu_generation)
|
||||||
|
local cpu_arch=$(uname -m)
|
||||||
|
|
||||||
|
# Get CPU frequency (current)
|
||||||
|
local cpu_freq_mhz=$(grep "cpu MHz" /proc/cpuinfo | head -n1 | awk '{print $4}' | cut -d. -f1)
|
||||||
|
|
||||||
|
# Get CPU max frequency if available
|
||||||
|
local cpu_max_freq="unknown"
|
||||||
|
if [ -f /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq ]; then
|
||||||
|
local freq_khz=$(cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq)
|
||||||
|
cpu_max_freq=$((freq_khz / 1000))
|
||||||
|
fi
|
||||||
|
|
||||||
|
cat <<EOF
|
||||||
|
cpu:
|
||||||
|
model: "$cpu_model"
|
||||||
|
vendor: "$cpu_vendor"
|
||||||
|
architecture: "$cpu_arch"
|
||||||
|
generation: "$cpu_gen"
|
||||||
|
cores: $cpu_cores
|
||||||
|
current_freq_mhz: ${cpu_freq_mhz:-unknown}
|
||||||
|
max_freq_mhz: $cpu_max_freq
|
||||||
|
EOF
|
||||||
|
}
|
||||||
|
|
||||||
|
collect_memory_facts() {
|
||||||
|
# Collect memory information
|
||||||
|
|
||||||
|
local mem_total_kb=$(grep MemTotal /proc/meminfo | awk '{print $2}')
|
||||||
|
local mem_total_mb=$((mem_total_kb / 1024))
|
||||||
|
local mem_total_gb=$((mem_total_kb / 1024 / 1024))
|
||||||
|
|
||||||
|
local mem_free_kb=$(grep MemFree /proc/meminfo | awk '{print $2}')
|
||||||
|
local mem_free_mb=$((mem_free_kb / 1024))
|
||||||
|
|
||||||
|
local swap_total_kb=$(grep SwapTotal /proc/meminfo | awk '{print $2}')
|
||||||
|
local swap_total_mb=$((swap_total_kb / 1024))
|
||||||
|
|
||||||
|
cat <<EOF
|
||||||
|
memory:
|
||||||
|
total_mb: $mem_total_mb
|
||||||
|
total_gb: $mem_total_gb
|
||||||
|
free_mb: $mem_free_mb
|
||||||
|
swap_mb: $swap_total_mb
|
||||||
|
EOF
|
||||||
|
}
|
||||||
|
|
||||||
|
collect_disk_facts() {
|
||||||
|
# Collect disk/storage information
|
||||||
|
|
||||||
|
cat <<EOF
|
||||||
|
storage:
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# Root partition
|
||||||
|
local root_total=$(df -BG / | awk 'NR==2 {print $2}' | sed 's/G//')
|
||||||
|
local root_used=$(df -BG / | awk 'NR==2 {print $3}' | sed 's/G//')
|
||||||
|
local root_avail=$(df -BG / | awk 'NR==2 {print $4}' | sed 's/G//')
|
||||||
|
local root_device=$(df / | awk 'NR==2 {print $1}')
|
||||||
|
|
||||||
|
cat <<EOF
|
||||||
|
root:
|
||||||
|
device: "$root_device"
|
||||||
|
total_gb: $root_total
|
||||||
|
used_gb: $root_used
|
||||||
|
available_gb: $root_avail
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# Detect disk type (SSD vs HDD)
|
||||||
|
local device_name=$(echo "$root_device" | sed 's|/dev/||' | sed 's/[0-9]*$//')
|
||||||
|
local disk_type="unknown"
|
||||||
|
|
||||||
|
if [ -f "/sys/block/$device_name/queue/rotational" ]; then
|
||||||
|
local rotational=$(cat "/sys/block/$device_name/queue/rotational")
|
||||||
|
if [ "$rotational" -eq 0 ]; then
|
||||||
|
disk_type="ssd"
|
||||||
|
else
|
||||||
|
disk_type="hdd"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
cat <<EOF
|
||||||
|
type: "$disk_type"
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# List all block devices
|
||||||
|
if command -v lsblk &>/dev/null; then
|
||||||
|
local block_devices=$(lsblk -d -n -o NAME,SIZE,TYPE | grep disk | awk '{printf " - { name: \"%s\", size: \"%s\" }\n", $1, $2}')
|
||||||
|
if [ -n "$block_devices" ]; then
|
||||||
|
cat <<EOF
|
||||||
|
block_devices:
|
||||||
|
$block_devices
|
||||||
|
EOF
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
collect_network_facts() {
|
||||||
|
# Collect network interface information
|
||||||
|
|
||||||
|
local primary_iface=$(detect_primary_interface)
|
||||||
|
local current_ip=$(get_current_ip)
|
||||||
|
local gateway=$(ip route show default 2>/dev/null | awk '/^default/ {print $3; exit}')
|
||||||
|
|
||||||
|
cat <<EOF
|
||||||
|
network:
|
||||||
|
primary_interface: "$primary_iface"
|
||||||
|
current_ip: "$current_ip"
|
||||||
|
gateway: "${gateway:-unknown}"
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# MAC address
|
||||||
|
if [ "$primary_iface" != "unknown" ]; then
|
||||||
|
local mac=$(ip link show "$primary_iface" 2>/dev/null | grep "link/ether" | awk '{print $2}')
|
||||||
|
cat <<EOF
|
||||||
|
mac_address: "${mac:-unknown}"
|
||||||
|
EOF
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Link speed if available
|
||||||
|
if [ "$primary_iface" != "unknown" ] && [ -f "/sys/class/net/$primary_iface/speed" ]; then
|
||||||
|
local speed=$(cat "/sys/class/net/$primary_iface/speed" 2>/dev/null || echo "unknown")
|
||||||
|
cat <<EOF
|
||||||
|
link_speed_mbps: $speed
|
||||||
|
EOF
|
||||||
|
fi
|
||||||
|
|
||||||
|
# All interfaces (brief)
|
||||||
|
local all_interfaces=$(ip -o link show | awk -F': ' '$2 != "lo" {printf " - \"%s\"\n", $2}')
|
||||||
|
if [ -n "$all_interfaces" ]; then
|
||||||
|
cat <<EOF
|
||||||
|
all_interfaces:
|
||||||
|
$all_interfaces
|
||||||
|
EOF
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
collect_gpu_facts() {
|
||||||
|
# Collect GPU information
|
||||||
|
|
||||||
|
local gpu_vendor=$(detect_gpu)
|
||||||
|
local gpu_model=$(get_gpu_model)
|
||||||
|
|
||||||
|
cat <<EOF
|
||||||
|
gpu:
|
||||||
|
vendor: "$gpu_vendor"
|
||||||
|
model: "$gpu_model"
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# NVIDIA-specific details
|
||||||
|
if [ "$gpu_vendor" == "nvidia" ] && command -v nvidia-smi &>/dev/null; then
|
||||||
|
local nvidia_driver=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader 2>/dev/null | head -n1)
|
||||||
|
local nvidia_cuda=$(nvidia-smi --query-gpu=compute_cap --format=csv,noheader 2>/dev/null | head -n1)
|
||||||
|
|
||||||
|
cat <<EOF
|
||||||
|
nvidia_driver: "${nvidia_driver:-unknown}"
|
||||||
|
cuda_capability: "${nvidia_cuda:-unknown}"
|
||||||
|
EOF
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
collect_os_facts() {
|
||||||
|
# Collect OS and kernel information
|
||||||
|
|
||||||
|
local os_family=$(detect_os_family)
|
||||||
|
local os_version=$(detect_os_version)
|
||||||
|
local kernel=$(uname -r)
|
||||||
|
local hostname=$(hostname)
|
||||||
|
local fqdn=$(hostname -f 2>/dev/null || echo "$hostname")
|
||||||
|
|
||||||
|
if [ -f /etc/os-release ]; then
|
||||||
|
. /etc/os-release
|
||||||
|
local os_name="$NAME"
|
||||||
|
local os_version_id="$VERSION_ID"
|
||||||
|
fi
|
||||||
|
|
||||||
|
cat <<EOF
|
||||||
|
os:
|
||||||
|
family: "$os_family"
|
||||||
|
name: "${os_name:-unknown}"
|
||||||
|
version: "$os_version"
|
||||||
|
version_id: "${os_version_id:-unknown}"
|
||||||
|
kernel: "$kernel"
|
||||||
|
hostname: "$hostname"
|
||||||
|
fqdn: "$fqdn"
|
||||||
|
EOF
|
||||||
|
}
|
||||||
|
|
||||||
|
collect_hardware_type_facts() {
|
||||||
|
# Collect deployment type and special hardware flags
|
||||||
|
|
||||||
|
local hw_type=$(detect_hardware_type)
|
||||||
|
local in_container=$(is_running_in_container && echo "true" || echo "false")
|
||||||
|
local is_vm=$(systemd-detect-virt --vm &>/dev/null && echo "true" || echo "false")
|
||||||
|
local virt_type=$(systemd-detect-virt 2>/dev/null || echo "none")
|
||||||
|
|
||||||
|
cat <<EOF
|
||||||
|
deployment:
|
||||||
|
hardware_type: "$hw_type"
|
||||||
|
is_virtual: $is_vm
|
||||||
|
virtualization: "$virt_type"
|
||||||
|
in_container: $in_container
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# Proxmox-specific
|
||||||
|
if [ "$hw_type" == "proxmox" ] && command -v pveversion &>/dev/null; then
|
||||||
|
local pve_version=$(pveversion | head -n1 | awk '{print $2}')
|
||||||
|
cat <<EOF
|
||||||
|
proxmox_version: "$pve_version"
|
||||||
|
EOF
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
collect_timestamp_facts() {
|
||||||
|
# Collect collection timestamp
|
||||||
|
|
||||||
|
local timestamp=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
|
||||||
|
local date_collected=$(date +"%Y-%m-%d")
|
||||||
|
local epoch=$(date +%s)
|
||||||
|
|
||||||
|
cat <<EOF
|
||||||
|
metadata:
|
||||||
|
collected_at: "$timestamp"
|
||||||
|
date: "$date_collected"
|
||||||
|
epoch: $epoch
|
||||||
|
collector: "bootstrap.sh/fingerprint"
|
||||||
|
EOF
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- STANDARDS VALIDATION ---
|
||||||
|
|
||||||
|
validate_against_standards() {
|
||||||
|
# Check hardware against environment-constraints.md requirements
|
||||||
|
# Returns validation status
|
||||||
|
|
||||||
|
local hw_type=$(detect_hardware_type)
|
||||||
|
local mem_gb=$(grep MemTotal /proc/meminfo | awk '{print int($2/1024/1024)}')
|
||||||
|
local cpu_cores=$(nproc)
|
||||||
|
local disk_gb=$(df -BG / | awk 'NR==2 {print $2}' | sed 's/G//')
|
||||||
|
|
||||||
|
local violations=0
|
||||||
|
|
||||||
|
echo " standards_validation:" >&2
|
||||||
|
|
||||||
|
# Memory requirements
|
||||||
|
case "$hw_type" in
|
||||||
|
proxmox)
|
||||||
|
if [ "$mem_gb" -lt 8 ]; then
|
||||||
|
echo " - \"WARNING: Proxmox host has ${mem_gb}GB RAM (minimum 8GB recommended)\"" >&2
|
||||||
|
((violations++))
|
||||||
|
fi
|
||||||
|
;;
|
||||||
|
docker-vm|physical-docker)
|
||||||
|
if [ "$mem_gb" -lt 4 ]; then
|
||||||
|
echo " - \"WARNING: Docker host has ${mem_gb}GB RAM (minimum 4GB recommended)\"" >&2
|
||||||
|
((violations++))
|
||||||
|
fi
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
|
||||||
|
# Disk space
|
||||||
|
if [ "$disk_gb" -lt 20 ]; then
|
||||||
|
echo " - \"WARNING: Low disk space (${disk_gb}GB) - minimum 20GB recommended\"" >&2
|
||||||
|
((violations++))
|
||||||
|
fi
|
||||||
|
|
||||||
|
# CPU cores
|
||||||
|
if [ "$cpu_cores" -lt 2 ]; then
|
||||||
|
echo " - \"WARNING: Single CPU core detected - minimum 2 recommended\"" >&2
|
||||||
|
((violations++))
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ $violations -eq 0 ]; then
|
||||||
|
echo " - \"OK: Hardware meets minimum standards\"" >&2
|
||||||
|
fi
|
||||||
|
|
||||||
|
return $violations
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- STRUCTURED OUTPUT ---
|
||||||
|
|
||||||
|
generate_hardware_facts_yaml() {
|
||||||
|
# Generate complete hardware facts in YAML format
|
||||||
|
# Compatible with gather_hardware_facts.yml output
|
||||||
|
|
||||||
|
local hostname=$(hostname)
|
||||||
|
|
||||||
|
cat <<EOF
|
||||||
|
---
|
||||||
|
# Hardware Facts Collection
|
||||||
|
# Generated by: bootstrap.sh fingerprint library
|
||||||
|
# Host: $hostname
|
||||||
|
# $(date -u +"%Y-%m-%d %H:%M:%S UTC")
|
||||||
|
|
||||||
|
$hostname:
|
||||||
|
$(collect_os_facts)
|
||||||
|
$(collect_cpu_facts)
|
||||||
|
$(collect_memory_facts)
|
||||||
|
$(collect_disk_facts)
|
||||||
|
$(collect_network_facts)
|
||||||
|
$(collect_gpu_facts)
|
||||||
|
$(collect_hardware_type_facts)
|
||||||
|
$(collect_timestamp_facts)
|
||||||
|
EOF
|
||||||
|
}
|
||||||
|
|
||||||
|
save_hardware_facts() {
|
||||||
|
# Save hardware facts to output file
|
||||||
|
# Args: $1 = output directory (default: ../ansible/archive/outputs)
|
||||||
|
|
||||||
|
local output_dir="${1:-../ansible/archive/outputs}"
|
||||||
|
local hostname=$(hostname)
|
||||||
|
local timestamp=$(date +%Y%m%dT%H%M%S)
|
||||||
|
local output_file="${output_dir}/hardware-facts-${hostname}-${timestamp}.yml"
|
||||||
|
|
||||||
|
# Ensure output directory exists
|
||||||
|
mkdir -p "$output_dir"
|
||||||
|
|
||||||
|
# Generate and save
|
||||||
|
generate_hardware_facts_yaml > "$output_file"
|
||||||
|
|
||||||
|
echo "$output_file"
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- ANSIBLE INVENTORY GENERATION ---
|
||||||
|
|
||||||
|
generate_ansible_inventory_snippet() {
|
||||||
|
# Generate Ansible inventory entry for this host
|
||||||
|
# Can be appended to discovered-hosts.yml
|
||||||
|
|
||||||
|
local hostname=$(hostname)
|
||||||
|
local current_ip=$(get_current_ip)
|
||||||
|
local hw_type=$(detect_hardware_type)
|
||||||
|
local os_family=$(detect_os_family)
|
||||||
|
|
||||||
|
# Determine inventory group
|
||||||
|
local inventory_group="docker_hosts"
|
||||||
|
case "$hw_type" in
|
||||||
|
proxmox)
|
||||||
|
inventory_group="proxmox_cluster"
|
||||||
|
;;
|
||||||
|
docker-vm)
|
||||||
|
inventory_group="swarm_managers" # Or swarm_workers, requires manual classification
|
||||||
|
;;
|
||||||
|
pi)
|
||||||
|
inventory_group="control_nodes"
|
||||||
|
;;
|
||||||
|
ai-workstation)
|
||||||
|
inventory_group="ai_nodes"
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
|
||||||
|
cat <<EOF
|
||||||
|
# Auto-discovered: $(date -u +"%Y-%m-%d %H:%M:%S UTC")
|
||||||
|
[$inventory_group]
|
||||||
|
$hostname ansible_host=$current_ip ansible_user=chester
|
||||||
|
|
||||||
|
# Hardware details:
|
||||||
|
# - Type: $hw_type
|
||||||
|
# - OS: $os_family
|
||||||
|
# - IP: $current_ip
|
||||||
|
# Review and merge into inventory/hosts.ini after validation
|
||||||
|
EOF
|
||||||
|
}
|
||||||
|
|
||||||
|
append_to_discovered_inventory() {
|
||||||
|
# Append this host to discovered-hosts.yml
|
||||||
|
# Args: $1 = inventory file (default: ../ansible/archive/inventory/discovered-hosts.yml)
|
||||||
|
|
||||||
|
local inventory_file="${1:-../ansible/archive/inventory/discovered-hosts.yml}"
|
||||||
|
local hostname=$(hostname)
|
||||||
|
|
||||||
|
# Create file if doesn't exist
|
||||||
|
if [ ! -f "$inventory_file" ]; then
|
||||||
|
cat > "$inventory_file" <<EOF
|
||||||
|
# Auto-Discovery Inventory
|
||||||
|
# Generated by bootstrap.sh - DO NOT MANUALLY EDIT
|
||||||
|
# Merge entries into hosts.ini after validation
|
||||||
|
|
||||||
|
EOF
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check if host already exists
|
||||||
|
if grep -q "^${hostname} " "$inventory_file" 2>/dev/null; then
|
||||||
|
echo "Host $hostname already in discovered inventory, skipping" >&2
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Append
|
||||||
|
generate_ansible_inventory_snippet >> "$inventory_file"
|
||||||
|
echo "" >> "$inventory_file"
|
||||||
|
|
||||||
|
echo "$inventory_file"
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- OUTPUT FORMAT OPTIONS ---
|
||||||
|
|
||||||
|
generate_json_output() {
|
||||||
|
# Generate JSON format instead of YAML (for monitoring integration)
|
||||||
|
# Converts YAML facts to JSON
|
||||||
|
|
||||||
|
local hostname=$(hostname)
|
||||||
|
local current_ip=$(get_current_ip)
|
||||||
|
local hw_type=$(detect_hardware_type)
|
||||||
|
local os_family=$(detect_os_family)
|
||||||
|
local cpu_cores=$(nproc)
|
||||||
|
local mem_gb=$(grep MemTotal /proc/meminfo | awk '{print int($2/1024/1024)}')
|
||||||
|
local disk_gb=$(df -BG / | awk 'NR==2 {print $2}' | sed 's/G//')
|
||||||
|
|
||||||
|
cat <<EOF
|
||||||
|
{
|
||||||
|
"hostname": "$hostname",
|
||||||
|
"ip": "$current_ip",
|
||||||
|
"hardware_type": "$hw_type",
|
||||||
|
"os_family": "$os_family",
|
||||||
|
"cpu_cores": $cpu_cores,
|
||||||
|
"memory_gb": $mem_gb,
|
||||||
|
"disk_gb": $disk_gb,
|
||||||
|
"collected_at": "$(date -u +"%Y-%m-%dT%H:%M:%SZ")"
|
||||||
|
}
|
||||||
|
EOF
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- SUMMARY DISPLAY ---
|
||||||
|
|
||||||
|
print_hardware_summary() {
|
||||||
|
# Print human-readable hardware summary
|
||||||
|
|
||||||
|
local hostname=$(hostname)
|
||||||
|
local hw_type=$(detect_hardware_type)
|
||||||
|
local os=$(detect_os_family) $(detect_os_version)
|
||||||
|
local cpu_model=$(grep "model name" /proc/cpuinfo | head -n1 | cut -d: -f2 | xargs)
|
||||||
|
local cpu_cores=$(nproc)
|
||||||
|
local mem_gb=$(grep MemTotal /proc/meminfo | awk '{print int($2/1024/1024)}')
|
||||||
|
local disk_gb=$(df -BG / | awk 'NR==2 {print $2}' | sed 's/G//')
|
||||||
|
local gpu=$(detect_gpu)
|
||||||
|
local ip=$(get_current_ip)
|
||||||
|
|
||||||
|
cat >&2 <<EOF
|
||||||
|
=== Hardware Fingerprint Summary ===
|
||||||
|
Hostname: $hostname
|
||||||
|
IP Address: $ip
|
||||||
|
Type: $hw_type
|
||||||
|
OS: $os
|
||||||
|
CPU: $cpu_model ($cpu_cores cores)
|
||||||
|
Memory: ${mem_gb}GB
|
||||||
|
Storage: ${disk_gb}GB
|
||||||
|
GPU: $gpu
|
||||||
|
====================================
|
||||||
|
EOF
|
||||||
|
}
|
||||||
|
|
||||||
|
# Export functions
|
||||||
|
export -f collect_cpu_facts
|
||||||
|
export -f collect_memory_facts
|
||||||
|
export -f collect_disk_facts
|
||||||
|
export -f collect_network_facts
|
||||||
|
export -f collect_gpu_facts
|
||||||
|
export -f collect_os_facts
|
||||||
|
export -f collect_hardware_type_facts
|
||||||
|
export -f collect_timestamp_facts
|
||||||
|
export -f validate_against_standards
|
||||||
|
export -f generate_hardware_facts_yaml
|
||||||
|
export -f save_hardware_facts
|
||||||
|
export -f generate_ansible_inventory_snippet
|
||||||
|
export -f append_to_discovered_inventory
|
||||||
|
export -f generate_json_output
|
||||||
|
export -f print_hardware_summary
|
||||||
356
scripts/lib/network.sh
Normal file
356
scripts/lib/network.sh
Normal file
@ -0,0 +1,356 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# ==============================================================================
|
||||||
|
# NETWORK LIBRARY: Network Configuration and Validation
|
||||||
|
# ==============================================================================
|
||||||
|
# Part of unified bootstrap system for homelab infrastructure
|
||||||
|
# Handles static IP configuration via netplan, network validation, and
|
||||||
|
# VLAN capability detection for future network segmentation.
|
||||||
|
# ==============================================================================
|
||||||
|
|
||||||
|
# Source detection library if not already loaded
|
||||||
|
if ! type -t detect_primary_interface &>/dev/null; then
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
# shellcheck source=./detection.sh
|
||||||
|
source "${SCRIPT_DIR}/detection.sh"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# --- NETWORK CONFIGURATION ---
|
||||||
|
|
||||||
|
apply_static_ip() {
|
||||||
|
# Configure static IP via netplan (Ubuntu/Debian)
|
||||||
|
# Args: $1 = Target IP, $2 = Gateway (default: 10.0.0.1), $3 = DNS (default: 10.0.0.2)
|
||||||
|
|
||||||
|
local target_ip="$1"
|
||||||
|
local gateway="${2:-10.0.0.1}"
|
||||||
|
local dns="${3:-10.0.0.2}"
|
||||||
|
|
||||||
|
if [ -z "$target_ip" ]; then
|
||||||
|
echo "ERROR: Target IP address required" >&2
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Validate IP format
|
||||||
|
if ! [[ "$target_ip" =~ ^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$ ]]; then
|
||||||
|
echo "ERROR: Invalid IP address format: $target_ip" >&2
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
local interface=$(detect_primary_interface)
|
||||||
|
if [ "$interface" == "unknown" ]; then
|
||||||
|
echo "ERROR: Could not detect primary network interface" >&2
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "[⚙] Configuring static IP: $target_ip on $interface..." >&2
|
||||||
|
|
||||||
|
# Fix permissions on existing netplan files (common issue)
|
||||||
|
sudo chmod 600 /lib/netplan/*.yaml 2>/dev/null || true
|
||||||
|
sudo chmod 600 /etc/netplan/*.yaml 2>/dev/null || true
|
||||||
|
|
||||||
|
# Create netplan directory if missing
|
||||||
|
sudo mkdir -p /etc/netplan
|
||||||
|
|
||||||
|
# Generate netplan configuration
|
||||||
|
sudo tee /etc/netplan/01-netcfg.yaml >/dev/null <<EOF
|
||||||
|
network:
|
||||||
|
version: 2
|
||||||
|
renderer: networkd
|
||||||
|
ethernets:
|
||||||
|
$interface:
|
||||||
|
addresses:
|
||||||
|
- ${target_ip}/24
|
||||||
|
nameservers:
|
||||||
|
addresses: [${dns}, 8.8.8.8]
|
||||||
|
routes:
|
||||||
|
- to: default
|
||||||
|
via: ${gateway}
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# Fix permissions (netplan requires 600)
|
||||||
|
sudo chmod 600 /etc/netplan/01-netcfg.yaml
|
||||||
|
|
||||||
|
echo "[✓] Netplan configuration created" >&2
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
apply_network_changes() {
|
||||||
|
# Apply netplan configuration (WARNING: may cause SSH disconnect)
|
||||||
|
# Uses background apply to prevent SSH session hang
|
||||||
|
|
||||||
|
echo "[⚙] Applying network configuration (SSH may disconnect)..." >&2
|
||||||
|
|
||||||
|
# Test configuration first
|
||||||
|
if ! sudo netplan generate 2>/dev/null; then
|
||||||
|
echo "ERROR: netplan configuration validation failed" >&2
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Apply in background to avoid blocking SSH
|
||||||
|
sudo netplan apply &
|
||||||
|
local apply_pid=$!
|
||||||
|
|
||||||
|
echo "[✓] Network apply started (PID: $apply_pid)" >&2
|
||||||
|
echo "[!] SSH connection will drop. Reconnect to new IP address." >&2
|
||||||
|
|
||||||
|
# Give it a moment to start
|
||||||
|
sleep 2
|
||||||
|
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
configure_network_safe() {
|
||||||
|
# Safe wrapper: configure IP + apply with reconnection instructions
|
||||||
|
# Args: $1 = Target IP, $2 = Gateway (optional), $3 = DNS (optional)
|
||||||
|
|
||||||
|
local target_ip="$1"
|
||||||
|
local current_ip=$(get_current_ip)
|
||||||
|
|
||||||
|
# Check if already configured
|
||||||
|
if [ "$current_ip" == "$target_ip" ]; then
|
||||||
|
echo "[✓] IP already configured as $target_ip, skipping" >&2
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Configure
|
||||||
|
if ! apply_static_ip "$@"; then
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Apply
|
||||||
|
apply_network_changes
|
||||||
|
|
||||||
|
echo "" >&2
|
||||||
|
echo "=========================================" >&2
|
||||||
|
echo "Network configuration applied" >&2
|
||||||
|
echo "Old IP: $current_ip" >&2
|
||||||
|
echo "New IP: $target_ip" >&2
|
||||||
|
echo "Reconnect with: ssh user@$target_ip" >&2
|
||||||
|
echo "=========================================" >&2
|
||||||
|
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- VLAN CONFIGURATION (PLACEHOLDER) ---
|
||||||
|
|
||||||
|
get_desired_vlan_ip() {
|
||||||
|
# Determine desired VLAN IP based on hardware type
|
||||||
|
# Returns IP address from environment-constraints.md topology
|
||||||
|
# TODO: Enable when VLAN segmentation is live
|
||||||
|
|
||||||
|
local hardware_type=$(detect_hardware_type)
|
||||||
|
local hostname=$(hostname)
|
||||||
|
|
||||||
|
# TODO: Implement VLAN placement logic based on:
|
||||||
|
# - Proxmox hosts → 10.0.10.x (infra VLAN)
|
||||||
|
# - Swarm VMs → 10.0.200.x (compute VLAN)
|
||||||
|
# - Control nodes → 10.0.0.x (main VLAN)
|
||||||
|
|
||||||
|
# For now, return flat network assignment
|
||||||
|
case "$hardware_type" in
|
||||||
|
proxmox)
|
||||||
|
# Currently: 10.0.0.200-209
|
||||||
|
# Desired: 10.0.10.11-13 (future VLAN)
|
||||||
|
echo "10.0.0.201" # Placeholder
|
||||||
|
;;
|
||||||
|
docker-vm)
|
||||||
|
# Currently: 10.0.0.210-229
|
||||||
|
# Desired: 10.0.200.11+ (future VLAN)
|
||||||
|
echo "10.0.0.211" # Placeholder
|
||||||
|
;;
|
||||||
|
pi|physical-docker)
|
||||||
|
# Control nodes stay on main VLAN
|
||||||
|
echo "10.0.0.200"
|
||||||
|
;;
|
||||||
|
ai-workstation)
|
||||||
|
# Currently: 10.0.0.230-239
|
||||||
|
# Desired: 10.0.200.x (future VLAN)
|
||||||
|
echo "10.0.0.230" # Placeholder
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
echo "10.0.0.200" # Safe default
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
}
|
||||||
|
|
||||||
|
check_vlan_support() {
|
||||||
|
# Check if network hardware supports VLAN tagging
|
||||||
|
# Returns 0 if supported, 1 otherwise
|
||||||
|
|
||||||
|
local interface=$(detect_primary_interface)
|
||||||
|
|
||||||
|
if [ "$interface" == "unknown" ]; then
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check for 802.1Q VLAN support in kernel modules
|
||||||
|
if lsmod | grep -q "^8021q"; then
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check if module can be loaded
|
||||||
|
if sudo modprobe 8021q 2>/dev/null; then
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- NETWORK VALIDATION ---
|
||||||
|
|
||||||
|
validate_connectivity() {
|
||||||
|
# Test basic network connectivity
|
||||||
|
# Returns 0 if healthy, 1 otherwise
|
||||||
|
|
||||||
|
local errors=0
|
||||||
|
|
||||||
|
echo "[⚙] Validating network connectivity..." >&2
|
||||||
|
|
||||||
|
# Test default gateway
|
||||||
|
local gateway=$(ip route show default 2>/dev/null | awk '/^default/ {print $3; exit}')
|
||||||
|
if [ -n "$gateway" ]; then
|
||||||
|
if ping -c 2 -W 3 "$gateway" &>/dev/null; then
|
||||||
|
echo " [✓] Gateway reachable: $gateway" >&2
|
||||||
|
else
|
||||||
|
echo " [✗] Gateway unreachable: $gateway" >&2
|
||||||
|
((errors++))
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
echo " [✗] No default gateway configured" >&2
|
||||||
|
((errors++))
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Test DNS resolution
|
||||||
|
if ping -c 2 -W 3 8.8.8.8 &>/dev/null; then
|
||||||
|
echo " [✓] Internet connectivity (8.8.8.8)" >&2
|
||||||
|
else
|
||||||
|
echo " [✗] No internet connectivity" >&2
|
||||||
|
((errors++))
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Test DNS resolution
|
||||||
|
if host google.com &>/dev/null; then
|
||||||
|
echo " [✓] DNS resolution working" >&2
|
||||||
|
else
|
||||||
|
echo " [!] DNS resolution issue (warning)" >&2
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ $errors -eq 0 ]; then
|
||||||
|
echo "[✓] Network validation passed" >&2
|
||||||
|
return 0
|
||||||
|
else
|
||||||
|
echo "[✗] Network validation failed ($errors errors)" >&2
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
check_nfs_accessibility() {
|
||||||
|
# Test NFS server accessibility (TerraMaster NAS)
|
||||||
|
# Args: $1 = NFS server IP (default: 10.0.0.250)
|
||||||
|
|
||||||
|
local nfs_server="${1:-10.0.0.250}"
|
||||||
|
|
||||||
|
echo "[⚙] Checking NFS server accessibility ($nfs_server)..." >&2
|
||||||
|
|
||||||
|
# Test basic connectivity via ping
|
||||||
|
if ! ping -c 2 -W 3 "$nfs_server" &>/dev/null; then
|
||||||
|
echo " [✗] NFS server unreachable: $nfs_server" >&2
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo " [✓] NFS server reachable" >&2
|
||||||
|
|
||||||
|
# Test NFS ports (2049 = NFSv3/v4, 111 = portmapper)
|
||||||
|
if command -v nc &>/dev/null; then
|
||||||
|
if nc -z -w 3 "$nfs_server" 2049 2>/dev/null; then
|
||||||
|
echo " [✓] NFS service responding (port 2049)" >&2
|
||||||
|
else
|
||||||
|
echo " [✗] NFS port 2049 closed" >&2
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
test_internal_hairpin_nat() {
|
||||||
|
# Test for hairpin NAT issues (lessons-learned.md #3)
|
||||||
|
# Internal hosts should NOT route through public DNS
|
||||||
|
|
||||||
|
local test_domain="castaldifamily.com"
|
||||||
|
|
||||||
|
echo "[⚙] Testing for hairpin NAT issues..." >&2
|
||||||
|
|
||||||
|
# Get public IP of domain
|
||||||
|
local public_ip=$(dig +short "$test_domain" @8.8.8.8 2>/dev/null | grep -E '^[0-9.]+$' | head -n1)
|
||||||
|
|
||||||
|
if [ -z "$public_ip" ]; then
|
||||||
|
echo " [!] Could not resolve $test_domain, skipping test" >&2
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Try to ping public IP from inside network (should fail on hairpin NAT routers)
|
||||||
|
if ping -c 2 -W 2 "$public_ip" &>/dev/null; then
|
||||||
|
echo " [✓] No hairpin NAT issue detected" >&2
|
||||||
|
return 0
|
||||||
|
else
|
||||||
|
echo " [!] Possible hairpin NAT - use internal IPs (10.0.0.x) for node-to-node" >&2
|
||||||
|
return 0 # Warning, not error
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- NETWORK RENDERER DETECTION ---
|
||||||
|
|
||||||
|
detect_network_renderer() {
|
||||||
|
# Detect network configuration system: netplan, networkd, NetworkManager
|
||||||
|
|
||||||
|
if [ -d /etc/netplan ] && command -v netplan &>/dev/null; then
|
||||||
|
echo "netplan"
|
||||||
|
return 0
|
||||||
|
elif systemctl is-active systemd-networkd &>/dev/null; then
|
||||||
|
echo "networkd"
|
||||||
|
return 0
|
||||||
|
elif systemctl is-active NetworkManager &>/dev/null; then
|
||||||
|
echo "NetworkManager"
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "unknown"
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- WAIT FOR NETWORK ---
|
||||||
|
|
||||||
|
wait_for_network() {
|
||||||
|
# Wait for network to stabilize after configuration change
|
||||||
|
# Args: $1 = timeout in seconds (default: 10)
|
||||||
|
|
||||||
|
local timeout="${1:-10}"
|
||||||
|
local elapsed=0
|
||||||
|
|
||||||
|
echo "[⚙] Waiting for network to stabilize (timeout: ${timeout}s)..." >&2
|
||||||
|
|
||||||
|
while [ $elapsed -lt $timeout ]; do
|
||||||
|
if ping -c 1 -W 1 8.8.8.8 &>/dev/null; then
|
||||||
|
echo "[✓] Network ready after ${elapsed}s" >&2
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
sleep 1
|
||||||
|
((elapsed++))
|
||||||
|
done
|
||||||
|
|
||||||
|
echo "[!] Network not ready after ${timeout}s, continuing anyway" >&2
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
|
||||||
|
# Export functions
|
||||||
|
export -f apply_static_ip
|
||||||
|
export -f apply_network_changes
|
||||||
|
export -f configure_network_safe
|
||||||
|
export -f get_desired_vlan_ip
|
||||||
|
export -f check_vlan_support
|
||||||
|
export -f validate_connectivity
|
||||||
|
export -f check_nfs_accessibility
|
||||||
|
export -f test_internal_hairpin_nat
|
||||||
|
export -f detect_network_renderer
|
||||||
|
export -f wait_for_network
|
||||||
453
scripts/lib/proxmox.sh
Normal file
453
scripts/lib/proxmox.sh
Normal file
@ -0,0 +1,453 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# ==============================================================================
|
||||||
|
# PROXMOX LIBRARY: Post-Install Configuration and Management
|
||||||
|
# ==============================================================================
|
||||||
|
# Part of unified bootstrap system for homelab infrastructure
|
||||||
|
# Comprehensive Proxmox VE post-install routines inspired by community scripts
|
||||||
|
# (tteck/community-scripts ProxmoxVE post-pve-install.sh)
|
||||||
|
# ==============================================================================
|
||||||
|
|
||||||
|
# Source detection library if not already loaded
|
||||||
|
if ! type -t detect_os_family &>/dev/null; then
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
# shellcheck source=./detection.sh
|
||||||
|
source "${SCRIPT_DIR}/detection.sh"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# --- PROXMOX VERSION DETECTION ---
|
||||||
|
|
||||||
|
get_pve_version() {
|
||||||
|
# Get Proxmox VE version (e.g., 8.2.4)
|
||||||
|
if ! command -v pveversion &>/dev/null; then
|
||||||
|
echo "unknown"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
pveversion | awk -F'/' '{print $2}' | awk -F'-' '{print $1}'
|
||||||
|
}
|
||||||
|
|
||||||
|
get_pve_major_version() {
|
||||||
|
# Get major version (8 or 9)
|
||||||
|
local ver=$(get_pve_version)
|
||||||
|
echo "$ver" | cut -d. -f1
|
||||||
|
}
|
||||||
|
|
||||||
|
is_pve_supported_version() {
|
||||||
|
# Check if PVE version is supported (8.0-8.9.x or 9.0-9.1.x)
|
||||||
|
local major=$(get_pve_major_version)
|
||||||
|
|
||||||
|
case "$major" in
|
||||||
|
8|9)
|
||||||
|
return 0
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
return 1
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- REPOSITORY MANAGEMENT ---
|
||||||
|
|
||||||
|
disable_pve_enterprise_repo() {
|
||||||
|
# Disable enterprise repository (requires subscription)
|
||||||
|
local major=$(get_pve_major_version)
|
||||||
|
|
||||||
|
echo "[⚙] Disabling Proxmox enterprise repository..." >&2
|
||||||
|
|
||||||
|
if [ "$major" == "9" ]; then
|
||||||
|
# PVE 9.x uses .sources format
|
||||||
|
for file in /etc/apt/sources.list.d/*.sources; do
|
||||||
|
if [ -f "$file" ] && grep -q "pve-enterprise" "$file"; then
|
||||||
|
# Use Enabled: false instead of commenting
|
||||||
|
if grep -q "^Enabled:" "$file"; then
|
||||||
|
sudo sed -i 's/^Enabled:.*/Enabled: false/' "$file"
|
||||||
|
else
|
||||||
|
echo "Enabled: false" | sudo tee -a "$file" >/dev/null
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
else
|
||||||
|
# PVE 8.x uses .list format
|
||||||
|
if [ -f /etc/apt/sources.list.d/pve-enterprise.list ]; then
|
||||||
|
sudo sed -i 's/^deb/# deb/' /etc/apt/sources.list.d/pve-enterprise.list
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo " [✓] Enterprise repository disabled" >&2
|
||||||
|
}
|
||||||
|
|
||||||
|
enable_pve_no_subscription_repo() {
|
||||||
|
# Enable no-subscription repository (free updates)
|
||||||
|
local major=$(get_pve_major_version)
|
||||||
|
|
||||||
|
echo "[⚙] Enabling Proxmox no-subscription repository..." >&2
|
||||||
|
|
||||||
|
if [ "$major" == "9" ]; then
|
||||||
|
# PVE 9.x deb822 format
|
||||||
|
sudo tee /etc/apt/sources.list.d/proxmox.sources >/dev/null <<EOF
|
||||||
|
Types: deb
|
||||||
|
URIs: http://download.proxmox.com/debian/pve
|
||||||
|
Suites: trixie
|
||||||
|
Components: pve-no-subscription
|
||||||
|
Signed-By: /usr/share/keyrings/proxmox-archive-keyring.gpg
|
||||||
|
EOF
|
||||||
|
else
|
||||||
|
# PVE 8.x traditional format
|
||||||
|
sudo tee /etc/apt/sources.list.d/pve-no-subscription.list >/dev/null <<EOF
|
||||||
|
deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription
|
||||||
|
EOF
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo " [✓] No-subscription repository enabled" >&2
|
||||||
|
}
|
||||||
|
|
||||||
|
configure_ceph_repos() {
|
||||||
|
# Configure Ceph storage repositories (disabled by default)
|
||||||
|
local major=$(get_pve_major_version)
|
||||||
|
|
||||||
|
echo "[⚙] Configuring Ceph package repositories..." >&2
|
||||||
|
|
||||||
|
if [ "$major" == "9" ]; then
|
||||||
|
# PVE 9.x - Squid version
|
||||||
|
sudo tee /etc/apt/sources.list.d/ceph.sources >/dev/null <<EOF
|
||||||
|
Types: deb
|
||||||
|
URIs: http://download.proxmox.com/debian/ceph-squid
|
||||||
|
Suites: trixie
|
||||||
|
Components: no-subscription
|
||||||
|
Signed-By: /usr/share/keyrings/proxmox-archive-keyring.gpg
|
||||||
|
Enabled: false
|
||||||
|
EOF
|
||||||
|
else
|
||||||
|
# PVE 8.x - Reef/Quincy versions
|
||||||
|
sudo tee /etc/apt/sources.list.d/ceph.list >/dev/null <<EOF
|
||||||
|
# Ceph Reef (disabled - enable as needed)
|
||||||
|
# deb http://download.proxmox.com/debian/ceph-reef bookworm no-subscription
|
||||||
|
|
||||||
|
# Ceph Quincy (disabled - enable as needed)
|
||||||
|
# deb http://download.proxmox.com/debian/ceph-quincy bookworm no-subscription
|
||||||
|
EOF
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo " [✓] Ceph repositories configured (disabled)" >&2
|
||||||
|
}
|
||||||
|
|
||||||
|
add_pvetest_repo() {
|
||||||
|
# Add pvetest repository for beta features (disabled by default)
|
||||||
|
local major=$(get_pve_major_version)
|
||||||
|
|
||||||
|
echo "[⚙] Adding pvetest repository (disabled)..." >&2
|
||||||
|
|
||||||
|
if [ "$major" == "9" ]; then
|
||||||
|
sudo tee /etc/apt/sources.list.d/pve-test.sources >/dev/null <<EOF
|
||||||
|
Types: deb
|
||||||
|
URIs: http://download.proxmox.com/debian/pve
|
||||||
|
Suites: trixie
|
||||||
|
Components: pve-test
|
||||||
|
Signed-By: /usr/share/keyrings/proxmox-archive-keyring.gpg
|
||||||
|
Enabled: false
|
||||||
|
EOF
|
||||||
|
else
|
||||||
|
sudo tee /etc/apt/sources.list.d/pvetest-for-beta.list >/dev/null <<EOF
|
||||||
|
# PVE Test Repository (disabled - enable for beta testing)
|
||||||
|
# deb http://download.proxmox.com/debian/pve bookworm pvetest
|
||||||
|
EOF
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo " [✓] pvetest repository added (disabled)" >&2
|
||||||
|
}
|
||||||
|
|
||||||
|
configure_all_pve_repos() {
|
||||||
|
# Run complete repository configuration
|
||||||
|
disable_pve_enterprise_repo
|
||||||
|
enable_pve_no_subscription_repo
|
||||||
|
configure_ceph_repos
|
||||||
|
add_pvetest_repo
|
||||||
|
|
||||||
|
echo "[✓] Proxmox repository configuration complete" >&2
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- SUBSCRIPTION NAG REMOVAL ---
|
||||||
|
|
||||||
|
disable_subscription_nag() {
|
||||||
|
# Remove subscription nag from web UI and mobile UI
|
||||||
|
# Creates apt hook to persist after updates
|
||||||
|
|
||||||
|
echo "[⚙] Disabling subscription nag dialog..." >&2
|
||||||
|
|
||||||
|
# Create persistent script
|
||||||
|
sudo mkdir -p /usr/local/bin
|
||||||
|
sudo tee /usr/local/bin/pve-remove-nag.sh >/dev/null <<'SCRIPT_EOF'
|
||||||
|
#!/bin/sh
|
||||||
|
# Auto-generated by homelab bootstrap - removes Proxmox subscription nag
|
||||||
|
|
||||||
|
WEB_JS=/usr/share/javascript/proxmox-widget-toolkit/proxmoxlib.js
|
||||||
|
if [ -s "$WEB_JS" ] && ! grep -q NoMoreNagging "$WEB_JS"; then
|
||||||
|
echo "Patching Web UI subscription nag..."
|
||||||
|
sed -i -e "/data\.status/ s/!//" -e "/data\.status/ s/active/NoMoreNagging/" "$WEB_JS"
|
||||||
|
fi
|
||||||
|
|
||||||
|
MOBILE_TPL=/usr/share/pve-yew-mobile-gui/index.html.tpl
|
||||||
|
MARKER="<!-- MANAGED BLOCK FOR MOBILE NAG -->"
|
||||||
|
if [ -f "$MOBILE_TPL" ] && ! grep -q "$MARKER" "$MOBILE_TPL"; then
|
||||||
|
echo "Patching Mobile UI subscription nag..."
|
||||||
|
printf "%s\n" \
|
||||||
|
"$MARKER" \
|
||||||
|
"<script>" \
|
||||||
|
" function removeSubscriptionElements() {" \
|
||||||
|
" const dialogs = document.querySelectorAll('dialog.pwt-outer-dialog');" \
|
||||||
|
" dialogs.forEach(d => { if ((d.textContent||'').toLowerCase().includes('subscription')) d.remove(); });" \
|
||||||
|
" const cards = document.querySelectorAll('.pwt-card.pwt-p-2');" \
|
||||||
|
" cards.forEach(c => { if (!c.querySelector('button') && (c.textContent||'').toLowerCase().includes('subscription')) c.remove(); });" \
|
||||||
|
" }" \
|
||||||
|
" const observer = new MutationObserver(removeSubscriptionElements);" \
|
||||||
|
" observer.observe(document.body, { childList: true, subtree: true });" \
|
||||||
|
" removeSubscriptionElements();" \
|
||||||
|
"</script>" >> "$MOBILE_TPL"
|
||||||
|
fi
|
||||||
|
SCRIPT_EOF
|
||||||
|
|
||||||
|
sudo chmod 755 /usr/local/bin/pve-remove-nag.sh
|
||||||
|
|
||||||
|
# Create apt hook to run after updates
|
||||||
|
sudo tee /etc/apt/apt.conf.d/90-pve-no-nag <<'APT_EOF'
|
||||||
|
DPkg::Post-Invoke { "/usr/local/bin/pve-remove-nag.sh"; };
|
||||||
|
APT_EOF
|
||||||
|
|
||||||
|
sudo chmod 644 /etc/apt/apt.conf.d/90-pve-no-nag
|
||||||
|
|
||||||
|
# Apply immediately
|
||||||
|
sudo /usr/local/bin/pve-remove-nag.sh 2>/dev/null || true
|
||||||
|
|
||||||
|
echo " [✓] Subscription nag disabled (clear browser cache)" >&2
|
||||||
|
}
|
||||||
|
|
||||||
|
enable_subscription_nag() {
|
||||||
|
# Remove nag removal script (restore default behavior)
|
||||||
|
|
||||||
|
echo "[⚙] Re-enabling subscription nag..." >&2
|
||||||
|
|
||||||
|
sudo rm -f /usr/local/bin/pve-remove-nag.sh
|
||||||
|
sudo rm -f /etc/apt/apt.conf.d/90-pve-no-nag
|
||||||
|
|
||||||
|
# Reinstall widget toolkit to restore original
|
||||||
|
sudo apt --reinstall install proxmox-widget-toolkit &>/dev/null || true
|
||||||
|
|
||||||
|
echo " [✓] Subscription nag re-enabled" >&2
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- HIGH AVAILABILITY MANAGEMENT ---
|
||||||
|
|
||||||
|
disable_ha_services() {
|
||||||
|
# Disable HA services for single-node setups (saves resources)
|
||||||
|
|
||||||
|
echo "[⚙] Disabling high availability services..." >&2
|
||||||
|
|
||||||
|
if systemctl is-active --quiet pve-ha-lrm || systemctl is-active --quiet pve-ha-crm; then
|
||||||
|
sudo systemctl disable --now pve-ha-lrm 2>/dev/null || true
|
||||||
|
sudo systemctl disable --now pve-ha-crm 2>/dev/null || true
|
||||||
|
sudo systemctl disable --now corosync 2>/dev/null || true
|
||||||
|
echo " [✓] HA services disabled" >&2
|
||||||
|
else
|
||||||
|
echo " [✓] HA services already disabled" >&2
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
enable_ha_services() {
|
||||||
|
# Enable HA services for clustered environments
|
||||||
|
|
||||||
|
echo "[⚙] Enabling high availability services..." >&2
|
||||||
|
|
||||||
|
sudo systemctl enable --now pve-ha-lrm
|
||||||
|
sudo systemctl enable --now pve-ha-crm
|
||||||
|
sudo systemctl enable --now corosync
|
||||||
|
|
||||||
|
echo " [✓] HA services enabled" >&2
|
||||||
|
}
|
||||||
|
|
||||||
|
check_ha_status() {
|
||||||
|
# Check if HA services are running
|
||||||
|
|
||||||
|
if systemctl is-active --quiet pve-ha-lrm; then
|
||||||
|
echo "enabled"
|
||||||
|
else
|
||||||
|
echo "disabled"
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- SYSTEM MAINTENANCE ---
|
||||||
|
|
||||||
|
update_pve_system() {
|
||||||
|
# Run full system update
|
||||||
|
|
||||||
|
echo "[⚙] Updating Proxmox VE system (this may take several minutes)..." >&2
|
||||||
|
|
||||||
|
sudo apt-get update -qq
|
||||||
|
sudo apt-get dist-upgrade -y -qq
|
||||||
|
|
||||||
|
echo " [✓] System update complete" >&2
|
||||||
|
}
|
||||||
|
|
||||||
|
check_reboot_required() {
|
||||||
|
# Check if reboot is needed after updates
|
||||||
|
|
||||||
|
if [ -f /var/run/reboot-required ]; then
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check if kernel was updated
|
||||||
|
local running_kernel=$(uname -r)
|
||||||
|
local latest_kernel=$(dpkg -l | grep linux-image | sort -V | tail -n1 | awk '{print $2}' | sed 's/linux-image-//')
|
||||||
|
|
||||||
|
if [ "$running_kernel" != "$latest_kernel" ]; then
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- COMPREHENSIVE POST-INSTALL ---
|
||||||
|
|
||||||
|
run_proxmox_post_install() {
|
||||||
|
# Complete post-install routine for Proxmox hosts
|
||||||
|
# Args: $1 = mode (auto|interactive) - default: auto
|
||||||
|
|
||||||
|
local mode="${1:-auto}"
|
||||||
|
|
||||||
|
echo "=======================================" >&2
|
||||||
|
echo "PROXMOX POST-INSTALL CONFIGURATION" >&2
|
||||||
|
echo "Version: $(get_pve_version)" >&2
|
||||||
|
echo "Mode: $mode" >&2
|
||||||
|
echo "=======================================" >&2
|
||||||
|
echo "" >&2
|
||||||
|
|
||||||
|
# Verify Proxmox installation
|
||||||
|
if ! is_pve_supported_version; then
|
||||||
|
echo "[✗] Unsupported Proxmox VE version" >&2
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Repository configuration
|
||||||
|
configure_all_pve_repos
|
||||||
|
echo "" >&2
|
||||||
|
|
||||||
|
# Subscription nag removal (auto mode)
|
||||||
|
if [ "$mode" == "auto" ]; then
|
||||||
|
disable_subscription_nag
|
||||||
|
echo "" >&2
|
||||||
|
fi
|
||||||
|
|
||||||
|
# HA service management
|
||||||
|
local cluster_nodes=$(pvesh get /nodes --output-format=json 2>/dev/null | jq -r 'length' 2>/dev/null || echo "1")
|
||||||
|
|
||||||
|
if [ "$cluster_nodes" -eq 1 ]; then
|
||||||
|
echo "[⚙] Single-node setup detected" >&2
|
||||||
|
disable_ha_services
|
||||||
|
echo "" >&2
|
||||||
|
else
|
||||||
|
echo "[⚙] Multi-node cluster detected ($cluster_nodes nodes)" >&2
|
||||||
|
echo " [✓] HA services left enabled" >&2
|
||||||
|
echo "" >&2
|
||||||
|
fi
|
||||||
|
|
||||||
|
# System update
|
||||||
|
if [ "$mode" == "auto" ]; then
|
||||||
|
echo "[⚙] Checking for updates..." >&2
|
||||||
|
sudo apt-get update -qq
|
||||||
|
|
||||||
|
local upgrades=$(apt list --upgradable 2>/dev/null | grep -c upgradable || echo "0")
|
||||||
|
if [ "$upgrades" -gt 1 ]; then
|
||||||
|
echo " [!] $((upgrades - 1)) packages can be upgraded" >&2
|
||||||
|
echo " [!] Run manually: apt update && apt dist-upgrade" >&2
|
||||||
|
else
|
||||||
|
echo " [✓] System up to date" >&2
|
||||||
|
fi
|
||||||
|
echo "" >&2
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check reboot requirement
|
||||||
|
if check_reboot_required; then
|
||||||
|
echo "[!] REBOOT REQUIRED" >&2
|
||||||
|
echo " Kernel or critical services updated" >&2
|
||||||
|
echo " Run: reboot" >&2
|
||||||
|
echo "" >&2
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "=======================================" >&2
|
||||||
|
echo "POST-INSTALL COMPLETE" >&2
|
||||||
|
echo "=======================================" >&2
|
||||||
|
echo "" >&2
|
||||||
|
echo "Next Steps:" >&2
|
||||||
|
echo " 1. Clear browser cache (Ctrl+Shift+R)" >&2
|
||||||
|
echo " 2. Reboot if kernel was updated" >&2
|
||||||
|
echo " 3. Configure storage/networking in web UI" >&2
|
||||||
|
echo "" >&2
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- VALIDATION ---
|
||||||
|
|
||||||
|
validate_proxmox_config() {
|
||||||
|
# Validate Proxmox configuration
|
||||||
|
|
||||||
|
local errors=0
|
||||||
|
|
||||||
|
echo "[⚙] Validating Proxmox configuration..." >&2
|
||||||
|
|
||||||
|
# Check PVE version
|
||||||
|
if ! is_pve_supported_version; then
|
||||||
|
echo " [✗] Unsupported PVE version: $(get_pve_version)" >&2
|
||||||
|
((errors++))
|
||||||
|
else
|
||||||
|
echo " [✓] PVE version: $(get_pve_version)" >&2
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check no-subscription repo
|
||||||
|
if grep -rq "pve-no-subscription" /etc/apt/sources.list.d/ 2>/dev/null; then
|
||||||
|
echo " [✓] No-subscription repository configured" >&2
|
||||||
|
else
|
||||||
|
echo " [✗] No-subscription repository not found" >&2
|
||||||
|
((errors++))
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check enterprise repo disabled
|
||||||
|
if grep -rq "^deb.*pve-enterprise" /etc/apt/sources.list.d/ 2>/dev/null; then
|
||||||
|
echo " [!] Enterprise repository still active (requires subscription)" >&2
|
||||||
|
else
|
||||||
|
echo " [✓] Enterprise repository disabled" >&2
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check cluster status
|
||||||
|
local cluster_status=$(pvesh get /cluster/status --output-format=json 2>/dev/null | jq -r '.[0].type' 2>/dev/null || echo "unknown")
|
||||||
|
if [ "$cluster_status" == "cluster" ]; then
|
||||||
|
echo " [✓] Part of Proxmox cluster" >&2
|
||||||
|
else
|
||||||
|
echo " [✓] Standalone node" >&2
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ $errors -eq 0 ]; then
|
||||||
|
echo "[✓] Proxmox validation passed" >&2
|
||||||
|
return 0
|
||||||
|
else
|
||||||
|
echo "[✗] Proxmox validation failed ($errors errors)" >&2
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
# Export functions
|
||||||
|
export -f get_pve_version
|
||||||
|
export -f get_pve_major_version
|
||||||
|
export -f is_pve_supported_version
|
||||||
|
export -f disable_pve_enterprise_repo
|
||||||
|
export -f enable_pve_no_subscription_repo
|
||||||
|
export -f configure_ceph_repos
|
||||||
|
export -f add_pvetest_repo
|
||||||
|
export -f configure_all_pve_repos
|
||||||
|
export -f disable_subscription_nag
|
||||||
|
export -f enable_subscription_nag
|
||||||
|
export -f disable_ha_services
|
||||||
|
export -f enable_ha_services
|
||||||
|
export -f check_ha_status
|
||||||
|
export -f update_pve_system
|
||||||
|
export -f check_reboot_required
|
||||||
|
export -f run_proxmox_post_install
|
||||||
|
export -f validate_proxmox_config
|
||||||
510
scripts/lib/validation.sh
Normal file
510
scripts/lib/validation.sh
Normal file
@ -0,0 +1,510 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# ==============================================================================
|
||||||
|
# VALIDATION LIBRARY: Comprehensive System Health Checks
|
||||||
|
# ==============================================================================
|
||||||
|
# Part of unified bootstrap system for homelab infrastructure
|
||||||
|
# Provides comprehensive pre-flight and post-bootstrap validation with
|
||||||
|
# severity levels (critical, warning, info) for operational readiness.
|
||||||
|
# ==============================================================================
|
||||||
|
|
||||||
|
# Source detection library if not already loaded
|
||||||
|
if ! type -t detect_os_family &>/dev/null; then
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
# shellcheck source=./detection.sh
|
||||||
|
source "${SCRIPT_DIR}/detection.sh"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# --- VALIDATION TRACKING ---
|
||||||
|
|
||||||
|
declare -g VALIDATION_ERRORS=0
|
||||||
|
declare -g VALIDATION_WARNINGS=0
|
||||||
|
declare -g VALIDATION_PASSED=0
|
||||||
|
|
||||||
|
reset_validation_counters() {
|
||||||
|
VALIDATION_ERRORS=0
|
||||||
|
VALIDATION_WARNINGS=0
|
||||||
|
VALIDATION_PASSED=0
|
||||||
|
}
|
||||||
|
|
||||||
|
log_pass() {
|
||||||
|
local message="$1"
|
||||||
|
echo " [✓] $message" >&2
|
||||||
|
((VALIDATION_PASSED++))
|
||||||
|
}
|
||||||
|
|
||||||
|
log_warning() {
|
||||||
|
local message="$1"
|
||||||
|
echo " [!] WARNING: $message" >&2
|
||||||
|
((VALIDATION_WARNINGS++))
|
||||||
|
}
|
||||||
|
|
||||||
|
log_error() {
|
||||||
|
local message="$1"
|
||||||
|
echo " [✗] ERROR: $message" >&2
|
||||||
|
((VALIDATION_ERRORS++))
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- DISK VALIDATION ---
|
||||||
|
|
||||||
|
check_disk_space() {
|
||||||
|
# Validate sufficient disk space for Docker and system operations
|
||||||
|
# Critical: Root partition must have at least 10GB free
|
||||||
|
# Warning: Root partition should have at least 20GB free
|
||||||
|
|
||||||
|
echo "[⚙] Checking disk space..." >&2
|
||||||
|
|
||||||
|
local root_avail=$(df -BG / | awk 'NR==2 {print $4}' | sed 's/G//')
|
||||||
|
|
||||||
|
if [ -z "$root_avail" ]; then
|
||||||
|
log_error "Could not determine disk space"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ "$root_avail" -lt 10 ]; then
|
||||||
|
log_error "Insufficient disk space on /: ${root_avail}GB (minimum 10GB required)"
|
||||||
|
return 1
|
||||||
|
elif [ "$root_avail" -lt 20 ]; then
|
||||||
|
log_warning "Low disk space on /: ${root_avail}GB (recommend 20GB+)"
|
||||||
|
else
|
||||||
|
log_pass "Disk space on /: ${root_avail}GB available"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check for separate /var partition (common on servers)
|
||||||
|
if mountpoint -q /var 2>/dev/null; then
|
||||||
|
local var_avail=$(df -BG /var | awk 'NR==2 {print $4}' | sed 's/G//')
|
||||||
|
if [ "$var_avail" -lt 20 ]; then
|
||||||
|
log_warning "Low disk space on /var: ${var_avail}GB (Docker images will go here)"
|
||||||
|
else
|
||||||
|
log_pass "Disk space on /var: ${var_avail}GB available"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
check_disk_performance() {
|
||||||
|
# Check if root is on SSD vs HDD (performance indicator)
|
||||||
|
|
||||||
|
echo "[⚙] Checking disk performance characteristics..." >&2
|
||||||
|
|
||||||
|
local root_device=$(df / | awk 'NR==2 {print $1}' | sed 's/[0-9]*$//')
|
||||||
|
local device_name=$(basename "$root_device")
|
||||||
|
|
||||||
|
if [ -f "/sys/block/$device_name/queue/rotational" ]; then
|
||||||
|
local rotational=$(cat "/sys/block/$device_name/queue/rotational")
|
||||||
|
if [ "$rotational" -eq 0 ]; then
|
||||||
|
log_pass "Root filesystem on SSD ($device_name)"
|
||||||
|
else
|
||||||
|
log_warning "Root filesystem on HDD ($device_name) - SSD recommended for Docker"
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
log_warning "Could not determine disk type for $device_name"
|
||||||
|
fi
|
||||||
|
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- MEMORY VALIDATION ---
|
||||||
|
|
||||||
|
check_memory() {
|
||||||
|
# Validate sufficient RAM for deployment type
|
||||||
|
# Proxmox: 8GB minimum, Docker Swarm: 4GB minimum, Pi: 2GB minimum
|
||||||
|
|
||||||
|
echo "[⚙] Checking memory..." >&2
|
||||||
|
|
||||||
|
local mem_total_kb=$(grep MemTotal /proc/meminfo | awk '{print $2}')
|
||||||
|
local mem_total_gb=$((mem_total_kb / 1024 / 1024))
|
||||||
|
|
||||||
|
if [ -z "$mem_total_gb" ] || [ "$mem_total_gb" -eq 0 ]; then
|
||||||
|
log_error "Could not determine system memory"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
local hardware_type=$(detect_hardware_type)
|
||||||
|
local min_required=2
|
||||||
|
|
||||||
|
case "$hardware_type" in
|
||||||
|
proxmox)
|
||||||
|
min_required=8
|
||||||
|
;;
|
||||||
|
docker-vm|physical-docker)
|
||||||
|
min_required=4
|
||||||
|
;;
|
||||||
|
ai-workstation)
|
||||||
|
min_required=16
|
||||||
|
;;
|
||||||
|
pi)
|
||||||
|
min_required=2
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
|
||||||
|
if [ "$mem_total_gb" -lt "$min_required" ]; then
|
||||||
|
log_error "Insufficient RAM: ${mem_total_gb}GB (minimum ${min_required}GB for $hardware_type)"
|
||||||
|
return 1
|
||||||
|
else
|
||||||
|
log_pass "Memory: ${mem_total_gb}GB (meets ${min_required}GB minimum for $hardware_type)"
|
||||||
|
fi
|
||||||
|
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
check_swap() {
|
||||||
|
# Check swap configuration (important for memory-constrained systems)
|
||||||
|
|
||||||
|
echo "[⚙] Checking swap configuration..." >&2
|
||||||
|
|
||||||
|
local swap_total_kb=$(grep SwapTotal /proc/meminfo | awk '{print $2}')
|
||||||
|
local swap_total_gb=$((swap_total_kb / 1024 / 1024))
|
||||||
|
|
||||||
|
local hardware_type=$(detect_hardware_type)
|
||||||
|
|
||||||
|
if [ "$hardware_type" == "proxmox" ]; then
|
||||||
|
# Proxmox hosts should NOT have swap enabled (best practice)
|
||||||
|
if [ "$swap_total_gb" -gt 0 ]; then
|
||||||
|
log_warning "Swap enabled on Proxmox host (${swap_total_gb}GB) - consider disabling"
|
||||||
|
else
|
||||||
|
log_pass "Swap disabled (correct for Proxmox)"
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
# Other systems benefit from swap
|
||||||
|
if [ "$swap_total_gb" -eq 0 ]; then
|
||||||
|
log_warning "No swap configured - may cause OOM issues under load"
|
||||||
|
else
|
||||||
|
log_pass "Swap: ${swap_total_gb}GB configured"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- NETWORK VALIDATION ---
|
||||||
|
|
||||||
|
check_network_routes() {
|
||||||
|
# Validate proper network routing configuration
|
||||||
|
|
||||||
|
echo "[⚙] Checking network routing..." >&2
|
||||||
|
|
||||||
|
# Check for default route
|
||||||
|
if ip route show default &>/dev/null; then
|
||||||
|
local gateway=$(ip route show default | awk '/^default/ {print $3; exit}')
|
||||||
|
log_pass "Default gateway configured: $gateway"
|
||||||
|
else
|
||||||
|
log_error "No default gateway configured"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check for DNS servers
|
||||||
|
if [ -f /etc/resolv.conf ]; then
|
||||||
|
local dns_count=$(grep -c "^nameserver" /etc/resolv.conf)
|
||||||
|
if [ "$dns_count" -gt 0 ]; then
|
||||||
|
log_pass "DNS servers configured ($dns_count entries)"
|
||||||
|
else
|
||||||
|
log_warning "No DNS servers in /etc/resolv.conf"
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
log_error "/etc/resolv.conf missing"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
check_hostname_resolution() {
|
||||||
|
# Validate hostname resolution (important for cluster operations)
|
||||||
|
|
||||||
|
echo "[⚙] Checking hostname resolution..." >&2
|
||||||
|
|
||||||
|
local hostname=$(hostname)
|
||||||
|
local fqdn=$(hostname -f 2>/dev/null || echo "")
|
||||||
|
|
||||||
|
if [ -n "$hostname" ]; then
|
||||||
|
log_pass "Hostname: $hostname"
|
||||||
|
else
|
||||||
|
log_error "Hostname not set"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check if hostname resolves
|
||||||
|
if host "$hostname" &>/dev/null || getent hosts "$hostname" &>/dev/null; then
|
||||||
|
log_pass "Hostname resolves"
|
||||||
|
else
|
||||||
|
log_warning "Hostname '$hostname' does not resolve - may cause cluster issues"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check /etc/hosts
|
||||||
|
if grep -q "127.0.1.1.*$hostname" /etc/hosts 2>/dev/null; then
|
||||||
|
log_pass "Hostname in /etc/hosts"
|
||||||
|
else
|
||||||
|
log_warning "Hostname not in /etc/hosts - adding is recommended"
|
||||||
|
fi
|
||||||
|
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- NFS CLIENT VALIDATION ---
|
||||||
|
|
||||||
|
check_nfs_client() {
|
||||||
|
# Validate NFS client packages and kernel modules
|
||||||
|
|
||||||
|
echo "[⚙] Checking NFS client..." >&2
|
||||||
|
|
||||||
|
# Check for NFS common package
|
||||||
|
if dpkg -l | grep -q "^ii.*nfs-common"; then
|
||||||
|
log_pass "nfs-common package installed"
|
||||||
|
else
|
||||||
|
log_warning "nfs-common not installed - required for NFS mounts"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check for NFS kernel modules
|
||||||
|
if lsmod | grep -q "^nfs "; then
|
||||||
|
log_pass "NFS kernel module loaded"
|
||||||
|
else
|
||||||
|
# Try to load it
|
||||||
|
if sudo modprobe nfs 2>/dev/null; then
|
||||||
|
log_pass "NFS kernel module loaded successfully"
|
||||||
|
else
|
||||||
|
log_warning "Could not load NFS kernel module"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- DOCKER VALIDATION ---
|
||||||
|
|
||||||
|
check_docker_daemon() {
|
||||||
|
# Validate Docker installation and daemon health
|
||||||
|
|
||||||
|
echo "[⚙] Checking Docker installation..." >&2
|
||||||
|
|
||||||
|
# Check if Docker is installed
|
||||||
|
if ! command -v docker &>/dev/null; then
|
||||||
|
log_warning "Docker not installed (will be installed during bootstrap)"
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
log_pass "Docker binary found: $(docker --version 2>/dev/null | head -n1)"
|
||||||
|
|
||||||
|
# Check if daemon is running
|
||||||
|
if systemctl is-active docker &>/dev/null; then
|
||||||
|
log_pass "Docker daemon running"
|
||||||
|
else
|
||||||
|
log_warning "Docker daemon not running"
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check Docker socket permissions
|
||||||
|
if [ -S /var/run/docker.sock ]; then
|
||||||
|
if sudo docker ps &>/dev/null; then
|
||||||
|
log_pass "Docker socket accessible"
|
||||||
|
else
|
||||||
|
log_warning "Docker socket exists but not accessible"
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
log_warning "Docker socket not found"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check storage driver
|
||||||
|
local storage_driver=$(docker info 2>/dev/null | grep "Storage Driver" | awk '{print $3}')
|
||||||
|
if [ -n "$storage_driver" ]; then
|
||||||
|
log_pass "Storage driver: $storage_driver"
|
||||||
|
|
||||||
|
# Warn about devicemapper (deprecated)
|
||||||
|
if [ "$storage_driver" == "devicemapper" ]; then
|
||||||
|
log_warning "devicemapper storage driver is deprecated - consider overlay2"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- PROXMOX VALIDATION ---
|
||||||
|
|
||||||
|
check_proxmox_api() {
|
||||||
|
# Validate Proxmox VE installation and API accessibility
|
||||||
|
|
||||||
|
local hardware_type=$(detect_hardware_type)
|
||||||
|
|
||||||
|
if [ "$hardware_type" != "proxmox" ]; then
|
||||||
|
return 0 # Skip if not Proxmox
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "[⚙] Checking Proxmox VE..." >&2
|
||||||
|
|
||||||
|
# Check for pveversion
|
||||||
|
if command -v pveversion &>/dev/null; then
|
||||||
|
local pve_version=$(pveversion | head -n1)
|
||||||
|
log_pass "Proxmox installed: $pve_version"
|
||||||
|
else
|
||||||
|
log_error "Proxmox tools not found (pveversion missing)"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check cluster status
|
||||||
|
if command -v pvesh &>/dev/null; then
|
||||||
|
if sudo pvesh get /cluster/status &>/dev/null; then
|
||||||
|
log_pass "Proxmox API accessible"
|
||||||
|
else
|
||||||
|
log_warning "Proxmox API not responding"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check for required repositories
|
||||||
|
if [ -f /etc/apt/sources.list.d/pve-no-subscription.list ]; then
|
||||||
|
log_pass "No-subscription repository configured"
|
||||||
|
else
|
||||||
|
log_warning "No-subscription repository not configured"
|
||||||
|
fi
|
||||||
|
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- SECURITY VALIDATION ---
|
||||||
|
|
||||||
|
check_ssh_security() {
|
||||||
|
# Basic SSH security validation
|
||||||
|
|
||||||
|
echo "[⚙] Checking SSH security..." >&2
|
||||||
|
|
||||||
|
# Check if SSH is running
|
||||||
|
if systemctl is-active ssh &>/dev/null || systemctl is-active sshd &>/dev/null; then
|
||||||
|
log_pass "SSH service running"
|
||||||
|
else
|
||||||
|
log_error "SSH service not running"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check for password authentication (should be disabled in production)
|
||||||
|
if [ -f /etc/ssh/sshd_config ]; then
|
||||||
|
if grep -q "^PasswordAuthentication no" /etc/ssh/sshd_config; then
|
||||||
|
log_pass "SSH password authentication disabled (secure)"
|
||||||
|
else
|
||||||
|
log_warning "SSH password authentication may be enabled - key-only is recommended"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check for root login
|
||||||
|
if [ -f /etc/ssh/sshd_config ]; then
|
||||||
|
if grep -q "^PermitRootLogin no" /etc/ssh/sshd_config; then
|
||||||
|
log_pass "SSH root login disabled (secure)"
|
||||||
|
else
|
||||||
|
log_warning "SSH root login may be enabled - consider disabling"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
check_firewall() {
|
||||||
|
# Check firewall status (informational)
|
||||||
|
|
||||||
|
echo "[⚙] Checking firewall..." >&2
|
||||||
|
|
||||||
|
if systemctl is-active ufw &>/dev/null; then
|
||||||
|
local ufw_status=$(sudo ufw status 2>/dev/null | head -n1)
|
||||||
|
log_pass "UFW active: $ufw_status"
|
||||||
|
elif systemctl is-active firewalld &>/dev/null; then
|
||||||
|
log_pass "firewalld active"
|
||||||
|
elif command -v iptables &>/dev/null; then
|
||||||
|
local iptables_rules=$(sudo iptables -L -n | wc -l)
|
||||||
|
if [ "$iptables_rules" -gt 8 ]; then
|
||||||
|
log_pass "iptables rules configured ($iptables_rules lines)"
|
||||||
|
else
|
||||||
|
log_warning "No firewall detected - consider enabling UFW"
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
log_warning "No firewall detected"
|
||||||
|
fi
|
||||||
|
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- TIME SYNCHRONIZATION ---
|
||||||
|
|
||||||
|
check_time_sync() {
|
||||||
|
# Validate NTP/timesyncd for cluster time synchronization
|
||||||
|
|
||||||
|
echo "[⚙] Checking time synchronization..." >&2
|
||||||
|
|
||||||
|
if systemctl is-active systemd-timesyncd &>/dev/null; then
|
||||||
|
local ntp_status=$(timedatectl status 2>/dev/null | grep "synchronized" | awk '{print $3}')
|
||||||
|
if [ "$ntp_status" == "yes" ]; then
|
||||||
|
log_pass "Time synchronized via systemd-timesyncd"
|
||||||
|
else
|
||||||
|
log_warning "Time sync not confirmed"
|
||||||
|
fi
|
||||||
|
elif systemctl is-active ntp &>/dev/null || systemctl is-active ntpd &>/dev/null; then
|
||||||
|
log_pass "NTP service running"
|
||||||
|
else
|
||||||
|
log_warning "No time synchronization service detected - critical for clusters"
|
||||||
|
fi
|
||||||
|
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- COMPREHENSIVE VALIDATION SUITE ---
|
||||||
|
|
||||||
|
run_validation_suite() {
|
||||||
|
# Run all validation checks and return summary
|
||||||
|
# Returns 0 if no critical errors, 1 if critical errors found
|
||||||
|
|
||||||
|
reset_validation_counters
|
||||||
|
|
||||||
|
echo "======================================" >&2
|
||||||
|
echo "SYSTEM VALIDATION SUITE" >&2
|
||||||
|
echo "======================================" >&2
|
||||||
|
|
||||||
|
# Run all checks (continue even if some fail)
|
||||||
|
check_disk_space || true
|
||||||
|
check_disk_performance || true
|
||||||
|
check_memory || true
|
||||||
|
check_swap || true
|
||||||
|
check_network_routes || true
|
||||||
|
check_hostname_resolution || true
|
||||||
|
check_nfs_client || true
|
||||||
|
check_docker_daemon || true
|
||||||
|
check_proxmox_api || true
|
||||||
|
check_ssh_security || true
|
||||||
|
check_firewall || true
|
||||||
|
check_time_sync || true
|
||||||
|
|
||||||
|
# Summary
|
||||||
|
echo "======================================" >&2
|
||||||
|
echo "VALIDATION SUMMARY" >&2
|
||||||
|
echo " Passed: $VALIDATION_PASSED" >&2
|
||||||
|
echo " Warnings: $VALIDATION_WARNINGS" >&2
|
||||||
|
echo " Errors: $VALIDATION_ERRORS" >&2
|
||||||
|
echo "======================================" >&2
|
||||||
|
|
||||||
|
if [ $VALIDATION_ERRORS -gt 0 ]; then
|
||||||
|
echo "[✗] CRITICAL: $VALIDATION_ERRORS validation errors - manual intervention required" >&2
|
||||||
|
return 1
|
||||||
|
elif [ $VALIDATION_WARNINGS -gt 0 ]; then
|
||||||
|
echo "[!] WARNINGS: $VALIDATION_WARNINGS issues detected - review recommended" >&2
|
||||||
|
return 0
|
||||||
|
else
|
||||||
|
echo "[✓] ALL CHECKS PASSED" >&2
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
# Export functions
|
||||||
|
export -f reset_validation_counters
|
||||||
|
export -f log_pass
|
||||||
|
export -f log_warning
|
||||||
|
export -f log_error
|
||||||
|
export -f check_disk_space
|
||||||
|
export -f check_disk_performance
|
||||||
|
export -f check_memory
|
||||||
|
export -f check_swap
|
||||||
|
export -f check_network_routes
|
||||||
|
export -f check_hostname_resolution
|
||||||
|
export -f check_nfs_client
|
||||||
|
export -f check_docker_daemon
|
||||||
|
export -f check_proxmox_api
|
||||||
|
export -f check_ssh_security
|
||||||
|
export -f check_firewall
|
||||||
|
export -f check_time_sync
|
||||||
|
export -f run_validation_suite
|
||||||
@ -1,5 +1,46 @@
|
|||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
|
|
||||||
|
# ==============================================================================
|
||||||
|
# DEPRECATED: onboarding.sh
|
||||||
|
# ==============================================================================
|
||||||
|
# ⚠️ DEPRECATION NOTICE
|
||||||
|
# This script is deprecated and will be removed in a future release.
|
||||||
|
# Please use the unified bootstrap.sh script instead:
|
||||||
|
#
|
||||||
|
# ./bootstrap.sh
|
||||||
|
#
|
||||||
|
# Note: Proxmox SSH key distribution is now handled by:
|
||||||
|
# 1. Run bootstrap.sh to generate keys
|
||||||
|
# 2. Manually copy keys to Proxmox: ssh-copy-id root@<proxmox-ip>
|
||||||
|
# 3. Run Ansible playbook: ansible-playbook playbooks/onboarding/proxmox_host.yml
|
||||||
|
#
|
||||||
|
# This wrapper will redirect to bootstrap.sh.
|
||||||
|
# ==============================================================================
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
# Show deprecation warning
|
||||||
|
echo "=======================================" >&2
|
||||||
|
echo "⚠️ DEPRECATION WARNING" >&2
|
||||||
|
echo "=======================================" >&2
|
||||||
|
echo "onboarding.sh is deprecated!" >&2
|
||||||
|
echo "" >&2
|
||||||
|
echo "Please use: ./bootstrap.sh" >&2
|
||||||
|
echo "" >&2
|
||||||
|
echo "Note: Proxmox SSH key setup is now a separate step." >&2
|
||||||
|
echo "See documentation/SOPs/SOP-002-Initial-Infrastructure-Deployment.md" >&2
|
||||||
|
echo "" >&2
|
||||||
|
echo "Redirecting to bootstrap.sh in 5 seconds..." >&2
|
||||||
|
echo "Press Ctrl+C to cancel" >&2
|
||||||
|
echo "=======================================" >&2
|
||||||
|
sleep 5
|
||||||
|
|
||||||
|
# Redirect to unified bootstrap
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
exec "${SCRIPT_DIR}/bootstrap.sh" "$@"
|
||||||
|
|
||||||
|
# ==============================================================================
|
||||||
|
# LEGACY CODE BELOW (no longer executed)
|
||||||
# ==============================================================================
|
# ==============================================================================
|
||||||
# ENVIRONMENT VARIABLES
|
# ENVIRONMENT VARIABLES
|
||||||
# ==============================================================================
|
# ==============================================================================
|
||||||
|
|||||||
@ -1,5 +1,44 @@
|
|||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
|
|
||||||
|
# ==============================================================================
|
||||||
|
# DEPRECATED: pi_init.sh
|
||||||
|
# ==============================================================================
|
||||||
|
# ⚠️ DEPRECATION NOTICE
|
||||||
|
# This script is deprecated and will be removed in a future release.
|
||||||
|
# Please use the unified bootstrap.sh script instead:
|
||||||
|
#
|
||||||
|
# ./bootstrap.sh --hardware-type pi
|
||||||
|
#
|
||||||
|
# This wrapper will redirect to bootstrap.sh with appropriate flags.
|
||||||
|
# ==============================================================================
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
# Show deprecation warning
|
||||||
|
echo "=======================================" >&2
|
||||||
|
echo "⚠️ DEPRECATION WARNING" >&2
|
||||||
|
echo "=======================================" >&2
|
||||||
|
echo "pi_init.sh is deprecated!" >&2
|
||||||
|
echo "" >&2
|
||||||
|
echo "Please use: ./bootstrap.sh" >&2
|
||||||
|
echo "" >&2
|
||||||
|
echo "Redirecting to bootstrap.sh in 5 seconds..." >&2
|
||||||
|
echo "Press Ctrl+C to cancel" >&2
|
||||||
|
echo "=======================================" >&2
|
||||||
|
sleep 5
|
||||||
|
|
||||||
|
# Redirect to unified bootstrap
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
exec "${SCRIPT_DIR}/bootstrap.sh" --hardware-type pi "$@"
|
||||||
|
|
||||||
|
# ==============================================================================
|
||||||
|
# LEGACY CODE BELOW (no longer executed)
|
||||||
|
# ==============================================================================
|
||||||
|
|
||||||
|
exit 0
|
||||||
|
|
||||||
|
#!/bin/bash
|
||||||
|
|
||||||
# ==============================================================================
|
# ==============================================================================
|
||||||
# SINGLE-COMMAND BOOTSTRAP: IP, DOCKER, ANSIBLE
|
# SINGLE-COMMAND BOOTSTRAP: IP, DOCKER, ANSIBLE
|
||||||
# Target: Ubuntu / Debian
|
# Target: Ubuntu / Debian
|
||||||
|
|||||||
215
scripts/validate-node.sh
Normal file
215
scripts/validate-node.sh
Normal file
@ -0,0 +1,215 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# ==============================================================================
|
||||||
|
# STANDALONE NODE VALIDATION TOOL
|
||||||
|
# ==============================================================================
|
||||||
|
# Comprehensive health check utility for homelab nodes
|
||||||
|
# Can be run independently on any managed host (post-bootstrap or ad-hoc)
|
||||||
|
#
|
||||||
|
# Usage:
|
||||||
|
# ./validate-node.sh [OPTIONS]
|
||||||
|
#
|
||||||
|
# Options:
|
||||||
|
# --help Show this help message
|
||||||
|
# --json Output results in JSON format
|
||||||
|
# --critical-only Show only critical errors (exit code 1 if any found)
|
||||||
|
# --verbose Show detailed output for each check
|
||||||
|
#
|
||||||
|
# Exit Codes:
|
||||||
|
# 0 - All checks passed or warnings only
|
||||||
|
# 1 - Critical errors found
|
||||||
|
# 2 - Invalid usage
|
||||||
|
#
|
||||||
|
# ==============================================================================
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
# --- SCRIPT METADATA ---
|
||||||
|
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
SCRIPT_NAME="$(basename "${BASH_SOURCE[0]}")"
|
||||||
|
VERSION="1.0.0"
|
||||||
|
|
||||||
|
# --- LOAD LIBRARIES ---
|
||||||
|
|
||||||
|
# shellcheck source=./lib/detection.sh
|
||||||
|
source "${SCRIPT_DIR}/lib/detection.sh"
|
||||||
|
# shellcheck source=./lib/validation.sh
|
||||||
|
source "${SCRIPT_DIR}/lib/validation.sh"
|
||||||
|
# shellcheck source=./lib/network.sh
|
||||||
|
source "${SCRIPT_DIR}/lib/network.sh"
|
||||||
|
|
||||||
|
# --- COMMAND LINE ARGUMENTS ---
|
||||||
|
|
||||||
|
SHOW_HELP=false
|
||||||
|
OUTPUT_JSON=false
|
||||||
|
CRITICAL_ONLY=false
|
||||||
|
VERBOSE=false
|
||||||
|
|
||||||
|
parse_arguments() {
|
||||||
|
while [[ $# -gt 0 ]]; do
|
||||||
|
case "$1" in
|
||||||
|
--help|-h)
|
||||||
|
SHOW_HELP=true
|
||||||
|
shift
|
||||||
|
;;
|
||||||
|
--json)
|
||||||
|
OUTPUT_JSON=true
|
||||||
|
shift
|
||||||
|
;;
|
||||||
|
--critical-only)
|
||||||
|
CRITICAL_ONLY=true
|
||||||
|
shift
|
||||||
|
;;
|
||||||
|
--verbose|-v)
|
||||||
|
VERBOSE=true
|
||||||
|
shift
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
echo "ERROR: Unknown option: $1" >&2
|
||||||
|
echo "Use --help for usage information" >&2
|
||||||
|
exit 2
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
done
|
||||||
|
}
|
||||||
|
|
||||||
|
show_help() {
|
||||||
|
cat <<'EOF'
|
||||||
|
=============================================================================
|
||||||
|
STANDALONE NODE VALIDATION TOOL v1.0.0
|
||||||
|
=============================================================================
|
||||||
|
|
||||||
|
Comprehensive health check for homelab infrastructure nodes.
|
||||||
|
Can be run on any managed host to verify operational readiness.
|
||||||
|
|
||||||
|
USAGE:
|
||||||
|
./validate-node.sh [OPTIONS]
|
||||||
|
|
||||||
|
OPTIONS:
|
||||||
|
--help Show this help message
|
||||||
|
--json Output results in JSON format (for monitoring integration)
|
||||||
|
--critical-only Show only critical errors (suppress warnings)
|
||||||
|
--verbose Show detailed output for each check
|
||||||
|
|
||||||
|
EXIT CODES:
|
||||||
|
0 - All checks passed (or warnings only)
|
||||||
|
1 - Critical errors found
|
||||||
|
2 - Invalid usage
|
||||||
|
|
||||||
|
CHECKS PERFORMED:
|
||||||
|
• Disk Space & Performance
|
||||||
|
• Memory & Swap Configuration
|
||||||
|
• Network Routes & Connectivity
|
||||||
|
• Hostname Resolution
|
||||||
|
• NFS Client Configuration
|
||||||
|
• Docker Daemon Health
|
||||||
|
• Proxmox API (if applicable)
|
||||||
|
• SSH Security Configuration
|
||||||
|
• Firewall Status
|
||||||
|
• Time Synchronization
|
||||||
|
|
||||||
|
EXAMPLES:
|
||||||
|
./validate-node.sh
|
||||||
|
Run all checks with standard output
|
||||||
|
|
||||||
|
./validate-node.sh --critical-only
|
||||||
|
Show only critical errors (useful in scripts)
|
||||||
|
|
||||||
|
./validate-node.sh --json
|
||||||
|
Output JSON for monitoring/alerting systems
|
||||||
|
|
||||||
|
./validate-node.sh --verbose
|
||||||
|
Show detailed information for each check
|
||||||
|
|
||||||
|
INTEGRATION:
|
||||||
|
JSON output can be consumed by monitoring systems (Prometheus, Grafana, etc.)
|
||||||
|
Example: ./validate-node.sh --json | jq '.errors'
|
||||||
|
|
||||||
|
=============================================================================
|
||||||
|
EOF
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- JSON OUTPUT FUNCTIONS ---
|
||||||
|
|
||||||
|
generate_json_report() {
|
||||||
|
# Generate JSON output for monitoring integration
|
||||||
|
|
||||||
|
local hostname=$(hostname)
|
||||||
|
local timestamp=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
|
||||||
|
|
||||||
|
cat <<EOF
|
||||||
|
{
|
||||||
|
"hostname": "$hostname",
|
||||||
|
"timestamp": "$timestamp",
|
||||||
|
"validation": {
|
||||||
|
"passed": $VALIDATION_PASSED,
|
||||||
|
"warnings": $VALIDATION_WARNINGS,
|
||||||
|
"errors": $VALIDATION_ERRORS
|
||||||
|
},
|
||||||
|
"status": "$([ $VALIDATION_ERRORS -eq 0 ] && echo "healthy" || echo "critical")",
|
||||||
|
"hardware_type": "$(detect_hardware_type)",
|
||||||
|
"os_family": "$(detect_os_family)"
|
||||||
|
}
|
||||||
|
EOF
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- MAIN WORKFLOW ---
|
||||||
|
|
||||||
|
main() {
|
||||||
|
# Parse CLI arguments
|
||||||
|
parse_arguments "$@"
|
||||||
|
|
||||||
|
if [ "$SHOW_HELP" == "true" ]; then
|
||||||
|
show_help
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Standard output header (unless JSON mode)
|
||||||
|
if [ "$OUTPUT_JSON" == "false" ]; then
|
||||||
|
echo "======================================="
|
||||||
|
echo "NODE VALIDATION TOOL v${VERSION}"
|
||||||
|
echo "Host: $(hostname)"
|
||||||
|
echo "Time: $(date -u +"%Y-%m-%d %H:%M:%S UTC")"
|
||||||
|
echo "======================================="
|
||||||
|
echo ""
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Run validation suite
|
||||||
|
if [ "$VERBOSE" == "true" ]; then
|
||||||
|
# Show detection summary in verbose mode
|
||||||
|
[ "$OUTPUT_JSON" == "false" ] && print_detection_summary
|
||||||
|
[ "$OUTPUT_JSON" == "false" ] && echo ""
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Run comprehensive validation
|
||||||
|
run_validation_suite
|
||||||
|
|
||||||
|
# Generate output
|
||||||
|
if [ "$OUTPUT_JSON" == "true" ]; then
|
||||||
|
generate_json_report
|
||||||
|
else
|
||||||
|
echo ""
|
||||||
|
echo "Validation completed."
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Show summary (already printed by run_validation_suite)
|
||||||
|
if [ $VALIDATION_ERRORS -gt 0 ]; then
|
||||||
|
echo "⚠️ Critical issues found - manual intervention required"
|
||||||
|
exit 1
|
||||||
|
elif [ $VALIDATION_WARNINGS -gt 0 ]; then
|
||||||
|
echo "ℹ️ Warnings present - review recommended"
|
||||||
|
exit 0
|
||||||
|
else
|
||||||
|
echo "✅ All checks passed - node is healthy"
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Exit code based on errors
|
||||||
|
[ $VALIDATION_ERRORS -eq 0 ] && exit 0 || exit 1
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- ENTRY POINT ---
|
||||||
|
|
||||||
|
main "$@"
|
||||||
Loading…
x
Reference in New Issue
Block a user