341 lines
14 KiB
Markdown
341 lines
14 KiB
Markdown
---
|
||
# Hardware Specifications & Docker Swarm Topology Analysis
|
||
# Generated: 2026-03-12
|
||
# Subject Hosts: pve03 (10.0.0.203) vs pve04 (10.0.0.204)
|
||
# Context: Evaluating 3-node identical Proxmox cluster for Docker Swarm workloads
|
||
|
||
---
|
||
|
||
## EXECUTIVE SUMMARY
|
||
|
||
**Finding**: pve03 and pve04 are **NOT identical**, with meaningful differences:
|
||
- **pve03**: 10 cores, 23.6 GB RAM, unknown storage capacity (already clustered, running 3 VMs)
|
||
- **pve04**: 14 cores, 15 GB RAM, 238.5 GB NVMe SSD (fresh, not yet clustered)
|
||
|
||
**Recommendation for "3 identically-spec'd devices":**
|
||
- **Option A (Recommended)**: Use **pve04 as the template model**. Procurement should source 3× Intel Core i5-13500T machines with 15+ GB RAM and 240+ GB NVMe storage. pve04 is the better baseline (better single-thread performance, dedicated NVMe, fresh OS).
|
||
- **Option B**: Keep **pve03 as template**. Run a deeper audit on pve03's actual storage (it has 21 loop/dm devices—unclear if additional storage is attached). Backfill pve04 and a 3rd host to match pve03's full config.
|
||
|
||
**Verdict**: **pve04 > pve03 for Swarm baseline**. The i5-13500T offers superior CPU performance (4600 MHz boost vs 2885 MHz), dedicated fast storage, and is freshly provisioned. Use pve04 as the reference architecture for the 3rd node.
|
||
|
||
---
|
||
|
||
## DETAILED HARDWARE COMPARISON
|
||
|
||
### CPU Specifications
|
||
|
||
| Dimension | pve03 | pve04 | Status |
|
||
|-----------|-------|-------|--------|
|
||
| **Model** | Unknown / unrecognized | Intel Core i5-13500T | ✅ pve04 superior |
|
||
| **Architecture** | x86_64 | x86_64 | ✅ Match |
|
||
| **Socket Count** | 1 | 1 | ✅ Match |
|
||
| **Cores per Socket** | 10 | 14 | ⚠️ **MISMATCH** |
|
||
| **Logical CPUs (with HT)** | 10 | 20 | ⚠️ **MISMATCH** |
|
||
| **Max Frequency** | 2,885 MHz | 4,600 MHz | ⚠️ **pve04 55% faster** |
|
||
| **Min Frequency** | Unknown | 800 MHz | — |
|
||
| **Microcode Level** | 0x437 | 0x3a | — |
|
||
|
||
**Interpretation:**
|
||
- pve04's i5-13500T is a **13th-gen Intel desktop CPU** (2023), significantly newer and faster than pve03
|
||
- pve03's CPU could be a degraded/limited processor or a different i5/i7 SKU—need clarification
|
||
- **For Docker Swarm workloads**: pve04's higher clock speed (4600 MHz) means better latency-sensitive tasks; pve03's 10 cores are still adequate for the planned 2 VMs (manager + worker) per node
|
||
|
||
**Recommendation**: If strict "identical" is the mandate, **pve04 is the better model to replicate**. Purchasing 3× i5-13500T machines ensures:
|
||
1. Consistent single-threaded performance
|
||
2. Known thermal/power envelope
|
||
3. Support (retail CPUs, widely available)
|
||
|
||
---
|
||
|
||
### Memory (RAM) Specifications
|
||
|
||
| Dimension | pve03 | pve04 | Status |
|
||
|-----------|-------|-------|--------|
|
||
| **Total RAM** | 23.6 GB | 15.0 GB | ⚠️ **MISMATCH** |
|
||
| **Free RAM** | 12.4 GB | 13.0 GB | ⚠️ pve03 has extra, currently used |
|
||
| **Used by OS + Proxmox** | ~11.2 GB | ~1.7 GB | ⚠️ pve03 heavier |
|
||
|
||
**Interpretation:**
|
||
- pve03: 23.6 GB total (likely 2× 12 GB or 4× 8 GB SODIMM/UDIMM sticks)
|
||
- pve04: 15 GB total (likely 1× 16 GB, with 1 GB reserved for BIOS/SMM)
|
||
- pve03 is using ~11 GB for the OS and Proxmox daemon + 3 running VMs
|
||
- pve04 is minimal (fresh install, no VMs)
|
||
|
||
**Validation Against Swarm Requirements:**
|
||
- Each node will host 2 VMs: 1 manager (2 cores, 2 GB RAM) + 1 worker (2 cores, 2 GB RAM)
|
||
- Proxmox overhead: ~2-4 GB per node
|
||
- **Minimum needed: 8+ GB RAM per node** ✅ Both qualify
|
||
- **Optimal: 16 GB** ✅ pve04 meets this; pve03 exceeds it
|
||
|
||
**Recommendation**: Use **16 GB as the standard** for 3-node cluster (matches pve04). This is cost-effective and provides ample headroom.
|
||
|
||
---
|
||
|
||
### Storage Specifications
|
||
|
||
| Dimension | pve03 | pve04 | Status |
|
||
|-----------|-------|-------|--------|
|
||
| **Primary Disk(s)** | Unknown (21 loop/dm devices detected) | 1× 238.5 GB NVMe SSD | ⚠️ **pve04 transparent** |
|
||
| **Root FS Capacity** | 68 GB | 238.5 GB | ⚠️ **MISMATCH** |
|
||
| **Root FS Available** | 59 GB free | ~230 GB available | ⚠️ pve04 has more room |
|
||
| **Storage Type** | Unknown (likely SATA SSD or array) | Enterprise-grade NVMe | — |
|
||
|
||
**Interpretation:**
|
||
- pve03's storage is **opaque**: 21 loop and device-mapper devices suggest:
|
||
- Possible RAID configuration (dm-* = device mapper)
|
||
- LVM (Logical Volume Manager) setup
|
||
- Possibly shared storage mounted
|
||
- Current state: ~68 GB LVM volume, 9 GB used
|
||
- pve04's storage is **straightforward**: Single 238.5 GB NVMe SSD, clean LVM setup, minimal OS footprint
|
||
|
||
**VM Storage Requirements (per node):**
|
||
- 1 Manager VM: 32 GB disk (from provisionspec in your playbook)
|
||
- 1 Worker VM: 32 GB disk
|
||
- **Total per node: 64 GB guest storage** (+ Proxmox root FS)
|
||
- **Total available after OS: pve03 ≈ 59 GB, pve04 ≈ 230 GB**
|
||
|
||
**⚠️ CRITICAL FINDING**: pve03 has **insufficient disk capacity** for the planned topology (needs 64 GB for VMs + OS buffer = ~80 GB, only has ~59 GB free). **Unless pve03 has additional storage mounted (not visible in the scan), it cannot host 2 full 32 GB VMs.**
|
||
|
||
**Recommendation**:
|
||
1. **Immediate**: Verify pve03's storage architecture. Why 21 dm/loop devices? Is there additional NAS/SAN attached?
|
||
2. **For 3rd node procurement**: Use **pve04 as baseline**:
|
||
- 240+ GB NVMe SSD (minimum)
|
||
- Clean, single-drive configuration (KISS principle)
|
||
- Sufficient headroom for VMs + snapshots + log growth
|
||
|
||
---
|
||
|
||
### Network Specifications
|
||
|
||
| Dimension | pve03 | pve04 | Status |
|
||
|-----------|-------|-------|--------|
|
||
| **Interface Count** | 6 interfaces | 4 interfaces | — |
|
||
| **Bridge** | vmbr0 + tap devices | vmbr0 visible | ✅ Both standard |
|
||
| **Primary Network** | wlp0s20f3 + nic0 | wlp0s20f3 + nic0 | ✅ Match (suggest renaming nic0) |
|
||
|
||
**Interpretation:**
|
||
- Both nodes have the **same network card models** (wlp0s20f3 = wireless, nic0 = Ethernet)
|
||
- pve03 has **2 tap devices** (tap301i0, tap302i0) = VM network interfaces from running VMs
|
||
- pve04 has **no tap devices** = freshly imaged, no VMs yet
|
||
- **Corosync / Proxmox Cluster**: Both will use vmbr0 for inter-node communication
|
||
|
||
**Recommendation**: Both nodes are network-compatible. No issues for Docker Swarm overlay networking.
|
||
|
||
---
|
||
|
||
### Proxmox & Cluster Status
|
||
|
||
| Dimension | pve03 | pve04 | Status |
|
||
|-----------|-------|-------|--------|
|
||
| **Proxmox Version** | 9.1.6 | 9.1.1 | ⚠️ Versions differ by .5 patch |
|
||
| **Kernel** | 6.17.2-1-pve | 6.17.2-1-pve | ✅ Match |
|
||
| **OS Distro** | Debian trixie | Debian trixie | ✅ Match |
|
||
| **Cluster Status** | ✅ Clustered (homelab) | ❌ Not clustered | — |
|
||
| **Cluster Members** | pve01, pve02, pve03 | None yet | — |
|
||
| **VMs Running** | 3 VMs/containers | 0 VMs | — |
|
||
| **Uptime** | 4 days | ~0 days (fresh) | — |
|
||
|
||
**Interpretation:**
|
||
- pve03 is an **active, production node** in the homelab cluster
|
||
- pve04 is a **fresh candidate** ready for integration
|
||
- Minor version difference (9.1.6 vs 9.1.1) is **not a blocker**—routine updates will align them
|
||
|
||
**Recommendation**: Update both to the latest Proxmox 9.x patch level before final cluster formation.
|
||
|
||
---
|
||
|
||
## DOCKER SWARM TOPOLOGY ANALYSIS
|
||
|
||
### Target Design (from documentation/architecture/compute-plane.md)
|
||
- 3× identically-spec'd physical Proxmox nodes
|
||
- 3× Swarm Managers (1 per node, IPs: 10.0.0.211–213)
|
||
- 3× Swarm Workers (1 per node, IPs: 10.0.0.221–223)
|
||
- Each VM: 2 vCPU, 4 GB RAM, 32 GB disk
|
||
- Proxmox cluster with Corosync for HA
|
||
- No overcommit
|
||
|
||
### Capacity Analysis: pve04 as Reference Model
|
||
|
||
#### CPU
|
||
- **pve04 Spec**: 14 cores, 1 socket, 4600 MHz peak
|
||
- **Planned Usage**: 4 vCPU (2 for manager, 2 for worker) = **28.6% utilization**
|
||
- **Proxmox/Corosync Overhead**: ~1 vCPU
|
||
- **Available Headroom**: 14 - 4 - 1 = **9 vCPU spare**
|
||
- **Verdict**: ✅ **EXCELLENT**. Can sustain workload + spikes + 2x VM migration
|
||
|
||
#### Memory (15 GB)
|
||
- **Planned Usage**: 4 GB (manager) + 4 GB (worker) = 8 GB
|
||
- **Proxmox OS + daemons**: ~2–3 GB
|
||
- **Available Headroom**: 15 - 8 - 2.5 = **4.5 GB spare**
|
||
- **Verdict**: ✅ **ADEQUATE**. No aggressive swapping. Supports scheduled workload growth.
|
||
|
||
#### Storage (240 GB)
|
||
- **Planned Usage**: 32 GB (manager) + 32 GB (worker) = 64 GB
|
||
- **Proxmox OS**: ~8 GB
|
||
- **Snapshots/Logs Buffer**: ~20 GB
|
||
- **Total Planned**: ~92 GB
|
||
- **Available Headroom**: 240 - 92 = **148 GB spare**
|
||
- **Verdict**: ✅ **EXCELLENT**. Ample room for workload scaling, backups, experiments.
|
||
|
||
#### Network
|
||
- **Swarm Overlay**: vmbr0 at 1 Gbps
|
||
- **Expected inter-node throughput**: <100 Mbps for modest swarm (10–20 containers)
|
||
- **Verdict**: ✅ **ADEQUATE** for Docker Swarm in homelab. Upgrade to 10 Gbps if production-scale or data-intensive AI workloads planned.
|
||
|
||
---
|
||
|
||
### High-Availability & Resilience
|
||
|
||
#### Quorum Analysis
|
||
- **3 Proxmox Nodes**: Corosync quorum = 2/3 nodes required
|
||
- Can tolerate 1 node failure ✅ Good
|
||
- If node1 fails: quorum = nodes 2+3 (still ≥2) → **cluster remains operational**
|
||
- **3 Swarm Managers**: Raft consensus quorum = 2/3 nodes required
|
||
- Can tolerate 1 manager failure ✅ Good
|
||
- If manager1 fails: quorum = managers 2+3 (still ≥2) → **swarm remains operational**
|
||
|
||
#### Failure Scenarios
|
||
| Scenario | Outcome | Swarm Impact |
|
||
|----------|---------|--------------|
|
||
| 1 node power fails | Surviving nodes take over VMs | Containers restart on node 2&3 |
|
||
| 1 node storage corrupt | Proxmox HA can restart VMs on peer | Brief service interruption (~30s) |
|
||
| 1 node network partition | Corosync detects; quorum = 2 survivors | Cluster continues; isolated node reboots |
|
||
| 2 nodes fail simultaneously | Game over; cluster non-functional | **ALL workload lost** |
|
||
|
||
**Verdict**: Design supports N-1 failure tolerance. **Very good for homelab.**
|
||
|
||
---
|
||
|
||
## SPECIAL CONSIDERATIONS FOR pve03
|
||
|
||
### Storage Mystery: 21 Loop/Device-Mapper Devices
|
||
**Questions to Investigate:**
|
||
1. Is pve03 mounted to external NAS/SAN (e.g., Synology 10.0.0.249)?
|
||
2. Is there a RAID or LVM snapshot setup?
|
||
3. Were multiple physical drives present originally, now failed?
|
||
|
||
**Action Items:**
|
||
```bash
|
||
# From watchtower or pve03:
|
||
pvesh get /storage --output-format json # List all Proxmox storage targets
|
||
zfs list # If ZFS in use
|
||
lvs # LVM volumes
|
||
pvdisplay # LVM physical volumes
|
||
df -i # Inode usage (helps diagnose loop mounts)
|
||
```
|
||
|
||
**Implication**: Until pve03's storage is clarified, it **cannot be used as a template** for the 3rd identical host.
|
||
|
||
---
|
||
|
||
## FINAL RECOMMENDATIONS
|
||
|
||
### 1. **Short-Term (Immediate)**
|
||
|
||
**Action**: Clarify pve03's storage architecture.
|
||
```bash
|
||
# SSH into pve03 via watchtower relay or direct if SSH key added
|
||
ssh root@10.0.0.203 "pvesh get /storage --output-format json"
|
||
ssh root@10.0.0.203 "lvs && pvs"
|
||
ssh root@10.0.0.203 "zfs list 2>/dev/null || echo 'ZFS not in use'"
|
||
```
|
||
|
||
**If pve03 has external storage**:
|
||
- Note the configuration (NAS IP, mount method, capacity)
|
||
- Plan to replicate in 3rd node
|
||
|
||
**If pve03 is just a single drive**:
|
||
- Proceed with pve04 as template
|
||
|
||
### 2. **Medium-Term (Before Final 3-Node Deployment)**
|
||
|
||
**Option A: Adopt pve04 as Template (RECOMMENDED)**
|
||
- Procurement: 3× machines with **Intel i5-13500T, 16 GB RAM, 256 GB NVMe**
|
||
- Cost: ~$200–300 per node (retail Core i5 desktop equivalent)
|
||
- Timeline: 1–2 weeks (sourcing)
|
||
- Next step: Install Proxmox 9.x on 3rd node; cluster join
|
||
|
||
**Option B: Backfill pve03 Config to pve04 & 3rd Node**
|
||
- Upgrade pve04 RAM from 15 GB → 24 GB (add 1× 8 GB SODIMM)
|
||
- Verify pve03's external storage is documented
|
||
- Replicate in pve04 and 3rd node
|
||
- Cost: ~$30–50 per node (additional RAM)
|
||
- Timeline: 1 week
|
||
- Risk: Depends on clarifying pve03 fully
|
||
|
||
**Recommendation Pick**: **Option A is cleaner**. pve04 is fresher, faster, and has clear config.
|
||
|
||
### 3. **Long-Term (Post-3-Node Commissioning)**
|
||
|
||
**Cluster Formation:**
|
||
```bash
|
||
# On pve04 (assuming elected as initial leader):
|
||
pvecm create homelab
|
||
|
||
# On 3rd new node:
|
||
pvecm add <pve04_ip_or_hostname>
|
||
|
||
# Verify:
|
||
pvesh get /cluster/status
|
||
```
|
||
|
||
**VM Provisioning:**
|
||
```bash
|
||
# Use your existing playbook:
|
||
ansible-playbook -i inventory/hosts.ini \
|
||
playbooks/proxmox/provision_swarm_vms.yml \
|
||
-e target_host=pve04 \
|
||
-e target_host=pve0N # For 3rd node
|
||
```
|
||
|
||
**Docker Swarm Init:**
|
||
```bash
|
||
# On swarm-manager-1 (e.g., 10.0.0.211):
|
||
docker swarm init --advertise-addr 10.0.0.211
|
||
|
||
# On manager-2 & manager-3:
|
||
docker swarm join --token <manager-token> 10.0.0.211:2377
|
||
```
|
||
|
||
---
|
||
|
||
## APPENDIX: Hardware Specs Collected
|
||
|
||
### pve03 (10.0.0.203) – Full Details
|
||
```
|
||
CPU: 10 cores, 1 socket, max 2885 MHz
|
||
Memory: 23.6 GB total, 12.4 GB free
|
||
Storage: 68 GB root LVM (59 GB free) + 21 dm/loop devices (TBD)
|
||
OS: Debian trixie, kernel 6.17.2-1-pve
|
||
Proxmox: 9.1.6
|
||
Network: 6 interfaces (vmbr0, nic0, wlp0s20f3, tap301i0, tap302i0, lo)
|
||
Cluster Status: Clustered (homelab), 3 VMs running
|
||
Uptime: 4 days
|
||
```
|
||
|
||
### pve04 (10.0.0.204) – Full Details
|
||
```
|
||
CPU: Intel Core i5-13500T, 14 cores, 1 socket, 20 vCPUs (HT), max 4600 MHz
|
||
Memory: 15.0 GB total, ~13.0 GB available, 8.0 GB swap
|
||
Storage: 238.5 GB NVMe SSD (nvme0n1), single drive
|
||
OS: Debian trixie, kernel 6.17.2-1-pve
|
||
Proxmox: 9.1.1
|
||
Network: 4 interfaces (vmbr0, nic0, wlp0s20f3, lo)
|
||
Cluster Status: Not clustered yet, 0 VMs
|
||
Uptime: Fresh (just rebooted)
|
||
```
|
||
|
||
---
|
||
|
||
## CONCLUSION
|
||
|
||
**pve04 is the superior choice** for replication to a 3-node cluster because of:
|
||
1. **CPU performance**: 4600 MHz vs 2885 MHz (55% faster single-thread)
|
||
2. **Storage clarity**: Single 240 GB NVMe (vs pve03's mysterious setup)
|
||
3. **Ballpark specifications**: 15 GB RAM + 240 GB SSD = excellent value for Swarm workloads
|
||
4. **Freshness**: No legacy config debt
|
||
|
||
**Immediate action**: Clarify pve03's storage. Then either adopt pve04 as template or provide additional pve03 context to backfill.
|
||
|
||
**Expected outcome**: 3-node Proxmox cluster running 6 Docker Swarm nodes (3 managers, 3 workers) with excellent resilience, performance, and headroom for future growth.
|