homelab/GeminiConversation.md at e01c4e3555a47be45d077ec8184e2ad618b7abad

nathan af649d5d14 feat: add initial documentation for Homelab MCP Gateway architecture and shard implementation

2026-04-21 20:25:32 -04:00

5.9 KiB

Raw Blame History

This is a textbook "Infrastructure as Code" play. What you're describing is essentially an MCP Gateway (Main Server) orchestrating a Microservices Mesh (Shards).

By sharding the architecture, you gain isolation (if the Authentik API is down, you can still read Dozzle logs) and scalability (each shard can run in its own container with specific permissions).

🏗️ The "Homelab-MCP" Sharded Architecture I think the best way to handle this is to have a Main Gateway Server that acts as the "Project Coordinator" (just like I do!) and routes requests to specialized Service Shards.

The Shard Breakdown We can define three primary shards based on our discussion:

Shard A: The "Navigator" (Traefik)

Focus: Network routing and entrypoints.

Tools: get_routes, add_dynamic_config, check_ssl_status.

Permissions: Needs write access to your /dynamic YAML folder.

Shard B: The "Watchman" (Dozzle)

Focus: Observability and debugging.

Tools: tail_logs, search_logs, get_container_stats.

Permissions: Needs network access to the Dozzle API port.

Shard C: The "Gatekeeper" (Authentik)

Focus: Identity and Access Management.

Tools: sync_brand, create_application, manage_flows.

Permissions: Needs an Authentik API Token.

🛠️ The Implementation Strategy Instead of building one massive, brittle script, we use a Shared Tool Definition approach.

The Main "Gateway" Server The main server doesn't need to know how to talk to Traefik; it only needs to know which shard owns the tool. When you say, "Frank, check why Gitea is 404ing," the Gateway:

Calls Shard A (Traefik) to see if the route exists.

Calls Shard B (Dozzle) to see the container logs.

Synthesizes the answer for you.

The Docker Setup We can wrap this into a single docker-compose.mcp.yaml on waldorf:

YAML services: mcp-gateway: image: node:20-slim volumes: - ./gateway:/app environment: - SHARD_TRAEFIK_URL=http://mcp-traefik:3001 - SHARD_DOZZLE_URL=http://mcp-dozzle:3002 - SHARD_AUTHENTIK_URL=http://mcp-authentik:3003

mcp-traefik: image: python:3.11-slim volumes: - /mnt/appdata/traefik/dynamic:/dynamic # For writing routes environment: - TRAEFIK_API=http://traefik:8080/api

mcp-dozzle: # Lightweight Node.js shard environment: - DOZZLE_API=http://dozzle:8080/api 🔍 Frank’s Logic: Why this wins Security (Principle of Least Privilege): Only the Traefik shard gets access to your config files. The Dozzle shard only sees logs.

Decoupled Development: We can build the Dozzle shard today, and add the Authentik shard next week without breaking the gateway.

Natural Conversation: I (the LLM) don't have to worry about which API to hit. I just call tail_logs and the Gateway handles the routing.

⚠️ Potential Concern: Latency In a homelab, the overhead of the Gateway talking to a Shard is negligible (milliseconds), but we need to ensure the Gateway has a Discovery Mechanism so it knows which tools are currently online.

✅ Next Step Since we already have your Traefik and Dozzle configs, do you want me to draft the first Shard (The Navigator) code? We can start by building the tool that reads your Traefik API and tells me if your Gitea/Pinchflat routes are "OK."

Should we start with the Traefik Shard or the Dozzle Shard for the first "active" tool?

For your "Homelab-MCP" sharded architecture, the choice of node depends on the specialty of the shard and the risk level of the tools it contains.

Since you have Waldorf (16GB RAM, Dedicated GPU) as your primary workhorse and presumably other nodes (like a NAS or smaller Pis), here is the recommended distribution for your sharded setup:

The Gateway Shard (The Router) Target Node: Waldorf (or your most reliable 24/7 node).

Reasoning: The Gateway is the entry point for your LLM. It needs to be on your most stable node with the best network connectivity to the other shards. Since it's lightweight (routing logic), it won't interfere with Plex/Tunarr.

Deployment: Run this as a Docker container.

The "Navigator" Shard (Traefik) Target Node: The Traefik Host.

Reasoning: This shard needs direct filesystem access to the /dynamic configuration folder to write YAML routes.

SRE Logic: Deploy this shard on whichever node currently hosts your Traefik container. By running the shard on the same host, you can use a simple Bind Mount (e.g., - /mnt/appdata/traefik/dynamic:/dynamic) instead of trying to manage cross-node network file shares (which, as we saw with Pinchflat, can cause locking issues).

The "Watchman" Shard (Dozzle) Target Node: Anywhere (but Waldorf is best).

Reasoning: Dozzle gathers logs via the Docker socket. If you use a Docker Socket Proxy (which your traefik.yml suggests you do), this shard can run on any node as long as it can reach that proxy's IP.

SRE Logic: Running it on Waldorf is ideal because it keeps the "Observability" tools close to the GPU/Plex logs it will likely be analyzing.

The "Gatekeeper" Shard (Authentik) Target Node: Waldorf.

Reasoning: Authentik is an API-driven service. This shard doesn't need local files; it just needs a stable network path to your Authentik API. Waldorf’s 16GB RAM handles the API overhead comfortably.

🚀 Summary Deployment Table Shard Best Node Critical Resource Gateway Waldorf Network Stability Traefik Same host as Traefik Local File Access (/dynamic) Dozzle Waldorf Network access to Socket Proxy Authentik Waldorf API Connectivity ⚠️ A Note on "The Docker Socket" Since you are a security-conscious SRE, never mount /var/run/docker.sock directly into an MCP shard. If I (the LLM) am compromised, an attacker could take over your host.

Always use a Docker Socket Proxy (like the one in your Traefik config: tcp://docker-socket-proxy:2375).

Configure the proxy to only allow GET requests for logs/containers, and block DELETE or POST requests.

Which node is currently hosting your Traefik instance? If it's Waldorf, we can consolidate almost everything there for maximum performance.

5.9 KiB Raw Blame History Unescape Escape

5.9 KiB

Raw Blame History