# Homelab Specs --- ## Hardware ### Dell OptiPlex 7070 - **Role**: kube-node-1 (control-plane + worker), bare metal - **IP**: 192.168.2.100 - **SSH**: `dan@192.168.2.100` - **CPU**: Intel Core i5-9500, 6c/6t, 3.0 GHz base / 4.4 GHz boost, 9 MB L3, 65W TDP, VT-x - **RAM**: 16 GB DDR4 2666 MT/s DIMM - **Storage**: - `nvme0`: Samsung PM991 256 GB — 1G EFI, 2G /boot, 235.4G LVM (100G → /) - `sda`: Seagate Expansion 2 TB → `/data/photos` (ext4) - `sdb`: Seagate Expansion+ 2 TB → `/mnt/sdb-ro` (ext4, **READ-ONLY — never touch**) - `sdc1`: Seagate Expansion 1 TB → `/data/media` (ext4) - `sdc2`: Seagate Expansion 788 GB → `/data/games` (ext4) - `sdd`: Samsung HD103SI 1 TB → `/data/owncloud` (ext4) - `sde`: Hitachi HTS545050 500 GB → `/data/infra` (ext4) - `sdf`: Seagate 1 TB → `/data/ai` (ext4) - **Total**: ~7 TB - **Network**: 1 Gbit/s - **NFS server**: exports `/data/{games,media,photos,owncloud,infra,ai}` to LAN ### HP ProLiant DL360 G7 - **Role**: Proxmox hypervisor (192.168.2.193) - **SSH**: `root@192.168.2.193` (local id_rsa) - **Web UI**: https://proxmox.vandachevici.ro - **Storage**: - 2× HPE SAS 900 GB in RAID 1+0 → 900 GB usable (Proxmox OS) - 4× HPE SAS 900 GB in RAID 1+0 → 1.8 TB usable (VM disks) - Promise VTrak J830s: 2× 16 TB → `media-pool` (ZFS, ~14 TB usable) - **Total**: ~18 TB ### Promise VTrak J830s - Connected to HP ProLiant via SAS - 2× 16 TB disks, ZFS pool `media-pool` - ZFS datasets mounted at `/data/X` on HP (matching Dell paths) --- ## Storage Layout ### Dell `/data` drives (primary/local) | Mount | Device | Size | Contents | |---|---|---|---| | `/data/games` | sdc2 | 788 GB | Game server worlds and kits | | `/data/media` | sdc1 | 1.1 TB | Jellyfin media library | | `/data/photos` | sda | 916 GB | Immich photo library | | `/data/owncloud` | sdd | 916 GB | OwnCloud files | | `/data/infra` | sde | 458 GB | Prometheus, infra data | | `/data/ai` | sdf | 916 GB | Paperclip, Ollama models | | `/mnt/sdb-ro` | sdb | 1.8 TB | **READ-ONLY** archive — never modify | ### HP VTrak ZFS datasets (HA mirrors) | ZFS Dataset | Mountpoint on HP | NFS export | |---|---|---| | media-pool/jellyfin | `/data/media` | ✅ | | media-pool/immich | `/data/photos` | ✅ | | media-pool/owncloud | `/data/owncloud` | ✅ | | media-pool/games | `/data/games` | ✅ | | media-pool/minecraft | `/data/games/minecraft` | ✅ | | media-pool/factorio | `/data/games/factorio` | ✅ | | media-pool/openttd | `/data/games/openttd` | ✅ | | media-pool/infra | `/data/infra` | ✅ | | media-pool/ai | `/data/ai` | ✅ | Legacy bind mounts at `/media-pool/X` → `/data/X` preserved for K8s PV compatibility. ### Cross-mounts (HA access) | From | Mount point | To | |---|---|---| | Dell | `/mnt/hp/data-{games,media,photos,owncloud,infra,ai}` | HP VTrak NFS | | HP | `/mnt/dell/data-{games,media,photos,owncloud,infra,ai}` | Dell NFS | --- ## VMs on HP ProLiant (Proxmox) | VM ID | Name | IP | RAM | Role | |---|---|---|---|---| | 100 | kube-node-2 | 192.168.2.195 | 16 GB | K8s worker | | 101 | kube-node-3 | 192.168.2.196 | 16 GB | K8s control-plane + worker | | 103 | kube-arbiter | 192.168.2.200 | 6 GB | K8s control-plane (etcd + API server, NoSchedule) | | 104 | local-ai | 192.168.2.88 | — | Ollama + openclaw-gateway (Tesla P4 GPU passthrough) | | 106 | ansible-control | 192.168.2.70 | — | Ansible control node | | 107 | remote-ai | 192.168.2.91 | — | openclaw-gateway (remote, cloud AI) | ⚠️ kube-node-2, kube-node-3, and kube-arbiter are all VMs on the HP ProLiant. HP ProLiant failure = loss of 3/4 K8s nodes simultaneously. Mitigation: add a Raspberry Pi 4/5 (8 GB) as a 4th physical host. SSH: `dan@` for all VMs --- ## Kubernetes Cluster - **Version**: 1.32.13 - **CNI**: Flannel - **Dashboard**: https://192.168.2.100:30443 (self-signed cert, token auth) - **Token file**: `/home/dan/homelab/kube/cluster/DASHBOARD-ACCESS.txt` - **StorageClass**: `local-storage` (hostPath on kube-node-1) - **NFS provisioners**: `nfs-provisioners` namespace (nfs-subdir-external-provisioner) ### Nodes | Node | Role | IP | Host | |---|---|---|---| | kube-node-1 | control-plane + worker | 192.168.2.100 | Dell OptiPlex 7070 (bare metal) | | kube-node-2 | worker | 192.168.2.195 | VM on HP ProLiant (16 GB RAM) | | kube-node-3 | control-plane + worker | 192.168.2.196 | VM on HP ProLiant (16 GB RAM) | | kube-arbiter | control-plane | 192.168.2.200 | VM on HP ProLiant (1c/6GB, tainted NoSchedule) | **etcd**: 3 members (kube-node-1 + kube-arbiter + kube-node-3) — quorum survives 1 member failure ✅ **controlPlaneEndpoint**: `192.168.2.100:6443` ⚠️ SPOF — kube-vip (Phase 1b) not yet deployed; if kube-node-1 goes down, workers lose API access even though kube-arbiter and kube-node-3 API servers are still running --- ## High Availability Status ### Control Plane | Component | Status | Notes | |---|---|---| | etcd | ✅ 3 members | kube-node-1 + kube-arbiter + kube-node-3; tolerates 1 failure | | API server VIP | ⚠️ Not yet deployed | controlPlaneEndpoint hardcoded to 192.168.2.100; kube-vip (Phase 1b) pending | | CoreDNS | ✅ Required anti-affinity | Pods spread across different nodes (kube-node-1 + kube-node-2) | ### Workloads (replicas=2, required pod anti-affinity) | Service | Replicas | PDB | |---|---|---| | authentik-server | 2 | ✅ | | authentik-worker | 2 | ✅ | | cert-manager | 2 | ✅ | | cert-manager-webhook | 2 | ✅ | | cert-manager-cainjector | 2 | ✅ | | parts-api | 2 | ✅ | | parts-ui | 2 | ✅ | | ha-sync-ui | 2 | ✅ | | games-console-backend | 2 | ✅ | | games-console-ui | 2 | ✅ | | ingress-nginx | DaemonSet | ✅ (runs on all workers) | ### Storage | PV | Type | Notes | |---|---|---| | paperclip-data-pv | NFS (192.168.2.252) | ✅ Migrated from hostPath; can schedule on any node | | prometheus-storage-pv | hostPath on kube-node-1 | ⚠️ Still pinned to kube-node-1 (out of scope) | ### Known Remaining SPOFs | Risk | Description | Mitigation | |---|---|---| | HP ProLiant physical host | kube-node-2/3 + kube-arbiter are all HP VMs | Add Raspberry Pi 4/5 (8 GB) as 4th physical host | | controlPlaneEndpoint | Hardcoded to kube-node-1 IP | Deploy kube-vip with VIP (e.g. 192.168.2.50) | --- ### games | Service | NodePort | Storage | |---|---|---| | minecraft-home | 31112 | HP NFS `/data/games/minecraft` | | minecraft-cheats | 31111 | HP NFS `/data/games/minecraft` | | minecraft-creative | 31559 | HP NFS `/data/games/minecraft` | | minecraft-johannes | 31563 | HP NFS `/data/games/minecraft` | | minecraft-noah | 31560 | HP NFS `/data/games/minecraft` | | Factorio | — | HP NFS `/data/games/factorio` | | OpenTTD | — | HP NFS `/data/games/openttd` | Minecraft operators: LadyGisela5, tomgates24, anutzalizuk, toranaga_samma ### monitoring - **Helm release**: `obs`, chart `prometheus-community/kube-prometheus-stack` - **Values file**: `/home/dan/homelab/deployment/helm/prometheus/prometheus-helm-values.yaml` - **Components**: Prometheus, Grafana, AlertManager, Node Exporter, Kube State Metrics - **Grafana**: NodePort 31473 → http://192.168.2.100:31473 - **Storage**: 100 Gi hostPath PV at `/data/infra/prometheus` on kube-node-1 ### infrastructure - General MySQL/MariaDB (StatefulSet) — HP NFS `/media-pool/general-db` - Speedtest Tracker — HP NFS `/media-pool/speedtest` - DNS updater (DaemonSet, `tunix/digitalocean-dyndns`) — updates DigitalOcean DNS - Proxmox ingress → 192.168.2.193:8006 ### storage - **OwnCloud** (`owncloud/server:10.12`) — drive.vandachevici.ro, admin: sefu - MariaDB (StatefulSet), Redis (Deployment), OwnCloud server (2 replicas) - Storage: HP NFS `/data/owncloud` ### media - **Jellyfin** — media.vandachevici.ro, storage: HP NFS `/data/media` - **Immich** — photos.vandachevici.ro, storage: HP NFS `/data/photos` - Components: server (2 replicas), ML (2 replicas), valkey, postgresql ### iot - IoT MySQL (StatefulSet, db: `iot_db`) - IoT API (`iot-api:latest`, NodePort 30800) — requires `topology.homelab/server: dell` label ### ai - **Paperclip** — paperclip.vandachevici.ro - Embedded PostgreSQL at `/data/ai/paperclip/instances/default/db` - Config: `/data/ai/paperclip/instances/default/config.json` - NFS PV via keepalived VIP `192.168.2.252:/data/ai/paperclip` (can schedule on any node) ✅ - Env: `PAPERCLIP_AGENT_JWT_SECRET` (in K8s secret) --- ## AI / OpenClaw ### local-ai VM (192.168.2.88) — GPU instance - **GPU**: NVIDIA Tesla P4, 8 GB VRAM (PCIe passthrough from Proxmox) - VFIO: `/etc/modprobe.d/vfio.conf` ids=10de:1bb3, allow_unsafe_interrupts=1 - initramfs updated for persistence - **Ollama**: listening on `0.0.0.0:11434`, models at `/data/ollama/models` - Loaded: `qwen3:8b` (5.2 GB) - **openclaw-gateway**: `ws://0.0.0.0:18789`, auth mode: token - Token: in `~/.openclaw/openclaw.json` → `gateway.auth.token` - Systemd: `openclaw-gateway.service` (Type=simple, enabled) ### remote-ai VM (192.168.2.91) - **openclaw-gateway**: installed (v2026.3.13), config at `~/.openclaw/openclaw.json` - Uses cloud AI providers (Claude API key required) ### Connecting Paperclip to openclaw - URL: `ws://192.168.2.88:18789/` - Auth: token from `~/.openclaw/openclaw.json` → `gateway.auth.token` --- ## Network Endpoints | Service | URL / Address | |---|---| | K8s Dashboard | https://192.168.2.100:30443 | | Proxmox UI | https://proxmox.vandachevici.ro | | Grafana | http://192.168.2.100:31473 | | Jellyfin | https://media.vandachevici.ro | | Immich (photos) | https://photos.vandachevici.ro | | OwnCloud | https://drive.vandachevici.ro | | Paperclip | https://paperclip.vandachevici.ro | | IoT API | http://192.168.2.100:30800 | | minecraft-home | 192.168.2.100:31112 | | minecraft-cheats | 192.168.2.100:31111 | | minecraft-creative | 192.168.2.100:31559 | | minecraft-johannes | 192.168.2.100:31563 | | minecraft-noah | 192.168.2.100:31560 | | Ollama (local-ai) | http://192.168.2.88:11434 | | openclaw gateway (local-ai) | ws://192.168.2.88:18789 | | Ollama (Dell) | http://192.168.2.100:11434 | ### DNS subdomains managed (DigitalOcean) `photos`, `backup`, `media`, `chat`, `openttd`, `excalidraw`, `prv`, `drive`, `grafana`, `paperclip`, `proxmox` --- ## Common Operations ### Apply manifests ```bash kubectl apply -f /home/dan/homelab/deployment// ``` ### Prometheus (Helm) ```bash helm upgrade obs prometheus-community/kube-prometheus-stack \ -n monitoring \ -f /home/dan/homelab/deployment/helm/prometheus/prometheus-helm-values.yaml ``` ### NFS provisioners (Helm) ```bash # Example: jellyfin helm upgrade nfs-jellyfin nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \ -n nfs-provisioners \ -f /home/dan/homelab/deployment/helm/nfs-provisioners/values-jellyfin.yaml ``` ### Troubleshooting: Flannel CNI after reboot If all pods stuck in `ContainerCreating` after reboot: ```bash # 1. Check default route exists on kube-node-1 ip route show | grep default # Fix: sudo ip route add default via 192.168.2.1 dev eno1 # Persist: check /etc/netplan/00-installer-config.yaml has routes section # 2. Restart flannel pod on node-1 kubectl delete pod -n kube-flannel -l app=flannel --field-selector spec.nodeName=kube-node-1 ``` ### Troubleshooting: kube-node-3 NotReady after reboot Likely swap re-enabled: ```bash ssh dan@192.168.2.196 "sudo swapoff -a && sudo sed -i 's|^/swap.img|#/swap.img|' /etc/fstab && sudo systemctl restart kubelet" ``` --- ## Workspace Structure ``` /home/dan/homelab/ ├── HOMELAB.md — this file ├── plan.md — original rebuild plan ├── step-by-step.md — execution tracker ├── deployment/ — K8s manifests and Helm values │ ├── 00-namespaces.yaml │ ├── ai/ — Paperclip │ ├── default/ — DNS updater │ ├── games/ — Minecraft, Factorio, OpenTTD │ ├── helm/ — Helm values (prometheus, nfs-provisioners) │ ├── infrastructure/ — ingress-nginx, cert-manager, general-db, speedtest, proxmox-ingress │ ├── iot/ — IoT DB + API │ ├── media/ — Jellyfin, Immich │ ├── monitoring/ — (managed by Helm) │ └── storage/ — OwnCloud ├── backups/ — K8s secrets backup (gitignored) ├── hardware/ — hardware spec docs ├── orchestration/ │ └── ansible/ — playbooks, inventory, group_vars, cloud-init └── services/ └── device-inventory/ — C++ CMake project: network device discovery ```