- Add .gitignore: exclude compiled binaries, build artifacts, and Helm values files containing real secrets (authentik, prometheus) - Add all Kubernetes deployment manifests (deployment/) - Add services source code: ha-sync, device-inventory, games-console, paperclip, parts-inventory - Add Ansible orchestration: playbooks, roles, inventory, cloud-init - Add hardware specs, execution plans, scripts, HOMELAB.md - Add skills/homelab/SKILL.md + skills/install.sh to preserve Copilot skill - Remove previously-tracked inventory-cli binary from git index Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
323 lines
12 KiB
Markdown
323 lines
12 KiB
Markdown
# Homelab Specs
|
||
|
||
---
|
||
|
||
## Hardware
|
||
|
||
### Dell OptiPlex 7070
|
||
- **Role**: kube-node-1 (control-plane + worker), bare metal
|
||
- **IP**: 192.168.2.100
|
||
- **SSH**: `dan@192.168.2.100`
|
||
- **CPU**: Intel Core i5-9500, 6c/6t, 3.0 GHz base / 4.4 GHz boost, 9 MB L3, 65W TDP, VT-x
|
||
- **RAM**: 16 GB DDR4 2666 MT/s DIMM
|
||
- **Storage**:
|
||
- `nvme0`: Samsung PM991 256 GB — 1G EFI, 2G /boot, 235.4G LVM (100G → /)
|
||
- `sda`: Seagate Expansion 2 TB → `/data/photos` (ext4)
|
||
- `sdb`: Seagate Expansion+ 2 TB → `/mnt/sdb-ro` (ext4, **READ-ONLY — never touch**)
|
||
- `sdc1`: Seagate Expansion 1 TB → `/data/media` (ext4)
|
||
- `sdc2`: Seagate Expansion 788 GB → `/data/games` (ext4)
|
||
- `sdd`: Samsung HD103SI 1 TB → `/data/owncloud` (ext4)
|
||
- `sde`: Hitachi HTS545050 500 GB → `/data/infra` (ext4)
|
||
- `sdf`: Seagate 1 TB → `/data/ai` (ext4)
|
||
- **Total**: ~7 TB
|
||
- **Network**: 1 Gbit/s
|
||
- **NFS server**: exports `/data/{games,media,photos,owncloud,infra,ai}` to LAN
|
||
|
||
### HP ProLiant DL360 G7
|
||
- **Role**: Proxmox hypervisor (192.168.2.193)
|
||
- **SSH**: `root@192.168.2.193` (local id_rsa)
|
||
- **Web UI**: https://proxmox.vandachevici.ro
|
||
- **Storage**:
|
||
- 2× HPE SAS 900 GB in RAID 1+0 → 900 GB usable (Proxmox OS)
|
||
- 4× HPE SAS 900 GB in RAID 1+0 → 1.8 TB usable (VM disks)
|
||
- Promise VTrak J830s: 2× 16 TB → `media-pool` (ZFS, ~14 TB usable)
|
||
- **Total**: ~18 TB
|
||
|
||
### Promise VTrak J830s
|
||
- Connected to HP ProLiant via SAS
|
||
- 2× 16 TB disks, ZFS pool `media-pool`
|
||
- ZFS datasets mounted at `/data/X` on HP (matching Dell paths)
|
||
|
||
---
|
||
|
||
## Storage Layout
|
||
|
||
### Dell `/data` drives (primary/local)
|
||
| Mount | Device | Size | Contents |
|
||
|---|---|---|---|
|
||
| `/data/games` | sdc2 | 788 GB | Game server worlds and kits |
|
||
| `/data/media` | sdc1 | 1.1 TB | Jellyfin media library |
|
||
| `/data/photos` | sda | 916 GB | Immich photo library |
|
||
| `/data/owncloud` | sdd | 916 GB | OwnCloud files |
|
||
| `/data/infra` | sde | 458 GB | Prometheus, infra data |
|
||
| `/data/ai` | sdf | 916 GB | Paperclip, Ollama models |
|
||
| `/mnt/sdb-ro` | sdb | 1.8 TB | **READ-ONLY** archive — never modify |
|
||
|
||
### HP VTrak ZFS datasets (HA mirrors)
|
||
| ZFS Dataset | Mountpoint on HP | NFS export |
|
||
|---|---|---|
|
||
| media-pool/jellyfin | `/data/media` | ✅ |
|
||
| media-pool/immich | `/data/photos` | ✅ |
|
||
| media-pool/owncloud | `/data/owncloud` | ✅ |
|
||
| media-pool/games | `/data/games` | ✅ |
|
||
| media-pool/minecraft | `/data/games/minecraft` | ✅ |
|
||
| media-pool/factorio | `/data/games/factorio` | ✅ |
|
||
| media-pool/openttd | `/data/games/openttd` | ✅ |
|
||
| media-pool/infra | `/data/infra` | ✅ |
|
||
| media-pool/ai | `/data/ai` | ✅ |
|
||
|
||
Legacy bind mounts at `/media-pool/X` → `/data/X` preserved for K8s PV compatibility.
|
||
|
||
### Cross-mounts (HA access)
|
||
| From | Mount point | To |
|
||
|---|---|---|
|
||
| Dell | `/mnt/hp/data-{games,media,photos,owncloud,infra,ai}` | HP VTrak NFS |
|
||
| HP | `/mnt/dell/data-{games,media,photos,owncloud,infra,ai}` | Dell NFS |
|
||
|
||
---
|
||
|
||
## VMs on HP ProLiant (Proxmox)
|
||
|
||
| VM ID | Name | IP | RAM | Role |
|
||
|---|---|---|---|---|
|
||
| 100 | kube-node-2 | 192.168.2.195 | 16 GB | K8s worker |
|
||
| 101 | kube-node-3 | 192.168.2.196 | 16 GB | K8s control-plane + worker |
|
||
| 103 | kube-arbiter | 192.168.2.200 | 6 GB | K8s control-plane (etcd + API server, NoSchedule) |
|
||
| 104 | local-ai | 192.168.2.88 | — | Ollama + openclaw-gateway (Tesla P4 GPU passthrough) |
|
||
| 106 | ansible-control | 192.168.2.70 | — | Ansible control node |
|
||
| 107 | remote-ai | 192.168.2.91 | — | openclaw-gateway (remote, cloud AI) |
|
||
|
||
⚠️ kube-node-2, kube-node-3, and kube-arbiter are all VMs on the HP ProLiant. HP ProLiant failure = loss of 3/4 K8s nodes simultaneously. Mitigation: add a Raspberry Pi 4/5 (8 GB) as a 4th physical host.
|
||
|
||
SSH: `dan@<ip>` for all VMs
|
||
|
||
---
|
||
|
||
## Kubernetes Cluster
|
||
|
||
- **Version**: 1.32.13
|
||
- **CNI**: Flannel
|
||
- **Dashboard**: https://192.168.2.100:30443 (self-signed cert, token auth)
|
||
- **Token file**: `/home/dan/homelab/kube/cluster/DASHBOARD-ACCESS.txt`
|
||
- **StorageClass**: `local-storage` (hostPath on kube-node-1)
|
||
- **NFS provisioners**: `nfs-provisioners` namespace (nfs-subdir-external-provisioner)
|
||
|
||
### Nodes
|
||
|
||
| Node | Role | IP | Host |
|
||
|---|---|---|---|
|
||
| kube-node-1 | control-plane + worker | 192.168.2.100 | Dell OptiPlex 7070 (bare metal) |
|
||
| kube-node-2 | worker | 192.168.2.195 | VM on HP ProLiant (16 GB RAM) |
|
||
| kube-node-3 | control-plane + worker | 192.168.2.196 | VM on HP ProLiant (16 GB RAM) |
|
||
| kube-arbiter | control-plane | 192.168.2.200 | VM on HP ProLiant (1c/6GB, tainted NoSchedule) |
|
||
|
||
**etcd**: 3 members (kube-node-1 + kube-arbiter + kube-node-3) — quorum survives 1 member failure ✅
|
||
**controlPlaneEndpoint**: `192.168.2.100:6443` ⚠️ SPOF — kube-vip (Phase 1b) not yet deployed; if kube-node-1 goes down, workers lose API access even though kube-arbiter and kube-node-3 API servers are still running
|
||
|
||
---
|
||
|
||
## High Availability Status
|
||
|
||
### Control Plane
|
||
| Component | Status | Notes |
|
||
|---|---|---|
|
||
| etcd | ✅ 3 members | kube-node-1 + kube-arbiter + kube-node-3; tolerates 1 failure |
|
||
| API server VIP | ⚠️ Not yet deployed | controlPlaneEndpoint hardcoded to 192.168.2.100; kube-vip (Phase 1b) pending |
|
||
| CoreDNS | ✅ Required anti-affinity | Pods spread across different nodes (kube-node-1 + kube-node-2) |
|
||
|
||
### Workloads (replicas=2, required pod anti-affinity)
|
||
| Service | Replicas | PDB |
|
||
|---|---|---|
|
||
| authentik-server | 2 | ✅ |
|
||
| authentik-worker | 2 | ✅ |
|
||
| cert-manager | 2 | ✅ |
|
||
| cert-manager-webhook | 2 | ✅ |
|
||
| cert-manager-cainjector | 2 | ✅ |
|
||
| parts-api | 2 | ✅ |
|
||
| parts-ui | 2 | ✅ |
|
||
| ha-sync-ui | 2 | ✅ |
|
||
| games-console-backend | 2 | ✅ |
|
||
| games-console-ui | 2 | ✅ |
|
||
| ingress-nginx | DaemonSet | ✅ (runs on all workers) |
|
||
|
||
### Storage
|
||
| PV | Type | Notes |
|
||
|---|---|---|
|
||
| paperclip-data-pv | NFS (192.168.2.252) | ✅ Migrated from hostPath; can schedule on any node |
|
||
| prometheus-storage-pv | hostPath on kube-node-1 | ⚠️ Still pinned to kube-node-1 (out of scope) |
|
||
|
||
### Known Remaining SPOFs
|
||
| Risk | Description | Mitigation |
|
||
|---|---|---|
|
||
| HP ProLiant physical host | kube-node-2/3 + kube-arbiter are all HP VMs | Add Raspberry Pi 4/5 (8 GB) as 4th physical host |
|
||
| controlPlaneEndpoint | Hardcoded to kube-node-1 IP | Deploy kube-vip with VIP (e.g. 192.168.2.50) |
|
||
|
||
---
|
||
|
||
|
||
|
||
### games
|
||
| Service | NodePort | Storage |
|
||
|---|---|---|
|
||
| minecraft-home | 31112 | HP NFS `/data/games/minecraft` |
|
||
| minecraft-cheats | 31111 | HP NFS `/data/games/minecraft` |
|
||
| minecraft-creative | 31559 | HP NFS `/data/games/minecraft` |
|
||
| minecraft-johannes | 31563 | HP NFS `/data/games/minecraft` |
|
||
| minecraft-noah | 31560 | HP NFS `/data/games/minecraft` |
|
||
| Factorio | — | HP NFS `/data/games/factorio` |
|
||
| OpenTTD | — | HP NFS `/data/games/openttd` |
|
||
|
||
Minecraft operators: LadyGisela5, tomgates24, anutzalizuk, toranaga_samma
|
||
|
||
### monitoring
|
||
- **Helm release**: `obs`, chart `prometheus-community/kube-prometheus-stack`
|
||
- **Values file**: `/home/dan/homelab/deployment/helm/prometheus/prometheus-helm-values.yaml`
|
||
- **Components**: Prometheus, Grafana, AlertManager, Node Exporter, Kube State Metrics
|
||
- **Grafana**: NodePort 31473 → http://192.168.2.100:31473
|
||
- **Storage**: 100 Gi hostPath PV at `/data/infra/prometheus` on kube-node-1
|
||
|
||
### infrastructure
|
||
- General MySQL/MariaDB (StatefulSet) — HP NFS `/media-pool/general-db`
|
||
- Speedtest Tracker — HP NFS `/media-pool/speedtest`
|
||
- DNS updater (DaemonSet, `tunix/digitalocean-dyndns`) — updates DigitalOcean DNS
|
||
- Proxmox ingress → 192.168.2.193:8006
|
||
|
||
### storage
|
||
- **OwnCloud** (`owncloud/server:10.12`) — drive.vandachevici.ro, admin: sefu
|
||
- MariaDB (StatefulSet), Redis (Deployment), OwnCloud server (2 replicas)
|
||
- Storage: HP NFS `/data/owncloud`
|
||
|
||
### media
|
||
- **Jellyfin** — media.vandachevici.ro, storage: HP NFS `/data/media`
|
||
- **Immich** — photos.vandachevici.ro, storage: HP NFS `/data/photos`
|
||
- Components: server (2 replicas), ML (2 replicas), valkey, postgresql
|
||
|
||
### iot
|
||
- IoT MySQL (StatefulSet, db: `iot_db`)
|
||
- IoT API (`iot-api:latest`, NodePort 30800) — requires `topology.homelab/server: dell` label
|
||
|
||
### ai
|
||
- **Paperclip** — paperclip.vandachevici.ro
|
||
- Embedded PostgreSQL at `/data/ai/paperclip/instances/default/db`
|
||
- Config: `/data/ai/paperclip/instances/default/config.json`
|
||
- NFS PV via keepalived VIP `192.168.2.252:/data/ai/paperclip` (can schedule on any node) ✅
|
||
- Env: `PAPERCLIP_AGENT_JWT_SECRET` (in K8s secret)
|
||
|
||
---
|
||
|
||
## AI / OpenClaw
|
||
|
||
### local-ai VM (192.168.2.88) — GPU instance
|
||
- **GPU**: NVIDIA Tesla P4, 8 GB VRAM (PCIe passthrough from Proxmox)
|
||
- VFIO: `/etc/modprobe.d/vfio.conf` ids=10de:1bb3, allow_unsafe_interrupts=1
|
||
- initramfs updated for persistence
|
||
- **Ollama**: listening on `0.0.0.0:11434`, models at `/data/ollama/models`
|
||
- Loaded: `qwen3:8b` (5.2 GB)
|
||
- **openclaw-gateway**: `ws://0.0.0.0:18789`, auth mode: token
|
||
- Token: in `~/.openclaw/openclaw.json` → `gateway.auth.token`
|
||
- Systemd: `openclaw-gateway.service` (Type=simple, enabled)
|
||
|
||
### remote-ai VM (192.168.2.91)
|
||
- **openclaw-gateway**: installed (v2026.3.13), config at `~/.openclaw/openclaw.json`
|
||
- Uses cloud AI providers (Claude API key required)
|
||
|
||
### Connecting Paperclip to openclaw
|
||
- URL: `ws://192.168.2.88:18789/`
|
||
- Auth: token from `~/.openclaw/openclaw.json` → `gateway.auth.token`
|
||
|
||
---
|
||
|
||
## Network Endpoints
|
||
|
||
| Service | URL / Address |
|
||
|---|---|
|
||
| K8s Dashboard | https://192.168.2.100:30443 |
|
||
| Proxmox UI | https://proxmox.vandachevici.ro |
|
||
| Grafana | http://192.168.2.100:31473 |
|
||
| Jellyfin | https://media.vandachevici.ro |
|
||
| Immich (photos) | https://photos.vandachevici.ro |
|
||
| OwnCloud | https://drive.vandachevici.ro |
|
||
| Paperclip | https://paperclip.vandachevici.ro |
|
||
| IoT API | http://192.168.2.100:30800 |
|
||
| minecraft-home | 192.168.2.100:31112 |
|
||
| minecraft-cheats | 192.168.2.100:31111 |
|
||
| minecraft-creative | 192.168.2.100:31559 |
|
||
| minecraft-johannes | 192.168.2.100:31563 |
|
||
| minecraft-noah | 192.168.2.100:31560 |
|
||
| Ollama (local-ai) | http://192.168.2.88:11434 |
|
||
| openclaw gateway (local-ai) | ws://192.168.2.88:18789 |
|
||
| Ollama (Dell) | http://192.168.2.100:11434 |
|
||
|
||
### DNS subdomains managed (DigitalOcean)
|
||
`photos`, `backup`, `media`, `chat`, `openttd`, `excalidraw`, `prv`, `drive`, `grafana`, `paperclip`, `proxmox`
|
||
|
||
---
|
||
|
||
## Common Operations
|
||
|
||
### Apply manifests
|
||
```bash
|
||
kubectl apply -f /home/dan/homelab/deployment/<namespace>/
|
||
```
|
||
|
||
### Prometheus (Helm)
|
||
```bash
|
||
helm upgrade obs prometheus-community/kube-prometheus-stack \
|
||
-n monitoring \
|
||
-f /home/dan/homelab/deployment/helm/prometheus/prometheus-helm-values.yaml
|
||
```
|
||
|
||
### NFS provisioners (Helm)
|
||
```bash
|
||
# Example: jellyfin
|
||
helm upgrade nfs-jellyfin nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \
|
||
-n nfs-provisioners \
|
||
-f /home/dan/homelab/deployment/helm/nfs-provisioners/values-jellyfin.yaml
|
||
```
|
||
|
||
### Troubleshooting: Flannel CNI after reboot
|
||
If all pods stuck in `ContainerCreating` after reboot:
|
||
```bash
|
||
# 1. Check default route exists on kube-node-1
|
||
ip route show | grep default
|
||
# Fix: sudo ip route add default via 192.168.2.1 dev eno1
|
||
# Persist: check /etc/netplan/00-installer-config.yaml has routes section
|
||
|
||
# 2. Restart flannel pod on node-1
|
||
kubectl delete pod -n kube-flannel -l app=flannel --field-selector spec.nodeName=kube-node-1
|
||
```
|
||
|
||
### Troubleshooting: kube-node-3 NotReady after reboot
|
||
Likely swap re-enabled:
|
||
```bash
|
||
ssh dan@192.168.2.196 "sudo swapoff -a && sudo sed -i 's|^/swap.img|#/swap.img|' /etc/fstab && sudo systemctl restart kubelet"
|
||
```
|
||
|
||
---
|
||
|
||
## Workspace Structure
|
||
|
||
```
|
||
/home/dan/homelab/
|
||
├── HOMELAB.md — this file
|
||
├── plan.md — original rebuild plan
|
||
├── step-by-step.md — execution tracker
|
||
├── deployment/ — K8s manifests and Helm values
|
||
│ ├── 00-namespaces.yaml
|
||
│ ├── ai/ — Paperclip
|
||
│ ├── default/ — DNS updater
|
||
│ ├── games/ — Minecraft, Factorio, OpenTTD
|
||
│ ├── helm/ — Helm values (prometheus, nfs-provisioners)
|
||
│ ├── infrastructure/ — ingress-nginx, cert-manager, general-db, speedtest, proxmox-ingress
|
||
│ ├── iot/ — IoT DB + API
|
||
│ ├── media/ — Jellyfin, Immich
|
||
│ ├── monitoring/ — (managed by Helm)
|
||
│ └── storage/ — OwnCloud
|
||
├── backups/ — K8s secrets backup (gitignored)
|
||
├── hardware/ — hardware spec docs
|
||
├── orchestration/
|
||
│ └── ansible/ — playbooks, inventory, group_vars, cloud-init
|
||
└── services/
|
||
└── device-inventory/ — C++ CMake project: network device discovery
|
||
```
|
||
|