chore: commit homelab setup — deployment, services, orchestration, skill

- Add .gitignore: exclude compiled binaries, build artifacts, and Helm
  values files containing real secrets (authentik, prometheus)
- Add all Kubernetes deployment manifests (deployment/)
- Add services source code: ha-sync, device-inventory, games-console,
  paperclip, parts-inventory
- Add Ansible orchestration: playbooks, roles, inventory, cloud-init
- Add hardware specs, execution plans, scripts, HOMELAB.md
- Add skills/homelab/SKILL.md + skills/install.sh to preserve Copilot skill
- Remove previously-tracked inventory-cli binary from git index

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
Dan V 2026-04-09 08:10:32 +02:00
parent f2c4324fb0
commit deb6c38d7b
8957 changed files with 2910492 additions and 15 deletions

20
.gitignore vendored Normal file
View file

@ -0,0 +1,20 @@
backups/
# Helm values files containing real secrets (manage manually, never commit)
deployment/helm/authentik/values.yaml
deployment/helm/monitoring/prometheus-values.yaml
# Compiled binaries and build artifacts (source code is in services/)
services/*/bin/
services/*/build/
services/device-inventory/bin/
services/device-inventory/web-ui/inventory-web-ui
orchestration/ansible/roles/inventory-cli/files/inventory-cli
orchestration/ansible/roles/inventory-cli/files/device-inventory
# Session / planning artifacts
plan.md
# OS artifacts
.DS_Store
*.env

323
HOMELAB.md Normal file
View file

@ -0,0 +1,323 @@
# Homelab Specs
---
## Hardware
### Dell OptiPlex 7070
- **Role**: kube-node-1 (control-plane + worker), bare metal
- **IP**: 192.168.2.100
- **SSH**: `dan@192.168.2.100`
- **CPU**: Intel Core i5-9500, 6c/6t, 3.0 GHz base / 4.4 GHz boost, 9 MB L3, 65W TDP, VT-x
- **RAM**: 16 GB DDR4 2666 MT/s DIMM
- **Storage**:
- `nvme0`: Samsung PM991 256 GB — 1G EFI, 2G /boot, 235.4G LVM (100G → /)
- `sda`: Seagate Expansion 2 TB → `/data/photos` (ext4)
- `sdb`: Seagate Expansion+ 2 TB → `/mnt/sdb-ro` (ext4, **READ-ONLY — never touch**)
- `sdc1`: Seagate Expansion 1 TB → `/data/media` (ext4)
- `sdc2`: Seagate Expansion 788 GB → `/data/games` (ext4)
- `sdd`: Samsung HD103SI 1 TB → `/data/owncloud` (ext4)
- `sde`: Hitachi HTS545050 500 GB → `/data/infra` (ext4)
- `sdf`: Seagate 1 TB → `/data/ai` (ext4)
- **Total**: ~7 TB
- **Network**: 1 Gbit/s
- **NFS server**: exports `/data/{games,media,photos,owncloud,infra,ai}` to LAN
### HP ProLiant DL360 G7
- **Role**: Proxmox hypervisor (192.168.2.193)
- **SSH**: `root@192.168.2.193` (local id_rsa)
- **Web UI**: https://proxmox.vandachevici.ro
- **Storage**:
- 2× HPE SAS 900 GB in RAID 1+0 → 900 GB usable (Proxmox OS)
- 4× HPE SAS 900 GB in RAID 1+0 → 1.8 TB usable (VM disks)
- Promise VTrak J830s: 2× 16 TB → `media-pool` (ZFS, ~14 TB usable)
- **Total**: ~18 TB
### Promise VTrak J830s
- Connected to HP ProLiant via SAS
- 2× 16 TB disks, ZFS pool `media-pool`
- ZFS datasets mounted at `/data/X` on HP (matching Dell paths)
---
## Storage Layout
### Dell `/data` drives (primary/local)
| Mount | Device | Size | Contents |
|---|---|---|---|
| `/data/games` | sdc2 | 788 GB | Game server worlds and kits |
| `/data/media` | sdc1 | 1.1 TB | Jellyfin media library |
| `/data/photos` | sda | 916 GB | Immich photo library |
| `/data/owncloud` | sdd | 916 GB | OwnCloud files |
| `/data/infra` | sde | 458 GB | Prometheus, infra data |
| `/data/ai` | sdf | 916 GB | Paperclip, Ollama models |
| `/mnt/sdb-ro` | sdb | 1.8 TB | **READ-ONLY** archive — never modify |
### HP VTrak ZFS datasets (HA mirrors)
| ZFS Dataset | Mountpoint on HP | NFS export |
|---|---|---|
| media-pool/jellyfin | `/data/media` | ✅ |
| media-pool/immich | `/data/photos` | ✅ |
| media-pool/owncloud | `/data/owncloud` | ✅ |
| media-pool/games | `/data/games` | ✅ |
| media-pool/minecraft | `/data/games/minecraft` | ✅ |
| media-pool/factorio | `/data/games/factorio` | ✅ |
| media-pool/openttd | `/data/games/openttd` | ✅ |
| media-pool/infra | `/data/infra` | ✅ |
| media-pool/ai | `/data/ai` | ✅ |
Legacy bind mounts at `/media-pool/X``/data/X` preserved for K8s PV compatibility.
### Cross-mounts (HA access)
| From | Mount point | To |
|---|---|---|
| Dell | `/mnt/hp/data-{games,media,photos,owncloud,infra,ai}` | HP VTrak NFS |
| HP | `/mnt/dell/data-{games,media,photos,owncloud,infra,ai}` | Dell NFS |
---
## VMs on HP ProLiant (Proxmox)
| VM ID | Name | IP | RAM | Role |
|---|---|---|---|---|
| 100 | kube-node-2 | 192.168.2.195 | 16 GB | K8s worker |
| 101 | kube-node-3 | 192.168.2.196 | 16 GB | K8s control-plane + worker |
| 103 | kube-arbiter | 192.168.2.200 | 6 GB | K8s control-plane (etcd + API server, NoSchedule) |
| 104 | local-ai | 192.168.2.88 | — | Ollama + openclaw-gateway (Tesla P4 GPU passthrough) |
| 106 | ansible-control | 192.168.2.70 | — | Ansible control node |
| 107 | remote-ai | 192.168.2.91 | — | openclaw-gateway (remote, cloud AI) |
⚠️ kube-node-2, kube-node-3, and kube-arbiter are all VMs on the HP ProLiant. HP ProLiant failure = loss of 3/4 K8s nodes simultaneously. Mitigation: add a Raspberry Pi 4/5 (8 GB) as a 4th physical host.
SSH: `dan@<ip>` for all VMs
---
## Kubernetes Cluster
- **Version**: 1.32.13
- **CNI**: Flannel
- **Dashboard**: https://192.168.2.100:30443 (self-signed cert, token auth)
- **Token file**: `/home/dan/homelab/kube/cluster/DASHBOARD-ACCESS.txt`
- **StorageClass**: `local-storage` (hostPath on kube-node-1)
- **NFS provisioners**: `nfs-provisioners` namespace (nfs-subdir-external-provisioner)
### Nodes
| Node | Role | IP | Host |
|---|---|---|---|
| kube-node-1 | control-plane + worker | 192.168.2.100 | Dell OptiPlex 7070 (bare metal) |
| kube-node-2 | worker | 192.168.2.195 | VM on HP ProLiant (16 GB RAM) |
| kube-node-3 | control-plane + worker | 192.168.2.196 | VM on HP ProLiant (16 GB RAM) |
| kube-arbiter | control-plane | 192.168.2.200 | VM on HP ProLiant (1c/6GB, tainted NoSchedule) |
**etcd**: 3 members (kube-node-1 + kube-arbiter + kube-node-3) — quorum survives 1 member failure ✅
**controlPlaneEndpoint**: `192.168.2.100:6443` ⚠️ SPOF — kube-vip (Phase 1b) not yet deployed; if kube-node-1 goes down, workers lose API access even though kube-arbiter and kube-node-3 API servers are still running
---
## High Availability Status
### Control Plane
| Component | Status | Notes |
|---|---|---|
| etcd | ✅ 3 members | kube-node-1 + kube-arbiter + kube-node-3; tolerates 1 failure |
| API server VIP | ⚠️ Not yet deployed | controlPlaneEndpoint hardcoded to 192.168.2.100; kube-vip (Phase 1b) pending |
| CoreDNS | ✅ Required anti-affinity | Pods spread across different nodes (kube-node-1 + kube-node-2) |
### Workloads (replicas=2, required pod anti-affinity)
| Service | Replicas | PDB |
|---|---|---|
| authentik-server | 2 | ✅ |
| authentik-worker | 2 | ✅ |
| cert-manager | 2 | ✅ |
| cert-manager-webhook | 2 | ✅ |
| cert-manager-cainjector | 2 | ✅ |
| parts-api | 2 | ✅ |
| parts-ui | 2 | ✅ |
| ha-sync-ui | 2 | ✅ |
| games-console-backend | 2 | ✅ |
| games-console-ui | 2 | ✅ |
| ingress-nginx | DaemonSet | ✅ (runs on all workers) |
### Storage
| PV | Type | Notes |
|---|---|---|
| paperclip-data-pv | NFS (192.168.2.252) | ✅ Migrated from hostPath; can schedule on any node |
| prometheus-storage-pv | hostPath on kube-node-1 | ⚠️ Still pinned to kube-node-1 (out of scope) |
### Known Remaining SPOFs
| Risk | Description | Mitigation |
|---|---|---|
| HP ProLiant physical host | kube-node-2/3 + kube-arbiter are all HP VMs | Add Raspberry Pi 4/5 (8 GB) as 4th physical host |
| controlPlaneEndpoint | Hardcoded to kube-node-1 IP | Deploy kube-vip with VIP (e.g. 192.168.2.50) |
---
### games
| Service | NodePort | Storage |
|---|---|---|
| minecraft-home | 31112 | HP NFS `/data/games/minecraft` |
| minecraft-cheats | 31111 | HP NFS `/data/games/minecraft` |
| minecraft-creative | 31559 | HP NFS `/data/games/minecraft` |
| minecraft-johannes | 31563 | HP NFS `/data/games/minecraft` |
| minecraft-noah | 31560 | HP NFS `/data/games/minecraft` |
| Factorio | — | HP NFS `/data/games/factorio` |
| OpenTTD | — | HP NFS `/data/games/openttd` |
Minecraft operators: LadyGisela5, tomgates24, anutzalizuk, toranaga_samma
### monitoring
- **Helm release**: `obs`, chart `prometheus-community/kube-prometheus-stack`
- **Values file**: `/home/dan/homelab/deployment/helm/prometheus/prometheus-helm-values.yaml`
- **Components**: Prometheus, Grafana, AlertManager, Node Exporter, Kube State Metrics
- **Grafana**: NodePort 31473 → http://192.168.2.100:31473
- **Storage**: 100 Gi hostPath PV at `/data/infra/prometheus` on kube-node-1
### infrastructure
- General MySQL/MariaDB (StatefulSet) — HP NFS `/media-pool/general-db`
- Speedtest Tracker — HP NFS `/media-pool/speedtest`
- DNS updater (DaemonSet, `tunix/digitalocean-dyndns`) — updates DigitalOcean DNS
- Proxmox ingress → 192.168.2.193:8006
### storage
- **OwnCloud** (`owncloud/server:10.12`) — drive.vandachevici.ro, admin: sefu
- MariaDB (StatefulSet), Redis (Deployment), OwnCloud server (2 replicas)
- Storage: HP NFS `/data/owncloud`
### media
- **Jellyfin** — media.vandachevici.ro, storage: HP NFS `/data/media`
- **Immich** — photos.vandachevici.ro, storage: HP NFS `/data/photos`
- Components: server (2 replicas), ML (2 replicas), valkey, postgresql
### iot
- IoT MySQL (StatefulSet, db: `iot_db`)
- IoT API (`iot-api:latest`, NodePort 30800) — requires `topology.homelab/server: dell` label
### ai
- **Paperclip** — paperclip.vandachevici.ro
- Embedded PostgreSQL at `/data/ai/paperclip/instances/default/db`
- Config: `/data/ai/paperclip/instances/default/config.json`
- NFS PV via keepalived VIP `192.168.2.252:/data/ai/paperclip` (can schedule on any node) ✅
- Env: `PAPERCLIP_AGENT_JWT_SECRET` (in K8s secret)
---
## AI / OpenClaw
### local-ai VM (192.168.2.88) — GPU instance
- **GPU**: NVIDIA Tesla P4, 8 GB VRAM (PCIe passthrough from Proxmox)
- VFIO: `/etc/modprobe.d/vfio.conf` ids=10de:1bb3, allow_unsafe_interrupts=1
- initramfs updated for persistence
- **Ollama**: listening on `0.0.0.0:11434`, models at `/data/ollama/models`
- Loaded: `qwen3:8b` (5.2 GB)
- **openclaw-gateway**: `ws://0.0.0.0:18789`, auth mode: token
- Token: in `~/.openclaw/openclaw.json``gateway.auth.token`
- Systemd: `openclaw-gateway.service` (Type=simple, enabled)
### remote-ai VM (192.168.2.91)
- **openclaw-gateway**: installed (v2026.3.13), config at `~/.openclaw/openclaw.json`
- Uses cloud AI providers (Claude API key required)
### Connecting Paperclip to openclaw
- URL: `ws://192.168.2.88:18789/`
- Auth: token from `~/.openclaw/openclaw.json``gateway.auth.token`
---
## Network Endpoints
| Service | URL / Address |
|---|---|
| K8s Dashboard | https://192.168.2.100:30443 |
| Proxmox UI | https://proxmox.vandachevici.ro |
| Grafana | http://192.168.2.100:31473 |
| Jellyfin | https://media.vandachevici.ro |
| Immich (photos) | https://photos.vandachevici.ro |
| OwnCloud | https://drive.vandachevici.ro |
| Paperclip | https://paperclip.vandachevici.ro |
| IoT API | http://192.168.2.100:30800 |
| minecraft-home | 192.168.2.100:31112 |
| minecraft-cheats | 192.168.2.100:31111 |
| minecraft-creative | 192.168.2.100:31559 |
| minecraft-johannes | 192.168.2.100:31563 |
| minecraft-noah | 192.168.2.100:31560 |
| Ollama (local-ai) | http://192.168.2.88:11434 |
| openclaw gateway (local-ai) | ws://192.168.2.88:18789 |
| Ollama (Dell) | http://192.168.2.100:11434 |
### DNS subdomains managed (DigitalOcean)
`photos`, `backup`, `media`, `chat`, `openttd`, `excalidraw`, `prv`, `drive`, `grafana`, `paperclip`, `proxmox`
---
## Common Operations
### Apply manifests
```bash
kubectl apply -f /home/dan/homelab/deployment/<namespace>/
```
### Prometheus (Helm)
```bash
helm upgrade obs prometheus-community/kube-prometheus-stack \
-n monitoring \
-f /home/dan/homelab/deployment/helm/prometheus/prometheus-helm-values.yaml
```
### NFS provisioners (Helm)
```bash
# Example: jellyfin
helm upgrade nfs-jellyfin nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \
-n nfs-provisioners \
-f /home/dan/homelab/deployment/helm/nfs-provisioners/values-jellyfin.yaml
```
### Troubleshooting: Flannel CNI after reboot
If all pods stuck in `ContainerCreating` after reboot:
```bash
# 1. Check default route exists on kube-node-1
ip route show | grep default
# Fix: sudo ip route add default via 192.168.2.1 dev eno1
# Persist: check /etc/netplan/00-installer-config.yaml has routes section
# 2. Restart flannel pod on node-1
kubectl delete pod -n kube-flannel -l app=flannel --field-selector spec.nodeName=kube-node-1
```
### Troubleshooting: kube-node-3 NotReady after reboot
Likely swap re-enabled:
```bash
ssh dan@192.168.2.196 "sudo swapoff -a && sudo sed -i 's|^/swap.img|#/swap.img|' /etc/fstab && sudo systemctl restart kubelet"
```
---
## Workspace Structure
```
/home/dan/homelab/
├── HOMELAB.md — this file
├── plan.md — original rebuild plan
├── step-by-step.md — execution tracker
├── deployment/ — K8s manifests and Helm values
│ ├── 00-namespaces.yaml
│ ├── ai/ — Paperclip
│ ├── default/ — DNS updater
│ ├── games/ — Minecraft, Factorio, OpenTTD
│ ├── helm/ — Helm values (prometheus, nfs-provisioners)
│ ├── infrastructure/ — ingress-nginx, cert-manager, general-db, speedtest, proxmox-ingress
│ ├── iot/ — IoT DB + API
│ ├── media/ — Jellyfin, Immich
│ ├── monitoring/ — (managed by Helm)
│ └── storage/ — OwnCloud
├── backups/ — K8s secrets backup (gitignored)
├── hardware/ — hardware spec docs
├── orchestration/
│ └── ansible/ — playbooks, inventory, group_vars, cloud-init
└── services/
└── device-inventory/ — C++ CMake project: network device discovery
```

View file

@ -0,0 +1,45 @@
---
apiVersion: v1
kind: Namespace
metadata:
name: games
---
apiVersion: v1
kind: Namespace
metadata:
name: monitoring
---
apiVersion: v1
kind: Namespace
metadata:
name: infrastructure
---
apiVersion: v1
kind: Namespace
metadata:
name: storage
---
apiVersion: v1
kind: Namespace
metadata:
name: media
---
apiVersion: v1
kind: Namespace
metadata:
name: iot
---
apiVersion: v1
kind: Namespace
metadata:
name: ai
---
apiVersion: v1
kind: Namespace
metadata:
name: backup
---
apiVersion: v1
kind: Namespace
metadata:
name: kubernetes-dashboard

150
deployment/README.md Normal file
View file

@ -0,0 +1,150 @@
# Homelab Kubernetes Deployment Manifests
Reconstructed 2026-03-20 from live cluster state using `kubectl.kubernetes.io/last-applied-configuration` annotations.
## Directory Structure
```
deployment/
├── 00-namespaces.yaml # All namespace definitions — apply first
├── games/
│ ├── factorio.yaml # Factorio server (hostPort 34197)
│ ├── minecraft-cheats.yaml # Minecraft cheats (hostPort 25111)
│ ├── minecraft-creative.yaml # Minecraft creative (hostPort 25559)
│ ├── minecraft-home.yaml # Minecraft home (hostPort 25112)
│ ├── minecraft-jaron.yaml # Minecraft jaron (hostPort 25564)
│ ├── minecraft-johannes.yaml # Minecraft johannes (hostPort 25563)
│ ├── minecraft-noah.yaml # Minecraft noah (hostPort 25560)
│ └── openttd.yaml # OpenTTD (NodePort 30979/30978)
├── monitoring/
│ └── prometheus-pv.yaml # Manual local-storage PV for Prometheus
├── infrastructure/
│ ├── cert-issuers.yaml # ClusterIssuers: letsencrypt-prod + staging
│ ├── dns-updater.yaml # DaemonSet + ConfigMap (DigitalOcean DynDNS)
│ ├── general-db.yaml # MySQL 9 StatefulSet (shared DB for speedtest etc.)
│ ├── paperclip.yaml # Paperclip AI — PV + Deployment + Service + Ingress
│ └── speedtest-tracker.yaml # Speedtest Tracker + ConfigMap + Ingress
├── storage/
│ ├── owncloud.yaml # OwnCloud server + ConfigMap + Ingress
│ ├── owncloud-mariadb.yaml # MariaDB 10.6 StatefulSet
│ └── owncloud-redis.yaml # Redis 6 Deployment
├── media/
│ ├── jellyfin.yaml # Jellyfin + ConfigMap + Ingress
│ └── immich.yaml # Immich full stack (server, ml, db, valkey) + Ingress
├── iot/
│ ├── iot-db.yaml # MySQL 9 StatefulSet for IoT data
│ └── iot-api.yaml # IoT API (local image, see note below)
├── ai/
│ └── ollama.yaml # Ollama (currently scaled to 0)
├── default/
│ └── dns-updater-legacy.yaml # Legacy default-ns resources (hp-fast-pv, old ollama)
└── helm/
├── nfs-provisioners/ # Values for all NFS subdir provisioner releases
│ ├── values-vtrak.yaml # nfs-vtrak (default StorageClass)
│ ├── values-general.yaml # nfs-general (500G quota)
│ ├── values-general-db.yaml # nfs-general-db (20G quota)
│ ├── values-immich.yaml # nfs-immich (300G quota)
│ ├── values-jellyfin.yaml # nfs-jellyfin (700G quota)
│ ├── values-owncloud.yaml # nfs-owncloud (200G quota)
│ ├── values-minecraft.yaml # nfs-minecraft (50G quota)
│ ├── values-factorio.yaml # nfs-factorio (10G quota)
│ ├── values-openttd.yaml # nfs-openttd (5G quota)
│ ├── values-speedtest.yaml # nfs-speedtest (5G quota)
│ ├── values-authentik.yaml # nfs-authentik (20G quota)
│ └── values-iot.yaml # nfs-iot (20G quota)
├── cert-manager/
│ └── values.yaml # cert-manager v1.19.3 (crds.enabled=true)
├── ingress-nginx/
│ └── values.yaml # ingress-nginx v4.14.3 (DaemonSet, hostPort)
├── monitoring/
│ └── prometheus-values.yaml # kube-prometheus-stack (Grafana NodePort 31473)
└── authentik/
├── values.yaml # Authentik SSO v2026.2.1
└── redis-values.yaml # Standalone Redis for Authentik
```
## Apply Order
For a fresh cluster, apply in this order:
```bash
BASE=/home/dan/homelab/deployment
# 1. Namespaces
kubectl apply -f $BASE/00-namespaces.yaml
# 2. NFS provisioners (Helm) — run from default namespace
helm install nfs-vtrak nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \
-f $BASE/helm/nfs-provisioners/values-vtrak.yaml
# ... repeat for each nfs-* values file
# 3. cert-manager
helm install cert-manager cert-manager/cert-manager -n cert-manager --create-namespace \
-f $BASE/helm/cert-manager/values.yaml
# 4. ingress-nginx
helm install ingress-nginx ingress-nginx/ingress-nginx -n infrastructure \
-f $BASE/helm/ingress-nginx/values.yaml
# 5. ClusterIssuers (requires cert-manager to be ready)
# Create the digitalocean-dns-token secret first:
# kubectl create secret generic digitalocean-dns-token \
# --from-literal=access-token=<TOKEN> -n cert-manager
kubectl apply -f $BASE/infrastructure/cert-issuers.yaml
# 6. Prometheus PV (must exist before helm install)
kubectl apply -f $BASE/monitoring/prometheus-pv.yaml
helm install obs prometheus-community/kube-prometheus-stack -n monitoring \
-f $BASE/helm/monitoring/prometheus-values.yaml
# 7. Infrastructure workloads (create secrets first — see comments in each file)
kubectl apply -f $BASE/infrastructure/dns-updater.yaml
kubectl apply -f $BASE/infrastructure/general-db.yaml
kubectl apply -f $BASE/infrastructure/speedtest-tracker.yaml
kubectl apply -f $BASE/infrastructure/paperclip.yaml
# 8. Storage
kubectl apply -f $BASE/storage/
# 9. Media
kubectl apply -f $BASE/media/
# 10. Games
kubectl apply -f $BASE/games/
# 11. IoT
kubectl apply -f $BASE/iot/
# 12. AI
kubectl apply -f $BASE/ai/
# 13. Authentik
helm install authentik-redis bitnami/redis -n infrastructure \
-f $BASE/helm/authentik/redis-values.yaml
helm install authentik authentik/authentik -n infrastructure \
-f $BASE/helm/authentik/values.yaml
```
## Secrets Required (not stored here)
The following secrets must be created manually before applying the relevant workloads:
| Secret | Namespace | Keys | Used By |
|--------|-----------|------|---------|
| `dns-updater-secret` | infrastructure | `digitalocean-token` | dns-updater DaemonSet |
| `digitalocean-dns-token` | cert-manager | `access-token` | ClusterIssuer (DNS01 solver) |
| `general-db-secret` | infrastructure | `root-password`, `database`, `user`, `password` | general-purpose-db, speedtest-tracker |
| `paperclip-secrets` | infrastructure | `BETTER_AUTH_SECRET` | paperclip |
| `owncloud-db-secret` | storage | `root-password`, `user`, `password`, `database` | owncloud-mariadb, owncloud-server |
| `iot-db-secret` | iot | `root-password`, `database`, `user`, `password` | iot-db, iot-api |
| `immich-secret` | media | `db-username`, `db-password`, `db-name`, `jwt-secret` | immich-server, immich-db |
## Key Notes
- **kube-node-1 is cordoned** — no general workloads schedule there. Exceptions: DaemonSets (dns-updater, ingress-nginx, node-exporter, flannel) and workloads with explicit `nodeSelector: kubernetes.io/hostname: kube-node-1` (paperclip).
- **NFS storage** — all app data lives on ZFS datasets on the HP ProLiant (`192.168.2.193:/VTrak-Storage/<app>`). The NFS provisioners in the `default` namespace handle dynamic PV provisioning.
- **Prometheus** — intentionally uses `local-storage` at `/kube-storage-room/prometheus/` on kube-node-1 (USB disk sde). The `prometheus-storage-pv` PV must be manually created.
- **Paperclip** — uses local image `paperclip:latest` with `imagePullPolicy: Never`, pinned to kube-node-1. The image must be built locally on that node.
- **iot-api** — currently broken (`ErrImageNeverPull` on kube-node-3). The `iot-api:latest` local image is not present on the worker nodes. Either add a nodeSelector or push to a registry.
- **Ollama** — the `ai/ollama` and `default/ollama` deployments are both scaled to 0. Active LLM serving happens on the openclaw VM (192.168.2.88) via systemd Ollama service.
- **Authentik**`helm/authentik/values.yaml` contains credentials in plaintext. Treat this file as sensitive.

68
deployment/ai/ollama.yaml Normal file
View file

@ -0,0 +1,68 @@
---
# NOTE: ollama in the 'ai' namespace is currently scaled to 0 replicas (intentionally stopped).
# The actual AI workload runs on the openclaw VM (192.168.2.88) via Ollama system service.
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations: {}
name: ollama-data
namespace: ai
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 100Gi
storageClassName: nfs-vtrak
---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations: {}
name: ollama
namespace: ai
spec:
replicas: 1
selector:
matchLabels:
app: ollama
template:
metadata:
labels:
app: ollama
spec:
containers:
- image: ollama/ollama:latest
name: ollama
ports:
- containerPort: 11434
name: http
resources:
limits:
cpu: '8'
memory: 24Gi
requests:
cpu: 500m
memory: 2Gi
volumeMounts:
- mountPath: /root/.ollama
name: ollama-storage
volumes:
- name: ollama-storage
persistentVolumeClaim:
claimName: ollama-data
---
apiVersion: v1
kind: Service
metadata:
annotations: {}
name: ollama
namespace: ai
spec:
ports:
- name: http
port: 11434
targetPort: 11434
selector:
app: ollama

View file

@ -0,0 +1,95 @@
---
# Legacy default-namespace resources
# These are the NFS subdir provisioner deployments and the legacy ollama deployment.
# NFS provisioners are managed via Helm — see helm/nfs-provisioners/ for values files.
# The ollama deployment here is a legacy entry (scaled to 0) — the active ollama
# is in the 'ai' namespace. The hp-fast-pv / ollama-data-pvc bind a 1500Gi hostPath
# on the HP ProLiant at /mnt/hp_fast.
---
# hp-fast-pv: hostPath PV on HP ProLiant VMs (path /mnt/hp_fast, 1500Gi)
# No nodeAffinity was set originally — binding may be unreliable.
apiVersion: v1
kind: PersistentVolume
metadata:
annotations: {}
name: hp-fast-pv
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 1500Gi
hostPath:
path: /mnt/hp_fast
persistentVolumeReclaimPolicy: Retain
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations: {}
name: ollama-data-pvc
namespace: default
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1500Gi
storageClassName: ''
volumeName: hp-fast-pv
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations: {}
name: ollama-data
namespace: default
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 100Gi
storageClassName: nfs-vtrak
---
# Legacy ollama deployment in default namespace (scaled to 0, inactive)
apiVersion: apps/v1
kind: Deployment
metadata:
annotations: {}
name: ollama
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: ollama
template:
metadata:
labels:
app: ollama
spec:
containers:
- image: ollama/ollama:latest
name: ollama
ports:
- containerPort: 11434
volumeMounts:
- mountPath: /root/.ollama
name: ollama-storage
volumes:
- name: ollama-storage
persistentVolumeClaim:
claimName: ollama-data-pvc
---
apiVersion: v1
kind: Service
metadata:
annotations: {}
name: ollama
namespace: default
spec:
ports:
- port: 11434
targetPort: 11434
selector:
app: ollama

View file

@ -0,0 +1,73 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: games-console-backend
namespace: infrastructure
spec:
replicas: 2
selector:
matchLabels:
app: games-console-backend
template:
metadata:
labels:
app: games-console-backend
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: games-console-backend
topologyKey: kubernetes.io/hostname
serviceAccountName: games-console
containers:
- name: backend
image: games-console-backend:latest
imagePullPolicy: Never
args: ["serve", "--namespace", "games"]
ports:
- containerPort: 8080
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 500m
memory: 256Mi
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
periodSeconds: 15
---
apiVersion: v1
kind: Service
metadata:
name: games-console-backend
namespace: infrastructure
spec:
selector:
app: games-console-backend
ports:
- port: 8080
targetPort: 8080
protocol: TCP
type: ClusterIP
---
apiVersion: v1
kind: Service
metadata:
name: games-console-backend-np
namespace: infrastructure
spec:
selector:
app: games-console-backend
ports:
- port: 8080
targetPort: 8080
nodePort: 31600
protocol: TCP
type: NodePort

View file

@ -0,0 +1,29 @@
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: games-console
namespace: infrastructure
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/auth-url: "https://auth.vandachevici.ro/outpost.goauthentik.io/auth/nginx"
nginx.ingress.kubernetes.io/auth-signin: "https://auth.vandachevici.ro/outpost.goauthentik.io/start?rd=$scheme://$http_host$escaped_request_uri"
nginx.ingress.kubernetes.io/auth-response-headers: >-
Set-Cookie,X-authentik-username,X-authentik-groups,X-authentik-email,X-authentik-name,X-authentik-uid
spec:
ingressClassName: nginx
rules:
- host: games.vandachevici.ro
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: games-console-ui
port:
number: 80
tls:
- hosts:
- games.vandachevici.ro
secretName: games-console-tls

View file

@ -0,0 +1,36 @@
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: games-console
namespace: infrastructure
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: games-console
namespace: games
rules:
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: [""]
resources: ["pods", "services"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: games-console
namespace: games
subjects:
- kind: ServiceAccount
name: games-console
namespace: infrastructure
roleRef:
kind: Role
name: games-console
apiGroup: rbac.authorization.k8s.io

View file

@ -0,0 +1,50 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: games-console-ui
namespace: infrastructure
spec:
replicas: 2
selector:
matchLabels:
app: games-console-ui
template:
metadata:
labels:
app: games-console-ui
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: games-console-ui
topologyKey: kubernetes.io/hostname
containers:
- name: ui
image: games-console-ui:latest
imagePullPolicy: Never
ports:
- containerPort: 80
resources:
requests:
cpu: 20m
memory: 32Mi
limits:
cpu: 200m
memory: 128Mi
---
apiVersion: v1
kind: Service
metadata:
name: games-console-ui
namespace: infrastructure
spec:
selector:
app: games-console-ui
ports:
- port: 80
targetPort: 80
protocol: TCP
type: ClusterIP

View file

@ -0,0 +1,67 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations: {}
name: factorio-alone
namespace: games
spec:
replicas: 1
selector:
matchLabels:
app: factorio-alone
template:
metadata:
labels:
app: factorio-alone
spec:
containers:
- image: factoriotools/factorio
name: factorio
ports:
- containerPort: 34197
hostPort: 34197
protocol: TCP
resources:
limits:
cpu: 2000m
memory: 2Gi
requests:
cpu: 500m
memory: 1Gi
volumeMounts:
- mountPath: /factorio
name: data
volumes:
- name: data
persistentVolumeClaim:
claimName: factorio-alone-v2-pvc
---
apiVersion: v1
kind: Service
metadata:
annotations: {}
name: factorio-alone
namespace: games
spec:
ports:
- port: 34197
protocol: TCP
targetPort: 34197
selector:
app: factorio-alone
type: ClusterIP
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations: {}
name: factorio-alone-v2-pvc
namespace: games
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: nfs-factorio

View file

@ -0,0 +1,92 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations: {}
name: minecraft-cheats
namespace: games
spec:
replicas: 1
selector:
matchLabels:
app: minecraft-cheats
template:
metadata:
labels:
app: minecraft-cheats
spec:
containers:
- env:
- name: EULA
value: 'true'
- name: MOTD
value: A Minecraft Server Powered by Docker
- name: DIFFICULTY
value: easy
- name: GAMEMODE
value: survival
- name: MAX_PLAYERS
value: '10'
- name: ENABLE_COMMAND_BLOCK
value: 'true'
- name: DUMP_SERVER_PROPERTIES
value: 'true'
- name: PAUSE_WHEN_EMPTY_SECONDS
value: '0'
- name: OPS
value: LadyGisela5,tomgates24,anutzalizuk,toranaga_samma
image: itzg/minecraft-server
name: minecraft
ports:
- containerPort: 25565
protocol: TCP
resources:
limits:
cpu: 2000m
memory: 2Gi
requests:
cpu: 500m
memory: 1Gi
volumeMounts:
- mountPath: /data
name: data
nodeSelector:
topology.homelab/server: dell
volumes:
- name: data
persistentVolumeClaim:
claimName: minecraft-cheats-v2-pvc
---
apiVersion: v1
kind: Service
metadata:
annotations: {}
name: minecraft-cheats
namespace: games
spec:
ports:
- port: 25565
protocol: TCP
targetPort: 25565
selector:
app: minecraft-cheats
type: NodePort
ports:
- port: 25565
protocol: TCP
targetPort: 25565
nodePort: 31111
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations: {}
name: minecraft-cheats-v2-pvc
namespace: games
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: nfs-minecraft

View file

@ -0,0 +1,78 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations: {}
name: minecraft-creative
namespace: games
spec:
replicas: 1
selector:
matchLabels:
app: minecraft-creative
template:
metadata:
labels:
app: minecraft-creative
spec:
containers:
- env:
- name: EULA
value: 'true'
- name: PAUSE_WHEN_EMPTY_SECONDS
value: '0'
image: itzg/minecraft-server
name: minecraft
ports:
- containerPort: 25565
protocol: TCP
resources:
limits:
cpu: 2000m
memory: 2Gi
requests:
cpu: 500m
memory: 1Gi
volumeMounts:
- mountPath: /data
name: data
nodeSelector:
topology.homelab/server: dell
volumes:
- name: data
persistentVolumeClaim:
claimName: minecraft-creative-v2-pvc
---
apiVersion: v1
kind: Service
metadata:
annotations: {}
name: minecraft-creative
namespace: games
spec:
ports:
- port: 25565
protocol: TCP
targetPort: 25565
selector:
app: minecraft-creative
type: NodePort
ports:
- port: 25565
protocol: TCP
targetPort: 25565
nodePort: 31559
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations: {}
name: minecraft-creative-v2-pvc
namespace: games
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: nfs-minecraft

View file

@ -0,0 +1,92 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations: {}
name: minecraft-home
namespace: games
spec:
replicas: 1
selector:
matchLabels:
app: minecraft-home
template:
metadata:
labels:
app: minecraft-home
spec:
containers:
- env:
- name: EULA
value: 'true'
- name: MOTD
value: A Minecraft Server Powered by Docker
- name: DIFFICULTY
value: easy
- name: GAMEMODE
value: survival
- name: MAX_PLAYERS
value: '10'
- name: ENABLE_COMMAND_BLOCK
value: 'true'
- name: DUMP_SERVER_PROPERTIES
value: 'true'
- name: PAUSE_WHEN_EMPTY_SECONDS
value: '0'
- name: OPS
value: LadyGisela5,tomgates24,anutzalizuk,toranaga_samma
image: itzg/minecraft-server
name: minecraft
ports:
- containerPort: 25565
protocol: TCP
resources:
limits:
cpu: 2000m
memory: 2Gi
requests:
cpu: 500m
memory: 1Gi
volumeMounts:
- mountPath: /data
name: data
nodeSelector:
topology.homelab/server: dell
volumes:
- name: data
persistentVolumeClaim:
claimName: minecraft-home-v2-pvc
---
apiVersion: v1
kind: Service
metadata:
annotations: {}
name: minecraft-home
namespace: games
spec:
ports:
- port: 25565
protocol: TCP
targetPort: 25565
selector:
app: minecraft-home
type: NodePort
ports:
- port: 25565
protocol: TCP
targetPort: 25565
nodePort: 31112
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations: {}
name: minecraft-home-v2-pvc
namespace: games
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: nfs-minecraft

View file

@ -0,0 +1,72 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations: {}
name: minecraft-jaron
namespace: games
spec:
replicas: 1
selector:
matchLabels:
app: minecraft-jaron
template:
metadata:
labels:
app: minecraft-jaron
spec:
containers:
- env:
- name: EULA
value: 'true'
image: itzg/minecraft-server
name: minecraft
ports:
- containerPort: 25565
hostPort: 25564
protocol: TCP
resources:
limits:
cpu: 2000m
memory: 2Gi
requests:
cpu: 500m
memory: 1Gi
volumeMounts:
- mountPath: /data
name: data
nodeSelector:
topology.homelab/server: dell
volumes:
- name: data
persistentVolumeClaim:
claimName: minecraft-jaron-pvc
---
apiVersion: v1
kind: Service
metadata:
annotations: {}
name: minecraft-jaron
namespace: games
spec:
ports:
- port: 25565
protocol: TCP
targetPort: 25565
selector:
app: minecraft-jaron
type: ClusterIP
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations: {}
name: minecraft-jaron-pvc
namespace: games
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: nfs-minecraft

View file

@ -0,0 +1,78 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations: {}
name: minecraft-johannes
namespace: games
spec:
replicas: 1
selector:
matchLabels:
app: minecraft-johannes
template:
metadata:
labels:
app: minecraft-johannes
spec:
containers:
- env:
- name: EULA
value: 'true'
- name: PAUSE_WHEN_EMPTY_SECONDS
value: '0'
image: itzg/minecraft-server
name: minecraft
ports:
- containerPort: 25565
protocol: TCP
resources:
limits:
cpu: 2000m
memory: 2Gi
requests:
cpu: 500m
memory: 1Gi
volumeMounts:
- mountPath: /data
name: data
nodeSelector:
topology.homelab/server: dell
volumes:
- name: data
persistentVolumeClaim:
claimName: minecraft-johannes-v2-pvc
---
apiVersion: v1
kind: Service
metadata:
annotations: {}
name: minecraft-johannes
namespace: games
spec:
ports:
- port: 25565
protocol: TCP
targetPort: 25565
selector:
app: minecraft-johannes
type: NodePort
ports:
- port: 25565
protocol: TCP
targetPort: 25565
nodePort: 31563
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations: {}
name: minecraft-johannes-v2-pvc
namespace: games
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: nfs-minecraft

View file

@ -0,0 +1,78 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations: {}
name: minecraft-noah
namespace: games
spec:
replicas: 1
selector:
matchLabels:
app: minecraft-noah
template:
metadata:
labels:
app: minecraft-noah
spec:
containers:
- env:
- name: EULA
value: 'true'
- name: PAUSE_WHEN_EMPTY_SECONDS
value: '0'
image: itzg/minecraft-server
name: minecraft
ports:
- containerPort: 25565
protocol: TCP
resources:
limits:
cpu: 2000m
memory: 2Gi
requests:
cpu: 500m
memory: 1Gi
volumeMounts:
- mountPath: /data
name: data
nodeSelector:
topology.homelab/server: dell
volumes:
- name: data
persistentVolumeClaim:
claimName: minecraft-noah-v2-pvc
---
apiVersion: v1
kind: Service
metadata:
annotations: {}
name: minecraft-noah
namespace: games
spec:
ports:
- port: 25565
protocol: TCP
targetPort: 25565
selector:
app: minecraft-noah
type: NodePort
ports:
- port: 25565
protocol: TCP
targetPort: 25565
nodePort: 31560
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations: {}
name: minecraft-noah-v2-pvc
namespace: games
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: nfs-minecraft

View file

@ -0,0 +1,78 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations: {}
name: openttd
namespace: games
spec:
replicas: 1
selector:
matchLabels:
app: openttd
template:
metadata:
labels:
app: openttd
spec:
containers:
- env:
- name: savepath
value: /var/openttd
image: bateau/openttd
name: openttd
ports:
- containerPort: 3979
name: game
- containerPort: 3978
name: admin
resources:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 100m
memory: 256Mi
volumeMounts:
- mountPath: /var/openttd
name: saves
volumes:
- name: saves
persistentVolumeClaim:
claimName: openttd-v2-pvc
---
apiVersion: v1
kind: Service
metadata:
annotations: {}
name: openttd
namespace: games
spec:
ports:
- name: game
nodePort: 30979
port: 3979
protocol: TCP
targetPort: 3979
- name: admin
nodePort: 30978
port: 3978
protocol: TCP
targetPort: 3978
selector:
app: openttd
type: NodePort
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations: {}
name: openttd-v2-pvc
namespace: games
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
storageClassName: nfs-openttd

View file

@ -0,0 +1,59 @@
# AI sync is suspended - not currently enabled for syncing
apiVersion: batch/v1
kind: CronJob
metadata:
name: ha-sync-ai-dell-to-hp
namespace: infrastructure
spec:
schedule: "*/15 * * * *"
suspend: true
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
spec:
serviceAccountName: ha-sync
restartPolicy: OnFailure
containers:
- name: ha-sync
image: ha-sync:latest
imagePullPolicy: Never
command: ["/usr/local/bin/ha-sync"]
args:
- --src=/mnt/dell/ai
- --dest=/mnt/hp/ai
- --pair=ai
- --direction=dell-to-hp
- --log-dir=/var/log/ha-sync
- --exclude=*.sock
- --exclude=*.pid
- --exclude=*.lock
- --exclude=lock
env:
- name: HA_SYNC_DB_DSN
valueFrom:
secretKeyRef:
name: ha-sync-db-secret
key: HA_SYNC_DB_DSN
volumeMounts:
- name: dell-data
mountPath: /mnt/dell/ai
- name: hp-data
mountPath: /mnt/hp/ai
- name: logs
mountPath: /var/log/ha-sync
resources:
requests: { cpu: 50m, memory: 64Mi }
limits: { cpu: 500m, memory: 256Mi }
volumes:
- name: dell-data
persistentVolumeClaim:
claimName: pvc-dell-ai
- name: hp-data
persistentVolumeClaim:
claimName: pvc-hp-ai
- name: logs
persistentVolumeClaim:
claimName: pvc-ha-sync-logs

View file

@ -0,0 +1,60 @@
# AI sync is suspended - not currently enabled for syncing
apiVersion: batch/v1
kind: CronJob
metadata:
name: ha-sync-ai-hp-to-dell
namespace: infrastructure
spec:
schedule: "7,22,37,52 * * * *"
suspend: true
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
spec:
serviceAccountName: ha-sync
restartPolicy: OnFailure
containers:
- name: ha-sync
image: ha-sync:latest
imagePullPolicy: Never
command: ["/usr/local/bin/ha-sync"]
args:
- --src=/mnt/hp/ai
- --dest=/mnt/dell/ai
- --pair=ai
- --direction=hp-to-dell
- --log-dir=/var/log/ha-sync
- --exclude=*.sock
- --exclude=*.pid
- --exclude=*.lock
- --exclude=lock
- --dry-run # REMOVE THIS LINE to enable production sync
env:
- name: HA_SYNC_DB_DSN
valueFrom:
secretKeyRef:
name: ha-sync-db-secret
key: HA_SYNC_DB_DSN
volumeMounts:
- name: hp-data
mountPath: /mnt/hp/ai
- name: dell-data
mountPath: /mnt/dell/ai
- name: logs
mountPath: /var/log/ha-sync
resources:
requests: { cpu: 50m, memory: 64Mi }
limits: { cpu: 500m, memory: 256Mi }
volumes:
- name: hp-data
persistentVolumeClaim:
claimName: pvc-hp-ai
- name: dell-data
persistentVolumeClaim:
claimName: pvc-dell-ai
- name: logs
persistentVolumeClaim:
claimName: pvc-ha-sync-logs

View file

@ -0,0 +1,58 @@
# To enable production sync: remove --dry-run from args below
apiVersion: batch/v1
kind: CronJob
metadata:
name: ha-sync-games-dell-to-hp
namespace: infrastructure
spec:
schedule: "*/15 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
spec:
serviceAccountName: ha-sync
restartPolicy: OnFailure
containers:
- name: ha-sync
image: ha-sync:latest
imagePullPolicy: Never
command: ["/usr/local/bin/ha-sync"]
args:
- --src=/mnt/dell/games
- --dest=/mnt/hp/games
- --pair=games
- --direction=dell-to-hp
- --log-dir=/var/log/ha-sync
- --exclude=*.sock
- --exclude=*.pid
- --exclude=*.lock
- --exclude=lock
env:
- name: HA_SYNC_DB_DSN
valueFrom:
secretKeyRef:
name: ha-sync-db-secret
key: HA_SYNC_DB_DSN
volumeMounts:
- name: dell-data
mountPath: /mnt/dell/games
- name: hp-data
mountPath: /mnt/hp/games
- name: logs
mountPath: /var/log/ha-sync
resources:
requests: { cpu: 50m, memory: 64Mi }
limits: { cpu: 500m, memory: 256Mi }
volumes:
- name: dell-data
persistentVolumeClaim:
claimName: pvc-dell-games
- name: hp-data
persistentVolumeClaim:
claimName: pvc-hp-games
- name: logs
persistentVolumeClaim:
claimName: pvc-ha-sync-logs

View file

@ -0,0 +1,59 @@
# To enable production sync: remove --dry-run from args below
apiVersion: batch/v1
kind: CronJob
metadata:
name: ha-sync-games-hp-to-dell
namespace: infrastructure
spec:
schedule: "7,22,37,52 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
spec:
serviceAccountName: ha-sync
restartPolicy: OnFailure
containers:
- name: ha-sync
image: ha-sync:latest
imagePullPolicy: Never
command: ["/usr/local/bin/ha-sync"]
args:
- --src=/mnt/hp/games
- --dest=/mnt/dell/games
- --pair=games
- --direction=hp-to-dell
- --log-dir=/var/log/ha-sync
- --exclude=*.sock
- --exclude=*.pid
- --exclude=*.lock
- --exclude=lock
- --dry-run # REMOVE THIS LINE to enable production sync
env:
- name: HA_SYNC_DB_DSN
valueFrom:
secretKeyRef:
name: ha-sync-db-secret
key: HA_SYNC_DB_DSN
volumeMounts:
- name: hp-data
mountPath: /mnt/hp/games
- name: dell-data
mountPath: /mnt/dell/games
- name: logs
mountPath: /var/log/ha-sync
resources:
requests: { cpu: 50m, memory: 64Mi }
limits: { cpu: 500m, memory: 256Mi }
volumes:
- name: hp-data
persistentVolumeClaim:
claimName: pvc-hp-games
- name: dell-data
persistentVolumeClaim:
claimName: pvc-dell-games
- name: logs
persistentVolumeClaim:
claimName: pvc-ha-sync-logs

View file

@ -0,0 +1,58 @@
# To enable production sync: remove --dry-run from args below
apiVersion: batch/v1
kind: CronJob
metadata:
name: ha-sync-infra-dell-to-hp
namespace: infrastructure
spec:
schedule: "*/15 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
spec:
serviceAccountName: ha-sync
restartPolicy: OnFailure
containers:
- name: ha-sync
image: ha-sync:latest
imagePullPolicy: Never
command: ["/usr/local/bin/ha-sync"]
args:
- --src=/mnt/dell/infra
- --dest=/mnt/hp/infra
- --pair=infra
- --direction=dell-to-hp
- --log-dir=/var/log/ha-sync
- --exclude=*.sock
- --exclude=*.pid
- --exclude=*.lock
- --exclude=lock
env:
- name: HA_SYNC_DB_DSN
valueFrom:
secretKeyRef:
name: ha-sync-db-secret
key: HA_SYNC_DB_DSN
volumeMounts:
- name: dell-data
mountPath: /mnt/dell/infra
- name: hp-data
mountPath: /mnt/hp/infra
- name: logs
mountPath: /var/log/ha-sync
resources:
requests: { cpu: 50m, memory: 64Mi }
limits: { cpu: 500m, memory: 256Mi }
volumes:
- name: dell-data
persistentVolumeClaim:
claimName: pvc-dell-infra
- name: hp-data
persistentVolumeClaim:
claimName: pvc-hp-infra
- name: logs
persistentVolumeClaim:
claimName: pvc-ha-sync-logs

View file

@ -0,0 +1,59 @@
# To enable production sync: remove --dry-run from args below
apiVersion: batch/v1
kind: CronJob
metadata:
name: ha-sync-infra-hp-to-dell
namespace: infrastructure
spec:
schedule: "7,22,37,52 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
spec:
serviceAccountName: ha-sync
restartPolicy: OnFailure
containers:
- name: ha-sync
image: ha-sync:latest
imagePullPolicy: Never
command: ["/usr/local/bin/ha-sync"]
args:
- --src=/mnt/hp/infra
- --dest=/mnt/dell/infra
- --pair=infra
- --direction=hp-to-dell
- --log-dir=/var/log/ha-sync
- --exclude=*.sock
- --exclude=*.pid
- --exclude=*.lock
- --exclude=lock
- --dry-run # REMOVE THIS LINE to enable production sync
env:
- name: HA_SYNC_DB_DSN
valueFrom:
secretKeyRef:
name: ha-sync-db-secret
key: HA_SYNC_DB_DSN
volumeMounts:
- name: hp-data
mountPath: /mnt/hp/infra
- name: dell-data
mountPath: /mnt/dell/infra
- name: logs
mountPath: /var/log/ha-sync
resources:
requests: { cpu: 50m, memory: 64Mi }
limits: { cpu: 500m, memory: 256Mi }
volumes:
- name: hp-data
persistentVolumeClaim:
claimName: pvc-hp-infra
- name: dell-data
persistentVolumeClaim:
claimName: pvc-dell-infra
- name: logs
persistentVolumeClaim:
claimName: pvc-ha-sync-logs

View file

@ -0,0 +1,58 @@
# To enable production sync: remove --dry-run from args below
apiVersion: batch/v1
kind: CronJob
metadata:
name: ha-sync-media-dell-to-hp
namespace: infrastructure
spec:
schedule: "*/15 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
spec:
serviceAccountName: ha-sync
restartPolicy: OnFailure
containers:
- name: ha-sync
image: ha-sync:latest
imagePullPolicy: Never
command: ["/usr/local/bin/ha-sync"]
args:
- --src=/mnt/dell/media
- --dest=/mnt/hp/media
- --pair=media
- --direction=dell-to-hp
- --log-dir=/var/log/ha-sync
- --exclude=*.sock
- --exclude=*.pid
- --exclude=*.lock
- --exclude=lock
env:
- name: HA_SYNC_DB_DSN
valueFrom:
secretKeyRef:
name: ha-sync-db-secret
key: HA_SYNC_DB_DSN
volumeMounts:
- name: dell-data
mountPath: /mnt/dell/media
- name: hp-data
mountPath: /mnt/hp/media
- name: logs
mountPath: /var/log/ha-sync
resources:
requests: { cpu: 50m, memory: 64Mi }
limits: { cpu: 500m, memory: 256Mi }
volumes:
- name: dell-data
persistentVolumeClaim:
claimName: pvc-dell-media
- name: hp-data
persistentVolumeClaim:
claimName: pvc-hp-media
- name: logs
persistentVolumeClaim:
claimName: pvc-ha-sync-logs

View file

@ -0,0 +1,59 @@
# To enable production sync: remove --dry-run from args below
apiVersion: batch/v1
kind: CronJob
metadata:
name: ha-sync-media-hp-to-dell
namespace: infrastructure
spec:
schedule: "7,22,37,52 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
spec:
serviceAccountName: ha-sync
restartPolicy: OnFailure
containers:
- name: ha-sync
image: ha-sync:latest
imagePullPolicy: Never
command: ["/usr/local/bin/ha-sync"]
args:
- --src=/mnt/hp/media
- --dest=/mnt/dell/media
- --pair=media
- --direction=hp-to-dell
- --log-dir=/var/log/ha-sync
- --exclude=*.sock
- --exclude=*.pid
- --exclude=*.lock
- --exclude=lock
- --dry-run # REMOVE THIS LINE to enable production sync
env:
- name: HA_SYNC_DB_DSN
valueFrom:
secretKeyRef:
name: ha-sync-db-secret
key: HA_SYNC_DB_DSN
volumeMounts:
- name: hp-data
mountPath: /mnt/hp/media
- name: dell-data
mountPath: /mnt/dell/media
- name: logs
mountPath: /var/log/ha-sync
resources:
requests: { cpu: 50m, memory: 64Mi }
limits: { cpu: 500m, memory: 256Mi }
volumes:
- name: hp-data
persistentVolumeClaim:
claimName: pvc-hp-media
- name: dell-data
persistentVolumeClaim:
claimName: pvc-dell-media
- name: logs
persistentVolumeClaim:
claimName: pvc-ha-sync-logs

View file

@ -0,0 +1,58 @@
# To enable production sync: remove --dry-run from args below
apiVersion: batch/v1
kind: CronJob
metadata:
name: ha-sync-owncloud-dell-to-hp
namespace: infrastructure
spec:
schedule: "*/15 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
spec:
serviceAccountName: ha-sync
restartPolicy: OnFailure
containers:
- name: ha-sync
image: ha-sync:latest
imagePullPolicy: Never
command: ["/usr/local/bin/ha-sync"]
args:
- --src=/mnt/dell/owncloud
- --dest=/mnt/hp/owncloud
- --pair=owncloud
- --direction=dell-to-hp
- --log-dir=/var/log/ha-sync
- --exclude=*.sock
- --exclude=*.pid
- --exclude=*.lock
- --exclude=lock
env:
- name: HA_SYNC_DB_DSN
valueFrom:
secretKeyRef:
name: ha-sync-db-secret
key: HA_SYNC_DB_DSN
volumeMounts:
- name: dell-data
mountPath: /mnt/dell/owncloud
- name: hp-data
mountPath: /mnt/hp/owncloud
- name: logs
mountPath: /var/log/ha-sync
resources:
requests: { cpu: 50m, memory: 64Mi }
limits: { cpu: 500m, memory: 256Mi }
volumes:
- name: dell-data
persistentVolumeClaim:
claimName: pvc-dell-owncloud
- name: hp-data
persistentVolumeClaim:
claimName: pvc-hp-owncloud
- name: logs
persistentVolumeClaim:
claimName: pvc-ha-sync-logs

View file

@ -0,0 +1,59 @@
# To enable production sync: remove --dry-run from args below
apiVersion: batch/v1
kind: CronJob
metadata:
name: ha-sync-owncloud-hp-to-dell
namespace: infrastructure
spec:
schedule: "7,22,37,52 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
spec:
serviceAccountName: ha-sync
restartPolicy: OnFailure
containers:
- name: ha-sync
image: ha-sync:latest
imagePullPolicy: Never
command: ["/usr/local/bin/ha-sync"]
args:
- --src=/mnt/hp/owncloud
- --dest=/mnt/dell/owncloud
- --pair=owncloud
- --direction=hp-to-dell
- --log-dir=/var/log/ha-sync
- --exclude=*.sock
- --exclude=*.pid
- --exclude=*.lock
- --exclude=lock
- --dry-run # REMOVE THIS LINE to enable production sync
env:
- name: HA_SYNC_DB_DSN
valueFrom:
secretKeyRef:
name: ha-sync-db-secret
key: HA_SYNC_DB_DSN
volumeMounts:
- name: hp-data
mountPath: /mnt/hp/owncloud
- name: dell-data
mountPath: /mnt/dell/owncloud
- name: logs
mountPath: /var/log/ha-sync
resources:
requests: { cpu: 50m, memory: 64Mi }
limits: { cpu: 500m, memory: 256Mi }
volumes:
- name: hp-data
persistentVolumeClaim:
claimName: pvc-hp-owncloud
- name: dell-data
persistentVolumeClaim:
claimName: pvc-dell-owncloud
- name: logs
persistentVolumeClaim:
claimName: pvc-ha-sync-logs

View file

@ -0,0 +1,58 @@
# To enable production sync: remove --dry-run from args below
apiVersion: batch/v1
kind: CronJob
metadata:
name: ha-sync-photos-dell-to-hp
namespace: infrastructure
spec:
schedule: "*/15 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
spec:
serviceAccountName: ha-sync
restartPolicy: OnFailure
containers:
- name: ha-sync
image: ha-sync:latest
imagePullPolicy: Never
command: ["/usr/local/bin/ha-sync"]
args:
- --src=/mnt/dell/photos
- --dest=/mnt/hp/photos
- --pair=photos
- --direction=dell-to-hp
- --log-dir=/var/log/ha-sync
- --exclude=*.sock
- --exclude=*.pid
- --exclude=*.lock
- --exclude=lock
env:
- name: HA_SYNC_DB_DSN
valueFrom:
secretKeyRef:
name: ha-sync-db-secret
key: HA_SYNC_DB_DSN
volumeMounts:
- name: dell-data
mountPath: /mnt/dell/photos
- name: hp-data
mountPath: /mnt/hp/photos
- name: logs
mountPath: /var/log/ha-sync
resources:
requests: { cpu: 50m, memory: 64Mi }
limits: { cpu: 500m, memory: 256Mi }
volumes:
- name: dell-data
persistentVolumeClaim:
claimName: pvc-dell-photos
- name: hp-data
persistentVolumeClaim:
claimName: pvc-hp-photos
- name: logs
persistentVolumeClaim:
claimName: pvc-ha-sync-logs

View file

@ -0,0 +1,59 @@
# To enable production sync: remove --dry-run from args below
apiVersion: batch/v1
kind: CronJob
metadata:
name: ha-sync-photos-hp-to-dell
namespace: infrastructure
spec:
schedule: "7,22,37,52 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
spec:
serviceAccountName: ha-sync
restartPolicy: OnFailure
containers:
- name: ha-sync
image: ha-sync:latest
imagePullPolicy: Never
command: ["/usr/local/bin/ha-sync"]
args:
- --src=/mnt/hp/photos
- --dest=/mnt/dell/photos
- --pair=photos
- --direction=hp-to-dell
- --log-dir=/var/log/ha-sync
- --exclude=*.sock
- --exclude=*.pid
- --exclude=*.lock
- --exclude=lock
- --dry-run # REMOVE THIS LINE to enable production sync
env:
- name: HA_SYNC_DB_DSN
valueFrom:
secretKeyRef:
name: ha-sync-db-secret
key: HA_SYNC_DB_DSN
volumeMounts:
- name: hp-data
mountPath: /mnt/hp/photos
- name: dell-data
mountPath: /mnt/dell/photos
- name: logs
mountPath: /var/log/ha-sync
resources:
requests: { cpu: 50m, memory: 64Mi }
limits: { cpu: 500m, memory: 256Mi }
volumes:
- name: hp-data
persistentVolumeClaim:
claimName: pvc-hp-photos
- name: dell-data
persistentVolumeClaim:
claimName: pvc-dell-photos
- name: logs
persistentVolumeClaim:
claimName: pvc-ha-sync-logs

View file

@ -0,0 +1,37 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- serviceaccount.yaml
- rbac.yaml
- pv-logs.yaml
- pvc-logs.yaml
- pv-dell-ai.yaml
- pv-dell-games.yaml
- pv-dell-infra.yaml
- pv-dell-media.yaml
- pv-dell-owncloud.yaml
- pv-dell-photos.yaml
- pv-hp-ai.yaml
- pv-hp-games.yaml
- pv-hp-infra.yaml
- pv-hp-media.yaml
- pv-hp-owncloud.yaml
- pv-hp-photos.yaml
- pvc-dell-ai.yaml
- pvc-dell-games.yaml
- pvc-dell-infra.yaml
- pvc-dell-media.yaml
- pvc-dell-owncloud.yaml
- pvc-dell-photos.yaml
- pvc-hp-ai.yaml
- pvc-hp-games.yaml
- pvc-hp-infra.yaml
- pvc-hp-media.yaml
- pvc-hp-owncloud.yaml
- pvc-hp-photos.yaml
# CronJobs are now managed by ha-sync-ctl (DB-driven). See archive/ for old static manifests.
# To migrate: ha-sync-ctl jobs import-k8s then ha-sync-ctl jobs apply-all
- ui-deployment.yaml
- ui-service.yaml
- ui-ingress.yaml

View file

@ -0,0 +1,14 @@
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-dell-ai
spec:
capacity:
storage: 500Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: ""
nfs:
server: 192.168.2.100
path: /data/ai

View file

@ -0,0 +1,14 @@
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-dell-games
spec:
capacity:
storage: 500Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: ""
nfs:
server: 192.168.2.100
path: /data/games

View file

@ -0,0 +1,14 @@
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-dell-infra
spec:
capacity:
storage: 100Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: ""
nfs:
server: 192.168.2.100
path: /data/infra

View file

@ -0,0 +1,14 @@
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-dell-media
spec:
capacity:
storage: 2Ti
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: ""
nfs:
server: 192.168.2.100
path: /data/media

View file

@ -0,0 +1,14 @@
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-dell-owncloud
spec:
capacity:
storage: 500Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: ""
nfs:
server: 192.168.2.100
path: /data/owncloud

View file

@ -0,0 +1,14 @@
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-dell-photos
spec:
capacity:
storage: 500Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: ""
nfs:
server: 192.168.2.100
path: /data/photos

View file

@ -0,0 +1,14 @@
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-hp-ai
spec:
capacity:
storage: 500Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: ""
nfs:
server: 192.168.2.193
path: /data/ai

View file

@ -0,0 +1,14 @@
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-hp-games
spec:
capacity:
storage: 500Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: ""
nfs:
server: 192.168.2.193
path: /data/games

View file

@ -0,0 +1,14 @@
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-hp-infra
spec:
capacity:
storage: 100Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: ""
nfs:
server: 192.168.2.193
path: /data/infra

View file

@ -0,0 +1,14 @@
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-hp-media
spec:
capacity:
storage: 2Ti
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: ""
nfs:
server: 192.168.2.193
path: /data/media

View file

@ -0,0 +1,14 @@
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-hp-owncloud
spec:
capacity:
storage: 500Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: ""
nfs:
server: 192.168.2.193
path: /data/owncloud

View file

@ -0,0 +1,14 @@
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-hp-photos
spec:
capacity:
storage: 500Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: ""
nfs:
server: 192.168.2.193
path: /data/photos

View file

@ -0,0 +1,14 @@
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-ha-sync-logs
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: ""
nfs:
server: 192.168.2.193
path: /data/infra/ha-sync-logs

View file

@ -0,0 +1,13 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-dell-ai
namespace: infrastructure
spec:
accessModes:
- ReadWriteMany
storageClassName: ""
volumeName: pv-dell-ai
resources:
requests:
storage: 500Gi

View file

@ -0,0 +1,13 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-dell-games
namespace: infrastructure
spec:
accessModes:
- ReadWriteMany
storageClassName: ""
volumeName: pv-dell-games
resources:
requests:
storage: 500Gi

View file

@ -0,0 +1,13 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-dell-infra
namespace: infrastructure
spec:
accessModes:
- ReadWriteMany
storageClassName: ""
volumeName: pv-dell-infra
resources:
requests:
storage: 100Gi

View file

@ -0,0 +1,13 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-dell-media
namespace: infrastructure
spec:
accessModes:
- ReadWriteMany
storageClassName: ""
volumeName: pv-dell-media
resources:
requests:
storage: 2Ti

View file

@ -0,0 +1,13 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-dell-owncloud
namespace: infrastructure
spec:
accessModes:
- ReadWriteMany
storageClassName: ""
volumeName: pv-dell-owncloud
resources:
requests:
storage: 500Gi

View file

@ -0,0 +1,13 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-dell-photos
namespace: infrastructure
spec:
accessModes:
- ReadWriteMany
storageClassName: ""
volumeName: pv-dell-photos
resources:
requests:
storage: 500Gi

View file

@ -0,0 +1,13 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-hp-ai
namespace: infrastructure
spec:
accessModes:
- ReadWriteMany
storageClassName: ""
volumeName: pv-hp-ai
resources:
requests:
storage: 500Gi

View file

@ -0,0 +1,13 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-hp-games
namespace: infrastructure
spec:
accessModes:
- ReadWriteMany
storageClassName: ""
volumeName: pv-hp-games
resources:
requests:
storage: 500Gi

View file

@ -0,0 +1,13 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-hp-infra
namespace: infrastructure
spec:
accessModes:
- ReadWriteMany
storageClassName: ""
volumeName: pv-hp-infra
resources:
requests:
storage: 100Gi

View file

@ -0,0 +1,13 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-hp-media
namespace: infrastructure
spec:
accessModes:
- ReadWriteMany
storageClassName: ""
volumeName: pv-hp-media
resources:
requests:
storage: 2Ti

View file

@ -0,0 +1,13 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-hp-owncloud
namespace: infrastructure
spec:
accessModes:
- ReadWriteMany
storageClassName: ""
volumeName: pv-hp-owncloud
resources:
requests:
storage: 500Gi

View file

@ -0,0 +1,13 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-hp-photos
namespace: infrastructure
spec:
accessModes:
- ReadWriteMany
storageClassName: ""
volumeName: pv-hp-photos
resources:
requests:
storage: 500Gi

View file

@ -0,0 +1,13 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-ha-sync-logs
namespace: infrastructure
spec:
accessModes:
- ReadWriteMany
storageClassName: ""
volumeName: pv-ha-sync-logs
resources:
requests:
storage: 10Gi

View file

@ -0,0 +1,4 @@
# Create this secret manually before applying:
# kubectl create secret generic ha-sync-db-secret \
# --from-literal=HA_SYNC_DB_DSN='<user>:<pass>@tcp(general-purpose-db.infrastructure.svc.cluster.local:3306)/general_db?parseTime=true' \
# -n infrastructure

View file

@ -0,0 +1,5 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: ha-sync
namespace: infrastructure

View file

@ -0,0 +1,49 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: ha-sync-ui
namespace: infrastructure
spec:
replicas: 2
selector:
matchLabels:
app: ha-sync-ui
template:
metadata:
labels:
app: ha-sync-ui
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: ha-sync-ui
topologyKey: kubernetes.io/hostname
serviceAccountName: ha-sync
containers:
- name: ha-sync-ui
image: ha-sync-ui:latest
imagePullPolicy: Never
command: ["/usr/local/bin/ha-sync-ui"]
ports:
- containerPort: 8080
env:
- name: HA_SYNC_DB_DSN
valueFrom:
secretKeyRef:
name: ha-sync-db-secret
key: HA_SYNC_DB_DSN
- name: HA_SYNC_UI_PORT
value: "8080"
resources:
requests: { cpu: 50m, memory: 64Mi }
limits: { cpu: 200m, memory: 128Mi }
livenessProbe:
httpGet: { path: /health, port: 8080 }
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet: { path: /health, port: 8080 }
initialDelaySeconds: 3
periodSeconds: 5

View file

@ -0,0 +1,28 @@
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ha-sync-ui
namespace: infrastructure
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/auth-url: "https://auth.vandachevici.ro/outpost.goauthentik.io/auth/nginx"
nginx.ingress.kubernetes.io/auth-signin: "https://auth.vandachevici.ro/outpost.goauthentik.io/start?rd=$scheme://$http_host$escaped_request_uri"
nginx.ingress.kubernetes.io/auth-response-headers: >-
Set-Cookie,X-authentik-username,X-authentik-groups,X-authentik-email,X-authentik-name,X-authentik-uid
spec:
ingressClassName: nginx
rules:
- host: ha-sync.vandachevici.ro
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: ha-sync-ui
port:
number: 80
tls:
- hosts:
- ha-sync.vandachevici.ro
secretName: ha-sync-ui-tls

View file

@ -0,0 +1,12 @@
apiVersion: v1
kind: Service
metadata:
name: ha-sync-ui
namespace: infrastructure
spec:
type: ClusterIP
selector:
app: ha-sync-ui
ports:
- port: 80
targetPort: 8080

View file

@ -0,0 +1,20 @@
# authentik-redis — standalone Redis for Authentik
# Helm release: authentik-redis, namespace: infrastructure
# Chart: bitnami/redis v25.3.2
# Install: helm install authentik-redis bitnami/redis -n infrastructure -f redis-values.yaml
# Repo: helm repo add bitnami https://charts.bitnami.com/bitnami
architecture: standalone
auth:
enabled: false
master:
persistence:
enabled: true
size: 1Gi
storageClass: nfs-authentik
resources:
limits:
memory: 128Mi
requests:
cpu: 30m
memory: 64Mi
resourcesPreset: none

View file

@ -0,0 +1,31 @@
# cert-manager v1.19.3
# Helm release: cert-manager, namespace: cert-manager
# Install: helm install cert-manager cert-manager/cert-manager -n cert-manager --create-namespace -f values.yaml
crds:
enabled: true
replicaCount: 2
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app.kubernetes.io/component: controller
topologyKey: kubernetes.io/hostname
webhook:
replicaCount: 2
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app.kubernetes.io/component: webhook
topologyKey: kubernetes.io/hostname
cainjector:
replicaCount: 2
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app.kubernetes.io/component: cainjector
topologyKey: kubernetes.io/hostname

View file

@ -0,0 +1,16 @@
# ingress-nginx v4.14.3 (app version 1.14.3)
# Helm release: ingress-nginx, namespace: infrastructure
# Install: helm install ingress-nginx ingress-nginx/ingress-nginx -n infrastructure -f values.yaml
controller:
config:
force-ssl-redirect: true
allow-snippet-annotations: "true"
annotations-risk-level: "Critical"
extraArgs:
default-ssl-certificate: "infrastructure/wildcard-vandachevici-tls"
hostPort:
enabled: false
kind: DaemonSet
service:
type: LoadBalancer
loadBalancerIP: 192.168.2.240

View file

@ -0,0 +1,41 @@
---
# NOTE: Secret 'digitalocean-dns-token' must be created manually in cert-manager namespace:
# kubectl create secret generic digitalocean-dns-token \
# --from-literal=access-token=<YOUR_DO_TOKEN> \
# -n cert-manager
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
annotations: {}
name: letsencrypt-prod
spec:
acme:
email: dan.vandachevici@gmail.com
privateKeySecretRef:
name: letsencrypt-prod-account-key
server: https://acme-v02.api.letsencrypt.org/directory
solvers:
- dns01:
digitalocean:
tokenSecretRef:
key: access-token
name: digitalocean-dns-token
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
annotations: {}
name: letsencrypt-staging
spec:
acme:
email: dan.vandachevici@gmail.com
privateKeySecretRef:
name: letsencrypt-staging-account-key
server: https://acme-staging-v02.api.letsencrypt.org/directory
solvers:
- dns01:
digitalocean:
tokenSecretRef:
key: access-token
name: digitalocean-dns-token

View file

@ -0,0 +1,183 @@
---
# NOTE: Images must be built and loaded onto nodes before applying.
# Run: /home/dan/homelab/services/device-inventory/build-and-load.sh
#
# Images required:
# inventory-server:latest → kube-node-2
# inventory-web-ui:latest → kube-node-2
# inventory-cli:latest → kube-node-2, kube-node-3
#
# nfs-general StorageClass is cluster-wide — no extra Helm release needed.
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: device-inventory-db-pvc
namespace: infrastructure
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: nfs-general
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: inventory-server
namespace: infrastructure
spec:
replicas: 1
selector:
matchLabels:
app: inventory-server
strategy:
type: Recreate
template:
metadata:
labels:
app: inventory-server
spec:
containers:
- name: inventory-server
image: inventory-server:latest
imagePullPolicy: Never
ports:
- containerPort: 9876
name: tcp
resources:
limits:
cpu: 200m
memory: 128Mi
requests:
cpu: 25m
memory: 32Mi
livenessProbe:
tcpSocket:
port: 9876
initialDelaySeconds: 10
periodSeconds: 20
failureThreshold: 5
readinessProbe:
tcpSocket:
port: 9876
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 3
volumeMounts:
- mountPath: /var/lib/inventory
name: db-storage
volumes:
- name: db-storage
persistentVolumeClaim:
claimName: device-inventory-db-pvc
---
apiVersion: v1
kind: Service
metadata:
name: inventory-server
namespace: infrastructure
spec:
selector:
app: inventory-server
ports:
- name: tcp
port: 9876
targetPort: 9876
nodePort: 30987
type: NodePort
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: inventory-web-ui
namespace: infrastructure
spec:
replicas: 1
selector:
matchLabels:
app: inventory-web-ui
template:
metadata:
labels:
app: inventory-web-ui
spec:
containers:
- name: inventory-web-ui
image: inventory-web-ui:latest
imagePullPolicy: Never
env:
- name: INVENTORY_HOST
value: inventory-server.infrastructure.svc.cluster.local
- name: INVENTORY_PORT
value: "9876"
- name: PORT
value: "8080"
ports:
- containerPort: 8080
name: http
resources:
limits:
cpu: 100m
memory: 64Mi
requests:
cpu: 10m
memory: 32Mi
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 20
failureThreshold: 3
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 3
periodSeconds: 10
failureThreshold: 3
---
apiVersion: v1
kind: Service
metadata:
name: inventory-web-ui
namespace: infrastructure
spec:
selector:
app: inventory-web-ui
ports:
- name: http
port: 80
targetPort: 8080
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: inventory-web-ui
namespace: infrastructure
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/auth-url: "https://auth.vandachevici.ro/outpost.goauthentik.io/auth/nginx"
nginx.ingress.kubernetes.io/auth-signin: "https://auth.vandachevici.ro/outpost.goauthentik.io/start?rd=$scheme://$http_host$escaped_request_uri"
nginx.ingress.kubernetes.io/auth-response-headers: >-
Set-Cookie,X-authentik-username,X-authentik-groups,X-authentik-email,X-authentik-name,X-authentik-uid
spec:
ingressClassName: nginx
rules:
- host: device-inventory.vandachevici.ro
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: inventory-web-ui
port:
number: 80
tls:
- hosts:
- device-inventory.vandachevici.ro
secretName: device-inventory-tls

View file

@ -0,0 +1,124 @@
---
# NOTE: Secret 'general-db-secret' must be created manually:
# kubectl create secret generic general-db-secret \
# --from-literal=root-password=<ROOT_PASS> \
# --from-literal=database=general_db \
# --from-literal=user=<USER> \
# --from-literal=password=<PASS> \
# -n infrastructure
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations: {}
name: general-db-v2-pvc
namespace: infrastructure
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: nfs-general-db
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
annotations: {}
name: general-purpose-db
namespace: infrastructure
spec:
replicas: 1
selector:
matchLabels:
app: general-purpose-db
serviceName: general-purpose-db
template:
metadata:
labels:
app: general-purpose-db
spec:
containers:
- env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
key: root-password
name: general-db-secret
- name: MYSQL_DATABASE
valueFrom:
secretKeyRef:
key: database
name: general-db-secret
- name: MYSQL_USER
valueFrom:
secretKeyRef:
key: user
name: general-db-secret
- name: MYSQL_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: general-db-secret
image: mysql:9
livenessProbe:
exec:
command:
- mysqladmin
- ping
- -h
- localhost
- -u
- root
- -pbackup_root_pass
failureThreshold: 10
initialDelaySeconds: 120
periodSeconds: 10
timeoutSeconds: 20
name: mysql
ports:
- containerPort: 3306
name: mysql
readinessProbe:
exec:
command:
- mysqladmin
- ping
- -h
- localhost
- -u
- root
- -pbackup_root_pass
failureThreshold: 10
initialDelaySeconds: 60
periodSeconds: 10
timeoutSeconds: 20
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 100m
memory: 256Mi
volumeMounts:
- mountPath: /var/lib/mysql
name: mysql-data
volumes:
- name: mysql-data
persistentVolumeClaim:
claimName: general-db-v2-pvc
---
apiVersion: v1
kind: Service
metadata:
annotations: {}
name: general-purpose-db
namespace: infrastructure
spec:
clusterIP: None
ports:
- name: mysql
port: 3306
targetPort: 3306
selector:
app: general-purpose-db

View file

@ -0,0 +1,23 @@
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: homelab-pool
namespace: metallb-system
spec:
addresses:
- 192.168.2.240-192.168.2.249
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: homelab-l2
namespace: metallb-system
spec:
ipAddressPools:
- homelab-pool
nodeSelectors:
- matchExpressions:
- key: kubernetes.io/hostname
operator: NotIn
values:
- kube-node-1

View file

@ -0,0 +1,174 @@
---
# PV for paperclip — NFS via keepalived VIP (192.168.2.252), synced between Dell and HP.
# Data lives at /data/ai/paperclip on the active NFS host.
apiVersion: v1
kind: PersistentVolume
metadata:
annotations: {}
name: paperclip-data-pv
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 50Gi
nfs:
path: /data/ai/paperclip
server: 192.168.2.252
persistentVolumeReclaimPolicy: Retain
storageClassName: ""
volumeMode: Filesystem
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations: {}
name: paperclip-data-pvc
namespace: ai
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: ""
volumeName: paperclip-data-pv
---
# NOTE: Secret 'paperclip-secrets' must be created manually:
# kubectl create secret generic paperclip-secrets \
# --from-literal=BETTER_AUTH_SECRET=<SECRET> \
# -n ai
---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations: {}
labels:
app: paperclip
name: paperclip
namespace: ai
spec:
replicas: 1
selector:
matchLabels:
app: paperclip
strategy:
type: Recreate
template:
metadata:
labels:
app: paperclip
spec:
containers:
- command:
- paperclipai
- run
- -d
- /paperclip
env:
- name: PAPERCLIP_AGENT_JWT_SECRET
valueFrom:
secretKeyRef:
key: PAPERCLIP_AGENT_JWT_SECRET
name: paperclip-secrets
- name: PORT
value: '3100'
- name: HOST
value: 0.0.0.0
- name: SERVE_UI
value: 'true'
- name: NODE_ENV
value: production
- name: PAPERCLIP_DEPLOYMENT_MODE
value: authenticated
- name: PAPERCLIP_DEPLOYMENT_EXPOSURE
value: private
- name: PAPERCLIP_PUBLIC_URL
value: https://paperclip.vandachevici.ro
- name: PAPERCLIP_MIGRATION_PROMPT
value: never
- name: PAPERCLIP_MIGRATION_AUTO_APPLY
value: 'true'
- name: HOME
value: /paperclip
image: paperclip:latest
imagePullPolicy: Never
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 60
periodSeconds: 20
tcpSocket:
port: 3100
name: paperclip
ports:
- containerPort: 3100
name: http
readinessProbe:
failureThreshold: 12
initialDelaySeconds: 30
periodSeconds: 10
tcpSocket:
port: 3100
resources:
limits:
cpu: 2000m
memory: 2Gi
requests:
cpu: 200m
memory: 512Mi
volumeMounts:
- mountPath: /paperclip
name: paperclip-data
volumes:
- name: paperclip-data
persistentVolumeClaim:
claimName: paperclip-data-pvc
---
apiVersion: v1
kind: Service
metadata:
annotations: {}
labels:
app: paperclip
name: paperclip
namespace: ai
spec:
ports:
- name: http
port: 80
targetPort: 3100
selector:
app: paperclip
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/proxy-body-size: 50m
nginx.ingress.kubernetes.io/proxy-buffering: 'off'
nginx.ingress.kubernetes.io/proxy-read-timeout: '300'
nginx.ingress.kubernetes.io/proxy-send-timeout: '300'
nginx.ingress.kubernetes.io/auth-url: "https://auth.vandachevici.ro/outpost.goauthentik.io/auth/nginx"
nginx.ingress.kubernetes.io/auth-signin: "https://auth.vandachevici.ro/outpost.goauthentik.io/start?rd=$scheme://$http_host$escaped_request_uri"
nginx.ingress.kubernetes.io/auth-response-headers: >-
Set-Cookie,X-authentik-username,X-authentik-groups,X-authentik-email,X-authentik-name,X-authentik-uid
name: paperclip-ingress
namespace: ai
spec:
ingressClassName: nginx
rules:
- host: paperclip.vandachevici.ro
http:
paths:
- backend:
service:
name: paperclip
port:
name: http
path: /
pathType: Prefix
tls:
- hosts:
- paperclip.vandachevici.ro
secretName: paperclip-tls

View file

@ -0,0 +1,257 @@
---
# NOTE: Secret 'parts-inventory-secret' must be created manually:
# kubectl create secret generic parts-inventory-secret \
# --from-literal=MONGO_URI="mongodb://parts-db.infrastructure.svc.cluster.local:27017/parts" \
# -n infrastructure
---
# MongoDB PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: parts-db-pvc
namespace: infrastructure
spec:
accessModes: [ReadWriteOnce]
storageClassName: nfs-general
resources:
requests:
storage: 5Gi
---
# MongoDB StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: parts-db
namespace: infrastructure
spec:
replicas: 1
serviceName: parts-db
selector:
matchLabels:
app: parts-db
template:
metadata:
labels:
app: parts-db
spec:
containers:
- name: mongo
image: mongo:4.4
ports:
- containerPort: 27017
name: mongo
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
livenessProbe:
exec:
command: ["mongo", "--eval", "db.adminCommand('ping')"]
initialDelaySeconds: 30
periodSeconds: 20
failureThreshold: 5
readinessProbe:
exec:
command: ["mongo", "--eval", "db.adminCommand('ping')"]
initialDelaySeconds: 15
periodSeconds: 10
failureThreshold: 3
volumeMounts:
- name: db-data
mountPath: /data/db
volumes:
- name: db-data
persistentVolumeClaim:
claimName: parts-db-pvc
---
# MongoDB Headless Service
apiVersion: v1
kind: Service
metadata:
name: parts-db
namespace: infrastructure
spec:
clusterIP: None
selector:
app: parts-db
ports:
- name: mongo
port: 27017
targetPort: 27017
---
# parts-api Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: parts-api
namespace: infrastructure
spec:
replicas: 2
selector:
matchLabels:
app: parts-api
strategy:
type: RollingUpdate
template:
metadata:
labels:
app: parts-api
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: parts-api
topologyKey: kubernetes.io/hostname
containers:
- name: parts-api
image: parts-api:latest
imagePullPolicy: Never
ports:
- containerPort: 3001
name: http
env:
- name: MONGO_URI
valueFrom:
secretKeyRef:
name: parts-inventory-secret
key: MONGO_URI
- name: PORT
value: "3001"
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 200m
memory: 128Mi
livenessProbe:
httpGet:
path: /health
port: 3001
initialDelaySeconds: 15
periodSeconds: 20
failureThreshold: 3
readinessProbe:
httpGet:
path: /health
port: 3001
initialDelaySeconds: 10
periodSeconds: 10
failureThreshold: 3
---
# parts-api Service
apiVersion: v1
kind: Service
metadata:
name: parts-api
namespace: infrastructure
spec:
selector:
app: parts-api
ports:
- name: http
port: 3001
targetPort: 3001
type: ClusterIP
---
# parts-ui Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: parts-ui
namespace: infrastructure
spec:
replicas: 2
selector:
matchLabels:
app: parts-ui
template:
metadata:
labels:
app: parts-ui
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: parts-ui
topologyKey: kubernetes.io/hostname
containers:
- name: parts-ui
image: parts-ui:latest
imagePullPolicy: Never
ports:
- containerPort: 8080
name: http
resources:
requests:
cpu: 10m
memory: 16Mi
limits:
cpu: 100m
memory: 64Mi
livenessProbe:
httpGet:
path: /
port: 8080
initialDelaySeconds: 5
periodSeconds: 20
failureThreshold: 3
readinessProbe:
httpGet:
path: /
port: 8080
initialDelaySeconds: 3
periodSeconds: 10
failureThreshold: 3
---
# parts-ui Service
apiVersion: v1
kind: Service
metadata:
name: parts-ui
namespace: infrastructure
spec:
selector:
app: parts-ui
ports:
- name: http
port: 80
targetPort: 8080
type: ClusterIP
---
# Ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: parts-ui-ingress
namespace: infrastructure
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/auth-url: "https://auth.vandachevici.ro/outpost.goauthentik.io/auth/nginx"
nginx.ingress.kubernetes.io/auth-signin: "https://auth.vandachevici.ro/outpost.goauthentik.io/start?rd=$scheme://$http_host$escaped_request_uri"
nginx.ingress.kubernetes.io/auth-response-headers: >-
Set-Cookie,X-authentik-username,X-authentik-groups,X-authentik-email,X-authentik-name,X-authentik-uid
spec:
ingressClassName: nginx
rules:
- host: parts.vandachevici.ro
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: parts-ui
port:
number: 80
tls:
- hosts:
- parts.vandachevici.ro
secretName: parts-ui-tls

View file

@ -0,0 +1,57 @@
---
apiVersion: v1
kind: Endpoints
metadata:
name: proxmox
namespace: infrastructure
subsets:
- addresses:
- ip: 192.168.2.193
ports:
- port: 8006
---
apiVersion: v1
kind: Service
metadata:
name: proxmox
namespace: infrastructure
spec:
ports:
- port: 8006
targetPort: 8006
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: proxmox
namespace: infrastructure
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
nginx.ingress.kubernetes.io/proxy-ssl-verify: "off"
nginx.ingress.kubernetes.io/auth-url: "https://auth.vandachevici.ro/outpost.goauthentik.io/auth/nginx"
nginx.ingress.kubernetes.io/auth-signin: "https://auth.vandachevici.ro/outpost.goauthentik.io/start?rd=$scheme://$http_host$escaped_request_uri"
nginx.ingress.kubernetes.io/auth-response-headers: >-
Set-Cookie,X-authentik-username,X-authentik-groups,X-authentik-email,X-authentik-name,X-authentik-uid
nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
nginx.ingress.kubernetes.io/proxy-body-size: "0"
# WebSocket support for noVNC console
nginx.ingress.kubernetes.io/proxy-http-version: "1.1"
spec:
ingressClassName: nginx
rules:
- host: proxmox.vandachevici.ro
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: proxmox
port:
number: 8006
tls:
- hosts:
- proxmox.vandachevici.ro
secretName: proxmox-tls

View file

@ -0,0 +1,133 @@
---
apiVersion: v1
kind: ConfigMap
metadata:
annotations: {}
name: speedtest-tracker-config
namespace: infrastructure
data:
APP_KEY: base64:F1lxPXfl42EXK1PTsi5DecMkyvTMPZgfAYDdSYwd9ME=
APP_URL: http://192.168.2.100:20000
DB_CONNECTION: mysql
DB_DATABASE: general_db
DB_HOST: general-purpose-db.infrastructure.svc.cluster.local
DB_PORT: '3306'
DISPLAY_TIMEZONE: Etc/UTC
PGID: '1000'
PRUNE_RESULTS_OLDER_THAN: '7'
PUID: '1000'
SPEEDTEST_SCHEDULE: '*/5 * * * *'
SPEEDTEST_SERVERS: 31470,1584,60747
TZ: Etc/UTC
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations: {}
name: speedtest-tracker-v2-pvc
namespace: infrastructure
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: nfs-speedtest
---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations: {}
name: speedtest-tracker
namespace: infrastructure
spec:
replicas: 1
selector:
matchLabels:
app: speedtest-tracker
template:
metadata:
labels:
app: speedtest-tracker
spec:
containers:
- env:
- name: DB_USERNAME
valueFrom:
secretKeyRef:
key: user
name: general-db-secret
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: general-db-secret
envFrom:
- configMapRef:
name: speedtest-tracker-config
- secretRef:
name: general-db-secret
image: lscr.io/linuxserver/speedtest-tracker:latest
name: speedtest-tracker
ports:
- containerPort: 80
name: http
resources:
limits:
cpu: 200m
memory: 256Mi
requests:
cpu: 50m
memory: 128Mi
volumeMounts:
- mountPath: /config
name: config
volumes:
- name: config
persistentVolumeClaim:
claimName: speedtest-tracker-v2-pvc
---
apiVersion: v1
kind: Service
metadata:
annotations: {}
name: speedtest-tracker
namespace: infrastructure
spec:
ports:
- name: http
nodePort: 30200
port: 80
targetPort: 80
selector:
app: speedtest-tracker
type: NodePort
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/auth-url: "https://auth.vandachevici.ro/outpost.goauthentik.io/auth/nginx"
nginx.ingress.kubernetes.io/auth-signin: "https://auth.vandachevici.ro/outpost.goauthentik.io/start?rd=$scheme://$http_host$escaped_request_uri"
nginx.ingress.kubernetes.io/auth-response-headers: >-
Set-Cookie,X-authentik-username,X-authentik-groups,X-authentik-email,X-authentik-name,X-authentik-uid
name: speedtest-tracker
namespace: infrastructure
spec:
ingressClassName: nginx
rules:
- host: speedtest.vandachevici.ro
http:
paths:
- backend:
service:
name: speedtest-tracker
port:
number: 80
path: /
pathType: Prefix
tls:
- hosts:
- speedtest.vandachevici.ro
secretName: speedtest-tls

View file

@ -0,0 +1,50 @@
---
apiVersion: v1
kind: Endpoints
metadata:
name: technitium-dns
namespace: infrastructure
subsets:
- addresses:
- ip: 192.168.2.193
ports:
- port: 5380
---
apiVersion: v1
kind: Service
metadata:
name: technitium-dns
namespace: infrastructure
spec:
ports:
- port: 5380
targetPort: 5380
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: technitium-dns
namespace: infrastructure
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/auth-url: "https://auth.vandachevici.ro/outpost.goauthentik.io/auth/nginx"
nginx.ingress.kubernetes.io/auth-signin: "https://auth.vandachevici.ro/outpost.goauthentik.io/start?rd=$scheme://$http_host$escaped_request_uri"
nginx.ingress.kubernetes.io/auth-response-headers: >-
Set-Cookie,X-authentik-username,X-authentik-groups,X-authentik-email,X-authentik-name,X-authentik-uid
spec:
ingressClassName: nginx
rules:
- host: dns.vandachevici.ro
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: technitium-dns
port:
number: 5380
tls:
- hosts:
- dns.vandachevici.ro
secretName: technitium-dns-tls

View file

@ -0,0 +1,21 @@
---
# Wildcard certificate for *.vandachevici.ro
# Used as nginx-ingress default SSL cert to eliminate the brief self-signed
# cert flash when a new ingress is first deployed.
#
# Requires DNS01 solver (already configured in letsencrypt-prod ClusterIssuer).
# Secret 'wildcard-vandachevici-tls' is referenced in ingress-nginx helm values.
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: wildcard-vandachevici
namespace: infrastructure
spec:
secretName: wildcard-vandachevici-tls
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
commonName: "*.vandachevici.ro"
dnsNames:
- "*.vandachevici.ro"
- "vandachevici.ro"

View file

@ -0,0 +1,94 @@
---
apiVersion: v1
kind: ConfigMap
metadata:
annotations: {}
name: iot-api-config
namespace: iot
data:
MYSQL_DATABASE: iot_db
---
# NOTE: iot-api uses image 'iot-api:latest' with imagePullPolicy=Never
# The image must be built and loaded onto the scheduled node before deploying.
# Current status: ErrImageNeverPull on kube-node-3 — image not present there.
# To fix: either add nodeSelector to pin to a node with the image, or push the
# image to a registry and change imagePullPolicy to Always.
---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations: {}
name: iot-api
namespace: iot
spec:
replicas: 1
selector:
matchLabels:
app: iot-api
template:
metadata:
labels:
app: iot-api
spec:
nodeSelector:
topology.homelab/server: dell
containers:
- env:
- name: MYSQL_USER
valueFrom:
secretKeyRef:
key: user
name: iot-db-secret
- name: MYSQL_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: iot-db-secret
envFrom:
- configMapRef:
name: iot-api-config
image: iot-api:latest
imagePullPolicy: Never
livenessProbe:
failureThreshold: 5
httpGet:
path: /
port: 8000
initialDelaySeconds: 30
periodSeconds: 30
timeoutSeconds: 10
name: iot-api
ports:
- containerPort: 8000
name: http
readinessProbe:
failureThreshold: 5
httpGet:
path: /
port: 8000
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 10
resources:
limits:
cpu: 200m
memory: 256Mi
requests:
cpu: 50m
memory: 128Mi
---
apiVersion: v1
kind: Service
metadata:
annotations: {}
name: iot-api
namespace: iot
spec:
ports:
- name: http
nodePort: 30800
port: 8000
targetPort: 8000
selector:
app: iot-api
type: NodePort

135
deployment/iot/iot-db.yaml Normal file
View file

@ -0,0 +1,135 @@
---
# NOTE: Secret 'iot-db-secret' must be created manually:
# kubectl create secret generic iot-db-secret \
# --from-literal=root-password=<ROOT_PASS> \
# --from-literal=database=iot_db \
# --from-literal=user=<USER> \
# --from-literal=password=<PASS> \
# -n iot
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations: {}
name: iot-db-v2-pvc
namespace: iot
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: nfs-iot
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
annotations: {}
name: iot-db
namespace: iot
spec:
replicas: 1
selector:
matchLabels:
app: iot-db
serviceName: iot-db
template:
metadata:
labels:
app: iot-db
spec:
containers:
- env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
key: root-password
name: iot-db-secret
- name: MYSQL_DATABASE
valueFrom:
secretKeyRef:
key: database
name: iot-db-secret
- name: MYSQL_USER
valueFrom:
secretKeyRef:
key: user
name: iot-db-secret
- name: MYSQL_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: iot-db-secret
image: mysql:9
livenessProbe:
exec:
command:
- mysqladmin
- ping
- -h
- localhost
- -u
- root
- -piot-db-root-passwort
failureThreshold: 10
initialDelaySeconds: 120
periodSeconds: 10
timeoutSeconds: 20
name: mysql
ports:
- containerPort: 3306
name: mysql
readinessProbe:
exec:
command:
- mysqladmin
- ping
- -h
- localhost
- -u
- root
- -piot-db-root-passwort
failureThreshold: 10
initialDelaySeconds: 60
periodSeconds: 10
timeoutSeconds: 20
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 100m
memory: 256Mi
volumeMounts:
- mountPath: /var/lib/mysql
name: mysql-data
volumes:
- name: mysql-data
persistentVolumeClaim:
claimName: iot-db-v2-pvc
---
apiVersion: v1
kind: Service
metadata:
annotations: {}
name: iot-db
namespace: iot
spec:
clusterIP: None
ports:
- name: mysql
port: 3306
targetPort: 3306
selector:
app: iot-db
---
# ExternalName alias so apps can use 'db' as hostname
apiVersion: v1
kind: Service
metadata:
annotations: {}
name: db
namespace: iot
spec:
externalName: iot-db.iot.svc.cluster.local
type: ExternalName

View file

@ -0,0 +1,446 @@
---
# NOTE: Secret 'immich-secret' must be created manually:
# kubectl create secret generic immich-secret \
# --from-literal=db-username=<USER> \
# --from-literal=db-password=<PASS> \
# --from-literal=db-name=immich \
# --from-literal=jwt-secret=<JWT_SECRET> \
# -n media
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations: {}
name: immich-db-v2-pvc
namespace: media
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
storageClassName: nfs-immich
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations: {}
name: immich-library-v2-pvc
namespace: media
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 290Gi
storageClassName: nfs-immich
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations: {}
name: immich-ml-cache-v2-pvc
namespace: media
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 20Gi
storageClassName: nfs-immich
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations: {}
name: immich-valkey-v2-pvc
namespace: media
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: nfs-immich
---
# immich-db: PostgreSQL with pgvecto.rs / vectorchord extensions for AI embeddings
apiVersion: apps/v1
kind: StatefulSet
metadata:
annotations: {}
name: immich-db
namespace: media
spec:
replicas: 1
selector:
matchLabels:
app: immich-db
serviceName: immich-db
template:
metadata:
labels:
app: immich-db
spec:
containers:
- env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
key: db-password
name: immich-secret
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
key: db-username
name: immich-secret
- name: POSTGRES_DB
valueFrom:
secretKeyRef:
key: db-name
name: immich-secret
- name: POSTGRES_INITDB_ARGS
value: --data-checksums
image: ghcr.io/immich-app/postgres:14-vectorchord0.4.3-pgvectors0.2.0
livenessProbe:
exec:
command:
- pg_isready
failureThreshold: 6
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
name: postgres
ports:
- containerPort: 5432
name: postgres
readinessProbe:
exec:
command:
- pg_isready
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 100m
memory: 256Mi
volumeMounts:
- mountPath: /var/lib/postgresql/data
name: postgres-data
subPath: postgres
volumes:
- name: postgres-data
persistentVolumeClaim:
claimName: immich-db-v2-pvc
---
apiVersion: v1
kind: Service
metadata:
annotations: {}
name: immich-db
namespace: media
spec:
clusterIP: None
ports:
- name: postgres
port: 5432
targetPort: 5432
selector:
app: immich-db
---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations: {}
name: immich-valkey
namespace: media
spec:
replicas: 1
selector:
matchLabels:
app: immich-valkey
template:
metadata:
labels:
app: immich-valkey
spec:
containers:
- args:
- --save
- '60'
- '1'
- --loglevel
- warning
image: docker.io/valkey/valkey:9.0-alpine
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
tcpSocket:
port: 6379
timeoutSeconds: 5
name: valkey
ports:
- containerPort: 6379
name: redis
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 5
periodSeconds: 10
tcpSocket:
port: 6379
timeoutSeconds: 5
resources:
limits:
cpu: 200m
memory: 256Mi
requests:
cpu: 50m
memory: 64Mi
volumeMounts:
- mountPath: /data
name: data
volumes:
- name: data
persistentVolumeClaim:
claimName: immich-valkey-v2-pvc
---
apiVersion: v1
kind: Service
metadata:
annotations: {}
name: immich-valkey
namespace: media
spec:
ports:
- name: redis
port: 6379
targetPort: 6379
selector:
app: immich-valkey
type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations: {}
name: immich-server
namespace: media
spec:
replicas: 2
selector:
matchLabels:
app: immich-server
template:
metadata:
labels:
app: immich-server
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: immich-server
topologyKey: kubernetes.io/hostname
containers:
- env:
- name: DB_HOSTNAME
value: immich-db
- name: DB_PORT
value: '5432'
- name: DB_USERNAME
valueFrom:
secretKeyRef:
key: db-username
name: immich-secret
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
key: db-password
name: immich-secret
- name: DB_DATABASE_NAME
valueFrom:
secretKeyRef:
key: db-name
name: immich-secret
- name: DB_STORAGE_TYPE
value: HDD
- name: DB_VECTOR_EXTENSION
value: vectorchord
- name: REDIS_HOSTNAME
value: immich-valkey
- name: REDIS_PORT
value: '6379'
- name: IMMICH_MACHINE_LEARNING_URL
value: http://immich-ml:3003
- name: JWT_SECRET
valueFrom:
secretKeyRef:
key: jwt-secret
name: immich-secret
image: ghcr.io/immich-app/immich-server:release
livenessProbe:
failureThreshold: 5
httpGet:
path: /api/server/ping
port: 2283
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 10
name: immich-server
ports:
- containerPort: 2283
name: http
readinessProbe:
failureThreshold: 3
httpGet:
path: /api/server/ping
port: 2283
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 10
resources:
limits:
cpu: 1000m
memory: 2Gi
requests:
cpu: 250m
memory: 512Mi
volumeMounts:
- mountPath: /usr/src/app/upload
name: library
volumes:
- name: library
persistentVolumeClaim:
claimName: immich-library-v2-pvc
---
apiVersion: v1
kind: Service
metadata:
annotations: {}
name: immich-web
namespace: media
spec:
ports:
- name: http
nodePort: 32283
port: 2283
targetPort: 2283
selector:
app: immich-server
type: NodePort
---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations: {}
name: immich-ml
namespace: media
spec:
replicas: 2
selector:
matchLabels:
app: immich-ml
template:
metadata:
labels:
app: immich-ml
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: immich-ml
topologyKey: kubernetes.io/hostname
containers:
- env:
- name: TRANSFORMERS_CACHE
value: /cache
- name: HF_XET_CACHE
value: /cache/huggingface-xet
- name: MPLCONFIGDIR
value: /cache/matplotlib-config
image: ghcr.io/immich-app/immich-machine-learning:release
livenessProbe:
failureThreshold: 5
httpGet:
path: /ping
port: 3003
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 10
name: machine-learning
ports:
- containerPort: 3003
name: http
readinessProbe:
failureThreshold: 3
httpGet:
path: /ping
port: 3003
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 10
resources:
limits:
cpu: 4000m
memory: 8Gi
requests:
cpu: 500m
memory: 2Gi
volumeMounts:
- mountPath: /cache
name: cache
volumes:
- name: cache
persistentVolumeClaim:
claimName: immich-ml-cache-v2-pvc
---
apiVersion: v1
kind: Service
metadata:
annotations: {}
name: immich-ml
namespace: media
spec:
ports:
- name: http
port: 3003
targetPort: 3003
selector:
app: immich-ml
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/proxy-body-size: '0'
nginx.ingress.kubernetes.io/proxy-read-timeout: '600'
nginx.ingress.kubernetes.io/proxy-send-timeout: '600'
name: immich
namespace: media
spec:
ingressClassName: nginx
rules:
- host: photos.vandachevici.ro
http:
paths:
- backend:
service:
name: immich-web
port:
number: 2283
path: /
pathType: Prefix
tls:
- hosts:
- photos.vandachevici.ro
secretName: immich-tls

View file

@ -0,0 +1,161 @@
---
apiVersion: v1
kind: ConfigMap
metadata:
annotations: {}
name: jellyfin-config
namespace: media
data:
JELLYFIN_PublishedServerUrl: https://media.vandachevici.ro
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations: {}
name: jellyfin-config-v2-pvc
namespace: media
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: nfs-jellyfin
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations: {}
name: jellyfin-cache-v2-pvc
namespace: media
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: nfs-jellyfin
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations: {}
name: jellyfin-media-v2-pvc
namespace: media
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 650Gi
storageClassName: nfs-jellyfin
---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations: {}
name: jellyfin
namespace: media
spec:
replicas: 1
selector:
matchLabels:
app: jellyfin
template:
metadata:
labels:
app: jellyfin
spec:
containers:
- envFrom:
- configMapRef:
name: jellyfin-config
image: jellyfin/jellyfin
livenessProbe:
failureThreshold: 5
httpGet:
path: /health
port: 8096
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 10
name: jellyfin
ports:
- containerPort: 8096
name: http
readinessProbe:
failureThreshold: 5
httpGet:
path: /health
port: 8096
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 10
resources:
limits:
cpu: 2000m
memory: 4Gi
requests:
cpu: 200m
memory: 512Mi
volumeMounts:
- mountPath: /config
name: config
- mountPath: /cache
name: cache
- mountPath: /media
name: media
readOnly: true
volumes:
- name: config
persistentVolumeClaim:
claimName: jellyfin-config-v2-pvc
- name: cache
persistentVolumeClaim:
claimName: jellyfin-cache-v2-pvc
- name: media
persistentVolumeClaim:
claimName: jellyfin-media-v2-pvc
---
apiVersion: v1
kind: Service
metadata:
annotations: {}
name: jellyfin
namespace: media
spec:
ports:
- name: http
port: 8096
targetPort: 8096
selector:
app: jellyfin
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/proxy-body-size: '0'
nginx.ingress.kubernetes.io/proxy-read-timeout: '600'
nginx.ingress.kubernetes.io/proxy-send-timeout: '600'
name: jellyfin
namespace: media
spec:
ingressClassName: nginx
rules:
- host: media.vandachevici.ro
http:
paths:
- backend:
service:
name: jellyfin
port:
number: 8096
path: /
pathType: Prefix
tls:
- hosts:
- media.vandachevici.ro
secretName: jellyfin-tls

View file

@ -0,0 +1,29 @@
---
# Prometheus local-storage PV — hostPath on kube-node-1 at /data/infra/prometheus
# This PV must be created before the Prometheus Helm chart is deployed.
# The Helm chart creates the PVC; this PV satisfies it via storageClassName=local-storage
# and nodeAffinity pinning to kube-node-1.
apiVersion: v1
kind: PersistentVolume
metadata:
annotations: {}
name: prometheus-storage-pv
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 100Gi
hostPath:
path: /data/infra/prometheus
type: DirectoryOrCreate
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- kube-node-1
persistentVolumeReclaimPolicy: Retain
storageClassName: local-storage
volumeMode: Filesystem

71
deployment/pdbs.yaml Normal file
View file

@ -0,0 +1,71 @@
---
# PodDisruptionBudgets for all HA-scaled services.
# Ensures at least 1 replica stays up during node drains and rolling updates.
#
# Apply with: kubectl apply -f deployment/pdbs.yaml
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: parts-api-pdb
namespace: infrastructure
spec:
minAvailable: 1
selector:
matchLabels:
app: parts-api
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: parts-ui-pdb
namespace: infrastructure
spec:
minAvailable: 1
selector:
matchLabels:
app: parts-ui
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: ha-sync-ui-pdb
namespace: infrastructure
spec:
minAvailable: 1
selector:
matchLabels:
app: ha-sync-ui
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: games-console-backend-pdb
namespace: infrastructure
spec:
minAvailable: 1
selector:
matchLabels:
app: games-console-backend
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: games-console-ui-pdb
namespace: infrastructure
spec:
minAvailable: 1
selector:
matchLabels:
app: games-console-ui
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: coredns-pdb
namespace: kube-system
spec:
minAvailable: 1
selector:
matchLabels:
k8s-app: kube-dns

View file

@ -0,0 +1,268 @@
---
# HA PVCs — pre-bound to Dell NFS PVs via keepalived VIP 192.168.2.50
# storageClassName: "" + volumeName forces binding to specific PV
# ==================== MEDIA namespace ====================
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: jellyfin-media-v2-pvc
namespace: media
spec:
accessModes: [ReadWriteOnce]
storageClassName: ""
volumeName: jellyfin-media-pv
resources:
requests:
storage: 650Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: jellyfin-config-v2-pvc
namespace: media
spec:
accessModes: [ReadWriteOnce]
storageClassName: ""
volumeName: jellyfin-config-pv
resources:
requests:
storage: 10Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: jellyfin-cache-v2-pvc
namespace: media
spec:
accessModes: [ReadWriteOnce]
storageClassName: ""
volumeName: jellyfin-cache-pv
resources:
requests:
storage: 10Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: immich-library-v2-pvc
namespace: media
spec:
accessModes: [ReadWriteOnce]
storageClassName: ""
volumeName: immich-library-pv
resources:
requests:
storage: 290Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: immich-db-v2-pvc
namespace: media
spec:
accessModes: [ReadWriteOnce]
storageClassName: ""
volumeName: immich-db-pv
resources:
requests:
storage: 20Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: immich-ml-cache-v2-pvc
namespace: media
spec:
accessModes: [ReadWriteOnce]
storageClassName: ""
volumeName: immich-ml-cache-pv
resources:
requests:
storage: 20Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: immich-valkey-v2-pvc
namespace: media
spec:
accessModes: [ReadWriteOnce]
storageClassName: ""
volumeName: immich-valkey-pv
resources:
requests:
storage: 1Gi
---
# ==================== STORAGE namespace ====================
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: owncloud-files-v2-pvc
namespace: storage
spec:
accessModes: [ReadWriteOnce]
storageClassName: ""
volumeName: owncloud-files-pv
resources:
requests:
storage: 190Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: owncloud-mariadb-v2-pvc
namespace: storage
spec:
accessModes: [ReadWriteOnce]
storageClassName: ""
volumeName: owncloud-mariadb-pv
resources:
requests:
storage: 5Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: owncloud-redis-v2-pvc
namespace: storage
spec:
accessModes: [ReadWriteOnce]
storageClassName: ""
volumeName: owncloud-redis-pv
resources:
requests:
storage: 1Gi
---
# ==================== GAMES namespace ====================
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: minecraft-home-v2-pvc
namespace: games
spec:
accessModes: [ReadWriteOnce]
storageClassName: ""
volumeName: minecraft-home-pv
resources:
requests:
storage: 10Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: minecraft-cheats-v2-pvc
namespace: games
spec:
accessModes: [ReadWriteOnce]
storageClassName: ""
volumeName: minecraft-cheats-pv
resources:
requests:
storage: 10Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: minecraft-creative-v2-pvc
namespace: games
spec:
accessModes: [ReadWriteOnce]
storageClassName: ""
volumeName: minecraft-creative-pv
resources:
requests:
storage: 10Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: minecraft-johannes-v2-pvc
namespace: games
spec:
accessModes: [ReadWriteOnce]
storageClassName: ""
volumeName: minecraft-johannes-pv
resources:
requests:
storage: 10Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: minecraft-noah-v2-pvc
namespace: games
spec:
accessModes: [ReadWriteOnce]
storageClassName: ""
volumeName: minecraft-noah-pv
resources:
requests:
storage: 10Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: factorio-alone-v2-pvc
namespace: games
spec:
accessModes: [ReadWriteOnce]
storageClassName: ""
volumeName: factorio-alone-pv
resources:
requests:
storage: 5Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: openttd-v2-pvc
namespace: games
spec:
accessModes: [ReadWriteOnce]
storageClassName: ""
volumeName: openttd-pv
resources:
requests:
storage: 2Gi
---
# ==================== INFRASTRUCTURE namespace ====================
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: general-db-v2-pvc
namespace: infrastructure
spec:
accessModes: [ReadWriteOnce]
storageClassName: ""
volumeName: general-db-pv
resources:
requests:
storage: 10Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: speedtest-tracker-v2-pvc
namespace: infrastructure
spec:
accessModes: [ReadWriteOnce]
storageClassName: ""
volumeName: speedtest-tracker-pv
resources:
requests:
storage: 1Gi
---
# ==================== IOT namespace ====================
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: iot-db-v2-pvc
namespace: iot
spec:
accessModes: [ReadWriteOnce]
storageClassName: ""
volumeName: iot-db-pv
resources:
requests:
storage: 10Gi

View file

@ -0,0 +1,117 @@
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations: {}
name: owncloud-mariadb-v2-pvc
namespace: storage
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: nfs-owncloud
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
annotations: {}
name: owncloud-mariadb
namespace: storage
spec:
replicas: 1
selector:
matchLabels:
app: owncloud-mariadb
serviceName: owncloud-mariadb
template:
metadata:
labels:
app: owncloud-mariadb
spec:
containers:
- args:
- --max-allowed-packet=128M
- --innodb-log-file-size=64M
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
key: root-password
name: owncloud-db-secret
- name: MYSQL_USER
valueFrom:
secretKeyRef:
key: user
name: owncloud-db-secret
- name: MYSQL_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: owncloud-db-secret
- name: MYSQL_DATABASE
valueFrom:
secretKeyRef:
key: database
name: owncloud-db-secret
- name: MARIADB_AUTO_UPGRADE
value: '1'
image: mariadb:10.6
livenessProbe:
exec:
command:
- mysqladmin
- ping
- -u
- root
- -powncloud
failureThreshold: 5
initialDelaySeconds: 60
periodSeconds: 10
timeoutSeconds: 5
name: mariadb
ports:
- containerPort: 3306
name: mysql
readinessProbe:
exec:
command:
- mysqladmin
- ping
- -u
- root
- -powncloud
failureThreshold: 5
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
resources:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 100m
memory: 256Mi
volumeMounts:
- mountPath: /var/lib/mysql
name: mysql-data
volumes:
- name: mysql-data
persistentVolumeClaim:
claimName: owncloud-mariadb-v2-pvc
---
apiVersion: v1
kind: Service
metadata:
annotations: {}
name: owncloud-mariadb
namespace: storage
spec:
clusterIP: None
ports:
- name: mysql
port: 3306
targetPort: 3306
selector:
app: owncloud-mariadb

View file

@ -0,0 +1,87 @@
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations: {}
name: owncloud-redis-v2-pvc
namespace: storage
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: nfs-owncloud
---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations: {}
name: owncloud-redis
namespace: storage
spec:
replicas: 1
selector:
matchLabels:
app: owncloud-redis
template:
metadata:
labels:
app: owncloud-redis
spec:
containers:
- args:
- --databases
- '1'
image: redis:6
livenessProbe:
exec:
command:
- redis-cli
- ping
failureThreshold: 5
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
name: redis
ports:
- containerPort: 6379
name: redis
readinessProbe:
exec:
command:
- redis-cli
- ping
failureThreshold: 5
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
resources:
limits:
cpu: 200m
memory: 256Mi
requests:
cpu: 50m
memory: 64Mi
volumeMounts:
- mountPath: /data
name: redis-data
volumes:
- name: redis-data
persistentVolumeClaim:
claimName: owncloud-redis-v2-pvc
---
apiVersion: v1
kind: Service
metadata:
annotations: {}
name: owncloud-redis
namespace: storage
spec:
clusterIP: None
ports:
- name: redis
port: 6379
targetPort: 6379
selector:
app: owncloud-redis

View file

@ -0,0 +1,158 @@
---
apiVersion: v1
kind: ConfigMap
metadata:
annotations: {}
name: owncloud-config
namespace: storage
data:
OWNCLOUD_ADMIN_PASSWORD: YYuiwhdyfUOjjoako
OWNCLOUD_ADMIN_USERNAME: sefu
OWNCLOUD_DB_HOST: owncloud-mariadb
OWNCLOUD_DB_NAME: owncloud
OWNCLOUD_DB_TYPE: mysql
OWNCLOUD_DOMAIN: localhost:8080
OWNCLOUD_MYSQL_UTF8MB4: 'true'
OWNCLOUD_REDIS_ENABLED: 'true'
OWNCLOUD_REDIS_HOST: owncloud-redis
OWNCLOUD_TRUSTED_DOMAINS: drive.vandachevici.ro
---
# NOTE: Secret 'owncloud-db-secret' must be created manually:
# kubectl create secret generic owncloud-db-secret \
# --from-literal=root-password=<ROOT_PASS> \
# --from-literal=user=<USER> \
# --from-literal=password=<PASS> \
# --from-literal=database=owncloud \
# -n storage
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations: {}
name: owncloud-files-v2-pvc
namespace: storage
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 190Gi
storageClassName: nfs-owncloud
---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations: {}
name: owncloud-server
namespace: storage
spec:
replicas: 2
selector:
matchLabels:
app: owncloud-server
template:
metadata:
labels:
app: owncloud-server
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: owncloud-server
topologyKey: kubernetes.io/hostname
containers:
- env:
- name: OWNCLOUD_DB_USERNAME
valueFrom:
secretKeyRef:
key: user
name: owncloud-db-secret
- name: OWNCLOUD_DB_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: owncloud-db-secret
envFrom:
- configMapRef:
name: owncloud-config
image: owncloud/server:10.12
livenessProbe:
exec:
command:
- /usr/bin/healthcheck
failureThreshold: 5
initialDelaySeconds: 120
periodSeconds: 30
timeoutSeconds: 10
name: owncloud
ports:
- containerPort: 8080
name: http
readinessProbe:
exec:
command:
- /usr/bin/healthcheck
failureThreshold: 5
initialDelaySeconds: 60
periodSeconds: 10
timeoutSeconds: 10
resources:
limits:
cpu: 1000m
memory: 2Gi
requests:
cpu: 200m
memory: 512Mi
volumeMounts:
- mountPath: /mnt/data
name: owncloud-files
volumes:
- name: owncloud-files
persistentVolumeClaim:
claimName: owncloud-files-v2-pvc
---
apiVersion: v1
kind: Service
metadata:
annotations: {}
name: owncloud-server
namespace: storage
spec:
ports:
- name: http
port: 8080
targetPort: 8080
selector:
app: owncloud-server
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/proxy-body-size: '0'
nginx.ingress.kubernetes.io/proxy-read-timeout: '600'
nginx.ingress.kubernetes.io/proxy-send-timeout: '600'
nginx.ingress.kubernetes.io/use-forwarded-headers: 'true'
name: owncloud
namespace: storage
spec:
ingressClassName: nginx
rules:
- host: drive.vandachevici.ro
http:
paths:
- backend:
service:
name: owncloud-server
port:
number: 8080
path: /
pathType: Prefix
tls:
- hosts:
- drive.vandachevici.ro
secretName: owncloud-tls

354
execution-plans/ha-sync.md Normal file
View file

@ -0,0 +1,354 @@
# HA Sync — Execution Plan
## Problem Statement
Two servers (Dell OptiPlex 7070 at `192.168.2.100` and HP ProLiant at `192.168.2.193`) each export the same folder set over NFS. A Kubernetes-native tool must keep each folder pair in bidirectional sync: newest file wins, mtime is preserved on copy, delete propagation is strict (one-way per CronJob), and every operation is logged in the MySQL instance in the `infrastructure` namespace.
---
## Architecture Decisions (Agreed)
| Decision | Choice | Rationale |
|---|---|---|
| Language | **Go** | Single static binary, excellent async I/O, no runtime overhead |
| Sync direction | **Bidirectional via two one-way CronJobs** | Each folder pair gets `a→b` and `b→a` jobs; newest-mtime wins |
| Loop prevention | **Preserve mtime on copy + `--delete-missing` flag** | Mtime equality → skip; no extra DB state needed |
| Lock | **Kubernetes `Lease` object (coordination.k8s.io/v1)** | Native K8s TTL; survives MySQL outage; sync blocked only if K8s API is down (already required for CronJob) |
| Change detection | **mtime + size first; MD5 only on mtime/size mismatch** | Efficient for large datasets |
| Delete propagation | **Strict mirror — configurable per job via `--delete-missing`** | See ⚠️ note below |
| Volume access | **NFS mounts (both servers already export NFS)** | No HostPath or node-affinity needed |
| Audit logging | **Write to opslog file during run; flush to MySQL on completion** | MySQL outage does not block sync; unprocessed opslogs are retried on next run |
| Opslog storage | **Persistent NFS-backed PVC at `/var/log/ha-sync/`** | `/tmp` is ephemeral (lost on pod exit); NFS PVC persists across CronJob runs for 10-day retention |
### Locking: Kubernetes Lease
Each sync pair uses a `coordination.k8s.io/v1` Lease object named `ha-sync-<pair>` in the `infrastructure` namespace.
- `spec.holderIdentity` = `<pod-name>/<iteration-id>`
- `spec.leaseDurationSeconds` = `--lock-ttl` (default 3600)
- A background goroutine renews (`spec.renewTime`) every `leaseDurationSeconds / 3` seconds
- On normal exit or SIGTERM: Lease is deleted (released)
- Stale leases (holder crashed without release): expire automatically after `leaseDurationSeconds`
- Requires RBAC: `ServiceAccount` with `create/get/update/delete` on `leases` in `infrastructure`
### Audit Logging: Opslog + MySQL Flush
1. On sync start: open `/var/log/ha-sync/opslog-<pair>-<direction>-<RFC3339>.jsonl`
2. Each file operation: append one JSON line (all `sync_operations` fields)
3. On sync end: attempt flush to MySQL (`sync_iterations` + `sync_operations` batch INSERT)
4. On successful flush: delete the opslog file
5. On MySQL failure: leave the opslog; on next run, scan `/var/log/ha-sync/` for unprocessed opslogs and retry flush before starting new sync
6. Cleanup: after each run, delete opslogs older than 10 days (`os.Stat` mtime check)
### ⚠️ Delete Propagation Warning
With two one-way jobs per pair, ordering matters for deletes. If `dell→hp` runs before `hp→dell` and `--delete-missing` is ON for both, files that only exist on HP will be deleted before they're copied to Dell.
**Safe default**: `--delete-missing=false` for all jobs. Enable `--delete-missing=true` only on the **primary direction** (e.g., `dell→hp` for each pair) once the initial full sync has completed and both sides are known-equal.
---
## NFS Sync Pairs
| Pair name | Dell NFS (192.168.2.100) | HP NFS (192.168.2.193) |
|---|---|---|
| `media` | `/data/media` | `/data/media` |
| `photos` | `/data/photos` | `/data/photos` |
| `owncloud` | `/data/owncloud` | `/data/owncloud` |
| `games` | `/data/games` | `/data/games` |
| `infra` | `/data/infra` | `/data/infra` |
| `ai` | `/data/ai` | `/data/ai` |
Each pair produces **two CronJobs** in the `infrastructure` namespace.
---
## CLI Interface (`ha-sync`)
```
ha-sync [flags]
Required:
--src <path> Source directory (absolute path inside pod)
--dest <path> Destination directory (absolute path inside pod)
--pair <name> Logical pair name (e.g. "media"); used as Lease name ha-sync-<pair>
Optional:
--direction <str> Label for logging, e.g. "dell-to-hp" (default: "fwd")
--db-dsn <dsn> MySQL DSN (default: from env HA_SYNC_DB_DSN)
--lock-ttl <seconds> Lease TTL before considered stale (default: 3600)
--log-dir <path> Directory for opslog files (default: /var/log/ha-sync)
--log-retain-days <n> Delete opslogs older than N days (default: 10)
--mtime-threshold <s> Seconds of tolerance for mtime equality (default: 2)
--delete-missing Delete dest files not present in src (default: false)
--workers <n> Concurrent file workers (default: 4)
--dry-run Compute what would sync, save to DB as dry_run rows, print plan; do not copy/delete (default: false)
--verbose Verbose output
--help
```
---
## MySQL Schema (database: `general_db`)
```sql
-- One row per CronJob execution
CREATE TABLE IF NOT EXISTS sync_iterations (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
sync_pair VARCHAR(255) NOT NULL,
direction VARCHAR(64) NOT NULL,
src VARCHAR(512) NOT NULL,
dest VARCHAR(512) NOT NULL,
started_at DATETIME(3) NOT NULL,
ended_at DATETIME(3),
status ENUM('running','success','partial_failure','failed') NOT NULL DEFAULT 'running',
dry_run TINYINT(1) NOT NULL DEFAULT 0,
files_created INT DEFAULT 0,
files_updated INT DEFAULT 0,
files_deleted INT DEFAULT 0,
files_skipped INT DEFAULT 0,
files_failed INT DEFAULT 0,
total_bytes_transferred BIGINT DEFAULT 0,
error_message TEXT,
INDEX idx_pair (sync_pair),
INDEX idx_started (started_at),
INDEX idx_dry_run (dry_run)
);
-- One row per individual file operation (flushed from opslog on sync completion)
CREATE TABLE IF NOT EXISTS sync_operations (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
iteration_id BIGINT NOT NULL,
dry_run TINYINT(1) NOT NULL DEFAULT 0,
operation ENUM('create','update','delete') NOT NULL,
filepath VARCHAR(4096) NOT NULL,
size_before BIGINT,
size_after BIGINT,
md5_before VARCHAR(32),
md5_after VARCHAR(32),
started_at DATETIME(3) NOT NULL,
ended_at DATETIME(3),
status ENUM('success','fail') NOT NULL,
error_message VARCHAR(4096),
INDEX idx_iteration (iteration_id),
CONSTRAINT fk_iteration FOREIGN KEY (iteration_id) REFERENCES sync_iterations(id)
);
```
> No `sync_locks` table — locking is handled by Kubernetes Lease objects.
### Dry-run Idempotency Rules
1. **`--dry-run` mode**: walk source and dest, compute the full set of would-be operations (create/update/delete), save to DB with `dry_run = 1`, print the plan. **No files are copied or deleted.**
2. **Idempotency check**: before running a dry-run, query for the last successful dry-run iteration for `(pair, direction)`:
```sql
SELECT id, started_at FROM sync_iterations
WHERE sync_pair = ? AND direction = ? AND dry_run = 1 AND status = 'success'
ORDER BY started_at DESC LIMIT 1;
```
Then re-walk the source and dest and compute the would-be operation set. Compare it against the `sync_operations` rows from that previous dry-run iteration (same set of `filepath + operation + size_before`). If **identical** → print `"Dry-run already current as of <started_at>. Nothing has changed."` and exit without writing new rows.
3. **Production run (`--dry-run` not set)**: all queries for previous iterations use `WHERE dry_run = 0`. Dry-run rows are **never considered** for skip logic, idempotency, or status reporting in production runs.
4. **Lease is still acquired** during dry-run (prevents two dry-runs from racing each other).
---
## Project Structure
```
services/ha-sync/
cmd/ha-sync/
main.go # Sync CLI entry point
cmd/ha-sync-ui/
main.go # Dashboard HTTP server entry point (serves ha-sync.vandachevici.ro)
internal/
config/
config.go # Config struct, defaults, validation (shared by both binaries)
db/
db.go # MySQL connect, auto-migrate schema
logging.go # StartIteration, FinishIteration, BulkInsertOperations, LastDryRunOps
lease/
lease.go # Acquire/release/heartbeat Kubernetes Lease object
opslog/
writer.go # Append JSON lines to /var/log/ha-sync/opslog-<pair>-<direction>-<RFC3339>.jsonl
flusher.go # Scan for unprocessed opslogs, batch INSERT; cleanup logs >10 days
sync/
engine.go # Main sync loop: walk, compare, dispatch; dryRun flag skips writes
walker.go # Recursive directory walk
compare.go # mtime+size comparison; conditional MD5
copy.go # File copy with os.Chtimes() mtime preservation
delete.go # Safe delete with pre-check
ui/
handler.go # HTTP handlers: index, /api/iterations, /api/operations, /api/pairs
templates/
index.html # Dashboard HTML; auto-refreshes every 10s via fetch(); vanilla JS only
go.mod
go.sum
Dockerfile # Multi-stage: golang:1.22-alpine builder (builds ha-sync + ha-sync-ui) → alpine:3.20
Makefile # build, docker-build IMAGE=<registry>/ha-sync:latest, docker-push targets
deployment/ha-sync/
serviceaccount.yaml # ServiceAccount: ha-sync, namespace: infrastructure
rbac.yaml # Role + RoleBinding: leases (coordination.k8s.io) create/get/update/delete
secret.yaml # NOTE: create manually — see Phase 3C instructions
pv-logs.yaml # PersistentVolume: NFS 192.168.2.193:/data/infra/ha-sync-logs, 10Gi, RWX
pvc-logs.yaml # PVC bound to pv-logs; all CronJobs mount at /var/log/ha-sync
pv-dell-<pair>.yaml # PersistentVolume: NFS 192.168.2.100:/data/<pair> (one per pair × 6)
pv-hp-<pair>.yaml # PersistentVolume: NFS 192.168.2.193:/data/<pair> (one per pair × 6)
pvc-dell-<pair>.yaml # PVC → pv-dell-<pair> (one per pair × 6)
pvc-hp-<pair>.yaml # PVC → pv-hp-<pair> (one per pair × 6)
cron-<pair>-dell-to-hp.yaml # --dry-run is DEFAULT; remove flag to enable production sync
cron-<pair>-hp-to-dell.yaml # same
ui-deployment.yaml # Deployment: ha-sync-ui, 1 replica, image: <registry>/ha-sync:latest, cmd: ha-sync-ui
ui-service.yaml # ClusterIP Service: port 8080 → ha-sync-ui pod
ui-ingress.yaml # Ingress: ha-sync.vandachevici.ro → ui-service:8080; cert-manager TLS
kustomization.yaml # Kustomize root listing all resources
scripts/cli/
ha-sync.md # CLI reference doc
```
### UI Dashboard (`ha-sync.vandachevici.ro`)
- **Binary**: `ha-sync-ui` — Go HTTP server, port 8080
- **Routes**:
- `GET /` — HTML dashboard; auto-refreshes via `setInterval` + `fetch`
- `GET /api/pairs` — JSON: per-pair last iteration summary (dry_run=0 and dry_run=1 separately)
- `GET /api/iterations?pair=&limit=20` — JSON: recent iterations
- `GET /api/operations?iteration_id=` — JSON: operations for one iteration
- **Dashboard shows**: per-pair status cards (last real sync, last dry-run, files created/updated/deleted/failed), recent activity table, errors highlighted in red
- **Env vars**: `HA_SYNC_DB_DSN` (same secret as CronJobs)
- **K8s**: Deployment in `infrastructure` namespace, 1 replica, same ServiceAccount as CronJobs (read-only DB access only)
---
## Tasks
> **Parallelism key**: Tasks marked `[P]` can be executed in parallel by separate agents. Tasks marked `[SEQ]` must follow the listed dependency chain.
---
### Phase 0 — Scaffolding `[SEQ]`
Must complete before any code is written; all subsequent tasks depend on this.
| # | Task | Command / Notes |
|---|---|---|
| 0.1 | Create Go module | `cd services/ha-sync && go mod init github.com/vandachevici/homelab/ha-sync` |
| 0.2 | Create directory tree | `mkdir -p cmd/ha-sync internal/{config,db,lease,opslog,sync}` |
| 0.3 | Create Dockerfile | Multi-stage: `FROM golang:1.22-alpine AS build``FROM alpine:3.20`; copy binary; `ENTRYPOINT ["/ha-sync"]` |
| 0.4 | Create Makefile | Targets: `build`, `docker-build IMAGE=<registry>/ha-sync:latest`, `docker-push IMAGE=...` |
---
### Phase 1 — Core Go packages `[P after Phase 0]`
Sub-tasks 1A, 1B, 1C, 1E are **fully independent** — assign to separate agents simultaneously. 1D depends on all of them.
#### 1A — `internal/config` `[P]`
| # | Task | Notes |
|---|---|---|
| 1A.1 | Write `config.go` | Define `Config` struct with all CLI flags; use `flag` stdlib or `cobra`; set defaults from CLI Interface section above |
#### 1B — `internal/db` `[P]`
| # | Task | Notes |
|---|---|---|
| 1B.1 | Write `db.go` | `Connect(dsn string) (*sql.DB, error)`; run `CREATE TABLE IF NOT EXISTS` for both tables (include `dry_run TINYINT(1) NOT NULL DEFAULT 0` column in both) on startup |
| 1B.2 | Write `logging.go` | `StartIteration(dryRun bool, ...) (id int64)` → INSERT with `dry_run` set; `FinishIteration(id, status, counts)` → UPDATE; `BulkInsertOperations(iterID int64, dryRun bool, []OpRecord)` → batch INSERT; `LastDryRunOps(db, pair, direction string) ([]OpRecord, error)` → fetch ops for last successful `dry_run=1` iteration for idempotency check |
#### 1C — `internal/lease` `[P]`
| # | Task | Notes |
|---|---|---|
| 1C.1 | Write `lease.go` | Use `k8s.io/client-go` in-cluster config; `Acquire(ctx, client, namespace, leaseName, holderID, ttlSec)` — create or update Lease if expired; `Release(ctx, client, namespace, leaseName, holderID)` — delete Lease; `Heartbeat(ctx, ...)` — goroutine that calls `Update` on `spec.renewTime` every `ttlSec/3` seconds |
#### 1D — `internal/opslog` `[P]`
| # | Task | Notes |
|---|---|---|
| 1D.1 | Write `writer.go` | `Open(logDir, pair, direction string) (*Writer, error)` — creates `/var/log/ha-sync/opslog-<pair>-<direction>-<RFC3339>.jsonl`; `Append(op OpRecord) error` — JSON-encode one line |
| 1D.2 | Write `flusher.go` | `FlushAll(logDir string, db *sql.DB) error` — scan dir for `*.jsonl`, for each: decode lines → call `BulkInsertOperations`, delete file on success; `CleanOld(logDir string, retainDays int)` — delete files with mtime older than N days |
#### 1E — `internal/sync` `[P]`
| # | Task | Notes |
|---|---|---|
| 1E.1 | Write `walker.go` | `Walk(root string) ([]FileInfo, error)` — returns slice of `{RelPath, AbsPath, Size, ModTime, IsDir}`; use `filepath.WalkDir` |
| 1E.2 | Write `compare.go` | `NeedsSync(src, dest FileInfo, threshold time.Duration) bool` — mtime+size check; `MD5File(path string) (string, error)` — streaming MD5; `MD5Changed(srcPath, destPath string) bool` |
| 1E.3 | Write `copy.go` | `CopyFile(src, dest string, srcModTime time.Time) error` — copy bytes, then `os.Chtimes(dest, srcModTime, srcModTime)` to preserve mtime |
| 1E.4 | Write `delete.go` | `DeleteFile(path string) error``os.Remove`; `DeleteDir(path string) error``os.RemoveAll` only if dir is empty after child removal |
| 1E.5 | Write `engine.go` | Walk src+dest, compare, dispatch create/update/delete via worker pool (`sync.WaitGroup` + buffered channel of `--workers` size); if `dryRun=true`, build op list but **do not call copy/delete** — return ops for caller to log; write each op to opslog.Writer (tagged with dry_run flag); return summary counts |
#### 1F — `cmd/ha-sync/main.go` `[SEQ, depends on 1A+1B+1C+1D+1E]`
| # | Task | Notes |
|---|---|---|
| 1F.1 | Write `main.go` | Parse flags → build config → connect DB → flush old opslogs → acquire Lease → **if `--dry-run`: call `LastDryRunOps`, walk src+dest, compute would-be ops, compare; if identical → print "already current" + exit; else run engine(dryRun=true)** → open opslog writer (tagged dry_run) → start iteration row (`dry_run` = true/false) → run engine → finish iteration → flush opslog to DB → release Lease; trap SIGTERM to release Lease before exit; **production queries always filter `dry_run = 0`** |
---
### Phase 2 — Build & Docker Image `[SEQ after Phase 1]`
| # | Task | Command |
|---|---|---|
| 2.1 | Fetch Go deps | `cd services/ha-sync && go mod tidy` |
| 2.2 | Build binary | `cd services/ha-sync && make build` |
| 2.3 | Build Docker image | `make docker-build IMAGE=192.168.2.100:5000/ha-sync:latest` *(replace registry if different)* |
| 2.4 | Push Docker image | `make docker-push IMAGE=192.168.2.100:5000/ha-sync:latest` |
---
### Phase 3 — Kubernetes Manifests `[P, can start during Phase 1]`
All manifest sub-tasks are **independent** and can be parallelized.
#### 3A — RBAC + Shared Resources `[P]`
| # | Task | Notes |
|---|---|---|
| 3A.1 | Create `serviceaccount.yaml` | `name: ha-sync`, `namespace: infrastructure` |
| 3A.2 | Create `rbac.yaml` | `Role` with rules: `apiGroups: [coordination.k8s.io]`, `resources: [leases]`, `verbs: [create, get, update, delete]`; `RoleBinding` binding `ha-sync` SA to the Role |
| 3A.3 | Create `pv-logs.yaml` + `pvc-logs.yaml` | PV: `nfs.server: 192.168.2.193`, `nfs.path: /data/infra/ha-sync-logs`, capacity `10Gi`, `accessModes: [ReadWriteMany]`; PVC: `storageClassName: ""`, `volumeName: pv-ha-sync-logs`, namespace `infrastructure` |
#### 3B — PVs and PVCs per pair `[P]`
| # | Task | Notes |
|---|---|---|
| 3B.1 | Create `pv-dell-<pair>.yaml` for each of 6 pairs | `spec.nfs.server: 192.168.2.100`, `spec.nfs.path: /data/<pair>`; capacity per pair: `media: 2Ti`, `photos: 500Gi`, `games: 500Gi`, `owncloud: 500Gi`, `infra: 100Gi`, `ai: 500Gi`; `accessModes: [ReadWriteMany]` |
| 3B.2 | Create `pv-hp-<pair>.yaml` for each of 6 pairs | Same structure; `spec.nfs.server: 192.168.2.193` |
| 3B.3 | Create `pvc-dell-<pair>.yaml` + `pvc-hp-<pair>.yaml` | `namespace: infrastructure`; `accessModes: [ReadWriteMany]`; `storageClassName: ""` (manual bind); `volumeName: pv-dell-<pair>` / `pv-hp-<pair>` |
#### 3C — CronJobs `[P, depends on 3A+3B for volume/SA names]`
| # | Task | Notes |
|---|---|---|
| 3C.1 | Create `cron-<pair>-dell-to-hp.yaml` for each pair | `namespace: infrastructure`; `serviceAccountName: ha-sync`; `schedule: "*/15 * * * *"`; image: `<registry>/ha-sync:latest`; args: `["--src=/mnt/dell/<pair>","--dest=/mnt/hp/<pair>","--pair=<pair>","--direction=dell-to-hp","--db-dsn=$(HA_SYNC_DB_DSN)","--log-dir=/var/log/ha-sync"]`; volumeMounts: `pvc-dell-<pair>``/mnt/dell/<pair>`, `pvc-hp-<pair>``/mnt/hp/<pair>`, `pvc-ha-sync-logs``/var/log/ha-sync`; envFrom: `ha-sync-db-secret` |
| 3C.2 | Create `cron-<pair>-hp-to-dell.yaml` for each pair | Same but src/dest swapped, `direction=hp-to-dell`; offset schedule by 7 min: `"7,22,37,52 * * * *"` |
| 3C.3 | Create `secret.yaml` | Comment-only file; actual secret created manually: `kubectl create secret generic ha-sync-db-secret --from-literal=HA_SYNC_DB_DSN='<user>:<pass>@tcp(general-purpose-db.infrastructure.svc.cluster.local:3306)/general_db' -n infrastructure` |
| 3C.4 | Create `kustomization.yaml` | Resources in order: `serviceaccount.yaml`, `rbac.yaml`, `pv-logs.yaml`, `pvc-logs.yaml`, all `pv-*.yaml`, all `pvc-*.yaml`, all `cron-*.yaml` |
---
### Phase 4 — CLI Documentation `[P, independent]`
| # | Task | Notes |
|---|---|---|
| 4.1 | Create `scripts/cli/ha-sync.md` | Document all flags, defaults, example invocations, env vars (`HA_SYNC_DB_DSN`); note `--dry-run` for safe first-run; note `--delete-missing` rollout guidance |
---
### Phase 5 — Deploy & Verify `[SEQ after Phase 2+3]`
| # | Task | Command |
|---|---|---|
| 5.1 | Create DB secret | `kubectl create secret generic ha-sync-db-secret --from-literal=HA_SYNC_DB_DSN='<user>:<pass>@tcp(general-purpose-db.infrastructure.svc.cluster.local:3306)/general_db' -n infrastructure` |
| 5.2 | Apply manifests | `kubectl apply -k deployment/ha-sync/` |
| 5.3 | Dry-run smoke test | `kubectl create job ha-sync-test --from=cronjob/ha-sync-media-dell-to-hp -n infrastructure` then: `kubectl logs -l job-name=ha-sync-test -n infrastructure -f` |
| 5.4 | Verify Lease is created | `kubectl get lease ha-sync-media -n infrastructure -o yaml` |
| 5.5 | Verify DB rows | `kubectl exec -it <general-purpose-db-pod> -n infrastructure -- mysql -u<user> -p general_db -e "SELECT * FROM sync_iterations ORDER BY id DESC LIMIT 5;"` |
| 5.6 | Verify opslog flush | Check `/var/log/ha-sync/` on the logs PVC — no `.jsonl` files should remain after a successful run |
| 5.7 | Trigger real first run | Delete the test job; let CronJob run on schedule; observe `sync_operations` table |
---
## Open Questions / Future Work
- **MySQL HA**: `general-purpose-db` is a single-replica StatefulSet — no HA. Since locking is now handled by K8s Lease and MySQL is only used for audit logging (with opslog fallback), a MySQL outage won't block sync. If full MySQL HA is later desired, **MariaDB Galera Cluster (3 replicas)** is the recommended path for this homelab.
- **Conflict resolution**: Currently "newest mtime wins". If clocks drift between nodes, a file could ping-pong. Consider NTP enforcement across all nodes or use `--mtime-threshold` >= observed clock skew.
- **Delete safety**: `--delete-missing` defaults to `false`. Staged rollout: run one full cycle disabled first → confirm parity → enable on primary direction only.
- **Alerting**: Add a Prometheus/Grafana alert on `sync_iterations.status = 'failed'` (query general_db directly or expose a future `/metrics` endpoint).
- **DB retention**: `sync_operations` will grow large. Add a cleanup step: `DELETE FROM sync_operations WHERE started_at < NOW() - INTERVAL 30 DAY` as a weekly CronJob.
- **Registry**: Dockerfile assumes local registry at `192.168.2.100:5000`. Confirm registry address before Phase 2.

View file

@ -0,0 +1,101 @@
Dell OptiPlex 7070.
Network
- Hostname: kube-node-1
- IP: 192.168.2.100
- Ansible groups: control_plane, standalone
- SSH user: dan
Services
- openclaw-gateway (user systemd service) — OpenClaw v2026.3.13
- Model: ollama/qwen2.5:0.5b (AVX2-accelerated, ~1.5s/reply)
- WhatsApp channel: linked (+4915175077219)
- Gateway port: 18789 (loopback)
- Config/state: ~/.openclaw/
- ollama (system service)
- Models dir: /claw/models (OLLAMA_MODELS env via systemd drop-in)
- Models: qwen2.5-coder:3b (default), qwen2.5:0.5b
- API: http://127.0.0.1:11434
Kubernetes role
- control-plane + worker (SchedulingDisabled — node cordoned, no new pods scheduled)
- Only static control-plane pods remain: etcd, kube-apiserver, kube-controller-manager,
kube-scheduler, kube-vip, kube-proxy, flannel (DaemonSet), ingress-nginx (DaemonSet),
dns-updater (DaemonSet), node-exporter (DaemonSet)
Disabled services (intentionally off)
- php8.3-fpm (was unused, nginx had no fastcgi_pass configured)
- openttd-server (snap, moved off)
- libvirtd + all sockets (windows-gaming VM was already shut off)
CPU
- Intel Core i5-9500 (9th Gen Coffee Lake)
- Cores/Threads: 6 cores / 6 threads (no Hyper-Threading)
- Base Clock: 3.00 GHz
- Max Boost: 4.40 GHz
- Cache: 9 MB L3 shared, 1.5 MB L2 total
- Architecture: 64-bit x86_64
- TDP: ~65W (typical for this model)
- Virtualization: VT-x enabled
RAM
- Capacity: 16 GB
- Type: DDR4 SDRAM
- Speed: 2666 MT/s (configured and rated)
- Form Factor: DIMM
- Data Width: 64 bits
- L1d cache: 192 KiB (6 instances)
- L1i cache: 192 KiB (6 instances)
- L2 cache: 1.5 MiB (6 instances)
- L3 cache: 9 MiB (shared)
Storage
nvme0n1 - Samsung PM991 NVMe SSD
- Capacity: 256 GB (238.5 GB usable)
- Type: NVMe SSD
- Interface: PCIe NVMe
- Partitions:
- 1G EFI boot
- 2G /boot
- 235.4G LVM (100G allocated to /)
Secondary Drives (HDDs)
sda - Seagate Expansion (External USB)
- Capacity: 2 TB (1.8 TB usable)
- Type: HDD (rotational)
- Mount: /backup-share
- Filesystem: ext4
sdb - Seagate Expansion+ (External USB)
- Capacity: 2 TB (1.8 TB usable)
- Type: HDD (rotational)
- Mount: /var/lib/docker/volumes (shared folder)
- Filesystem: ext4
sdc - Seagate Expansion (External USB)
- Capacity: 1 TB (931.5 GB usable)
- Type: HDD (rotational)
- Mount: /backup-drive
- Filesystem: ext4
sdd - Samsung HD103SI (Internal)
- Capacity: 1 TB (931.5 GB usable)
- Type: HDD (rotational)
- Mount: /drive
- Filesystem: ext4
sde - Hitachi HTS545050 (Laptop Drive)
- Capacity: 500 GB (465.8 GB usable)
- Type: HDD (rotational)
- Mount: /apps-data
- Filesystem: ext4
Total Storage
- SSD: 256 GB (system/boot)
- HDD: ~6.7 TB across 5 drives
- Total: ~7 TB
Network
- 1 Gbit/s

View file

@ -0,0 +1,3 @@
## memories
8x Samsung Planet First 16GB 2Rx4 PC3L 12800R 11-12-E2-D3
M393B2G70QH0-YK0 1421

View file

@ -0,0 +1,3 @@
2 hard disks mounted
1. Seagate Exos X16 (ST16000NM001G) — 14.6TB — sdc
2. Seagate Exos X16 (ST16000NM001G) — 14.6TB — sdd

View file

@ -0,0 +1,133 @@
# Homelab SSH Orchestration (Ansible)
This setup gives you a **centralized, SSH-managed orchestration engine** for your homelab.
Control plane expectation: run Ansible from a dedicated Proxmox VM (`ansible-control`), not from your laptop.
## Why this stack
- Agentless (no daemon required on targets)
- Centrally managed from one control node
- Native SSH workflow (fits your existing key-based access)
## Layout
- `ansible.cfg` - controller defaults
- `inventory/hosts.yml` - your homelab hosts and groups
- `group_vars/all.yml` - common variables (key path, packages, timezone)
- `playbooks/ping.yml` - connectivity validation
- `playbooks/baseline.yml` - baseline hardening and package setup
## 1) Bootstrap control node
From this directory on the control node:
```bash
cd /Users/dan/work/homelab/orchestration/ansible
./scripts/bootstrap-control-node.sh
```
If needed, add local Python bin to PATH (script prints the exact line).
## 0) Create dedicated control VM in Proxmox
From any machine that can SSH to Proxmox root:
```bash
cd /Users/dan/work/homelab/orchestration/ansible
chmod +x scripts/create-ansible-control-vm.sh
./scripts/create-ansible-control-vm.sh
```
This creates `ansible-control` (VMID `105`) on `192.168.2.193` using Ubuntu 24.04 ISO.
After Ubuntu install in Proxmox console, ensure:
- static IP is `192.168.2.105`
- SSH key login works for user `dan`
- `sudo` is available for `dan`
## 0.5) Establish Proxmox cloud-init SSH key baseline
Goal: ensure a predefined key set is injected by cloud-init for Linux VMs.
1. Put your public keys (one per line) in:
- `cloud-init/authorized_keys`
2. Run setup:
```bash
cd /Users/dan/work/homelab/orchestration/ansible
chmod +x scripts/proxmox-cloudinit-setup.sh
./scripts/proxmox-cloudinit-setup.sh
```
Defaults:
- Proxmox host: `root@192.168.2.193`
- VM targets: `100 102 103 104 105`
- Cloud-init user: `dan`
Override example:
```bash
VMIDS="100 104 105" CI_USER="dan" ./scripts/proxmox-cloudinit-setup.sh
```
Notes:
- Windows guests are skipped automatically.
- For existing Linux guests, cloud-init changes typically take effect after reboot.
## 2) Validate SSH orchestration
```bash
ansible --version
ansible-inventory --graph
ansible all -m ping
ansible-playbook playbooks/ping.yml
```
## 3) Apply baseline config
```bash
ansible-playbook playbooks/baseline.yml
```
## 4) Run targeted orchestration examples
```bash
# Reboot only workers
ansible workers -a "sudo reboot" -f 2
# Update package metadata everywhere except proxmox host
ansible 'all:!proxmox' -m apt -a "update_cache=true" -b
# Check uptime of control-plane nodes
ansible control_plane -a "uptime"
```
## 5) Deploy/redeploy Paperclip on openclaw
Playbook:
```bash
ansible-playbook playbooks/paperclip-openclaw.yml -l openclaw
```
One-command helper (from this directory):
```bash
chmod +x scripts/deploy-paperclip-openclaw.sh
./scripts/deploy-paperclip-openclaw.sh
```
Post-deploy quick checks:
```bash
ansible openclaw -m shell -a "systemctl is-enabled paperclip; systemctl is-active paperclip; ss -lntp | grep 3100"
curl -sS http://192.168.2.88:3100/api/health
```
## Notes
- Inventory includes your known hosts:
- `kube-node-1` (`192.168.2.100`, user `dan`)
- `kube-node-2` (`192.168.2.195`, user `dan`)
- `kube-node-3` (`192.168.2.196`, user `dan`)
- `kube-arbiter` (`192.168.2.200`, user `dan`)
- `hp-proliant-proxmox` (`192.168.2.193`, user `root`)
- Proxmox is split into its own group to avoid accidentally running Linux baseline hardening tasks against it.
- If a host uses a different key path, override `ansible_ssh_private_key_file` in inventory host vars.

View file

@ -0,0 +1,13 @@
[defaults]
inventory = ./inventory/hosts.yml
roles_path = ./roles
host_key_checking = False
retry_files_enabled = False
timeout = 30
forks = 20
interpreter_python = auto_silent
stdout_callback = yaml
[ssh_connection]
pipelining = True
ssh_args = -o ControlMaster=auto -o ControlPersist=60s

View file

@ -0,0 +1,8 @@
# One SSH public key per line.
# Replace these placeholders with your real keys.
# Example:
# ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAI... dan@ansible-control
# ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQ... root@proxmox
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDB2VFkknrB62pAQLTrIBv8OLMR9Q3yMm73xTpJlXzrclcNczzEdwJWJ7rVCarsHB333CK3LvXMFwhhJRKsASuIxVJot0Il/2czbINBzrEkgcmDQ1gPKHqHh301j0xsPwKHKv9r8feiNBYGErcZJe2cHCVQNGvKsjgcHlMcd71C76qKZLyCO4HFbVra2jFDPSu0Yvk2sllpLAUV+t5b84JeWDv6sPkwTFDcaAVNftFtjmd+yPDFnWxI1cetclySK/NZdCCg1JMrF5m2gbdVrZeaMbNZ1U+jNam7/r31UzbZQ4NFhktWOe8nXORop+efqY6zh22q/hdfgEJXnhpwUT91 dvandachevici@dvandachevici
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDKq8DP8xtmYe6uvixF+ls4RALYunm5unqsLNOAVg01p4pyPlojNNTv0fBbWVI/2z5guR8+fA5NvsgcXCcYylbf9pD+PXOyveZxkNCe//cl9ct2nLzHfwKtddyeeycH8KL/O4OaeiLCye/L0cLFDqNVnSGp8BREdGyrmWodcHgvDZk5zIzcCxj3dn5HwulkC1WINyehHnBAKXF/kx85mBhRKCVP4Gxr9aLcxX0dmL9SLVntP8tg9Xw6uQgZa/pQ46IW49BkrMPU9DdLJCQwmD3sf5y2cPpD8xbC76bDWhFbnoVB+hhMR7aA9/KPQF648ZhaV5RFyCLC/3EK1gNpRFvEheOrTgrcEvBXkW6H7PqBck79D+oFvHCqkqWFXb09P1zbQ7ZOxRk23gJ9mQBncxKuO7zcgOyWF3r9xKt50TmtVdl8AASFVwi7Cr96Qnsi5I+qTt2dqtIU+K9rNvMNaz4ObPPzLyEXn96TvR2MPH0es2psPMpzc4B0dbwteYL6Wk0= dan@mainserver
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIKATy9Sdfm0jGOfHcYUj5X7/YY2uJpCocnbwXSUTaSil dan@ansible-control

View file

@ -0,0 +1,13 @@
---
ansible_ssh_private_key_file: ~/.ssh/id_rsa
ansible_become: true
ansible_become_method: sudo
ansible_python_interpreter: auto_silent
timezone: Europe/Berlin
common_packages:
- curl
- htop
- jq
- unzip
- rsync

View file

@ -0,0 +1,4 @@
# kube-node-1 cannot reach its own NodePort (hairpin NAT limitation with Flannel).
# Use the ClusterIP + internal port directly instead.
inventory_cli_server_host: "10.110.148.52"
inventory_cli_server_port: 9876

View file

@ -0,0 +1,4 @@
# kube-node-1 cannot reach its own NodePort (hairpin NAT limitation with Flannel).
# Use the ClusterIP + internal port directly instead.
inventory_cli_server_host: "10.110.148.52"
inventory_cli_server_port: 9876

View file

@ -0,0 +1,49 @@
all:
children:
ansible_control:
hosts:
ansible-control:
ansible_host: 192.168.2.70
ansible_user: dan
control_plane:
hosts:
kube-node-1: # Dell OptiPlex 7070 (i5-9500, 16 GB RAM, bare metal)
ansible_host: 192.168.2.100
ansible_user: dan
kube-node-3:
ansible_host: 192.168.2.196
ansible_user: dan
kube-arbiter: # 1c/6GB, NoSchedule taint — etcd only
ansible_host: 192.168.2.200
ansible_user: dan
workers:
hosts:
kube-node-2:
ansible_host: 192.168.2.195
ansible_user: dan
proxmox:
hosts:
hp-proliant-proxmox:
ansible_host: 192.168.2.193
ansible_user: root
ai_vms:
hosts:
local-ai: # Tesla P4 GPU passthrough, Ollama + openclaw-gateway ws://0.0.0.0:18789
ansible_host: 192.168.2.88
ansible_user: dan
remote-ai: # cloud AI providers, openclaw-gateway
ansible_host: 192.168.2.91
ansible_user: dan
windows:
hosts:
win10ltsc:
ansible_user: dan
kubernetes:
children:
control_plane:
workers:
device_inventory:
children:
kubernetes:
proxmox:
ai_vms:

View file

@ -0,0 +1,50 @@
---
- name: Baseline host configuration
hosts: all:!proxmox
become: true
gather_facts: true
tasks:
- name: Ensure common packages are installed (Debian/Ubuntu)
ansible.builtin.apt:
name: "{{ common_packages }}"
state: present
update_cache: true
when: ansible_os_family == "Debian"
- name: Configure timezone
community.general.timezone:
name: "{{ timezone }}"
- name: Ensure unattended-upgrades is installed
ansible.builtin.apt:
name: unattended-upgrades
state: present
update_cache: true
when: ansible_os_family == "Debian"
- name: Ensure fail2ban is installed
ansible.builtin.apt:
name: fail2ban
state: present
update_cache: true
when: ansible_os_family == "Debian"
- name: Ensure UFW is installed
ansible.builtin.apt:
name: ufw
state: present
update_cache: true
when: ansible_os_family == "Debian"
- name: Ensure UFW allows SSH
community.general.ufw:
rule: allow
port: "22"
proto: tcp
when: ansible_os_family == "Debian"
- name: Ensure UFW is enabled
community.general.ufw:
state: enabled
when: ansible_os_family == "Debian"

View file

@ -0,0 +1,8 @@
---
- name: Deploy device-inventory CLI on all homelab hosts
hosts: device_inventory
become: true
gather_facts: true
roles:
- inventory-cli

View file

@ -0,0 +1,79 @@
---
# Usage:
# ansible-playbook playbooks/networking.yml \
# --extra-vars "technitium_admin_password=<secret>"
#
# Or store the password in an Ansible vault file and pass with --vault-id.
- name: Deploy Technitium DNS primary on Proxmox
hosts: proxmox
become: true
gather_facts: true
vars:
technitium_secondary_ips:
- "192.168.2.100" # kube-node-1
technitium_dns_records:
- { name: kube-node-1, ip: "192.168.2.100" }
- { name: kube-node-2, ip: "192.168.2.195" }
- { name: kube-node-3, ip: "192.168.2.196" }
- { name: kube-arbiter, ip: "192.168.2.200" }
- { name: proxmox, ip: "192.168.2.193" }
- { name: ansible-control, ip: "192.168.2.70" }
- { name: local-ai, ip: "192.168.2.88" }
- { name: remote-ai, ip: "192.168.2.91" }
roles:
- technitium-dns-primary
- name: Deploy Technitium DNS secondary on kube-node-1
hosts: kube-node-1
become: true
gather_facts: true
vars:
technitium_primary_ip: "192.168.2.193"
roles:
- technitium-dns-secondary
- name: Open DNS port on kube-node-1 (secondary DNS)
hosts: kube-node-1
become: true
gather_facts: false
tasks:
- name: Allow DNS TCP
community.general.ufw:
rule: allow
port: "53"
proto: tcp
- name: Allow DNS UDP
community.general.ufw:
rule: allow
port: "53"
proto: udp
- name: Allow Technitium web UI
community.general.ufw:
rule: allow
port: "5380"
proto: tcp
- name: Router DNS configuration reminder
hosts: localhost
gather_facts: false
tasks:
- name: Print router DNS instructions
ansible.builtin.debug:
msg: |
┌─────────────────────────────────────────────────────────────────┐
│ ACTION REQUIRED: Update your router's LAN DNS settings │
│ │
│ Primary DNS: 192.168.2.193 (Proxmox — Technitium primary) │
│ Secondary DNS: 192.168.2.100 (kube-node-1 — zone transfer) │
│ │
│ All .homelab names will now resolve on your LAN. │
└─────────────────────────────────────────────────────────────────┘

View file

@ -0,0 +1,124 @@
---
- name: Deploy Paperclip on openclaw
hosts: openclaw
become: true
gather_facts: true
vars:
paperclip_user: dan
paperclip_home: /home/dan
paperclip_config_path: /home/dan/.paperclip/instances/default/config.json
paperclip_logs_dir: /home/dan/.paperclip/instances/default/logs
paperclip_service_name: paperclip
paperclip_host: "0.0.0.0"
paperclip_port: 3100
paperclip_allowed_hostnames:
- "192.168.2.88"
- "openclaw"
tasks:
- name: Ensure Paperclip logs directory exists
ansible.builtin.file:
path: "{{ paperclip_logs_dir }}"
state: directory
owner: "{{ paperclip_user }}"
group: "{{ paperclip_user }}"
mode: "0755"
- name: Ensure Paperclip config has LAN authenticated private mode
ansible.builtin.shell: |
python3 - <<'PY'
import json
from pathlib import Path
config_path = Path("{{ paperclip_config_path }}")
if not config_path.exists():
raise SystemExit(f"Missing config file: {config_path}")
data = json.loads(config_path.read_text())
server = data.setdefault("server", {})
before = json.dumps(server, sort_keys=True)
server["deploymentMode"] = "authenticated"
server["exposure"] = "private"
server["host"] = "{{ paperclip_host }}"
server["port"] = {{ paperclip_port }}
allowed = set(server.get("allowedHostnames", []))
allowed.update({{ paperclip_allowed_hostnames | to_json }})
server["allowedHostnames"] = sorted(allowed)
server["serveUi"] = True
after = json.dumps(server, sort_keys=True)
if before != after:
config_path.write_text(json.dumps(data, indent=2) + "\n")
print("changed")
else:
print("unchanged")
PY
args:
executable: /bin/bash
register: paperclip_config_update
changed_when: "'changed' in paperclip_config_update.stdout"
- name: Ensure Paperclip systemd service is installed
ansible.builtin.copy:
dest: /etc/systemd/system/{{ paperclip_service_name }}.service
owner: root
group: root
mode: "0644"
content: |
[Unit]
Description=Paperclip Server
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User={{ paperclip_user }}
Group={{ paperclip_user }}
WorkingDirectory={{ paperclip_home }}
Environment=HOME={{ paperclip_home }}
Environment=PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
ExecStart=/usr/bin/env npx -y paperclipai run
Restart=always
RestartSec=3
TimeoutStopSec=20
[Install]
WantedBy=multi-user.target
notify: Restart Paperclip
- name: Ensure UFW allows Paperclip port when enabled
community.general.ufw:
rule: allow
port: "{{ paperclip_port | string }}"
proto: tcp
when: ansible_os_family == "Debian"
- name: Ensure Paperclip service is enabled and running
ansible.builtin.systemd:
name: "{{ paperclip_service_name }}"
daemon_reload: true
enabled: true
state: started
- name: Wait for Paperclip health endpoint
ansible.builtin.uri:
url: "http://{{ ansible_host }}:{{ paperclip_port }}/api/health"
method: GET
return_content: true
register: paperclip_health
retries: 10
delay: 3
until: paperclip_health.status == 200
- name: Show health response
ansible.builtin.debug:
var: paperclip_health.json
handlers:
- name: Restart Paperclip
ansible.builtin.systemd:
name: "{{ paperclip_service_name }}"
daemon_reload: true
state: restarted

Some files were not shown because too many files have changed in this diff Show more