homelab/execution-plans/ha-sync.md
Dan V deb6c38d7b chore: commit homelab setup — deployment, services, orchestration, skill
- Add .gitignore: exclude compiled binaries, build artifacts, and Helm
  values files containing real secrets (authentik, prometheus)
- Add all Kubernetes deployment manifests (deployment/)
- Add services source code: ha-sync, device-inventory, games-console,
  paperclip, parts-inventory
- Add Ansible orchestration: playbooks, roles, inventory, cloud-init
- Add hardware specs, execution plans, scripts, HOMELAB.md
- Add skills/homelab/SKILL.md + skills/install.sh to preserve Copilot skill
- Remove previously-tracked inventory-cli binary from git index

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-09 08:10:32 +02:00

354 lines
22 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# HA Sync — Execution Plan
## Problem Statement
Two servers (Dell OptiPlex 7070 at `192.168.2.100` and HP ProLiant at `192.168.2.193`) each export the same folder set over NFS. A Kubernetes-native tool must keep each folder pair in bidirectional sync: newest file wins, mtime is preserved on copy, delete propagation is strict (one-way per CronJob), and every operation is logged in the MySQL instance in the `infrastructure` namespace.
---
## Architecture Decisions (Agreed)
| Decision | Choice | Rationale |
|---|---|---|
| Language | **Go** | Single static binary, excellent async I/O, no runtime overhead |
| Sync direction | **Bidirectional via two one-way CronJobs** | Each folder pair gets `a→b` and `b→a` jobs; newest-mtime wins |
| Loop prevention | **Preserve mtime on copy + `--delete-missing` flag** | Mtime equality → skip; no extra DB state needed |
| Lock | **Kubernetes `Lease` object (coordination.k8s.io/v1)** | Native K8s TTL; survives MySQL outage; sync blocked only if K8s API is down (already required for CronJob) |
| Change detection | **mtime + size first; MD5 only on mtime/size mismatch** | Efficient for large datasets |
| Delete propagation | **Strict mirror — configurable per job via `--delete-missing`** | See ⚠️ note below |
| Volume access | **NFS mounts (both servers already export NFS)** | No HostPath or node-affinity needed |
| Audit logging | **Write to opslog file during run; flush to MySQL on completion** | MySQL outage does not block sync; unprocessed opslogs are retried on next run |
| Opslog storage | **Persistent NFS-backed PVC at `/var/log/ha-sync/`** | `/tmp` is ephemeral (lost on pod exit); NFS PVC persists across CronJob runs for 10-day retention |
### Locking: Kubernetes Lease
Each sync pair uses a `coordination.k8s.io/v1` Lease object named `ha-sync-<pair>` in the `infrastructure` namespace.
- `spec.holderIdentity` = `<pod-name>/<iteration-id>`
- `spec.leaseDurationSeconds` = `--lock-ttl` (default 3600)
- A background goroutine renews (`spec.renewTime`) every `leaseDurationSeconds / 3` seconds
- On normal exit or SIGTERM: Lease is deleted (released)
- Stale leases (holder crashed without release): expire automatically after `leaseDurationSeconds`
- Requires RBAC: `ServiceAccount` with `create/get/update/delete` on `leases` in `infrastructure`
### Audit Logging: Opslog + MySQL Flush
1. On sync start: open `/var/log/ha-sync/opslog-<pair>-<direction>-<RFC3339>.jsonl`
2. Each file operation: append one JSON line (all `sync_operations` fields)
3. On sync end: attempt flush to MySQL (`sync_iterations` + `sync_operations` batch INSERT)
4. On successful flush: delete the opslog file
5. On MySQL failure: leave the opslog; on next run, scan `/var/log/ha-sync/` for unprocessed opslogs and retry flush before starting new sync
6. Cleanup: after each run, delete opslogs older than 10 days (`os.Stat` mtime check)
### ⚠️ Delete Propagation Warning
With two one-way jobs per pair, ordering matters for deletes. If `dell→hp` runs before `hp→dell` and `--delete-missing` is ON for both, files that only exist on HP will be deleted before they're copied to Dell.
**Safe default**: `--delete-missing=false` for all jobs. Enable `--delete-missing=true` only on the **primary direction** (e.g., `dell→hp` for each pair) once the initial full sync has completed and both sides are known-equal.
---
## NFS Sync Pairs
| Pair name | Dell NFS (192.168.2.100) | HP NFS (192.168.2.193) |
|---|---|---|
| `media` | `/data/media` | `/data/media` |
| `photos` | `/data/photos` | `/data/photos` |
| `owncloud` | `/data/owncloud` | `/data/owncloud` |
| `games` | `/data/games` | `/data/games` |
| `infra` | `/data/infra` | `/data/infra` |
| `ai` | `/data/ai` | `/data/ai` |
Each pair produces **two CronJobs** in the `infrastructure` namespace.
---
## CLI Interface (`ha-sync`)
```
ha-sync [flags]
Required:
--src <path> Source directory (absolute path inside pod)
--dest <path> Destination directory (absolute path inside pod)
--pair <name> Logical pair name (e.g. "media"); used as Lease name ha-sync-<pair>
Optional:
--direction <str> Label for logging, e.g. "dell-to-hp" (default: "fwd")
--db-dsn <dsn> MySQL DSN (default: from env HA_SYNC_DB_DSN)
--lock-ttl <seconds> Lease TTL before considered stale (default: 3600)
--log-dir <path> Directory for opslog files (default: /var/log/ha-sync)
--log-retain-days <n> Delete opslogs older than N days (default: 10)
--mtime-threshold <s> Seconds of tolerance for mtime equality (default: 2)
--delete-missing Delete dest files not present in src (default: false)
--workers <n> Concurrent file workers (default: 4)
--dry-run Compute what would sync, save to DB as dry_run rows, print plan; do not copy/delete (default: false)
--verbose Verbose output
--help
```
---
## MySQL Schema (database: `general_db`)
```sql
-- One row per CronJob execution
CREATE TABLE IF NOT EXISTS sync_iterations (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
sync_pair VARCHAR(255) NOT NULL,
direction VARCHAR(64) NOT NULL,
src VARCHAR(512) NOT NULL,
dest VARCHAR(512) NOT NULL,
started_at DATETIME(3) NOT NULL,
ended_at DATETIME(3),
status ENUM('running','success','partial_failure','failed') NOT NULL DEFAULT 'running',
dry_run TINYINT(1) NOT NULL DEFAULT 0,
files_created INT DEFAULT 0,
files_updated INT DEFAULT 0,
files_deleted INT DEFAULT 0,
files_skipped INT DEFAULT 0,
files_failed INT DEFAULT 0,
total_bytes_transferred BIGINT DEFAULT 0,
error_message TEXT,
INDEX idx_pair (sync_pair),
INDEX idx_started (started_at),
INDEX idx_dry_run (dry_run)
);
-- One row per individual file operation (flushed from opslog on sync completion)
CREATE TABLE IF NOT EXISTS sync_operations (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
iteration_id BIGINT NOT NULL,
dry_run TINYINT(1) NOT NULL DEFAULT 0,
operation ENUM('create','update','delete') NOT NULL,
filepath VARCHAR(4096) NOT NULL,
size_before BIGINT,
size_after BIGINT,
md5_before VARCHAR(32),
md5_after VARCHAR(32),
started_at DATETIME(3) NOT NULL,
ended_at DATETIME(3),
status ENUM('success','fail') NOT NULL,
error_message VARCHAR(4096),
INDEX idx_iteration (iteration_id),
CONSTRAINT fk_iteration FOREIGN KEY (iteration_id) REFERENCES sync_iterations(id)
);
```
> No `sync_locks` table — locking is handled by Kubernetes Lease objects.
### Dry-run Idempotency Rules
1. **`--dry-run` mode**: walk source and dest, compute the full set of would-be operations (create/update/delete), save to DB with `dry_run = 1`, print the plan. **No files are copied or deleted.**
2. **Idempotency check**: before running a dry-run, query for the last successful dry-run iteration for `(pair, direction)`:
```sql
SELECT id, started_at FROM sync_iterations
WHERE sync_pair = ? AND direction = ? AND dry_run = 1 AND status = 'success'
ORDER BY started_at DESC LIMIT 1;
```
Then re-walk the source and dest and compute the would-be operation set. Compare it against the `sync_operations` rows from that previous dry-run iteration (same set of `filepath + operation + size_before`). If **identical** → print `"Dry-run already current as of <started_at>. Nothing has changed."` and exit without writing new rows.
3. **Production run (`--dry-run` not set)**: all queries for previous iterations use `WHERE dry_run = 0`. Dry-run rows are **never considered** for skip logic, idempotency, or status reporting in production runs.
4. **Lease is still acquired** during dry-run (prevents two dry-runs from racing each other).
---
## Project Structure
```
services/ha-sync/
cmd/ha-sync/
main.go # Sync CLI entry point
cmd/ha-sync-ui/
main.go # Dashboard HTTP server entry point (serves ha-sync.vandachevici.ro)
internal/
config/
config.go # Config struct, defaults, validation (shared by both binaries)
db/
db.go # MySQL connect, auto-migrate schema
logging.go # StartIteration, FinishIteration, BulkInsertOperations, LastDryRunOps
lease/
lease.go # Acquire/release/heartbeat Kubernetes Lease object
opslog/
writer.go # Append JSON lines to /var/log/ha-sync/opslog-<pair>-<direction>-<RFC3339>.jsonl
flusher.go # Scan for unprocessed opslogs, batch INSERT; cleanup logs >10 days
sync/
engine.go # Main sync loop: walk, compare, dispatch; dryRun flag skips writes
walker.go # Recursive directory walk
compare.go # mtime+size comparison; conditional MD5
copy.go # File copy with os.Chtimes() mtime preservation
delete.go # Safe delete with pre-check
ui/
handler.go # HTTP handlers: index, /api/iterations, /api/operations, /api/pairs
templates/
index.html # Dashboard HTML; auto-refreshes every 10s via fetch(); vanilla JS only
go.mod
go.sum
Dockerfile # Multi-stage: golang:1.22-alpine builder (builds ha-sync + ha-sync-ui) → alpine:3.20
Makefile # build, docker-build IMAGE=<registry>/ha-sync:latest, docker-push targets
deployment/ha-sync/
serviceaccount.yaml # ServiceAccount: ha-sync, namespace: infrastructure
rbac.yaml # Role + RoleBinding: leases (coordination.k8s.io) create/get/update/delete
secret.yaml # NOTE: create manually — see Phase 3C instructions
pv-logs.yaml # PersistentVolume: NFS 192.168.2.193:/data/infra/ha-sync-logs, 10Gi, RWX
pvc-logs.yaml # PVC bound to pv-logs; all CronJobs mount at /var/log/ha-sync
pv-dell-<pair>.yaml # PersistentVolume: NFS 192.168.2.100:/data/<pair> (one per pair × 6)
pv-hp-<pair>.yaml # PersistentVolume: NFS 192.168.2.193:/data/<pair> (one per pair × 6)
pvc-dell-<pair>.yaml # PVC → pv-dell-<pair> (one per pair × 6)
pvc-hp-<pair>.yaml # PVC → pv-hp-<pair> (one per pair × 6)
cron-<pair>-dell-to-hp.yaml # --dry-run is DEFAULT; remove flag to enable production sync
cron-<pair>-hp-to-dell.yaml # same
ui-deployment.yaml # Deployment: ha-sync-ui, 1 replica, image: <registry>/ha-sync:latest, cmd: ha-sync-ui
ui-service.yaml # ClusterIP Service: port 8080 → ha-sync-ui pod
ui-ingress.yaml # Ingress: ha-sync.vandachevici.ro → ui-service:8080; cert-manager TLS
kustomization.yaml # Kustomize root listing all resources
scripts/cli/
ha-sync.md # CLI reference doc
```
### UI Dashboard (`ha-sync.vandachevici.ro`)
- **Binary**: `ha-sync-ui` — Go HTTP server, port 8080
- **Routes**:
- `GET /` — HTML dashboard; auto-refreshes via `setInterval` + `fetch`
- `GET /api/pairs` — JSON: per-pair last iteration summary (dry_run=0 and dry_run=1 separately)
- `GET /api/iterations?pair=&limit=20` — JSON: recent iterations
- `GET /api/operations?iteration_id=` — JSON: operations for one iteration
- **Dashboard shows**: per-pair status cards (last real sync, last dry-run, files created/updated/deleted/failed), recent activity table, errors highlighted in red
- **Env vars**: `HA_SYNC_DB_DSN` (same secret as CronJobs)
- **K8s**: Deployment in `infrastructure` namespace, 1 replica, same ServiceAccount as CronJobs (read-only DB access only)
---
## Tasks
> **Parallelism key**: Tasks marked `[P]` can be executed in parallel by separate agents. Tasks marked `[SEQ]` must follow the listed dependency chain.
---
### Phase 0 — Scaffolding `[SEQ]`
Must complete before any code is written; all subsequent tasks depend on this.
| # | Task | Command / Notes |
|---|---|---|
| 0.1 | Create Go module | `cd services/ha-sync && go mod init github.com/vandachevici/homelab/ha-sync` |
| 0.2 | Create directory tree | `mkdir -p cmd/ha-sync internal/{config,db,lease,opslog,sync}` |
| 0.3 | Create Dockerfile | Multi-stage: `FROM golang:1.22-alpine AS build` → `FROM alpine:3.20`; copy binary; `ENTRYPOINT ["/ha-sync"]` |
| 0.4 | Create Makefile | Targets: `build`, `docker-build IMAGE=<registry>/ha-sync:latest`, `docker-push IMAGE=...` |
---
### Phase 1 — Core Go packages `[P after Phase 0]`
Sub-tasks 1A, 1B, 1C, 1E are **fully independent** — assign to separate agents simultaneously. 1D depends on all of them.
#### 1A — `internal/config` `[P]`
| # | Task | Notes |
|---|---|---|
| 1A.1 | Write `config.go` | Define `Config` struct with all CLI flags; use `flag` stdlib or `cobra`; set defaults from CLI Interface section above |
#### 1B — `internal/db` `[P]`
| # | Task | Notes |
|---|---|---|
| 1B.1 | Write `db.go` | `Connect(dsn string) (*sql.DB, error)`; run `CREATE TABLE IF NOT EXISTS` for both tables (include `dry_run TINYINT(1) NOT NULL DEFAULT 0` column in both) on startup |
| 1B.2 | Write `logging.go` | `StartIteration(dryRun bool, ...) (id int64)` → INSERT with `dry_run` set; `FinishIteration(id, status, counts)` → UPDATE; `BulkInsertOperations(iterID int64, dryRun bool, []OpRecord)` → batch INSERT; `LastDryRunOps(db, pair, direction string) ([]OpRecord, error)` → fetch ops for last successful `dry_run=1` iteration for idempotency check |
#### 1C — `internal/lease` `[P]`
| # | Task | Notes |
|---|---|---|
| 1C.1 | Write `lease.go` | Use `k8s.io/client-go` in-cluster config; `Acquire(ctx, client, namespace, leaseName, holderID, ttlSec)` — create or update Lease if expired; `Release(ctx, client, namespace, leaseName, holderID)` — delete Lease; `Heartbeat(ctx, ...)` — goroutine that calls `Update` on `spec.renewTime` every `ttlSec/3` seconds |
#### 1D — `internal/opslog` `[P]`
| # | Task | Notes |
|---|---|---|
| 1D.1 | Write `writer.go` | `Open(logDir, pair, direction string) (*Writer, error)` — creates `/var/log/ha-sync/opslog-<pair>-<direction>-<RFC3339>.jsonl`; `Append(op OpRecord) error` — JSON-encode one line |
| 1D.2 | Write `flusher.go` | `FlushAll(logDir string, db *sql.DB) error` — scan dir for `*.jsonl`, for each: decode lines → call `BulkInsertOperations`, delete file on success; `CleanOld(logDir string, retainDays int)` — delete files with mtime older than N days |
#### 1E — `internal/sync` `[P]`
| # | Task | Notes |
|---|---|---|
| 1E.1 | Write `walker.go` | `Walk(root string) ([]FileInfo, error)` — returns slice of `{RelPath, AbsPath, Size, ModTime, IsDir}`; use `filepath.WalkDir` |
| 1E.2 | Write `compare.go` | `NeedsSync(src, dest FileInfo, threshold time.Duration) bool` — mtime+size check; `MD5File(path string) (string, error)` — streaming MD5; `MD5Changed(srcPath, destPath string) bool` |
| 1E.3 | Write `copy.go` | `CopyFile(src, dest string, srcModTime time.Time) error` — copy bytes, then `os.Chtimes(dest, srcModTime, srcModTime)` to preserve mtime |
| 1E.4 | Write `delete.go` | `DeleteFile(path string) error` — `os.Remove`; `DeleteDir(path string) error` — `os.RemoveAll` only if dir is empty after child removal |
| 1E.5 | Write `engine.go` | Walk src+dest, compare, dispatch create/update/delete via worker pool (`sync.WaitGroup` + buffered channel of `--workers` size); if `dryRun=true`, build op list but **do not call copy/delete** — return ops for caller to log; write each op to opslog.Writer (tagged with dry_run flag); return summary counts |
#### 1F — `cmd/ha-sync/main.go` `[SEQ, depends on 1A+1B+1C+1D+1E]`
| # | Task | Notes |
|---|---|---|
| 1F.1 | Write `main.go` | Parse flags → build config → connect DB → flush old opslogs → acquire Lease → **if `--dry-run`: call `LastDryRunOps`, walk src+dest, compute would-be ops, compare; if identical → print "already current" + exit; else run engine(dryRun=true)** → open opslog writer (tagged dry_run) → start iteration row (`dry_run` = true/false) → run engine → finish iteration → flush opslog to DB → release Lease; trap SIGTERM to release Lease before exit; **production queries always filter `dry_run = 0`** |
---
### Phase 2 — Build & Docker Image `[SEQ after Phase 1]`
| # | Task | Command |
|---|---|---|
| 2.1 | Fetch Go deps | `cd services/ha-sync && go mod tidy` |
| 2.2 | Build binary | `cd services/ha-sync && make build` |
| 2.3 | Build Docker image | `make docker-build IMAGE=192.168.2.100:5000/ha-sync:latest` *(replace registry if different)* |
| 2.4 | Push Docker image | `make docker-push IMAGE=192.168.2.100:5000/ha-sync:latest` |
---
### Phase 3 — Kubernetes Manifests `[P, can start during Phase 1]`
All manifest sub-tasks are **independent** and can be parallelized.
#### 3A — RBAC + Shared Resources `[P]`
| # | Task | Notes |
|---|---|---|
| 3A.1 | Create `serviceaccount.yaml` | `name: ha-sync`, `namespace: infrastructure` |
| 3A.2 | Create `rbac.yaml` | `Role` with rules: `apiGroups: [coordination.k8s.io]`, `resources: [leases]`, `verbs: [create, get, update, delete]`; `RoleBinding` binding `ha-sync` SA to the Role |
| 3A.3 | Create `pv-logs.yaml` + `pvc-logs.yaml` | PV: `nfs.server: 192.168.2.193`, `nfs.path: /data/infra/ha-sync-logs`, capacity `10Gi`, `accessModes: [ReadWriteMany]`; PVC: `storageClassName: ""`, `volumeName: pv-ha-sync-logs`, namespace `infrastructure` |
#### 3B — PVs and PVCs per pair `[P]`
| # | Task | Notes |
|---|---|---|
| 3B.1 | Create `pv-dell-<pair>.yaml` for each of 6 pairs | `spec.nfs.server: 192.168.2.100`, `spec.nfs.path: /data/<pair>`; capacity per pair: `media: 2Ti`, `photos: 500Gi`, `games: 500Gi`, `owncloud: 500Gi`, `infra: 100Gi`, `ai: 500Gi`; `accessModes: [ReadWriteMany]` |
| 3B.2 | Create `pv-hp-<pair>.yaml` for each of 6 pairs | Same structure; `spec.nfs.server: 192.168.2.193` |
| 3B.3 | Create `pvc-dell-<pair>.yaml` + `pvc-hp-<pair>.yaml` | `namespace: infrastructure`; `accessModes: [ReadWriteMany]`; `storageClassName: ""` (manual bind); `volumeName: pv-dell-<pair>` / `pv-hp-<pair>` |
#### 3C — CronJobs `[P, depends on 3A+3B for volume/SA names]`
| # | Task | Notes |
|---|---|---|
| 3C.1 | Create `cron-<pair>-dell-to-hp.yaml` for each pair | `namespace: infrastructure`; `serviceAccountName: ha-sync`; `schedule: "*/15 * * * *"`; image: `<registry>/ha-sync:latest`; args: `["--src=/mnt/dell/<pair>","--dest=/mnt/hp/<pair>","--pair=<pair>","--direction=dell-to-hp","--db-dsn=$(HA_SYNC_DB_DSN)","--log-dir=/var/log/ha-sync"]`; volumeMounts: `pvc-dell-<pair>` → `/mnt/dell/<pair>`, `pvc-hp-<pair>` → `/mnt/hp/<pair>`, `pvc-ha-sync-logs` → `/var/log/ha-sync`; envFrom: `ha-sync-db-secret` |
| 3C.2 | Create `cron-<pair>-hp-to-dell.yaml` for each pair | Same but src/dest swapped, `direction=hp-to-dell`; offset schedule by 7 min: `"7,22,37,52 * * * *"` |
| 3C.3 | Create `secret.yaml` | Comment-only file; actual secret created manually: `kubectl create secret generic ha-sync-db-secret --from-literal=HA_SYNC_DB_DSN='<user>:<pass>@tcp(general-purpose-db.infrastructure.svc.cluster.local:3306)/general_db' -n infrastructure` |
| 3C.4 | Create `kustomization.yaml` | Resources in order: `serviceaccount.yaml`, `rbac.yaml`, `pv-logs.yaml`, `pvc-logs.yaml`, all `pv-*.yaml`, all `pvc-*.yaml`, all `cron-*.yaml` |
---
### Phase 4 — CLI Documentation `[P, independent]`
| # | Task | Notes |
|---|---|---|
| 4.1 | Create `scripts/cli/ha-sync.md` | Document all flags, defaults, example invocations, env vars (`HA_SYNC_DB_DSN`); note `--dry-run` for safe first-run; note `--delete-missing` rollout guidance |
---
### Phase 5 — Deploy & Verify `[SEQ after Phase 2+3]`
| # | Task | Command |
|---|---|---|
| 5.1 | Create DB secret | `kubectl create secret generic ha-sync-db-secret --from-literal=HA_SYNC_DB_DSN='<user>:<pass>@tcp(general-purpose-db.infrastructure.svc.cluster.local:3306)/general_db' -n infrastructure` |
| 5.2 | Apply manifests | `kubectl apply -k deployment/ha-sync/` |
| 5.3 | Dry-run smoke test | `kubectl create job ha-sync-test --from=cronjob/ha-sync-media-dell-to-hp -n infrastructure` then: `kubectl logs -l job-name=ha-sync-test -n infrastructure -f` |
| 5.4 | Verify Lease is created | `kubectl get lease ha-sync-media -n infrastructure -o yaml` |
| 5.5 | Verify DB rows | `kubectl exec -it <general-purpose-db-pod> -n infrastructure -- mysql -u<user> -p general_db -e "SELECT * FROM sync_iterations ORDER BY id DESC LIMIT 5;"` |
| 5.6 | Verify opslog flush | Check `/var/log/ha-sync/` on the logs PVC — no `.jsonl` files should remain after a successful run |
| 5.7 | Trigger real first run | Delete the test job; let CronJob run on schedule; observe `sync_operations` table |
---
## Open Questions / Future Work
- **MySQL HA**: `general-purpose-db` is a single-replica StatefulSet — no HA. Since locking is now handled by K8s Lease and MySQL is only used for audit logging (with opslog fallback), a MySQL outage won't block sync. If full MySQL HA is later desired, **MariaDB Galera Cluster (3 replicas)** is the recommended path for this homelab.
- **Conflict resolution**: Currently "newest mtime wins". If clocks drift between nodes, a file could ping-pong. Consider NTP enforcement across all nodes or use `--mtime-threshold` >= observed clock skew.
- **Delete safety**: `--delete-missing` defaults to `false`. Staged rollout: run one full cycle disabled first → confirm parity → enable on primary direction only.
- **Alerting**: Add a Prometheus/Grafana alert on `sync_iterations.status = 'failed'` (query general_db directly or expose a future `/metrics` endpoint).
- **DB retention**: `sync_operations` will grow large. Add a cleanup step: `DELETE FROM sync_operations WHERE started_at < NOW() - INTERVAL 30 DAY` as a weekly CronJob.
- **Registry**: Dockerfile assumes local registry at `192.168.2.100:5000`. Confirm registry address before Phase 2.