HA Sync — How It Works

Overview

HA Sync is a homelab service that keeps six NFS-exported data folders in sync between two physical servers — a Dell OptiPlex 7070 (192.168.2.100) and an HP ProLiant DL360 G7 (192.168.2.193) — so that either machine can take over if the other goes down.

Sync runs as Kubernetes CronJobs every 15 minutes. Each folder pair has two jobs: one copying Dell → HP, and one copying HP → Dell (bidirectional, last-writer-wins).

Dry-run mode is on by default. CronJobs run with --dry-run until you remove that flag. The dashboard shows what would be synced — no files are actually moved until you enable real mode.

Sync Pairs

Pair	Dell path	HP path	Description
`media`	`/data/media`	`/data/media`	Movies, TV shows, music
`photos`	`/data/photos`	`/data/photos`	Personal photo library
`owncloud`	`/data/owncloud`	`/data/owncloud`	OwnCloud user data
`games`	`/data/games`	`/data/games`	Game storage
`infra`	`/data/infra`	`/data/infra`	Infrastructure configs & DB data
`ai`	`/data/ai`	`/data/ai`	AI model weights & datasets

How a Sync Run Works

Acquire K8s Lease

→

Walk src & dest trees

→

Compare mtime + size

→

Copy / Delete (worker pool)

→

Write results to MySQL

→

Release Lease

Lease acquisition — the CronJob pod acquires a Kubernetes Lease object (coordination.k8s.io/v1) named ha-sync-<pair>. If another pod for the same pair is already running, it exits immediately. The lease is heartbeated every TTL/3 seconds and auto-expires on crash.
Tree walk — source and destination directories are walked in parallel. Each file's path, size, and modification time are collected into a hash map.
Comparison — files are compared by mtime + size. If they differ by less than 2 seconds (configurable), they are considered equal and skipped. On a mtime/size mismatch an MD5 comparison is triggered to avoid false positives.
Copy / delete — a configurable worker pool (default 4) processes the operation queue. os.Chtimes() preserves the source mtime on every copy, which prevents the reverse-direction job from re-copying the same file.
Opslog flush — each operation is appended to a local JSONL file (/var/log/ha-sync/, backed by NFS). After all ops complete, the file is bulk-inserted into MySQL and deleted. If MySQL is down, the file is retried on the next run.

Loop Prevention

Because sync is bidirectional, a naïve implementation would copy a file from A→B, then copy it back B→A on the next run, forever. HA Sync avoids this by preserving the source file's mtime on every copy. On the next run the comparison sees equal mtimes and skips the file.

In a write conflict (both sides modified the same file between runs), the newest mtime wins — the more recently modified copy is treated as the source of truth and overwrites the other.

Dry-Run & Idempotency

Running with --dry-run computes all would-be operations and saves them to the sync_iterations / sync_operations tables with dry_run = 1, but makes no file changes. The dashboard marks these rows with a DRY badge.

If you trigger a second dry-run before anything changes on disk, the service detects that the new would-be op set is identical to the previous one, skips writing new DB rows, and prints "no changes since last dry-run".

Enabling Real Sync

When you are satisfied with the dry-run output, remove --dry-run from the CronJob args:

Patch a single pair:
kubectl -n infrastructure edit cronjob ha-sync-media-dell-to-hp
Remove --dry-run from .spec.jobTemplate.spec.template.spec.containers[0].args

Enable delete propagation (after initial full sync only):
Add --delete-missing to the args of the primary direction CronJob. Do not enable it on both directions simultaneously.

Infrastructure

Component	Detail
Language	Go 1.22, single static binary
Locking	Kubernetes `Lease` (`coordination.k8s.io/v1`), no MySQL dependency for locks
Database	MySQL 9 (`general-purpose-db` StatefulSet, `general_db` schema)
Storage	NFS PersistentVolumes, RWX — both servers export `/data/*`
Schedule	Dell→HP every 15 min; HP→Dell at :07, :22, :37, :52 (staggered)
Workers	4 concurrent copy goroutines per run (configurable)
Log retention	Opslog JSONL files kept for 10 days before purge