Backups and Restore

FreeSDN ships two distinct backup mechanisms with different scopes. Using the wrong one during an incident is costly - understand each before you need them.

Two backup mechanisms

Mechanism	What it captures	Use case
pg-backup (DB dumps)	Full Postgres state: users, audit log, device inventory, encrypted credentials, agent registry, plugin state, all module data	Disaster recovery - restore a complete instance
Configuration Backup module (`.fsdn` archives)	Portable config snapshot: Sites, Controllers, Devices, users, automation rules. No credentials.	Migration between instances, dev-to-prod copy, config version history

pg-backup: daily GPG-encrypted DB dumps

The freesdn-pg-backup container is part of the core stack - no profile required. It dumps both databases daily:

freesdn - the primary PostgreSQL DB (19 schemas: access, agents, ai, analytics, audit, backup, cameras, collector, core, devices, enterprise, events, fabric, firewall, gateway, hypervisor, network, voip, vpn)
freesdn_logs - the TimescaleDB observability DB

Dumps land in the pg_backups volume with 7-day local retention.

GPG encryption (required in production)

By default, pg-backup fail-closes without GPG configured: it refuses to write unencrypted dumps. This applies to every deployment tier - the guard checks only whether GPG is active, not ENVIRONMENT. Any stack that lacks a valid GPG recipient must set BACKUP_ALLOW_PLAINTEXT=1 explicitly (the homelab .env.lite template includes a commented-out opt-in; uncomment it only if you have no GPG key and understand the risk - pro and max tier templates include GPG placeholder values that you must fill in before running - generate a dedicated keypair, set BACKUP_GPG_RECIPIENT to the real recipient email, and place the exported public key at ./secrets/backup-public.asc - and must not set this flag). Configure a GPG recipient before running in production.

Setup:

On a secure restore host (not the production server), generate a dedicated backup key pair:

gpg --full-generate-key
# Choose RSA 4096, no expiry
gpg --export --armor backup@example.com > backup-public.asc

Copy backup-public.asc to ./secrets/backup-public.asc in the repo.

Set in your env file:

BACKUP_GPG_RECIPIENT=backup@example.com
BACKUP_GPG_PUBLIC_KEY_PATH=./secrets/backup-public.asc

The container holds only the public key and cannot decrypt its own dumps. Store the private key and its passphrase in a secrets manager completely separate from the production server.

Manually trigger a backup

The scheduled dump runs every 24 hours. To force one immediately:

docker compose exec spawns a fresh shell inside the running container. The GPG guard ($GPG_ENABLED) exists only in the entrypoint process; this shell inherits no such variable and writes raw .sql.gz files unconditionally - even in a GPG-enabled production stack.

Preferred approach in production: restart the pg-backup container so the entrypoint re-runs with the full GPG path:

docker compose --env-file .env.pro restart pg-backup
# The entrypoint loop fires immediately and writes an encrypted .sql.gz.gpg dump.
docker compose --env-file .env.pro exec pg-backup ls -lh /backups

Use the exec command below only in plaintext-allowed environments (homelab / dev with BACKUP_ALLOW_PLAINTEXT=1). Never run it against a production stack.

docker compose --env-file .env.pro exec pg-backup bash -c '
  STAMP=$(date +%Y%m%d_%H%M%S)
  pg_dump -h postgres -U "$PGUSER" -d "${POSTGRES_DB:-freesdn}" --no-owner --no-privileges \
    | gzip > /backups/manual_freesdn_$STAMP.sql.gz
  pg_dump -h logdb -U "$LOGDB_USER" -d "$LOGDB_DB" --no-owner --no-privileges \
    | gzip > /backups/manual_logdb_$STAMP.sql.gz
  ls -lh /backups/manual_*.sql.gz
'

List and retrieve backups

# List all dumps in the backup volume
docker compose --env-file .env.pro exec pg-backup ls -lh /backups

# Copy a GPG-encrypted dump to the host (production  -  files end in .sql.gz.gpg)
docker compose --env-file .env.pro exec pg-backup cat /backups/freesdn_20260520_030000.sql.gz.gpg \
  > ./freesdn_20260520_030000.sql.gz.gpg

# Copy a plaintext dump to the host (dev/homelab with BACKUP_ALLOW_PLAINTEXT=1  -  files end in .sql.gz)
docker compose --env-file .env.pro exec pg-backup cat /backups/freesdn_20260520_030000.sql.gz \
  > ./freesdn_20260520_030000.sql.gz

Off-site DR via rclone (`dr` profile)

Enable the dr Compose profile to run a sidecar that syncs encrypted dumps off-site using rclone. See Compose Profiles for setup steps.

Off-site retention defaults to 30 days, managed independently from local retention. Use a bucket with object-lock / write-once policy to protect against ransomware.

Default RPO with the shipped daily-dump model: up to 24 hours. For tighter recovery points, configure WAL streaming as described in the DR procedure docs.

The Configuration Backup module

The Configuration Backup module (at /backup in the UI) produces portable .fsdn archives. Use them to:

Migrate configuration from a staging instance to production
Snapshot configuration before a major change
Seed a new Site with an existing Site’s configuration

A .fsdn archive carries no credentials. After restoring a config archive to a new instance, re-enter all module secrets (Controller passwords, NVR credentials, etc.).

The module supports selective restore (individual sections), strict semver schema gating (a mis-versioned section is rejected, not silently applied), and automatic rollback slots (a pre-restore snapshot is captured before any non-dry-run restore).

Restore procedure

From a pg_dump (disaster recovery)

Full restore from scratch. Estimated time: 30-60 minutes for DB-only loss; 3-4 hours for total data-center loss including provisioning.

# 1. Stop write surfaces (keep DBs running if they exist)
docker compose --env-file .env.pro stop api worker worker-io scheduler

# 2. Drop and recreate the target databases
docker compose --env-file .env.pro exec postgres \
  psql -U "$POSTGRES_USER" -d postgres -c \
  "DROP DATABASE IF EXISTS freesdn; CREATE DATABASE freesdn OWNER $POSTGRES_USER;"

docker compose --env-file .env.pro exec logdb \
  psql -U "$LOGDB_USER" -d postgres -c \
  "DROP DATABASE IF EXISTS freesdn_logs; CREATE DATABASE freesdn_logs OWNER $LOGDB_USER;"

# 3. Restore from the dumps
# Production (GPG-encrypted, .sql.gz.gpg): decrypt first, then pipe into psql.
# The private key must be available on the restore host (not the production server).
gpg --decrypt ./freesdn_20260520_030000.sql.gz.gpg | gunzip -c | \
  docker compose --env-file .env.pro exec -T postgres \
  psql -U "$POSTGRES_USER" -d "$POSTGRES_DB"

gpg --decrypt ./logdb_20260520_030000.sql.gz.gpg | gunzip -c | \
  docker compose --env-file .env.pro exec -T logdb \
  psql -U "$LOGDB_USER" -d "$LOGDB_DB"

# Dev/homelab only (BACKUP_ALLOW_PLAINTEXT=1, .sql.gz  -  never in production):
# gunzip -c ./freesdn_20260520_030000.sql.gz | \
#   docker compose --env-file .env.pro exec -T postgres \
#   psql -U "$POSTGRES_USER" -d "$POSTGRES_DB"
#
# gunzip -c ./logdb_20260520_030000.sql.gz | \
#   docker compose --env-file .env.pro exec -T logdb \
#   psql -U "$LOGDB_USER" -d "$LOGDB_DB"

# 4. Run migrations (detects existing schema, stamps to head  -  safe to re-run)
docker compose --env-file .env.pro up -d api worker worker-io scheduler
docker compose --env-file .env.pro exec api python scripts/migrate.py

# 5. Invalidate stale sessions (forces re-login  -  required after every restore)
docker compose --env-file .env.pro exec postgres \
  psql -U "$POSTGRES_USER" -d "$POSTGRES_DB" -c \
  "TRUNCATE core.user_sessions;
   UPDATE core.users SET token_version = COALESCE(token_version, 0) + 1;"

Post-restore validation

Always run this after any restore:

BASE=https://freesdn.example.com

# Health checks must pass before allowing traffic
curl -fsS "$BASE/api/v1/health/live"
curl -fsS "$BASE/api/v1/health/ready"

# Audit chain integrity  -  the most important check
# Requires super_admin role. Use a super_admin Bearer token or super_admin session cookie.
# -f is omitted so a 403 body is visible if the wrong role is used.
curl -sS -H "Authorization: Bearer <super_admin_token>" "$BASE/api/v1/audit/validate?limit=100000" | jq
# Expected: {"valid": true, "broken_at": null, ...}

valid: false with broken_reason: "tampered" indicates either a partial commit during restore or genuine tampering. Cross-reference your off-site immutable backup before taking further action.

RPO and RTO targets

Scenario	Default RPO	Target RTO
Daily dump, local restore	Up to 24 hours	30-60 min (DB-only loss) / 3-4 hours (total loss)
Daily dump + off-site sync	Up to 24 hours	3-4 hours (includes downloading the off-site dump)
WAL streaming (operator-configured)	~5 minutes	3-4 hours + WAL replay time

These are planning targets, not contractual SLAs. Define and validate your own RPO/RTO with quarterly restore drills.

Next steps: High Availability - Valkey automatic failover and the Postgres standby topology.