Skip to content
Last updated Give Feedback

Backup & Restore

A backup that’s never restored is not a backup. This page covers what to back up, where to ship it, how often, and — critically — how to verify the restore actually works before you need it in production.

MetricTarget
RPO (data loss tolerance)≤ 1 hour — hourly Postgres dumps + WAL streaming
RTO (recovery time)≤ 30 minutes for DB restore + service restart from a clean VM, given the latest dump in hand
Backup retention30 daily + 12 monthly + 4 yearly snapshots (off-site)
Encryption at restAll off-site dumps are AES-256 encrypted via the same ECOMMUS_DATA_KEY chain or a dedicated backup KEK
Off-site targetS3-compatible (Backblaze B2 recommended for €0 budget; AWS S3 or Cloudflare R2 also work)

These are defaults for a single-tenant install. Multi-tenant SaaS deployments aiming at higher tiers (Enterprise) need stricter RPO (continuous WAL ship-out) — see ADR-031.

SourceWhereHow
ecommus_prod Postgres databaseCustom-format dump (pg_dump -Fc)Hourly cron + WAL streaming
ecommus_licenses (license-server self-hosted)SameHourly cron
Uploads (STORAGE_LOCAL_DIR=/opt/ecommus/uploads)rsync diff + nightly tarballNightly cron
Keys (/opt/ecommus/keys/*.pem)Encrypted tarball, separate targetOnce on provision; never again unless rotated
.envEncrypted, separate targetAfter every change
Caddyfile / Nginx vhostsSameAfter every change

Don’t back up dist/, node_modules/, apps/*/.next, apps/*/.astro, data/db/ (that’s the dev pglite path — production runs Postgres). They’re regenerable from git + npm ci.

Don’t back the keys to the same target as the DB. Compromise of one bucket shouldn’t compromise the other.

/etc/cron.d/ecommus-backup
0 * * * * ecommus /opt/ecommus/scripts/backup.sh >> /var/log/ecommus-backup.log 2>&1
/opt/ecommus/scripts/backup.sh
#!/bin/bash
set -euo pipefail
ts=$(date -u +%Y-%m-%dT%H-%M-%SZ)
out=/var/backups/ecommus/$ts
mkdir -p "$out"
# Postgres dump (custom format = compressed + parallel-restore-friendly)
pg_dump -Fc -d "$DATABASE_URL" -f "$out/db.dump"
# License DB if self-hosting license-server
if [ -n "${LICENSE_DATABASE_URL:-}" ]; then
pg_dump -Fc -d "$LICENSE_DATABASE_URL" -f "$out/licenses.dump"
fi
# Encrypt with the backup KEK (separate from ECOMMUS_DATA_KEY in production)
gpg --batch --yes --symmetric --cipher-algo AES256 \
--passphrase-file /etc/ecommus/backup-passphrase \
"$out/db.dump"
rm "$out/db.dump"
[ -f "$out/licenses.dump" ] && gpg --batch --yes --symmetric --cipher-algo AES256 \
--passphrase-file /etc/ecommus/backup-passphrase \
"$out/licenses.dump" && rm "$out/licenses.dump"
# Off-site ship via rclone (configure once: `rclone config`)
rclone copy "$out" "ecommus-backup:ecommus-prod/$ts/" \
--transfers 4 --checkers 8
# Local retention: keep last 24 hourly dumps locally
find /var/backups/ecommus -mindepth 1 -maxdepth 1 -type d -mtime +1 -exec rm -rf {} +

Off-site retention is enforced by the bucket lifecycle rule (rotate to cold storage after 30 days, delete after 1 year — adjust per your compliance requirement).

Configure Postgres archive_mode + archive_command to ship WAL segments to the same off-site bucket:

/etc/postgresql/16/main/postgresql.conf
wal_level = replica
archive_mode = on
archive_command = 'rclone copyto %p ecommus-backup:ecommus-prod/wal/%f'
archive_timeout = 300 # 5 min

Together with the hourly pg_dump, this gives you a worst-case 5-min data loss on cold-recovery from the bucket.

Restore — the drill (run this once after provisioning, then monthly)

Section titled “Restore — the drill (run this once after provisioning, then monthly)”

This is the procedure you run in a clean VM to validate that the backup is actually restorable. Run it on a fresh VM, not the production one.

Terminal window
# 1. Provision a clean Postgres 16 instance
sudo -u postgres createdb ecommus_drill
# 2. Pull the most recent dump from off-site
rclone copy ecommus-backup:ecommus-prod/$(date -u +%Y-%m-%dT%H-)*/db.dump.gpg /tmp/
# 3. Decrypt
gpg --batch --yes --passphrase-file /etc/ecommus/backup-passphrase \
--decrypt /tmp/db.dump.gpg > /tmp/db.dump
# 4. Restore
pg_restore --clean --if-exists --no-owner \
--dbname=ecommus_drill /tmp/db.dump
# 5. Sanity-check
psql -d ecommus_drill -c "SELECT count(*) FROM tenants;"
psql -d ecommus_drill -c "SELECT count(*) FROM products;"
psql -d ecommus_drill -c "SELECT count(*) FROM orders WHERE created_at >= now() - interval '1 day';"
# 6. Boot a throwaway API against this DB to confirm the schema is alive
DATABASE_MODE=postgres \
DATABASE_URL=postgres://postgres@localhost/ecommus_drill \
JWT_ACCESS_SECRET=drill-only-not-real-not-real-not-real \
JWT_REFRESH_SECRET=drill-only-not-real-not-real-not-real \
ECOMMUS_LICENSE_JWT=<dev-license-from-license-server> \
node --experimental-strip-types apps/api/src/server.ts &
sleep 3
curl http://localhost:4000/health
kill %1

If step 5 row counts are non-zero and step 6’s /health returns ok:true, the backup is restorable.

Do this monthly. A backup you’ve never restored is Schrödinger’s backup — it both works and doesn’t until you check.

The shape is the same, in production:

Terminal window
# 1. Stop all four services (prevent in-flight writes during restore)
pm2 stop all
# 2. Pull + decrypt the dump (latest hourly + WAL replay if needed)
rclone copy ecommus-backup:ecommus-prod/<ts>/db.dump.gpg /tmp/
gpg --decrypt --batch --passphrase-file /etc/ecommus/backup-passphrase \
/tmp/db.dump.gpg > /tmp/db.dump
# 3. Restore. --clean drops + recreates objects; existing connections must be gone.
pg_restore --clean --if-exists --no-owner \
--dbname=$DATABASE_URL /tmp/db.dump
# 4. If WAL replay is needed (recover further past the dump):
# Configure recovery.signal + restore_command to fetch from rclone bucket.
# See https://www.postgresql.org/docs/16/continuous-archiving.html
# 5. Restart
pm2 start all
# 6. Health
curl https://api.mystore.ro/health

Per the RTO target above, this should land in ≤ 30 minutes assuming the dump is already in hand and the VM is provisioned. If you need to rebuild the VM from scratch first, budget ≥ 2 hours.

The Postgres rows include columns encrypted via ECOMMUS_DATA_KEY envelope encryption (Phase 0 §1.6) — payment_methods.config, settings.value (ANAF tokens). These columns are encrypted on-disk and in the dump. Restoring on a VM with a different ECOMMUS_DATA_KEY will leave those columns unreadable.

If you’re rotating ECOMMUS_DATA_KEY (per ADR-028 — Phase 0):

  1. Restore the dump with the old ECOMMUS_DATA_KEY first.
  2. Run the re-key migration: node --experimental-strip-types apps/api/src/cli/rotate-data-key.ts --new-key=<new-hex>
  3. Update .env with the new key; restart the API.

Don’t lose ECOMMUS_DATA_KEY. There is no recovery from a lost master key — encrypted columns are unrecoverable, full stop. Back the key up to a separate target (e.g. password manager + sealed envelope in a safe).

TargetCost (~10 GB/mo)EU region availableNotes
Backblaze B2~€0.05 + egressRecommended for €0-cap installs. Compatible with rclone S3 driver.
Cloudflare R2~€0.15 + free egress to CloudflareGood if already on Cloudflare.
AWS S3~€0.23 + egressMost expensive, most ubiquitous tooling.
Hetzner Storage Box€3.50/mo flatApproved for ecommus per ADR (€5 spending cap, 2026-05-02). Generous.

The framework only needs an rclone-compatible target. The bucket lifecycle / retention policy is configured at the provider, not in ecommus.

  • DB dump on the same disk as the running DB. Disk failure takes both. The hourly cron above writes locally first, then rclone copy ships off-site — local files are deleted after 24 h.
  • Dumps without encryption. A leaked S3 bucket is a leaked customer database. Always GPG-encrypt.
  • No restore drill. “Backups exist” is not the same as “backups work”. Drill monthly.
  • Same passphrase as ECOMMUS_DATA_KEY for backup encryption. Domain isolation. Use a dedicated backup passphrase.
  • Backing up node_modules/ or dist/. Wastes bandwidth + storage. They’re rebuildable.