Architecture
This page describes how FreeSDN’s pieces fit together: what runs in each process, how a request travels through the stack, what makes writes safe by default, and how the background workers, agent, and storage tier interact. Read this before you start tuning deployment options or building integrations.
System overview
Section titled “System overview”Browser / API client │ HTTPS ▼ ┌─────────────┐ │ Caddy │ Edge: automatic HTTPS, TLS termination, static SPA files └──────┬──────┘ │ HTTP (internal) ▼ ┌──────────────────────────────────────────────────────────────────┐ │ FastAPI (gunicorn + uvicorn workers) │ │ │ │ Middleware (outermost → innermost): │ │ RequestID + security headers → Request logging → │ │ Trailing-slash normalize → Body-size limit (1 MiB) → │ │ CSRF double-submit → Rate limiting (Valkey sliding window) │ │ │ │ Core endpoints /api/v1/auth /api/v1/users /api/v1/sites … │ │ Module routes /api/v1/{module-id}/… (filesystem-discovered) │ │ Vendor "gateway" surface /api/v1/gateway-*/… │ │ WebSocket /api/v1/ws (real-time event stream) │ │ │ │ Tenant context org_id from JWT / API key → org scope │ │ RBAC 7-tier roles + per-user site grants │ └──────┬───────────────────────────────────┬───────────────────────┘ │ │ ▼ ▼ ┌──────────────┐ ┌───────────────┐ │ Core + │ Fabric │ Adapter │ │ Modules │ Negotiator ─────►│ Registry │ │ (10 modules)│ │ (13 adapters)│ └──────────────┘ └───────┬───────┘ │ vendor protocol ▼ Network devices / cameras / PBX / firewalls / hypervisors
┌────────────────────────────────────────────────────────────────┐ │ Celery workers │ │ quick-worker - API-side tasks (device sync, firmware check) │ │ io-worker - long I/O (backup, forensic export, scans) │ │ scheduler - Celery beat (cron: SLA eval, DPI roll-ups) │ └────────────────────────────────────────────────────────────────┘
┌──────────────┐ ┌────────────────────────┐ ┌────────────────┐ │ PostgreSQL │ │ TimescaleDB (logdb) │ │ Valkey 8.1 │ │ 19 (primary)│ │ metrics / events / │ │ cache, broker,│ │ schemas │ │ heartbeats / TSDB │ │ rate-limit, │ └──────────────┘ └────────────────────────┘ │ WS pubsub │ └────────────────┘
freesdn-agent (desktop / headless daemon - optional, MIT) └─ WS ──► /api/v1/ws (command / heartbeat / scan-result channel)Edge layer - Caddy
Section titled “Edge layer - Caddy”Caddy sits at the network boundary. It:
- Terminates TLS. Set
CADDY_SITE_ADDRESSto control the mode::80- plain HTTP (use behind an existing load balancer)localhost- HTTPS with Caddy’s automatic internal CAfreesdn.example.com- HTTPS via Let’s Encrypt
- Serves the compiled React SPA from the
frontend/dist/directory. - Proxies everything under
/api/and/wsto the FastAPI process. - Publishes no internal data-tier ports on the host - PostgreSQL, TimescaleDB, and Valkey are reachable only on the Docker-internal network.
An nginx edge escape-hatch is available as a Compose profile for environments that require it, but Caddy is the default.
Application server - FastAPI
Section titled “Application server - FastAPI”FastAPI runs under gunicorn with multiple uvicorn async workers. The application factory
(create_application()) builds the app, attaches the middleware stack, mounts all
routers, then runs the lifespan startup sequence.
Middleware stack
Section titled “Middleware stack”Middleware executes in the order below (outermost first). Every inbound request passes all layers before reaching endpoint logic.
| Layer | What it does |
|---|---|
| RequestID | Reads or generates X-Request-ID; prefixes client-supplied IDs with ext- to flag log poisoning. Injects security headers on every response: X-Content-Type-Options, X-Frame-Options: DENY, a strict Content-Security-Policy, Permissions-Policy, and HSTS when the connection is HTTPS. |
| Request logging | Structured start / complete log per request; adds X-Response-Time to responses. |
| Trailing-slash normalize | Rewrites the path internally so /sites and /sites/ both route correctly - no 307 redirect that would leak the internal proxy host in Location. |
| Body-size limit | Rejects requests where Content-Length exceeds 1 MiB (413). Guards against DoS via oversized stage payloads. |
| CSRF | Double-submit cookie check on all state-changing methods. GET, HEAD, OPTIONS, the public auth flow, and API-key-only requests (no cookie present) are exempt. When both a cookie and an X-API-Key / Bearer token are present, CSRF is still enforced. |
| Rate limiting | Valkey sorted-set sliding window. Per-user bucket keyed from the JWT sub claim (local HMAC verify, no DB round-trip), falling back to rl:ip:<ip>. Auth endpoints fail closed (503) when Valkey is unavailable; all other endpoints fail open. Adds X-RateLimit-Limit / X-RateLimit-Remaining headers. |
Default limits: 600 requests/minute per principal, burst 120/second. Auth endpoints are limited separately at 5/minute per IP.
Startup sequence
Section titled “Startup sequence”When the process starts, the lifespan hook wires subsystems in order. Each subsystem
reports into SUBSYSTEM_STATUS which drives GET /health:
- Event bus connect + subscribe
- Adapter connection pool start
- Cross-pod WebSocket pubsub (Valkey channel;
single_podstatus when Valkey is absent) - Module loader -
discover_modules()scans the filesystem, loads all 10 modules, registers their routers at/api/v1/{module-id}/ - Automation engine start
- Fabric negotiator
wire_and_start() - Third-party plugin loader - loads each plugin, registers routes, starts per-org
(respecting
PluginOrganizationState.is_enabled) - Background initial device sync + firmware check (not awaited - must not block boot)
- DPI built-in rule seeding
- In-process HLS session reaper (15-second loop - must run in the API process, not Celery)
Shutdown reverses this sequence: drain WebSocket sessions, unload modules and plugins, stop automation and Fabric, disconnect Valkey pubsub and the event bus.
Request flow - authenticated API call
Section titled “Request flow - authenticated API call”The sequence for a typical authenticated REST request (for example,
GET /api/v1/switches/ports?site_id=...):
- Caddy receives the HTTPS request, terminates TLS, proxies to FastAPI.
- Middleware stack runs in order: request ID assigned, security headers queued, body-size checked, CSRF skipped (GET), rate-limit token consumed.
- Dependency injection - FastAPI resolves
get_current_active_user:- Tries Bearer header, then
freesdn_accesshttpOnly cookie. verify_token()validates JWT signature, expiry,aud=freesdn-api,iss=freesdn, and the jti revocation blacklist. After the user record is loaded from the database,get_current_user_optional()compares the JWTtv(token version) claim againstuser.token_version- a stale token minted before a password change or logout-all event is rejected at that point (the DB lookup is required to readtoken_version).- Builds a
CurrentUserprincipal carryinguser,permissions,accessible_site_ids, and thescopedflag (set when an API key with an explicit scope list is used).
- Tries Bearer header, then
- Permission check - the route dependency (e.g.
require_permissions("network:read")) evaluates against the principal. For a scoped API key,super_adminimplicit grants do not bypass the explicit scope ceiling. - Tenant context -
site_idfrom the query param is validated against the user’saccessible_site_ids+ org membership. Service methods receive the org-scoped session; queries addWHERE organization_id = :org_id. - Service / adapter - the service fetches data from PostgreSQL or calls the adapter to read live state from the controller.
- Response - serialized through Pydantic v2 schemas, returned as JSON. Secrets are
stripped by
redact_secretsbefore any adapter response reaches endpoint logic.
Multi-tenancy
Section titled “Multi-tenancy”Reads and writes are org-scoped at the application layer throughout the service layer. There is no PostgreSQL Row-Level Security.
For per-user scoping within an organization, FreeSDN uses hybrid site grants: a user
who has one or more UserSiteAccess rows becomes site-limited and can only see the sites
explicitly granted. A user with no grants sees all sites in the organization (backward
compatibility for small teams). Unknown access levels fail closed (deny).
Modules - filesystem-discovered
Section titled “Modules - filesystem-discovered”Modules are not hard-coded into the router. The module loader scans the filesystem, discovers all 10 module packages, and registers each module’s router at startup. This means:
- Module routes appear in the OpenAPI schema only after startup.
- Enabling or disabling a module for an organization is a runtime toggle; the routes are registered regardless, but the service layer enforces the per-org enable flag.
- The full module API surface is assembled at runtime rather than declared in one static router file.
Each module mounts at /api/v1/{module-id}/. The vendor adapter “gateway” surface
(Omada, OPNsense, pfSense, MikroTik, Proxmox, UniFi, OpenWrt) registers additional
routers at /api/v1/gateway-{area}/... (e.g. /api/v1/gateway-vpn/, /api/v1/gateway-opnsense-firewall/, /api/v1/gateway-mikrotik-routing/) plus /api/v1/unifi/... for the UniFi-specific surface.
Adapter registry
Section titled “Adapter registry”An Adapter is a typed vendor driver. All 13 adapters are auto-registered at startup and pooled. When API logic needs to talk to a device, it resolves the adapter for that controller from the registry, calls the normalized operation, and the adapter translates it into the vendor protocol (REST, SOAP/ONVIF, WebSocket JSON-RPC, AMI, CLI).
Every adapter response passes through redact_secrets - a ~120-key camelCase-aware
filter - before leaving the adapter layer. This strips credentials, PSKs, RADIUS
secrets, and similar sensitive values regardless of which adapter returned them.
Staged-write safety pipeline
Section titled “Staged-write safety pipeline”FreeSDN’s most important safety property: writes do not touch live devices by default.
The dual gate has two independent conditions that must both be true for a change to reach a controller:
- Both
ADAPTER_READ_ONLY=falseandOMADA_READ_ONLY=false(environment variables, both defaulttrue). The staging service uses OR logic: if either istrue, all writes are staged.OMADA_READ_ONLYis a legacy per-Omada alias kept for clarity - both must be explicitly cleared for live writes to be dispatched. - The apply call carries
force=truein the request body
If either condition is false, the change is accepted and staged, but never dispatched to the device. This means you can connect FreeSDN to a live production controller in read-only mode and explore its state without any risk of accidental changes.
The staged-write flow:
- Stage - operator authors a change via the UI or API. FreeSDN writes a
PendingChangerow to the database. The controller is not contacted. - Review - the pending change is visible as a diff. A second authorized user (or the same user, depending on your workflow) can inspect it.
- Apply - an explicit
POST /api/v1/gateway-vpn/changes/{change_id}/applywith{"force": true}pushes the change to the controller via the adapter. (The change row already carries the controller and site associations - the apply endpoint takes onlychange_id.) - Discard - a
POST .../changes/{change_id}/discardremoves the staged change without applying it.
Permission model on apply
Section titled “Permission model on apply”The apply endpoint resolves the required permission from the change.feature field
after fetching the row - one endpoint covers all feature domains:
| Feature prefix | Required permission |
|---|---|
vpn.* | vpn:write |
firewall.* / opnsense.* / pfsense.* | firewall:write |
proxmox.* | hypervisor:write |
mikrotik.* | network:write (controller:write for destructive subsets) |
unifi.* | network:write (controller:write for destructive subset) |
system.* / monitoring.* | controller:write |
| (default) | network:write |
Catastrophic operations (VM destroy, node reboot/shutdown, snapshot rollback, backup
restore, firmware installs, factory reset, config restore) additionally require
has_min_role("site_admin") at both stage time and apply time. The stage-time gate
closes the “queue-poison” window where a lower-privileged user stages a destructic
change for a higher-privileged user to unknowingly apply.
Staged-write key endpoints
Section titled “Staged-write key endpoints”| Method | Path | Purpose |
|---|---|---|
POST | /api/v1/gateway-{omada-area}/{controller_id}/sites/{site_id}/changes/{feature} | Stage a change (Omada areas: vpn, firewall, wifi, bulk, firmware, hotspot, profiles, routing, switch-advanced, system) |
POST | /api/v1/gateway-{area}/{controller_id}/changes/{feature} | Stage a change (non-Omada: mikrotik-vpn, opnsense-vpn, opnsense-firewall, pfsense-vpn, pfsense-firewall, proxmox-firewall, openwrt-firewall, unifi-networks) |
GET | /api/v1/gateway-{omada-area}/{controller_id}/sites/{site_id}/changes | List pending changes (Omada areas) |
GET | /api/v1/gateway-{area}/{controller_id}/changes | List pending changes (non-Omada areas) |
POST | /api/v1/gateway-vpn/changes/{change_id}/apply | Apply (push to device) |
POST | /api/v1/gateway-vpn/changes/{change_id}/discard | Discard without applying |
GET | /api/v1/gateway-vpn/changes/by-gateway/{gateway_id} | Fanout pending-changes view |
Fabric - universal app-interconnect
Section titled “Fabric - universal app-interconnect”The Fabric is FreeSDN’s event-driven integration layer. It exposes a single tier-tagged
catalog at GET /api/v1/fabric/catalog that lists every operation and event across all
modules - native and plugin alike.
Operators author Connections: an inbound event (from any of the 7 event sources) triggers a step chain. Steps can invoke operations from any of the 6 modules that declare operations, send notifications, write log records, or call outbound webhooks.
The in-process Negotiator drives step execution. It uses Valkey SET-NX for at-most-once delivery under multi-worker fan-out, so the same event does not trigger a Connection twice when multiple API workers receive it.
Key safety properties of the Fabric:
- Write steps are staged and require per-action sign-off.
- Inbound ingestion (
POST /api/v1/fabric/ingest) requires an org-scoped API key. - Outbound webhook targets are SSRF-validated: RFC 1918, CGNAT, loopback, and IPv4-mapped addresses are denied; redirects are not followed.
- The n8n community node (
n8n-nodes-freesdn) integrates FreeSDN with n8n workflows using the same ingest/webhook surface.
WebSocket - real-time event stream
Section titled “WebSocket - real-time event stream”The WebSocket endpoint at /api/v1/ws provides a real-time event stream to browser
clients and the desktop agent.
Authentication uses the freesdn_access httpOnly cookie or an auth-message frame sent
within 10 seconds of connect. Query-string ?token= auth is deprecated (leaks tokens
into server logs) and logs a warning.
Server-to-client events are filtered:
- Org filter - drops events whose
org_iddoes not match the connection’s organization. Fails closed in both directions: if either the receiver or the event has no org_id, the event is dropped. - Site scope - for site-limited users, drops site-tagged events outside their
UserSiteAccessgrants. - Payload sanitization - strips
password,api_key,token,secret,refresh_token, andencryption_keyfields before delivery. - Session revalidation - every 5 minutes the server checks
is_active,token_version, anddeleted_atfor each live connection. Revoked sessions receive asession_revokedmessage and are closed.
Connection limits: 25 WebSocket connections per user, 5,000 globally, 200 event subscriptions per connection.
Cross-pod delivery - when multiple API replicas run, targeted send_to_user
publishes via a Valkey pubsub channel so any pod can deliver to a user regardless of
which pod holds the connection. No-op in single-pod deployments.
Celery workers and scheduler
Section titled “Celery workers and scheduler”Background work runs in dedicated worker processes. Two worker types are defined:
| Worker | Purpose |
|---|---|
| quick-worker | Short API-side tasks: initial device sync on startup, firmware availability checks, notification dispatch |
| io-worker | Long-running I/O: configuration backup jobs, forensic video export, large discovery scans, off-site DR transfers |
| scheduler | Celery beat - cron-driven: SLA evaluation, DPI metric roll-ups, stale-agent cleanup, backup pruning |
The scheduler (Celery beat) runs as a separate container to avoid clock-skew issues in multi-worker deployments. Workers use Valkey as both the broker and result backend.
Storage tier
Section titled “Storage tier”PostgreSQL 19 (primary database)
Section titled “PostgreSQL 19 (primary database)”The primary database holds all configuration and operational state across 19 schemas:
core, devices, events, enterprise, analytics, agents, vpn, network,
audit, ai, collector, cameras, firewall, voip, access, backup,
gateway, hypervisor, and fabric.
The fabric schema (migration 039) stores Fabric Connection definitions and their
per-firing audit runs (connection_runs).
SQLAlchemy 2.0 async with asyncpg. Connection pool: 20 connections + 30 overflow per worker process. Alembic manages schema migrations; the first boot runs them automatically.
TimescaleDB (logdb - time-series database)
Section titled “TimescaleDB (logdb - time-series database)”A separate TimescaleDB instance holds all time-series data: SNMP trap events, syslog records, NetFlow samples, device heartbeats, SLA metrics, and camera event records. Continuous aggregates roll up metrics for dashboards without scanning raw tables.
LOGDB_URL is required in production and staging. The app refuses to boot without it.
Valkey 8.1 (cache, broker, rate-limit, pubsub)
Section titled “Valkey 8.1 (cache, broker, rate-limit, pubsub)”Valkey is a drop-in Redis replacement. FreeSDN retains the redis:// URL scheme and
redis service name for compatibility.
Valkey serves four distinct roles:
| Role | Details |
|---|---|
| Celery broker + results | Task queue for quick-worker, io-worker, and scheduler |
| Rate-limit windows | Sorted-set sliding windows per-user and per-IP |
| WebSocket pubsub | Cross-pod targeted delivery channel |
| Session cache | JWT blacklist (jti revocation), auth rate-limit counters |
The high-availability configuration (docker-compose.ha.yml) runs one Valkey master,
one replica, and three Sentinels. Valkey failover is automatic via Sentinel promotion.
The API’s Redis client factory (app/core/redis_client.py) resolves the current master
on every connection so it follows Sentinel promotions without restart.
Agent - freesdn-agent
Section titled “Agent - freesdn-agent”The freesdn-agent package (MIT license, v1.0.0, alpha) is an optional desktop
application and headless daemon that runs on Windows, Linux, and macOS (Python >= 3.11).
The agent connects to FreeSDN over WebSocket at /api/v1/ws and provides:
- 14 active discovery scanners - network topology, device fingerprinting, and service detection
- 5 passive listeners - monitors traffic and system events locally
- Capability advertisement - the agent reports what it can do; the platform issues commands via the WebSocket command set
- Cron scheduled scans - configurable scan schedules managed via
/api/v1/agents/{agent_id}/schedules - ECDSA-P256 signed auto-update - updates are signature-verified and fail closed (a bad signature blocks the update rather than applying it)
The agent is useful for reaching networks where the FreeSDN server has no direct layer-3 path to the devices - for example, a remote branch with NAT between the branch LAN and the FreeSDN host.
API surface
Section titled “API surface”Browse the full API surface in the interactive OpenAPI docs at /api/v1/docs on a running non-production instance. Module, plugin, and vendor adapter routers are registered at runtime.
Key platform-level endpoint groups:
| Area | Base path | Notes |
|---|---|---|
| Auth | /api/v1/auth/ | Login, MFA, refresh, sessions, password management |
| SSO | /api/v1/auth/sso/ | OIDC (working), LDAP (working); SAML 501-gated |
| API keys | /api/v1/api-keys/ | Scoped keys, 50-key per-user ceiling |
| Users / orgs / sites | /api/v1/users/, /organizations/, /sites/ | Core admin |
| Controllers | /api/v1/controllers/ | Add/remove/sync controllers |
| Discovery | /api/v1/discovery/ | 4-phase scan pipeline; adopt discovered devices |
| Switches / APs | /api/v1/switches/, /access-points/ | Switch and access point management |
| VPN | /api/v1/vpn/ | VPN management and orchestration |
| Fabric | /api/v1/fabric/ | Catalog, connections, ingest, webhooks |
| Agents | /api/v1/agents/ | Register, heartbeat, tasks, schedules, releases |
| WebSocket | /api/v1/ws | Real-time event stream |
| Health | /health, /api/v1/health/ | Liveness, readiness, subsystem status |
Module-specific routes (cameras, VoIP, firewall, hypervisor, etc.) mount under their own prefixes and are registered at startup by the module loader.
Security cross-cuts
Section titled “Security cross-cuts”Several security mechanisms apply at the platform layer, not per-route:
- JWT validation - signature, expiry,
aud,iss, and token-version claims are verified on every authenticated request. Role is read from the database, not the JWT claim, so a stolen JWT cannot convey a promoted role. - CSRF - double-submit cookie on all mutations; API-key-only requests without a session cookie are exempt.
- Scoped API keys - a key with an explicit scope list marks the principal as
scoped=True. Even asuper_adminowner’s key cannot exceed the declared scope. - Secret redaction -
redact_secretsstrips ~120 sensitive field names (camelCase-aware) from every adapter response before it leaves the adapter layer. - SSRF -
safe_http_requestresolves DNS once, pins the IP, follows no redirects, and blocks RFC 1918 / CGNAT / loopback / IPv4-mapped targets. - Rate-limit fail modes - auth endpoints fail closed (503) on Valkey outage; non-auth endpoints fail open to avoid service disruption from a Valkey blip.
See Security Model and Roles and Permissions for the full treatment.
Next steps
Section titled “Next steps”- Deployment Tiers - sizing the worker and database tier for Lite, Pro, Max, and HA
- Configuration - every environment variable
- Security Model - threat model, security controls, and what “application-layer isolation” means in practice
- Roles and Permissions - the 7-tier hierarchy and per-user site grants in detail
- Fabric - Connections, the catalog, and n8n integration
- Adapters Overview - per-adapter capability matrix and maturity tiers