Enterprise Overview
FreeSDN is a self-hosted, multi-tenant SDN controller. The enterprise layer is the operational plane that sits above individual modules: it lets you manage many organisations and sites from one installation, enforce SLAs, correlate events into incidents, template device configs across a site hierarchy, run bulk firmware upgrades with staged rollout, and produce tamper-evident audit trails - all without a vendor cloud in the loop.
This page maps what exists today. Read it before diving into the per-feature pages.
What the enterprise layer covers
Section titled “What the enterprise layer covers”| Area | Short description | Status |
|---|---|---|
| Multi-org / MSP | Org to Site Group to Site hierarchy; per-user site grants; org quotas | Available |
| SLA monitoring | Policy thresholds, breach detection, acknowledge, compliance summary | Available |
| SLA reports | On-demand PDF/CSV generation; scheduled report records | Partial - see caveat |
| Alert rules engine | Event-bus rule engine to Alerts to multi-channel notifications | Available |
| Event correlation | Pattern-match events into Incidents; lifecycle + assignment | Available |
| Notifications | Email, Slack, Teams, webhook, in-app, SMS, and WhatsApp | Available |
| Audit log + tamper evidence | HMAC-SHA256 hash chain; export; security events | Available |
| Config templates | Hierarchical templates (org to site_group to site to device_group) | Available |
| Device lifecycle FSM | Discovered to managed to decommissioned state model | Available |
| Device health scores | Composite health score with org/site/device rollup | Available |
| Reconciliation | Drift detection + auto-remediate loop; on-demand + scheduled | Available |
| Bulk operations | Reboot / push-config / firmware-update with staged rollout | Available |
| Topology map | L2/L3 graph per site or org; saved layouts; auto-layout algorithms | Available |
| Config version history | Snapshot, diff, rollback per device | Available |
| Streaming telemetry | Agent WebSocket present; no sub-second gRPC pipeline | Partial |
| Scheduled report delivery | Schedules persist in DB; no Celery runner executes them | Partial - see caveat |
Architecture: how these features relate
Section titled “Architecture: how these features relate” ┌─────────────────────────────────────────┐ │ FreeSDN enterprise layer │ │ │ Orgs / Sites ───────►│ Alert Rules ──► Alerts │ Config Templates │ Event Correlation ──► Incidents │ Device Groups │ SLA Policies ──► Breaches │ Site Groups │ Reconciliation ──► Drift / Auto-fix │ │ Bulk Operations ──► Staged rollout │ │ Health Scores ──► Dashboard │ │ Audit Log ──► Hash chain + export │ │ Notifications ──► 7 channels │ └────────────┬────────────────────────────┘ │ event bus ┌────────────▼────────────────────────────┐ │ 10 FreeSDN modules │ │ Network / Cameras / VoIP / Firewall … │ └─────────────────────────────────────────┘Every feature here is org-scoped: queries filter by user.organization_id at the service
layer. Many endpoints also enforce per-user site grants (UserSiteAccess) so operators can be
restricted to a subset of sites within their org.
Multi-org and MSP model
Section titled “Multi-org and MSP model”FreeSDN uses an Org → Site Group → Site → Device hierarchy. A single installation can host
multiple organisations (tenants). Each org is isolated: every list, get, create, and update
endpoint verifies organization_id before returning data or persisting rows.
Per-user site grants let you give an operator access to exactly the sites they manage - useful
for MSP staff who each own a customer subset. The grant primitive lives in
app/core/site_access.py; the API lives under /api/v1/organizations/{org_id}/site-access.
Role ladder (7 tiers)
Section titled “Role ladder (7 tiers)”| Tier | Score | Typical use |
|---|---|---|
super_admin | 100 | Platform-wide; can see all orgs |
admin | 80 | Org-wide all capabilities |
org_admin | 60 | Org management; audit access |
site_admin | 40 | Full control within assigned sites |
operator | 20 | Day-to-day ops; no user management |
viewer | 10 | Read-only |
guest | 0 | Highly restricted |
Role assignment is strict-lower-than: you cannot assign a role at or above your own level.
Org quotas
Section titled “Org quotas”Quotas are off by default (ENFORCE_ORG_QUOTAS=false). Self-hosted installs are unlimited.
If you run FreeSDN as a SaaS you can enable quotas and assign tiers:
| Tier | Max users | Max sites | Max devices | Audit retention |
|---|---|---|---|---|
| FREE | 3 | 1 | 10 | 7 days |
| STARTER | 10 | 5 | 100 | 30 days |
| PROFESSIONAL | 50 | 20 | 500 | 90 days |
| ENTERPRISE | 500 | 100 | 5,000 | 365 days |
| UNLIMITED | unlimited | unlimited | unlimited | unlimited |
Quotas are enforced atomically (SELECT … FOR UPDATE) to prevent TOCTOU races on concurrent
device adoption or member additions.
Permission model
Section titled “Permission model”Most enterprise endpoints use fine-grained permission scopes. The table below shows the scopes used across the enterprise layer:
| Scope | Used by |
|---|---|
config:read / config:write | Templates, SLA policies, site/device groups, reconcile, bulk ops |
device:read / device:write | Health, lifecycle, topology layouts |
alert:read / alert:create / alert:update / alert:delete | Alert rules and alerts |
event:read / event:write | Correlation rules and incidents |
| ORG_ADMIN or SUPER_ADMIN (role check) | Audit logs, notification providers, org management |
| SUPER_ADMIN only | Audit chain validation |
Audit endpoints use role checks rather than fine-grained scopes because they carry cross-tenant
visibility implications for super_admin operators.
API surface (summary)
Section titled “API surface (summary)”The enterprise API spans several router prefixes. The table below lists prefixes and what lives under each. See the per-feature pages for full endpoint details.
| Prefix | What it contains |
|---|---|
/api/v1/enterprise | Templates, site groups, device groups, device config, lifecycle, health, reconcile, bulk ops, config versions |
/api/v1/sla | SLA policies, breaches, compliance summary, reports, report schedules |
/api/v1/correlation | Correlation rules, incidents, incident events, manual trigger |
/api/v1/alert-rules | Alert rules, alerts, acknowledge/resolve/suppress, evaluate |
/api/v1/notifications | Notification providers, send, in-app notifications, preferences |
/api/v1/audit | Audit logs, security events, activity/security summaries, export, chain validate |
/api/v1/topology | Topology graph, saved layouts, auto-layout |
/api/v1/organizations | Org CRUD, dashboard, site-access grants |
Browse the full platform surface in the interactive OpenAPI docs at /api/v1/docs in non-production environments when ENABLE_DOCS is true (its default). Docs are unconditionally disabled in production regardless of ENABLE_DOCS.
Honest status: what is wired and what is not
Section titled “Honest status: what is wired and what is not”Wired and exercised
Section titled “Wired and exercised”The following features have complete backend + frontend implementations exercised against real hardware or integration tests:
- Three-state device config model (desired / pushed / running)
- Device lifecycle FSM with event emission
- Config template hierarchy with deep-merge and secret redaction
- Device health scores (6-component composite, recomputed every 5 minutes by Celery)
- Reconciliation loop (drift detection, auto-remediate flag, scheduled every 5 minutes)
- Bulk operations with staged rollout and auto-rollback option
- Config version history with diff and rollback
- Event correlation into incidents
- SLA monitoring with per-metric thresholds, breach lifecycle, and compliance summary
- Alert rules engine with multi-channel notification dispatch
- Notification providers (SMTP, Slack, Teams, webhook, Twilio SMS/WhatsApp)
- Topology graph with saved layouts
- Tamper-evident audit log with HMAC-SHA256 hash chain
Partial or unavailable
Section titled “Partial or unavailable”Security posture of the enterprise layer
Section titled “Security posture of the enterprise layer”These points are specific to the enterprise surface. For the platform-wide security model see Security overview.
Cross-tenant IDOR hardening. Every endpoint that accepts a foreign UUID (scope_id, parent_id, site_id, assigned_to) verifies org ownership before inserting or returning data. Foreign probes return 404, not 403, to avoid leaking existence. This behavior is covered by the automated security regression suite.
Audit chain tamper evidence. Each AuditLogRecord carries prev_hash and
row_hmac = HMAC-SHA256(key, prev_hash || canonical_json(record)). Chain validation walks the
full history and reports the first broken link.
Notification provider config. Provider configs are capped at 256 KiB. Display names reject
CR/LF/control characters (header-injection defense). Responses return a redacted
config_summary, not raw credentials.
Alert evaluation cross-tenant fix. The manual evaluate endpoint previously accepted
organization_id from the request body (IDOR). It now always derives the org from the
authenticated user’s JWT. The required permission was raised from alert:create to
alert:update because evaluation consumes notification-channel quotas.
Bulk operation site pre-check. Before queuing a bulk job, _resolve_bulk_target_site_ids
verifies has_site_permission for every target site. Empty resolved target returns 400 (refuses
silent no-op jobs). Foreign site/device_group scope_id returns 404.
Infrastructure health redaction. GET /api/v1/enterprise/health/infrastructure redacts
framework version strings (FastAPI, Pydantic, PostgreSQL, Redis) from non-admin callers to
reduce CVE reconnaissance surface.
Celery beat schedule (enterprise tasks)
Section titled “Celery beat schedule (enterprise tasks)”These background tasks run automatically when the scheduler container (Celery beat) is up:
| Task name | Interval | Queue | What it does |
|---|---|---|---|
enterprise-reconcile-all | Every 5 min | sync | Drift detection + auto-remediate for all managed devices |
enterprise-recompute-health | Every 5 min | metrics | Recompute health scores for all devices |
enterprise-snapshot-daily-health | Nightly 01:00 UTC | default | Store daily health snapshot for trend history |
alert-rules-evaluate-all | Every 3 min | default | Evaluate all enabled alert rules |
alert-rules-auto-resolve | Every 10 min | default | Auto-resolve timed-out alerts |
alert-rules-unsuppress-expired | Every 5 min | default | Lift expired alert suppressions |
sla-evaluate-all | Every 5 min | metrics | Evaluate SLA policies, create/resolve breaches |
correlation-scan-events | Every 5 min | default | Correlate new events into incidents |
correlation-auto-resolve | Every 15 min | default | Auto-resolve stale incidents |
These tasks are registered in celery_app.py and only run when a worker is present on the
relevant queue. The Pro and Max deployment tiers include an io-worker container that handles
the sync and metrics queues.
Deployment tiers and enterprise features
Section titled “Deployment tiers and enterprise features”Enterprise-grade capabilities require the Pro or Max tier. Lite is for homelabs.
| Tier | Env file | Enterprise capabilities |
|---|---|---|
| Lite | .env.lite | Basic monitoring only; 1 worker |
| Pro | .env.pro | Full enterprise + io-worker + Flower monitoring |
| Max | .env.max | Pro + PgBouncer connection pooling + Valkey Sentinel HA + off-site DR |
Valkey Sentinel HA (Max tier) provides automatic cache/broker failover (~5 s detection, ~9 s promotion). PostgreSQL standby is available in Max but failover is manual - automatic PostgreSQL failover is not available in this release. Operators must manually promote the standby.
See Deployment overview for exact docker compose commands.
Relevant environment variables
Section titled “Relevant environment variables”| Variable | Default | Purpose |
|---|---|---|
ENFORCE_ORG_QUOTAS | false | Enable SaaS-style org tier quotas |
AUDIT_HMAC_KEY | (falls back to SECRET_KEY) | HMAC key for audit chain; set explicitly for key rotation |
PUBLIC_BASE_URL | http://localhost:8000 | Externally-reachable URL for this FreeSDN instance; override to your production domain. Used in notification action URLs and agent WebSocket config. |
All variables use the bare name (no FREESDN_ prefix).
Next steps
Section titled “Next steps”- SLA monitoring - define policies, review breaches, generate reports
- Alert rules - create rules, manage the alert lifecycle, configure notification routing
- Event correlation - pattern-match events into incidents, assign and resolve
- Notifications - configure SMTP, Slack, Teams, webhook, and SMS providers
- Config templates - author hierarchical device config templates with secret handling
- Device lifecycle - move devices through the FSM from discovery to decommission
- Health dashboard - read composite health scores, site ranking, and trend history
- Reconciliation - understand drift detection and trigger on-demand reconciliation
- Bulk operations - run staged firmware upgrades and config pushes across device groups
- Audit log - query the hash-chain audit trail, export, and validate chain integrity
- Topology - view and persist L2/L3 network maps
- Organizations - manage tenants, per-user site grants, and org quotas