Enterprise Overview

FreeSDN is a self-hosted, multi-tenant SDN controller. The enterprise layer is the operational plane that sits above individual modules: it lets you manage many organisations and sites from one installation, enforce SLAs, correlate events into incidents, template device configs across a site hierarchy, run bulk firmware upgrades with staged rollout, and produce tamper-evident audit trails - all without a vendor cloud in the loop.

This page maps what exists today. Read it before diving into the per-feature pages.

What the enterprise layer covers

Area	Short description	Status
Multi-org / MSP	Org to Site Group to Site hierarchy; per-user site grants; org quotas	Available
SLA monitoring	Policy thresholds, breach detection, acknowledge, compliance summary	Available
SLA reports	On-demand PDF/CSV generation; scheduled report records	Partial - see caveat
Alert rules engine	Event-bus rule engine to Alerts to multi-channel notifications	Available
Event correlation	Pattern-match events into Incidents; lifecycle + assignment	Available
Notifications	Email, Slack, Teams, webhook, in-app, SMS, and WhatsApp	Available
Audit log + tamper evidence	HMAC-SHA256 hash chain; export; security events	Available
Config templates	Hierarchical templates (org to site_group to site to device_group)	Available
Device lifecycle FSM	Discovered to managed to decommissioned state model	Available
Device health scores	Composite health score with org/site/device rollup	Available
Reconciliation	Drift detection + auto-remediate loop; on-demand + scheduled	Available
Bulk operations	Reboot / push-config / firmware-update with staged rollout	Available
Topology map	L2/L3 graph per site or org; saved layouts; auto-layout algorithms	Available
Config version history	Snapshot, diff, rollback per device	Available
Streaming telemetry	Agent WebSocket present; no sub-second gRPC pipeline	Partial
Scheduled report delivery	Schedules persist in DB; no Celery runner executes them	Partial - see caveat

Architecture: how these features relate

                       ┌─────────────────────────────────────────┐
                       │          FreeSDN enterprise layer        │
                       │                                          │
  Orgs / Sites ───────►│  Alert Rules ──► Alerts                  │
  Config Templates     │  Event Correlation ──► Incidents         │
  Device Groups        │  SLA Policies ──► Breaches               │
  Site Groups          │  Reconciliation ──► Drift / Auto-fix      │
                       │  Bulk Operations ──► Staged rollout       │
                       │  Health Scores ──► Dashboard              │
                       │  Audit Log ──► Hash chain + export        │
                       │  Notifications ──► 7 channels             │
                       └────────────┬────────────────────────────┘
                                    │ event bus
                       ┌────────────▼────────────────────────────┐
                       │        10 FreeSDN modules                 │
                       │  Network / Cameras / VoIP / Firewall …   │
                       └─────────────────────────────────────────┘

Every feature here is org-scoped: queries filter by user.organization_id at the service layer. Many endpoints also enforce per-user site grants (UserSiteAccess) so operators can be restricted to a subset of sites within their org.

Multi-org and MSP model

FreeSDN uses an Org → Site Group → Site → Device hierarchy. A single installation can host multiple organisations (tenants). Each org is isolated: every list, get, create, and update endpoint verifies organization_id before returning data or persisting rows.

Per-user site grants let you give an operator access to exactly the sites they manage - useful for MSP staff who each own a customer subset. The grant primitive lives in app/core/site_access.py; the API lives under /api/v1/organizations/{org_id}/site-access.

Role ladder (7 tiers)

Tier	Score	Typical use
`super_admin`	100	Platform-wide; can see all orgs
`admin`	80	Org-wide all capabilities
`org_admin`	60	Org management; audit access
`site_admin`	40	Full control within assigned sites
`operator`	20	Day-to-day ops; no user management
`viewer`	10	Read-only
`guest`	0	Highly restricted

Role assignment is strict-lower-than: you cannot assign a role at or above your own level.

Org quotas

Quotas are off by default (ENFORCE_ORG_QUOTAS=false). Self-hosted installs are unlimited. If you run FreeSDN as a SaaS you can enable quotas and assign tiers:

Tier	Max users	Max sites	Max devices	Audit retention
FREE	3	1	10	7 days
STARTER	10	5	100	30 days
PROFESSIONAL	50	20	500	90 days
ENTERPRISE	500	100	5,000	365 days
UNLIMITED	unlimited	unlimited	unlimited	unlimited

Quotas are enforced atomically (SELECT … FOR UPDATE) to prevent TOCTOU races on concurrent device adoption or member additions.

Permission model

Most enterprise endpoints use fine-grained permission scopes. The table below shows the scopes used across the enterprise layer:

Scope	Used by
`config:read` / `config:write`	Templates, SLA policies, site/device groups, reconcile, bulk ops
`device:read` / `device:write`	Health, lifecycle, topology layouts
`alert:read` / `alert:create` / `alert:update` / `alert:delete`	Alert rules and alerts
`event:read` / `event:write`	Correlation rules and incidents
ORG_ADMIN or SUPER_ADMIN (role check)	Audit logs, notification providers, org management
SUPER_ADMIN only	Audit chain validation

Audit endpoints use role checks rather than fine-grained scopes because they carry cross-tenant visibility implications for super_admin operators.

API surface (summary)

The enterprise API spans several router prefixes. The table below lists prefixes and what lives under each. See the per-feature pages for full endpoint details.

Prefix	What it contains
`/api/v1/enterprise`	Templates, site groups, device groups, device config, lifecycle, health, reconcile, bulk ops, config versions
`/api/v1/sla`	SLA policies, breaches, compliance summary, reports, report schedules
`/api/v1/correlation`	Correlation rules, incidents, incident events, manual trigger
`/api/v1/alert-rules`	Alert rules, alerts, acknowledge/resolve/suppress, evaluate
`/api/v1/notifications`	Notification providers, send, in-app notifications, preferences
`/api/v1/audit`	Audit logs, security events, activity/security summaries, export, chain validate
`/api/v1/topology`	Topology graph, saved layouts, auto-layout
`/api/v1/organizations`	Org CRUD, dashboard, site-access grants

Browse the full platform surface in the interactive OpenAPI docs at /api/v1/docs in non-production environments when ENABLE_DOCS is true (its default). Docs are unconditionally disabled in production regardless of ENABLE_DOCS.

Honest status: what is wired and what is not

Wired and exercised

The following features have complete backend + frontend implementations exercised against real hardware or integration tests:

Three-state device config model (desired / pushed / running)
Device lifecycle FSM with event emission
Config template hierarchy with deep-merge and secret redaction
Device health scores (6-component composite, recomputed every 5 minutes by Celery)
Reconciliation loop (drift detection, auto-remediate flag, scheduled every 5 minutes)
Bulk operations with staged rollout and auto-rollback option
Config version history with diff and rollback
Event correlation into incidents
SLA monitoring with per-metric thresholds, breach lifecycle, and compliance summary
Alert rules engine with multi-channel notification dispatch
Notification providers (SMTP, Slack, Teams, webhook, Twilio SMS/WhatsApp)
Topology graph with saved layouts
Tamper-evident audit log with HMAC-SHA256 hash chain

Partial or unavailable

Security posture of the enterprise layer

These points are specific to the enterprise surface. For the platform-wide security model see Security overview.

Cross-tenant IDOR hardening. Every endpoint that accepts a foreign UUID (scope_id, parent_id, site_id, assigned_to) verifies org ownership before inserting or returning data. Foreign probes return 404, not 403, to avoid leaking existence. This behavior is covered by the automated security regression suite.

Audit chain tamper evidence. Each AuditLogRecord carries prev_hash and row_hmac = HMAC-SHA256(key, prev_hash || canonical_json(record)). Chain validation walks the full history and reports the first broken link.

Notification provider config. Provider configs are capped at 256 KiB. Display names reject CR/LF/control characters (header-injection defense). Responses return a redacted config_summary, not raw credentials.

Alert evaluation cross-tenant fix. The manual evaluate endpoint previously accepted organization_id from the request body (IDOR). It now always derives the org from the authenticated user’s JWT. The required permission was raised from alert:create to alert:update because evaluation consumes notification-channel quotas.

Bulk operation site pre-check. Before queuing a bulk job, _resolve_bulk_target_site_ids verifies has_site_permission for every target site. Empty resolved target returns 400 (refuses silent no-op jobs). Foreign site/device_group scope_id returns 404.

Infrastructure health redaction. GET /api/v1/enterprise/health/infrastructure redacts framework version strings (FastAPI, Pydantic, PostgreSQL, Redis) from non-admin callers to reduce CVE reconnaissance surface.

Celery beat schedule (enterprise tasks)

These background tasks run automatically when the scheduler container (Celery beat) is up:

Task name	Interval	Queue	What it does
`enterprise-reconcile-all`	Every 5 min	sync	Drift detection + auto-remediate for all managed devices
`enterprise-recompute-health`	Every 5 min	metrics	Recompute health scores for all devices
`enterprise-snapshot-daily-health`	Nightly 01:00 UTC	default	Store daily health snapshot for trend history
`alert-rules-evaluate-all`	Every 3 min	default	Evaluate all enabled alert rules
`alert-rules-auto-resolve`	Every 10 min	default	Auto-resolve timed-out alerts
`alert-rules-unsuppress-expired`	Every 5 min	default	Lift expired alert suppressions
`sla-evaluate-all`	Every 5 min	metrics	Evaluate SLA policies, create/resolve breaches
`correlation-scan-events`	Every 5 min	default	Correlate new events into incidents
`correlation-auto-resolve`	Every 15 min	default	Auto-resolve stale incidents

These tasks are registered in celery_app.py and only run when a worker is present on the relevant queue. The Pro and Max deployment tiers include an io-worker container that handles the sync and metrics queues.

Deployment tiers and enterprise features

Enterprise-grade capabilities require the Pro or Max tier. Lite is for homelabs.

Tier	Env file	Enterprise capabilities
Lite	`.env.lite`	Basic monitoring only; 1 worker
Pro	`.env.pro`	Full enterprise + io-worker + Flower monitoring
Max	`.env.max`	Pro + PgBouncer connection pooling + Valkey Sentinel HA + off-site DR

Valkey Sentinel HA (Max tier) provides automatic cache/broker failover (~5 s detection, ~9 s promotion). PostgreSQL standby is available in Max but failover is manual - automatic PostgreSQL failover is not available in this release. Operators must manually promote the standby.

See Deployment overview for exact docker compose commands.

Relevant environment variables

Variable	Default	Purpose
`ENFORCE_ORG_QUOTAS`	`false`	Enable SaaS-style org tier quotas
`AUDIT_HMAC_KEY`	(falls back to `SECRET_KEY`)	HMAC key for audit chain; set explicitly for key rotation
`PUBLIC_BASE_URL`	`http://localhost:8000`	Externally-reachable URL for this FreeSDN instance; override to your production domain. Used in notification action URLs and agent WebSocket config.

All variables use the bare name (no FREESDN_ prefix).

Next steps

SLA monitoring - define policies, review breaches, generate reports
Alert rules - create rules, manage the alert lifecycle, configure notification routing
Event correlation - pattern-match events into incidents, assign and resolve
Notifications - configure SMTP, Slack, Teams, webhook, and SMS providers
Config templates - author hierarchical device config templates with secret handling
Device lifecycle - move devices through the FSM from discovery to decommission
Health dashboard - read composite health scores, site ranking, and trend history
Reconciliation - understand drift detection and trigger on-demand reconciliation
Bulk operations - run staged firmware upgrades and config pushes across device groups
Audit log - query the hash-chain audit trail, export, and validate chain integrity
Topology - view and persist L2/L3 network maps
Organizations - manage tenants, per-user site grants, and org quotas