Skip to content

SLA Management

SLA Management lets you codify your uptime and availability expectations as named policies, evaluate actual device and network metrics against those expectations continuously, and act on any deviation before it becomes a user-visible problem.

Policies attach to a scope - an organization, site, site group, device group, SSID, camera, or NVR - and carry a set of metric thresholds. Every five minutes Celery evaluates every active policy for your organization and raises a breach when a threshold is violated. You acknowledge, track, and resolve breaches from the same interface.


The Celery beat task sla-evaluate-all runs every five minutes on the metrics queue (rate-limited to 2 per minute). It calls SLAMonitoringService.evaluate_all_policies for your organization, which:

  1. Loads every active policy and its thresholds dict (metric_name → threshold_value).
  2. For each non-null threshold, computes the deviation percentage against the current actual.
  3. Calls _is_threshold_violated and, on a violation, persists an SLABreach row and publishes sla.breach.created on the event bus.
  4. On recovery, publishes sla.breach.resolved and updates the breach status automatically.

You can also trigger evaluation manually at any time (see Endpoints).

The sla.breach.created event is always published at HIGH priority. The sla.breach.acknowledged event priority reflects breach severity: critical → CRITICAL, warning → NORMAL (default fallback). The sla.breach.resolved event is published at NORMAL priority. Breach severity is either warning (deviation ≤ 20%) or critical (deviation > 20%).


Every policy targets exactly one scope. When scope is organization, scope_id must be omitted. For all other scopes except ssid, scope_id is the UUID of the target entity and is required. The ssid scope is an exception - scope_id may be omitted (the SSID name is stored in the separate scope_name field instead).

Scopescope_id requiredTypical use
organizationNoOrg-wide baseline thresholds
siteYes - site UUIDPer-location SLA
site_groupYes - site group UUIDRegional / campus grouping
device_groupYes - device group UUIDPer-fleet (e.g. all APs)
ssidOptionalWi-Fi availability per SSID name
cameraYes - camera UUIDPer-camera uptime
nvrYes - NVR UUIDPer-NVR availability

Navigate to Enterprise → SLA in the UI and click New Policy, or use the API directly.

Required body fields:

FieldTypeNotes
namestringHuman-readable label
scopestringOne of the scope values above
scope_idUUID or omitRequired for all scopes except organization
thresholdsobject{ "uptime_percent_min": 99.9, "latency_ms_max": 50 } - strictly typed; valid keys: uptime_percent_min, latency_ms_max, packet_loss_percent_max, health_score_min, client_satisfaction_min, error_rate_max

Optional fields:

FieldTypeNotes
descriptionstringFree-text explanation
statusstringactive (default), disabled, or draft; set disabled to suspend evaluation without deleting - PATCH only, not accepted on POST

Example request:

POST /api/v1/sla/policies
Content-Type: application/json
Authorization: Bearer <token>
{
"name": "Core Network Uptime",
"scope": "site",
"scope_id": "a1b2c3d4-...",
"thresholds": {
"uptime_percent_min": 99.5,
"packet_loss_percent_max": 1.0,
"latency_ms_max": 100
}
}

All paths are under the prefix /api/v1/sla. Both config:read and config:write permissions are required at the appropriate tier - see the RBAC reference for the role-to-permission mapping.

MethodPathPurposePermission
GET/api/v1/sla/summaryOrg-wide compliance summary (pass site_id to scope)config:read
GET/api/v1/sla/policiesList policies - filter by site_id, scope, status; limit≤200config:read
POST/api/v1/sla/policiesCreate a policyconfig:write
GET/api/v1/sla/policies/{policy_id}Retrieve one policyconfig:read
PATCH/api/v1/sla/policies/{policy_id}Update (scope re-verified on change)config:write
DELETE/api/v1/sla/policies/{policy_id}Delete a policyconfig:write
MethodPathPurposePermission
GET/api/v1/sla/breachesList breaches - filter by site_id, policy_id, status; limit≤200config:read
POST/api/v1/sla/breaches/{breach_id}/acknowledgeAcknowledge with an optional noteconfig:write
POST/api/v1/sla/evaluateManually trigger evaluation for your orgconfig:write

When a policy is violated, the platform:

  1. Persists an SLABreach row with the policy, scope, metric, actual value, threshold, and severity.
  2. Publishes sla.breach.created on the event bus - any connected alert rule or notification provider will fire if configured to listen for this event type.
  3. Updates the breach status to resolved and publishes sla.breach.resolved automatically once the metric recovers.

Acknowledge a breach to record that a human has reviewed it:

POST /api/v1/sla/breaches/{breach_id}/acknowledge
Content-Type: application/json
{
"notes": "Investigating upstream ISP packet loss."
}

The acknowledgement publishes sla.breach.acknowledged on the event bus.

Filtering the breach list:

GET /api/v1/sla/breaches?status=active&site_id=<uuid>&limit=50

Valid status values: active, acknowledged, resolved.


GET /api/v1/sla/summary (optionally scoped with ?site_id=<uuid>) returns an org-wide view showing:

  • Total active policies
  • Policies currently in breach
  • Breach counts by severity
  • Recent breach history

Use this endpoint to power a management dashboard or feed into a periodic report.


The report engine at /api/v1/sla/reports generates on-demand SLA compliance reports in PDF or CSV format.

POST /api/v1/sla/reports/generate
Content-Type: application/json
{
"period_start": "2026-05-01T00:00:00Z",
"period_end": "2026-06-01T00:00:00Z",
"policy_ids": ["<uuid>", "<uuid>"],
"format": "pdf",
"title": "May 2026 SLA Report"
}

Constraints enforced by the server:

  • period_start must be before period_end
  • Period length must be 366 days or less
  • format must be pdf or csv

Once generated, download the file with:

GET /api/v1/sla/reports/{report_id}/download

If the file has not yet been rendered, the endpoint returns the report data as inline JSON instead. The download path is path-traversal guarded - the resolved file path must fall inside REPORTS_BASE_DIR or the request is rejected with 403.

MethodPathPurposePermission
POST/api/v1/sla/reports/generateGenerate on demandconfig:read
GET/api/v1/sla/reportsList generated reports (limit≤200)config:read
GET/api/v1/sla/reports/{report_id}/downloadDownload file or inline JSONconfig:read
MethodPathPurposePermission
GET/api/v1/sla/report-schedulesList schedulesconfig:read
POST/api/v1/sla/report-schedulesCreate a scheduleconfig:write
PUT/api/v1/sla/report-schedules/{schedule_id}Update a scheduleconfig:write
DELETE/api/v1/sla/report-schedules/{schedule_id}Delete a scheduleconfig:write

PermissionWho can hold itWhat it unlocks
config:readviewer and above (site-scoped by grant)Read policies, breaches, summary, reports
config:writesite_admin and aboveCreate/edit/delete policies; acknowledge breaches; trigger evaluation; create reports and schedules

Role-to-permission mapping is defined in the Roles and permissions reference.


Connecting SLA to other enterprise features

Section titled “Connecting SLA to other enterprise features”
  • Alert Rules - create a rule that fires on sla.breach.created to route breach notifications to any configured notification channel.
  • Health Dashboard - the SLAComplianceCard component on the health dashboard (/health) pulls from GET /api/v1/sla/summary and shows a high-level compliance rate alongside device health scores.
  • Event Correlation - SLA breach events appear in the correlation engine’s event stream; you can write a correlation rule that groups multiple simultaneous breaches into a single incident.
  • Notifications - configure an SMTP, Slack webhook, or Teams webhook provider under Enterprise → Notification Providers, then attach it to an alert rule targeting SLA breach events.

  • Every scope_id supplied at create or update time is verified to be owned by your organization before the operation is committed. Cross-tenant probes receive a 404.
  • All policy and breach reads are org-scoped - you cannot read another organization’s SLA data.
  • The manual /evaluate endpoint derives the target organization from the authenticated user’s JWT. It cannot be directed at another organization’s policies.
  • Multi-tenant isolation is enforced at the application layer. There is no PostgreSQL row-level security - isolation is the responsibility of the service and endpoint code, which has been independently reviewed.

  • Alert Rules - wire breach events to notifications and auto-resolve workflows
  • Health Dashboard (/health) - view SLA compliance alongside device health scores
  • Event Correlation - group breach events into incidents for coordinated response
  • Notification Providers - configure SMTP, Slack, Teams, and webhook delivery