SLA Management
SLA Management lets you codify your uptime and availability expectations as named policies, evaluate actual device and network metrics against those expectations continuously, and act on any deviation before it becomes a user-visible problem.
Policies attach to a scope - an organization, site, site group, device group, SSID, camera, or NVR - and carry a set of metric thresholds. Every five minutes Celery evaluates every active policy for your organization and raises a breach when a threshold is violated. You acknowledge, track, and resolve breaches from the same interface.
How evaluation works
Section titled “How evaluation works”The Celery beat task sla-evaluate-all runs every five minutes on the metrics queue (rate-limited to 2 per minute). It calls SLAMonitoringService.evaluate_all_policies for your organization, which:
- Loads every active policy and its
thresholdsdict (metric_name → threshold_value). - For each non-null threshold, computes the deviation percentage against the current actual.
- Calls
_is_threshold_violatedand, on a violation, persists anSLABreachrow and publishessla.breach.createdon the event bus. - On recovery, publishes
sla.breach.resolvedand updates the breach status automatically.
You can also trigger evaluation manually at any time (see Endpoints).
The sla.breach.created event is always published at HIGH priority. The sla.breach.acknowledged event priority reflects breach severity: critical → CRITICAL, warning → NORMAL (default fallback). The sla.breach.resolved event is published at NORMAL priority. Breach severity is either warning (deviation ≤ 20%) or critical (deviation > 20%).
Policy scopes
Section titled “Policy scopes”Every policy targets exactly one scope. When scope is organization, scope_id must be omitted. For all other scopes except ssid, scope_id is the UUID of the target entity and is required. The ssid scope is an exception - scope_id may be omitted (the SSID name is stored in the separate scope_name field instead).
| Scope | scope_id required | Typical use |
|---|---|---|
organization | No | Org-wide baseline thresholds |
site | Yes - site UUID | Per-location SLA |
site_group | Yes - site group UUID | Regional / campus grouping |
device_group | Yes - device group UUID | Per-fleet (e.g. all APs) |
ssid | Optional | Wi-Fi availability per SSID name |
camera | Yes - camera UUID | Per-camera uptime |
nvr | Yes - NVR UUID | Per-NVR availability |
Creating a policy
Section titled “Creating a policy”Navigate to Enterprise → SLA in the UI and click New Policy, or use the API directly.
Required body fields:
| Field | Type | Notes |
|---|---|---|
name | string | Human-readable label |
scope | string | One of the scope values above |
scope_id | UUID or omit | Required for all scopes except organization |
thresholds | object | { "uptime_percent_min": 99.9, "latency_ms_max": 50 } - strictly typed; valid keys: uptime_percent_min, latency_ms_max, packet_loss_percent_max, health_score_min, client_satisfaction_min, error_rate_max |
Optional fields:
| Field | Type | Notes |
|---|---|---|
description | string | Free-text explanation |
status | string | active (default), disabled, or draft; set disabled to suspend evaluation without deleting - PATCH only, not accepted on POST |
Example request:
POST /api/v1/sla/policiesContent-Type: application/jsonAuthorization: Bearer <token>
{ "name": "Core Network Uptime", "scope": "site", "scope_id": "a1b2c3d4-...", "thresholds": { "uptime_percent_min": 99.5, "packet_loss_percent_max": 1.0, "latency_ms_max": 100 }}Endpoints
Section titled “Endpoints”All paths are under the prefix /api/v1/sla. Both config:read and config:write permissions are required at the appropriate tier - see the RBAC reference for the role-to-permission mapping.
Policy management
Section titled “Policy management”| Method | Path | Purpose | Permission |
|---|---|---|---|
| GET | /api/v1/sla/summary | Org-wide compliance summary (pass site_id to scope) | config:read |
| GET | /api/v1/sla/policies | List policies - filter by site_id, scope, status; limit≤200 | config:read |
| POST | /api/v1/sla/policies | Create a policy | config:write |
| GET | /api/v1/sla/policies/{policy_id} | Retrieve one policy | config:read |
| PATCH | /api/v1/sla/policies/{policy_id} | Update (scope re-verified on change) | config:write |
| DELETE | /api/v1/sla/policies/{policy_id} | Delete a policy | config:write |
Breach management
Section titled “Breach management”| Method | Path | Purpose | Permission |
|---|---|---|---|
| GET | /api/v1/sla/breaches | List breaches - filter by site_id, policy_id, status; limit≤200 | config:read |
| POST | /api/v1/sla/breaches/{breach_id}/acknowledge | Acknowledge with an optional note | config:write |
| POST | /api/v1/sla/evaluate | Manually trigger evaluation for your org | config:write |
Tracking and acknowledging breaches
Section titled “Tracking and acknowledging breaches”When a policy is violated, the platform:
- Persists an
SLABreachrow with the policy, scope, metric, actual value, threshold, and severity. - Publishes
sla.breach.createdon the event bus - any connected alert rule or notification provider will fire if configured to listen for this event type. - Updates the breach status to
resolvedand publishessla.breach.resolvedautomatically once the metric recovers.
Acknowledge a breach to record that a human has reviewed it:
POST /api/v1/sla/breaches/{breach_id}/acknowledgeContent-Type: application/json
{ "notes": "Investigating upstream ISP packet loss."}The acknowledgement publishes sla.breach.acknowledged on the event bus.
Filtering the breach list:
GET /api/v1/sla/breaches?status=active&site_id=<uuid>&limit=50Valid status values: active, acknowledged, resolved.
Compliance summary
Section titled “Compliance summary”GET /api/v1/sla/summary (optionally scoped with ?site_id=<uuid>) returns an org-wide view showing:
- Total active policies
- Policies currently in breach
- Breach counts by severity
- Recent breach history
Use this endpoint to power a management dashboard or feed into a periodic report.
Reports and schedules
Section titled “Reports and schedules”The report engine at /api/v1/sla/reports generates on-demand SLA compliance reports in PDF or CSV format.
Generating a report
Section titled “Generating a report”POST /api/v1/sla/reports/generateContent-Type: application/json
{ "period_start": "2026-05-01T00:00:00Z", "period_end": "2026-06-01T00:00:00Z", "policy_ids": ["<uuid>", "<uuid>"], "format": "pdf", "title": "May 2026 SLA Report"}Constraints enforced by the server:
period_startmust be beforeperiod_end- Period length must be 366 days or less
formatmust bepdforcsv
Once generated, download the file with:
GET /api/v1/sla/reports/{report_id}/downloadIf the file has not yet been rendered, the endpoint returns the report data as inline JSON instead. The download path is path-traversal guarded - the resolved file path must fall inside REPORTS_BASE_DIR or the request is rejected with 403.
Report endpoints
Section titled “Report endpoints”| Method | Path | Purpose | Permission |
|---|---|---|---|
| POST | /api/v1/sla/reports/generate | Generate on demand | config:read |
| GET | /api/v1/sla/reports | List generated reports (limit≤200) | config:read |
| GET | /api/v1/sla/reports/{report_id}/download | Download file or inline JSON | config:read |
Schedule endpoints
Section titled “Schedule endpoints”| Method | Path | Purpose | Permission |
|---|---|---|---|
| GET | /api/v1/sla/report-schedules | List schedules | config:read |
| POST | /api/v1/sla/report-schedules | Create a schedule | config:write |
| PUT | /api/v1/sla/report-schedules/{schedule_id} | Update a schedule | config:write |
| DELETE | /api/v1/sla/report-schedules/{schedule_id} | Delete a schedule | config:write |
Permissions reference
Section titled “Permissions reference”| Permission | Who can hold it | What it unlocks |
|---|---|---|
config:read | viewer and above (site-scoped by grant) | Read policies, breaches, summary, reports |
config:write | site_admin and above | Create/edit/delete policies; acknowledge breaches; trigger evaluation; create reports and schedules |
Role-to-permission mapping is defined in the Roles and permissions reference.
Connecting SLA to other enterprise features
Section titled “Connecting SLA to other enterprise features”- Alert Rules - create a rule that fires on
sla.breach.createdto route breach notifications to any configured notification channel. - Health Dashboard - the
SLAComplianceCardcomponent on the health dashboard (/health) pulls fromGET /api/v1/sla/summaryand shows a high-level compliance rate alongside device health scores. - Event Correlation - SLA breach events appear in the correlation engine’s event stream; you can write a correlation rule that groups multiple simultaneous breaches into a single incident.
- Notifications - configure an SMTP, Slack webhook, or Teams webhook provider under Enterprise → Notification Providers, then attach it to an alert rule targeting SLA breach events.
Security notes
Section titled “Security notes”- Every
scope_idsupplied at create or update time is verified to be owned by your organization before the operation is committed. Cross-tenant probes receive a 404. - All policy and breach reads are org-scoped - you cannot read another organization’s SLA data.
- The manual
/evaluateendpoint derives the target organization from the authenticated user’s JWT. It cannot be directed at another organization’s policies. - Multi-tenant isolation is enforced at the application layer. There is no PostgreSQL row-level security - isolation is the responsibility of the service and endpoint code, which has been independently reviewed.
Next steps
Section titled “Next steps”- Alert Rules - wire breach events to notifications and auto-resolve workflows
- Health Dashboard (
/health) - view SLA compliance alongside device health scores - Event Correlation - group breach events into incidents for coordinated response
- Notification Providers - configure SMTP, Slack, Teams, and webhook delivery