Alerting & Notifications

The alerting subsystem has two collaborating parts:

Alert rules engine - evaluates rules against the internal event bus every 3 minutes and creates Alert records when conditions are met.
Notification providers - deliver those alerts (and any other dispatch call) to one or more external channels concurrently.

Together they replace ad-hoc email scripts and give you a single place to define what to watch, who to tell, and how.

Alert rules

What a rule is

An AlertRule subscribes to a named event type (or pattern) on the internal event bus. When the evaluator sees a matching event, it creates an Alert record and fans out notifications through every configured provider.

Each rule carries:

Field	Description
`name`	Human-readable label shown in the UI
`description`	Optional context for operators
`rule_type`	`threshold`, `pattern`, `anomaly`, or `custom` - selects the evaluator
`scope`	Targeting scope - one of `organization`, `site`, `device_group`, `device`
`scope_ids`	JSONB list of UUIDs for the chosen scope (omit for `organization`)
`severity`	`info`, `warning`, or `critical` - maps to notification priority
`status`	`active`, `disabled`, or `draft` - disabled/draft rules are skipped by the evaluator
`conditions`	JSONB dict whose keys depend on `rule_type` (see below)
`auto_resolve_after_seconds`	If set, the evaluator resolves the alert automatically after this many seconds (integer, minimum 60)
`notification_channels`	JSONB dict mapping channel name to channel-specific config (e.g. `{"email": {"to": ["ops@example.com"]}, "slack": {"channel": "#alerts"}}`). Valid channel keys: `email`, `slack`, `teams`, `webhook`, `in_app`, `sms`, `whatsapp`.

conditions structure by rule_type:

`rule_type`	Required keys in `conditions`	Example
`threshold`	`metric`, `operator` (`>`,`<`,`>=`,`<=`,`==`,`!=`), `value`	`{"metric": "cpu_utilization", "operator": ">", "value": 90}`
`pattern`	`event_type` (glob, e.g. `device.offline` or `device.*`), `min_count`	`{"event_type": "device.offline", "min_count": 3}`
`anomaly`	`metric`, `std_dev_threshold`	`{"metric": "traffic_in", "std_dev_threshold": 3.0}`

Scope and scope_ids

scope narrows which resources a rule fires for. The table below shows valid combinations:

scope	scope_ids content	Example use
`organization`	omit (must be empty)	Fire for anything in your org
`site`	list of site UUIDs	Fire only for events on specific sites
`device_group`	list of device-group UUIDs	Fire for a named group of devices
`device`	list of device UUIDs	Fire for exact devices

Before a rule is saved, the backend calls _verify_scope_ids - every UUID in scope_ids is verified to belong to the caller’s organisation. A foreign or non-existent UUID returns 404 (not 403) to avoid leaking existence.

Severities

Severity	Typical use
`info`	Low-priority informational events, resolved events
`warning`	Default; degraded performance, capacity thresholds
`critical`	Service-affecting outages, security events

Alert lifecycle

An alert moves through these states:

firing
  ├─► acknowledged   (operator marks seen)
  ├─► resolved       (operator or auto-resolve clears it)
  └─► suppressed     (snooze for N minutes)

Suppress takes a suppress_minutes value and an optional reason. When the suppression expires, the alert-rules-unsuppress-expired Celery task lifts it automatically (runs every 5 minutes).
Auto-resolve runs via alert-rules-auto-resolve every 10 minutes. Rules with auto_resolve_after_seconds set will be closed without manual intervention.
Acknowledge accepts an optional free-text note stored on the record.

Background evaluation schedule

Celery task	Interval	What it does
`alert-rules-evaluate-all`	Every 3 min	Runs the full rule set for your org
`alert-rules-auto-resolve`	Every 10 min	Resolves timed-out alerts
`alert-rules-unsuppress-expired`	Every 5 min	Lifts expired suppressions

These tasks run on the default Celery queue. If the worker container is down, no evaluation happens until it recovers - there is no fallback evaluator.

You can also trigger evaluation manually: POST /api/v1/alert-rules/evaluate.

Alert rules API

All endpoints are under the prefix /api/v1/alert-rules. Fine-grained permission scopes are used throughout.

Rules CRUD

Method	Path	Purpose	Permission
GET	`/api/v1/alert-rules/rules`	List rules (`status?`, `type?`, `site_id?`)	`alert:read`
POST	`/api/v1/alert-rules/rules`	Create rule (verifies scope_ids)	`alert:create`
GET	`/api/v1/alert-rules/rules/{rule_id}`	Get one rule	`alert:read`
PATCH	`/api/v1/alert-rules/rules/{rule_id}`	Update rule (re-verifies scope_ids)	`alert:update`
DELETE	`/api/v1/alert-rules/rules/{rule_id}`	Soft-delete rule	`alert:delete`
GET	`/api/v1/alert-rules/stats`	Rule + alert statistics (`site_id?`)	`alert:read`

Alert lifecycle actions

Method	Path	Purpose	Permission
GET	`/api/v1/alert-rules/alerts`	List alerts (`status?`, `severity?`, `rule_id?`, `site_id?`, `limit≤200`)	`alert:read`
GET	`/api/v1/alert-rules/alerts/{alert_id}`	Get one alert	`alert:read`
POST	`/api/v1/alert-rules/alerts/{alert_id}/acknowledge`	Acknowledge with optional note	`alert:update`
POST	`/api/v1/alert-rules/alerts/{alert_id}/resolve`	Resolve alert	`alert:update`
POST	`/api/v1/alert-rules/alerts/{alert_id}/suppress`	Suppress for N minutes	`alert:update`
POST	`/api/v1/alert-rules/evaluate`	Manually evaluate all rules now	`alert:update`

Creating your first alert rule

Open Alert Rules in the sidebar (route /alert-rules).
Click New rule.
Set name, event_type, and severity.
Choose a scope. For a site-scoped rule, select one or more sites from the picker - their UUIDs populate scope_ids.
Optionally set auto_resolve_after_seconds if the alert should self-clear.
Under Channels, tick the delivery channels you want (email, Slack, etc.). Each must have a configured provider - see Notification providers below.
Save. The rule is active immediately; the next evaluator cycle (within 3 minutes) will pick it up.

To test without waiting, call:

POST /api/v1/alert-rules/evaluate
Authorization: Bearer <token>

Then check /api/v1/alert-rules/alerts for any newly fired alerts.

Notification providers

A notification provider is a named, stored delivery configuration for one channel. You can have multiple providers for the same channel (e.g. two Slack workbooks, one per team).

Supported channels and provider types

Channel	Provider type	Auth model
Email	`smtp`	SMTP server + credentials
Slack	`slack_webhook`	Incoming webhook URL
Microsoft Teams	`teams_webhook`	Incoming webhook URL
Webhook (generic)	`generic_webhook`	URL + optional HMAC secret + custom headers
In-app	built-in	No external config needed
SMS	`twilio_sms`	Twilio Account SID + Auth Token + from-number
WhatsApp	`twilio_whatsapp`	Twilio Account SID + Auth Token + from-number

Fetch the full config schema for any provider type - including required fields and validation rules - from:

GET /api/v1/notifications/providers/types

This returns {type, name, channel, icon, config_schema} for each supported type.

Security constraints on provider config

Provider config blobs are capped at 256 KiB per record.
Display names reject CR, LF, and other control characters (header-injection defense).
API responses return a redacted config_summary, never raw credentials.
Generic webhook HMAC secrets are stored encrypted; the HMAC is computed server-side on dispatch.

Providers API

Provider management requires ORG_ADMIN or SUPER_ADMIN.

Method	Path	Purpose
GET	`/api/v1/notifications/providers`	List providers (`channel?`, `enabled_only?`)
GET	`/api/v1/notifications/providers/types`	Supported types with config schemas
POST	`/api/v1/notifications/providers`	Create provider
GET	`/api/v1/notifications/providers/{provider_id}`	Get provider (config summary only)
PUT	`/api/v1/notifications/providers/{provider_id}`	Update provider
DELETE	`/api/v1/notifications/providers/{provider_id}`	Delete provider
POST	`/api/v1/notifications/providers/{provider_id}/verify`	Test stored provider connectivity
POST	`/api/v1/notifications/providers/{provider_id}/test`	Send a test message (`test_email` query param required)

Setting up an SMTP provider

Navigate to Notification Providers (/notification-providers).
Click Add provider and choose SMTP Email.
Fill in host, port, username, password, TLS settings, and a from_email.
Save, then click Verify to confirm connectivity (sends no email).
Click Test and supply a test_email address to send a real test message.

Setting up Slack

In your Slack workspace, create an Incoming Webhook app and copy the webhook URL.
Add a provider with type slack_webhook and paste the URL.
Verify connectivity, then optionally send a test.

Setting up a generic webhook

{
  "type": "generic_webhook",
  "name": "PagerDuty ingest",
  "config": {
    "url": "https://events.pagerduty.com/v2/enqueue",
    "method": "POST",
    "headers": {"Content-Type": "application/json"},
    "hmac_secret": "your-secret-here"
  }
}

When hmac_secret is set, FreeSDN computes HMAC-SHA256(secret, body) and attaches it as X-FreeSDN-Signature on every outbound request. The receiving end can verify it to confirm the call originated from your FreeSDN instance.

Sending programmatically

POST /api/v1/notifications/send
Authorization: Bearer <token>
Content-Type: application/json

{
  "channel": "slack",
  "recipient": "#alerts",
  "title": "Device offline",
  "body": "Switch sw-01 at Site A stopped responding."
}

Template-based send:

POST /api/v1/notifications/send/template

Both endpoints require ORG_ADMIN or SUPER_ADMIN.

Dispatch fan-out

When an alert rule fires, the dispatch call fans out across all configured channels concurrently via asyncio.gather. Each channel is independent: a failure on Slack does not block email delivery.

Per-user mute preferences apply before dispatch. If a user has muted a category, the dispatch logs SKIPPED for that user’s in-app channel rather than delivering.

In-app notifications

In-app notifications are per-user and require no provider configuration. They appear in the bell icon in the top navigation.

Method	Path	Purpose
GET	`/api/v1/notifications/in-app`	List notifications (paginated envelope with `unread_count`)
POST	`/api/v1/notifications/in-app/{notification_id}/read`	Mark one read
POST	`/api/v1/notifications/in-app/read-all`	Mark all read
GET	`/api/v1/notifications/in-app/unread-count`	Badge count for the bell
POST	`/api/v1/notifications/in-app/mark`	Bulk mark read or dismiss

The list response returns {items, total, limit, offset, unread_count}. Pass unread_only=true for only unseen notifications, or include_dismissed=true to include archived ones.

User notification preferences

Each user can control which channels they receive on and set quiet hours.

Method	Path	Purpose
GET	`/api/v1/notifications/preferences`	Get current preferences (defaults: all channels enabled)
PUT	`/api/v1/notifications/preferences`	Update channels, quiet hours, category settings
PATCH	`/api/v1/notifications/preferences/mute`	Mute or snooze a category (`expires_at=null` = permanent)
DELETE	`/api/v1/notifications/preferences/mute/{category}`	Unmute (returns 404 if not muted)

Users access these settings from Settings → Notifications.

Alert rule examples

Notify on any device going offline

{
  "name": "Device offline",
  "rule_type": "pattern",
  "conditions": {"event_type": "device.offline"},
  "scope": "organization",
  "scope_ids": [],
  "severity": "critical",
  "auto_resolve_after_seconds": 1800,
  "notification_channels": {
    "email": {"to": ["ops@example.com"]},
    "slack": {"channel": "#alerts"}
  }
}

Alert on SLA breach for two specific sites

{
  "name": "SLA breach  -  Production sites",
  "rule_type": "pattern",
  "conditions": {
    "event_type": "sla.breach.created"
  },
  "scope": "site",
  "scope_ids": ["site-uuid-a", "site-uuid-b"],
  "severity": "critical",
  "notification_channels": {
    "email": {"to": ["ops@example.com"]},
    "teams": {"webhook_url": "https://teams.microsoft.com/l/..."},
    "in_app": {"user_ids": ["user-uuid-1", "user-uuid-2"]}
  }
}

Suppress noisy alerts during a maintenance window

POST /api/v1/alert-rules/alerts/{alert_id}/suppress
Content-Type: application/json

{
  "suppress_minutes": 120,
  "reason": "Scheduled maintenance window 02:00-04:00 UTC"
}

Permission reference

Action	Required permission
Read rules and alerts	`alert:read`
Create a rule	`alert:create`
Update a rule or alert lifecycle action	`alert:update`
Delete a rule	`alert:delete`
Manage notification providers	ORG_ADMIN or SUPER_ADMIN
Send notifications programmatically	ORG_ADMIN or SUPER_ADMIN

Role assignment follows the 7-tier ladder (super_admin → guest). You cannot assign a role at or above your own level. See Enterprise overview for the full role table.

Troubleshooting

Alerts are not firing

Check that the worker container is running and the default queue is being consumed.
Verify the rule’s status is active - rules with status: disabled or status: draft are skipped by the evaluator.
Call POST /api/v1/alert-rules/evaluate to force an immediate evaluation and watch the response for any error details.
Check Flower (if the monitoring profile is active) at port 5555 to confirm the evaluate_all_alert_rules task is completing without errors.

Notifications are not being delivered

Open the provider record and click Verify to confirm connectivity.
Send a Test message and review the response body for the provider error.
Check whether the user has muted the relevant category in their preferences.
For email: confirm SMTP port, TLS mode, and that the from_email is accepted by the relay.
For webhooks: confirm the endpoint is reachable from the FreeSDN API container (not just your browser).

POST /api/v1/alert-rules/evaluate returns 403

The endpoint requires alert:update, not alert:create. Verify your API key or JWT role includes that scope.

Suppress is not lifting automatically

The alert-rules-unsuppress-expired task runs every 5 minutes. If it is overdue, check the Celery worker logs for failures or queue backlog.

Next steps

SLA monitoring - define policies, review breaches, and generate compliance reports
Event correlation - group related alerts into managed incidents with assignment
Audit log - query the tamper-evident log for alert and notification history
Enterprise overview - feature matrix, caveats, and role ladder