Alerting & Notifications
The alerting subsystem has two collaborating parts:
- Alert rules engine - evaluates rules against the internal event bus every 3 minutes and
creates
Alertrecords when conditions are met. - Notification providers - deliver those alerts (and any other dispatch call) to one or more external channels concurrently.
Together they replace ad-hoc email scripts and give you a single place to define what to watch, who to tell, and how.
Alert rules
Section titled “Alert rules”What a rule is
Section titled “What a rule is”An AlertRule subscribes to a named event type (or pattern) on the internal event bus. When the
evaluator sees a matching event, it creates an Alert record and fans out notifications through
every configured provider.
Each rule carries:
| Field | Description |
|---|---|
name | Human-readable label shown in the UI |
description | Optional context for operators |
rule_type | threshold, pattern, anomaly, or custom - selects the evaluator |
scope | Targeting scope - one of organization, site, device_group, device |
scope_ids | JSONB list of UUIDs for the chosen scope (omit for organization) |
severity | info, warning, or critical - maps to notification priority |
status | active, disabled, or draft - disabled/draft rules are skipped by the evaluator |
conditions | JSONB dict whose keys depend on rule_type (see below) |
auto_resolve_after_seconds | If set, the evaluator resolves the alert automatically after this many seconds (integer, minimum 60) |
notification_channels | JSONB dict mapping channel name to channel-specific config (e.g. {"email": {"to": ["ops@example.com"]}, "slack": {"channel": "#alerts"}}). Valid channel keys: email, slack, teams, webhook, in_app, sms, whatsapp. |
conditions structure by rule_type:
rule_type | Required keys in conditions | Example |
|---|---|---|
threshold | metric, operator (>,<,>=,<=,==,!=), value | {"metric": "cpu_utilization", "operator": ">", "value": 90} |
pattern | event_type (glob, e.g. device.offline or device.*), min_count | {"event_type": "device.offline", "min_count": 3} |
anomaly | metric, std_dev_threshold | {"metric": "traffic_in", "std_dev_threshold": 3.0} |
Scope and scope_ids
Section titled “Scope and scope_ids”scope narrows which resources a rule fires for. The table below shows valid combinations:
| scope | scope_ids content | Example use |
|---|---|---|
organization | omit (must be empty) | Fire for anything in your org |
site | list of site UUIDs | Fire only for events on specific sites |
device_group | list of device-group UUIDs | Fire for a named group of devices |
device | list of device UUIDs | Fire for exact devices |
Before a rule is saved, the backend calls _verify_scope_ids - every UUID in scope_ids is
verified to belong to the caller’s organisation. A foreign or non-existent UUID returns 404
(not 403) to avoid leaking existence.
Severities
Section titled “Severities”| Severity | Typical use |
|---|---|
info | Low-priority informational events, resolved events |
warning | Default; degraded performance, capacity thresholds |
critical | Service-affecting outages, security events |
Alert lifecycle
Section titled “Alert lifecycle”An alert moves through these states:
firing ├─► acknowledged (operator marks seen) ├─► resolved (operator or auto-resolve clears it) └─► suppressed (snooze for N minutes)- Suppress takes a
suppress_minutesvalue and an optionalreason. When the suppression expires, thealert-rules-unsuppress-expiredCelery task lifts it automatically (runs every 5 minutes). - Auto-resolve runs via
alert-rules-auto-resolveevery 10 minutes. Rules withauto_resolve_after_secondsset will be closed without manual intervention. - Acknowledge accepts an optional free-text
notestored on the record.
Background evaluation schedule
Section titled “Background evaluation schedule”| Celery task | Interval | What it does |
|---|---|---|
alert-rules-evaluate-all | Every 3 min | Runs the full rule set for your org |
alert-rules-auto-resolve | Every 10 min | Resolves timed-out alerts |
alert-rules-unsuppress-expired | Every 5 min | Lifts expired suppressions |
These tasks run on the default Celery queue. If the worker container is down, no evaluation
happens until it recovers - there is no fallback evaluator.
You can also trigger evaluation manually: POST /api/v1/alert-rules/evaluate.
Alert rules API
Section titled “Alert rules API”All endpoints are under the prefix /api/v1/alert-rules. Fine-grained permission scopes are
used throughout.
Rules CRUD
Section titled “Rules CRUD”| Method | Path | Purpose | Permission |
|---|---|---|---|
| GET | /api/v1/alert-rules/rules | List rules (status?, type?, site_id?) | alert:read |
| POST | /api/v1/alert-rules/rules | Create rule (verifies scope_ids) | alert:create |
| GET | /api/v1/alert-rules/rules/{rule_id} | Get one rule | alert:read |
| PATCH | /api/v1/alert-rules/rules/{rule_id} | Update rule (re-verifies scope_ids) | alert:update |
| DELETE | /api/v1/alert-rules/rules/{rule_id} | Soft-delete rule | alert:delete |
| GET | /api/v1/alert-rules/stats | Rule + alert statistics (site_id?) | alert:read |
Alert lifecycle actions
Section titled “Alert lifecycle actions”| Method | Path | Purpose | Permission |
|---|---|---|---|
| GET | /api/v1/alert-rules/alerts | List alerts (status?, severity?, rule_id?, site_id?, limit≤200) | alert:read |
| GET | /api/v1/alert-rules/alerts/{alert_id} | Get one alert | alert:read |
| POST | /api/v1/alert-rules/alerts/{alert_id}/acknowledge | Acknowledge with optional note | alert:update |
| POST | /api/v1/alert-rules/alerts/{alert_id}/resolve | Resolve alert | alert:update |
| POST | /api/v1/alert-rules/alerts/{alert_id}/suppress | Suppress for N minutes | alert:update |
| POST | /api/v1/alert-rules/evaluate | Manually evaluate all rules now | alert:update |
Creating your first alert rule
Section titled “Creating your first alert rule”- Open Alert Rules in the sidebar (route
/alert-rules). - Click New rule.
- Set
name,event_type, andseverity. - Choose a
scope. For a site-scoped rule, select one or more sites from the picker - their UUIDs populatescope_ids. - Optionally set
auto_resolve_after_secondsif the alert should self-clear. - Under Channels, tick the delivery channels you want (email, Slack, etc.). Each must have a configured provider - see Notification providers below.
- Save. The rule is active immediately; the next evaluator cycle (within 3 minutes) will pick it up.
To test without waiting, call:
POST /api/v1/alert-rules/evaluateAuthorization: Bearer <token>Then check /api/v1/alert-rules/alerts for any newly fired alerts.
Notification providers
Section titled “Notification providers”A notification provider is a named, stored delivery configuration for one channel. You can have multiple providers for the same channel (e.g. two Slack workbooks, one per team).
Supported channels and provider types
Section titled “Supported channels and provider types”| Channel | Provider type | Auth model |
|---|---|---|
smtp | SMTP server + credentials | |
| Slack | slack_webhook | Incoming webhook URL |
| Microsoft Teams | teams_webhook | Incoming webhook URL |
| Webhook (generic) | generic_webhook | URL + optional HMAC secret + custom headers |
| In-app | built-in | No external config needed |
| SMS | twilio_sms | Twilio Account SID + Auth Token + from-number |
twilio_whatsapp | Twilio Account SID + Auth Token + from-number |
Fetch the full config schema for any provider type - including required fields and validation rules - from:
GET /api/v1/notifications/providers/typesThis returns {type, name, channel, icon, config_schema} for each supported type.
Security constraints on provider config
Section titled “Security constraints on provider config”- Provider config blobs are capped at 256 KiB per record.
- Display names reject CR, LF, and other control characters (header-injection defense).
- API responses return a redacted
config_summary, never raw credentials. - Generic webhook HMAC secrets are stored encrypted; the HMAC is computed server-side on dispatch.
Providers API
Section titled “Providers API”Provider management requires ORG_ADMIN or SUPER_ADMIN.
| Method | Path | Purpose |
|---|---|---|
| GET | /api/v1/notifications/providers | List providers (channel?, enabled_only?) |
| GET | /api/v1/notifications/providers/types | Supported types with config schemas |
| POST | /api/v1/notifications/providers | Create provider |
| GET | /api/v1/notifications/providers/{provider_id} | Get provider (config summary only) |
| PUT | /api/v1/notifications/providers/{provider_id} | Update provider |
| DELETE | /api/v1/notifications/providers/{provider_id} | Delete provider |
| POST | /api/v1/notifications/providers/{provider_id}/verify | Test stored provider connectivity |
| POST | /api/v1/notifications/providers/{provider_id}/test | Send a test message (test_email query param required) |
Setting up an SMTP provider
Section titled “Setting up an SMTP provider”- Navigate to Notification Providers (
/notification-providers). - Click Add provider and choose SMTP Email.
- Fill in host, port, username, password, TLS settings, and a
from_email. - Save, then click Verify to confirm connectivity (sends no email).
- Click Test and supply a
test_emailaddress to send a real test message.
Setting up Slack
Section titled “Setting up Slack”- In your Slack workspace, create an Incoming Webhook app and copy the webhook URL.
- Add a provider with type
slack_webhookand paste the URL. - Verify connectivity, then optionally send a test.
Setting up a generic webhook
Section titled “Setting up a generic webhook”{ "type": "generic_webhook", "name": "PagerDuty ingest", "config": { "url": "https://events.pagerduty.com/v2/enqueue", "method": "POST", "headers": {"Content-Type": "application/json"}, "hmac_secret": "your-secret-here" }}When hmac_secret is set, FreeSDN computes HMAC-SHA256(secret, body) and attaches it as
X-FreeSDN-Signature on every outbound request. The receiving end can verify it to confirm
the call originated from your FreeSDN instance.
Sending programmatically
Section titled “Sending programmatically”POST /api/v1/notifications/sendAuthorization: Bearer <token>Content-Type: application/json
{ "channel": "slack", "recipient": "#alerts", "title": "Device offline", "body": "Switch sw-01 at Site A stopped responding."}Template-based send:
POST /api/v1/notifications/send/templateBoth endpoints require ORG_ADMIN or SUPER_ADMIN.
Dispatch fan-out
Section titled “Dispatch fan-out”When an alert rule fires, the dispatch call fans out across all configured channels
concurrently via asyncio.gather. Each channel is independent: a failure on Slack does not
block email delivery.
Per-user mute preferences apply before dispatch. If a user has muted a category, the dispatch
logs SKIPPED for that user’s in-app channel rather than delivering.
In-app notifications
Section titled “In-app notifications”In-app notifications are per-user and require no provider configuration. They appear in the bell icon in the top navigation.
| Method | Path | Purpose |
|---|---|---|
| GET | /api/v1/notifications/in-app | List notifications (paginated envelope with unread_count) |
| POST | /api/v1/notifications/in-app/{notification_id}/read | Mark one read |
| POST | /api/v1/notifications/in-app/read-all | Mark all read |
| GET | /api/v1/notifications/in-app/unread-count | Badge count for the bell |
| POST | /api/v1/notifications/in-app/mark | Bulk mark read or dismiss |
The list response returns {items, total, limit, offset, unread_count}. Pass unread_only=true
for only unseen notifications, or include_dismissed=true to include archived ones.
User notification preferences
Section titled “User notification preferences”Each user can control which channels they receive on and set quiet hours.
| Method | Path | Purpose |
|---|---|---|
| GET | /api/v1/notifications/preferences | Get current preferences (defaults: all channels enabled) |
| PUT | /api/v1/notifications/preferences | Update channels, quiet hours, category settings |
| PATCH | /api/v1/notifications/preferences/mute | Mute or snooze a category (expires_at=null = permanent) |
| DELETE | /api/v1/notifications/preferences/mute/{category} | Unmute (returns 404 if not muted) |
Users access these settings from Settings → Notifications.
Alert rule examples
Section titled “Alert rule examples”Notify on any device going offline
Section titled “Notify on any device going offline”{ "name": "Device offline", "rule_type": "pattern", "conditions": {"event_type": "device.offline"}, "scope": "organization", "scope_ids": [], "severity": "critical", "auto_resolve_after_seconds": 1800, "notification_channels": { "email": {"to": ["ops@example.com"]}, "slack": {"channel": "#alerts"} }}Alert on SLA breach for two specific sites
Section titled “Alert on SLA breach for two specific sites”{ "name": "SLA breach - Production sites", "rule_type": "pattern", "conditions": { "event_type": "sla.breach.created" }, "scope": "site", "scope_ids": ["site-uuid-a", "site-uuid-b"], "severity": "critical", "notification_channels": { "email": {"to": ["ops@example.com"]}, "teams": {"webhook_url": "https://teams.microsoft.com/l/..."}, "in_app": {"user_ids": ["user-uuid-1", "user-uuid-2"]} }}Suppress noisy alerts during a maintenance window
Section titled “Suppress noisy alerts during a maintenance window”POST /api/v1/alert-rules/alerts/{alert_id}/suppressContent-Type: application/json
{ "suppress_minutes": 120, "reason": "Scheduled maintenance window 02:00-04:00 UTC"}Permission reference
Section titled “Permission reference”| Action | Required permission |
|---|---|
| Read rules and alerts | alert:read |
| Create a rule | alert:create |
| Update a rule or alert lifecycle action | alert:update |
| Delete a rule | alert:delete |
| Manage notification providers | ORG_ADMIN or SUPER_ADMIN |
| Send notifications programmatically | ORG_ADMIN or SUPER_ADMIN |
Role assignment follows the 7-tier ladder (super_admin → guest). You cannot assign a
role at or above your own level. See Enterprise overview for the full
role table.
Troubleshooting
Section titled “Troubleshooting”Alerts are not firing
- Check that the
workercontainer is running and thedefaultqueue is being consumed. - Verify the rule’s
statusisactive- rules withstatus: disabledorstatus: draftare skipped by the evaluator. - Call
POST /api/v1/alert-rules/evaluateto force an immediate evaluation and watch the response for any error details. - Check Flower (if the
monitoringprofile is active) at port 5555 to confirm theevaluate_all_alert_rulestask is completing without errors.
Notifications are not being delivered
- Open the provider record and click Verify to confirm connectivity.
- Send a Test message and review the response body for the provider error.
- Check whether the user has muted the relevant category in their preferences.
- For email: confirm SMTP port, TLS mode, and that the
from_emailis accepted by the relay. - For webhooks: confirm the endpoint is reachable from the FreeSDN API container (not just your browser).
POST /api/v1/alert-rules/evaluate returns 403
The endpoint requires alert:update, not alert:create. Verify your API key or JWT role
includes that scope.
Suppress is not lifting automatically
The alert-rules-unsuppress-expired task runs every 5 minutes. If it is overdue, check the
Celery worker logs for failures or queue backlog.
Next steps
Section titled “Next steps”- SLA monitoring - define policies, review breaches, and generate compliance reports
- Event correlation - group related alerts into managed incidents with assignment
- Audit log - query the tamper-evident log for alert and notification history
- Enterprise overview - feature matrix, caveats, and role ladder