AI Governance

FreeSDN’s AI module is BETA. It integrates LLMs into the Automation engine and backs the agentic assistant with 11 tools. The design priority is defense in depth: the default posture is completely inert, each layer can be independently restricted, and field-level selectors ensure secrets stay out of every prompt - cloud or local.

The three-layer model

Every LLM call travels through three sequential gates. All three must pass. A failure at any gate is a silent skip, not an error, logged with the reason.

Layer 1  -  Global kill-switch
  env var LLM_GLOBALLY_ENABLED, set by super_admin
  default: false (completely off)
        │
        ▼ (only if true)
Layer 2  -  Per-org policy
  configured by org_admin in Settings → AI
  default: disabled
  options: disabled | local_only | cloud_approved
        │
        ▼ (only if local_only or cloud_approved)
Layer 3  -  Field selector
  declared per automation rule by the rule author
  selects exactly which event fields enter the prompt
  secrets are stripped by a hardcoded block-list before dispatch

No layer can be bypassed from a lower-privilege tier. An org_admin cannot override the global kill-switch. A rule author cannot override the org policy or the field block-list.

Layer 1 - Global kill-switch

# Set in the API container environment.
# Not present or empty = false.
LLM_GLOBALLY_ENABLED=true

When LLM_GLOBALLY_ENABLED=false (the default):

Only POST /api/v1/ai/chat returns 503 Service Unavailable. The governance, provider-config, and audit-log endpoints (GET /ai/governance/usage, GET /ai/providers, GET /ai/governance/logs, PUT /ai/providers/{provider_id}) perform no kill-switch check and respond normally - so an org_admin can inspect and configure the module even when the global switch is off.
Automation rules that contain LLM action types (llm.classify, llm.extract, llm.summarize) produce a failed action step recorded as ActionResult(success=True) with output={"error": "LLM classification/extraction/summarization failed. Check server logs for details."}. The governance layer raises LLMGloballyDisabledError, which is caught inside the handler; the step is not retried. There is no separate skipped status and no llm_globally_disabled reason code in the execution record.
The AI settings section is hidden from the UI entirely.

This env var is the super_admin’s circuit breaker. It overrides every per-org policy.

Layer 2 - Per-org policy

When Layer 1 is enabled, each organization independently chooses one of three modes. An org_admin sets the policy in Settings → AI → Policy.

Policy	What it allows	Data leaves the deployment?
`disabled` (default)	No LLM calls for this org	No
`local_only`	Only the Ollama endpoint configured per-org by the `org_admin` in Settings → AI → Providers	No
`cloud_approved`	Ollama + approved cloud providers (OpenAI, Anthropic)	Yes - with explicit opt-in

There is no “any cloud provider” or “bring any endpoint” option. Adding a provider is a code change and a security review, not a configuration field.

API key storage

When cloud_approved is selected, an org_admin configures per-org API keys for OpenAI and Anthropic. Keys are stored Fernet-encrypted in the database. They are decrypted in-process at call time by the serving worker and are never readable across org boundaries.

Layer 3 - Field selector and secret strip

When an automation rule uses an LLM action, the rule author selects exactly which fields from the trigger event are forwarded to the model. Wildcards are not allowed. Fields are chosen individually.

The server enforces two additional controls before building the prompt:

Path validation - input_fields accepts ≤ 20 dot-separated paths, max depth 5 (trigger_data.device.name is valid; a 6-level path is rejected). The engine only reads from the trigger_data subtree - not from the full execution context that includes organization_id, rule_id, and the actor identity.
Hardcoded secret block-list - even if a rule author explicitly lists a field, the engine checks each resolved value’s key against a block-list that includes: organization_id, rule_id, password, secret, token, api_key, credential, and similar. Matched keys are dropped silently. This is a defense-in-depth guard against a rule that accidentally (or deliberately) targets a field that carries a credential.

// Example LLM classify action config (automation rule body)
{
  "action_type": "llm.classify",
  "params": {
    "input_fields": [
      "trigger_data.message",
      "trigger_data.severity",
      "trigger_data.device_name"
    ],
    "labels": ["network", "security", "hardware", "informational"]
  }
}

The field list in the execution record is the exact audit trail - you know precisely which field names were sent (not their values; values are not stored).

Value truncation

Resolved field values are truncated to 1,000 characters before insertion into the prompt. This limits both prompt-injection risk and token blowout on unexpectedly large fields.

Supported providers

All provider calls use httpx directly. No vendor SDK is a dependency.

Provider	Policy required	Configuration	Notes
Ollama (self-hosted)	`local_only` or `cloud_approved`	Base URL configured per-org by `org_admin` in Settings → AI → Providers → Ollama → Base URL (stored in DB; defaults to `http://localhost:11434`)	Runs on your network. No data egress.
OpenAI	`cloud_approved`	Per-org API key (org_admin)	Data leaves the deployment.
Anthropic	`cloud_approved`	Per-org API key (org_admin)	Data leaves the deployment.

Supported LLM operations

Only three structured operations are supported in the automation engine. Free-form generation belongs to the AI Assistant interface, not automation rules. The engine accepts llm.classify, llm.extract, and llm.summarize as action_type values.

Operation	Purpose	`input_fields` cap	`max_tokens`	Output
`llm.classify`	Assign one label from a fixed declared set	≤ 20 fields	500 (fixed)	One string from the declared label list
`llm.extract`	Pull structured data from unstructured text	≤ 20 fields	500 (fixed)	JSON matching a declared schema
`llm.summarize`	Summarize a block of text	≤ 20 fields	500 (fixed)	Plain text, 50-500 words (hard cap)

`llm.classify`

Classifies the selected input fields into one label from a fixed set.

labels: 2-20 strings, declared at rule creation time. Labels cannot be set or changed at runtime.
If the model returns a value outside the declared label list, the action fails with classification_error. It is not retried.
The action result (keyed as classification) is stored in the execution record. Downstream steps in the same rule receive the same shared context dict but do not receive prior-step outputs via templating - all steps read from the original trigger context. For inter-step data chaining, use the Fabric Connections system, which does support {{steps.N.output.result}} templating.

`llm.extract`

Extracts structured data from unstructured text into a declared JSON Schema.

output_schema: ≤ 10 top-level properties.
If the model output fails JSON parse or schema validation, the action fails. No retry.
Use this for pulling device names, IPs, error codes, or severity levels out of syslog or trap messages.

`llm.summarize`

Produces a plain-text summary of the selected fields.

max_words: 50-500. The server clamps any value outside that range. 500 is the hard cap regardless of configuration.
Use this for alert digests, change-window summaries, or event rollups before sending a notification.

Token budgets

Each organization has a monthly token budget enforced by the governance layer before any provider call is made.

Setting	Default	Configured by
Monthly token limit	100,000 tokens	`org_admin` in Settings → AI → Token Budget
Warning threshold	80%	Platform (not configurable)
Enforcement action	Generic error result (`success=True`, `output.error` set)	Platform

When 80% of the budget is consumed, an ai.budget.warning event is emitted on the event bus. You can wire a Fabric Connection or an automation rule to this event to route it to email, Slack, or a webhook.

When the budget is exhausted, the governance layer raises LLMBudgetExceededError, which is caught by the handler’s top-level except Exception block. The action returns ActionResult(success=True) with output={"error": "LLM classification/extraction/summarization failed. Check server logs for details."} - the same generic error path taken for a provider timeout or any other failure. There is no separate skipped status and no budget_exceeded reason code in the execution record; inspect server logs for LLMBudgetExceededError to distinguish a budget refusal from a provider failure. No charges accumulate after the limit. Budget tracking uses an atomic Redis increment with a 32-day rolling TTL, backed by a DB sync for durability.

Audit log

Every LLM call - successful or not - is written to ai.llm_call_logs. Fields logged per call:

Field	Stored?	Notes
Timestamp	Yes
Rule ID	Yes	Links the LLM call to the automation rule (`rule_id` in response)
Execution ID	DB only	Stored in `ai.llm_call_logs` but not returned by the API; not available for client-side correlation
Provider and model	Yes
Operation type	Yes	classify / extract / summarize
Field names sent	Yes	Not the values - privacy-preserving
Token counts	Yes	Prompt tokens + completion tokens
Success / failure	Yes
Error message	Yes	Provider error or validation failure
Latency	Yes	Milliseconds
Input text	No	Not stored
Output text	No	Not stored

The audit log is visible to org_admin in Settings → AI → Audit Log and exportable as CSV. If you need to reconstruct what was sent, filter the audit log by rule_id and match on created_at. The execution_id is stored in the database (ai.llm_call_logs.execution_id) but is not currently exposed by GET /api/v1/ai/governance/logs - correlation to AutomationExecutionRecord.trigger_data via execution_id requires direct DB access.

Async execution

LLM calls never block the automation engine. Each call is awaited inline as an async coroutine within the automation rule’s action-handler chain - there is no separate Celery task. The call completes (or times out) before the next action step runs.

The timeout that applies is whatever the underlying httpx client uses for the provider connection; there are no separate configurable soft or hard time limits for LLM actions. Classification or schema validation failures are not retried - a failure returns an error result for that step immediately.

The action result is stored in ActionResult.output keyed as classification (for llm.classify), extracted (for llm.extract), or summary (for llm.summarize). Downstream steps in the same automation rule cannot template against prior-step LLM outputs - the automation engine does not support inter-step templating. Use Fabric Connections if you need {{steps.N.output.result}} chaining.

AI Assistant and native tools

The AI Assistant (the chat interface, not automation rules) has access to 11 native tools covering device inventory, alerts, network topology, and observability data. These tools run against the authenticated user’s permissions - the assistant cannot act on behalf of another user or org.

The same 3-layer governance applies to the assistant’s provider calls. If LLM_GLOBALLY_ENABLED=false, the chat interface returns 503. If the org policy is disabled, the assistant is unavailable for that org even if the global switch is on.

The agentic loop runs at most 5 iterations per user message.

What the assistant cannot do

fabric.webhook is explicitly excluded from the AI tool bridge. The assistant cannot POST data to an external URL. This is a hard code exclusion, not a permission check - the op is simply not registered as an AI tool. Plugin authors can register up to 20 AI tools per plugin, but those tools are bounded by the same 3-layer governance.

Enabling LLM for the first time

Follow this sequence. Skipping steps causes confusing 503s.

Super admin: set the global flag.
Terminal window
```
# In the API container environment (docker-compose.yml or .env file)
LLM_GLOBALLY_ENABLED=true
```
Restart the API container after changing the env var.
Org admin: configure the Ollama base URL if using local_only (optional but recommended to start).

Navigate to Settings → AI → Providers → Ollama → Base URL and set it to your Ollama host (e.g. http://ollama:11434 for a Compose-internal host, or your LAN IP). The base URL defaults to http://localhost:11434 and is stored in the database - there is no env var for this.
Org admin: set the per-org policy in Settings → AI → Policy. Choose local_only for zero data egress or cloud_approved if you are using OpenAI or Anthropic.
Org admin: configure cloud API keys (only for cloud_approved). Navigate to Settings → AI → Providers and enter the key. The key is Fernet-encrypted on save.
Rule author: build an automation rule with an LLM action. In the action editor, select the field picker and choose only the fields the model genuinely needs. Review the block-list note above - secret-carrying fields are dropped regardless.
Test the rule using Run Now in the automation table. Check Settings → AI → Audit Log to confirm a call record was written and tokens were consumed.

What is explicitly out of scope

The following are not in this module and will not be added without a separate security model review:

Free-form chat, ask, or generate operations in automation rules
LLM writing or modifying automation rules autonomously
LLM calling device APIs directly or submitting staged changes
Streaming LLM responses inside automation steps
Fine-tuning or training on infrastructure data
Image or multimodal inputs
Third-party providers beyond OpenAI and Anthropic - adding a provider is a code change, not a config option

API endpoints (AI module)

The AI module mounts under /api/v1/ai/. All endpoints require authentication. Cloud-provider endpoints additionally require cloud_approved org policy.

METHOD	Path	Purpose
GET	`/api/v1/ai/governance/usage`	Org LLM policy, monthly token budget, and percentage used (requires `ai.admin` or `org_admin`)
GET	`/api/v1/ai/providers`	Available providers for this org
POST	`/api/v1/ai/chat`	AI Assistant (agentic chat, ≤ 5 iterations)
GET	`/api/v1/ai/governance/logs`	LLM call audit log (paginated; page≥1, size 1-100; requires ai.admin or org_admin)
PUT	`/api/v1/ai/providers/{provider_id}`	Configure a provider and/or update org policy / token budget (`ai.admin` or `org_admin`)

Next steps

Connections - build automation rules that use llm.classify, llm.extract, or llm.summarize as action steps.
Fabric - wire the ai.budget.warning event to a notification or external webhook.
Plugin System - register additional AI tools from a plugin (up to 20 per plugin, subject to the same 3-layer governance).
Configuration Reference - full env var list including LLM_GLOBALLY_ENABLED.
Roles and Permissions - super_admin and org_admin role responsibilities in full.