AI Governance
FreeSDN’s AI module is BETA. It integrates LLMs into the Automation engine and backs the agentic assistant with 11 tools. The design priority is defense in depth: the default posture is completely inert, each layer can be independently restricted, and field-level selectors ensure secrets stay out of every prompt - cloud or local.
The three-layer model
Section titled “The three-layer model”Every LLM call travels through three sequential gates. All three must pass. A failure at any gate is a silent skip, not an error, logged with the reason.
Layer 1 - Global kill-switch env var LLM_GLOBALLY_ENABLED, set by super_admin default: false (completely off) │ ▼ (only if true)Layer 2 - Per-org policy configured by org_admin in Settings → AI default: disabled options: disabled | local_only | cloud_approved │ ▼ (only if local_only or cloud_approved)Layer 3 - Field selector declared per automation rule by the rule author selects exactly which event fields enter the prompt secrets are stripped by a hardcoded block-list before dispatchNo layer can be bypassed from a lower-privilege tier. An org_admin cannot override the global kill-switch. A rule author cannot override the org policy or the field block-list.
Layer 1 - Global kill-switch
Section titled “Layer 1 - Global kill-switch”# Set in the API container environment.# Not present or empty = false.LLM_GLOBALLY_ENABLED=trueWhen LLM_GLOBALLY_ENABLED=false (the default):
- Only
POST /api/v1/ai/chatreturns503 Service Unavailable. The governance, provider-config, and audit-log endpoints (GET /ai/governance/usage,GET /ai/providers,GET /ai/governance/logs,PUT /ai/providers/{provider_id}) perform no kill-switch check and respond normally - so anorg_admincan inspect and configure the module even when the global switch is off. - Automation rules that contain LLM action types (
llm.classify,llm.extract,llm.summarize) produce a failed action step recorded asActionResult(success=True)withoutput={"error": "LLM classification/extraction/summarization failed. Check server logs for details."}. The governance layer raisesLLMGloballyDisabledError, which is caught inside the handler; the step is not retried. There is no separateskippedstatus and nollm_globally_disabledreason code in the execution record. - The AI settings section is hidden from the UI entirely.
This env var is the super_admin’s circuit breaker. It overrides every per-org policy.
Layer 2 - Per-org policy
Section titled “Layer 2 - Per-org policy”When Layer 1 is enabled, each organization independently chooses one of three modes. An org_admin sets the policy in Settings → AI → Policy.
| Policy | What it allows | Data leaves the deployment? |
|---|---|---|
disabled (default) | No LLM calls for this org | No |
local_only | Only the Ollama endpoint configured per-org by the org_admin in Settings → AI → Providers | No |
cloud_approved | Ollama + approved cloud providers (OpenAI, Anthropic) | Yes - with explicit opt-in |
There is no “any cloud provider” or “bring any endpoint” option. Adding a provider is a code change and a security review, not a configuration field.
API key storage
Section titled “API key storage”When cloud_approved is selected, an org_admin configures per-org API keys for OpenAI and Anthropic. Keys are stored Fernet-encrypted in the database. They are decrypted in-process at call time by the serving worker and are never readable across org boundaries.
Layer 3 - Field selector and secret strip
Section titled “Layer 3 - Field selector and secret strip”When an automation rule uses an LLM action, the rule author selects exactly which fields from the trigger event are forwarded to the model. Wildcards are not allowed. Fields are chosen individually.
The server enforces two additional controls before building the prompt:
-
Path validation -
input_fieldsaccepts ≤ 20 dot-separated paths, max depth 5 (trigger_data.device.nameis valid; a 6-level path is rejected). The engine only reads from thetrigger_datasubtree - not from the full execution context that includesorganization_id,rule_id, and the actor identity. -
Hardcoded secret block-list - even if a rule author explicitly lists a field, the engine checks each resolved value’s key against a block-list that includes:
organization_id,rule_id,password,secret,token,api_key,credential, and similar. Matched keys are dropped silently. This is a defense-in-depth guard against a rule that accidentally (or deliberately) targets a field that carries a credential.
// Example LLM classify action config (automation rule body){ "action_type": "llm.classify", "params": { "input_fields": [ "trigger_data.message", "trigger_data.severity", "trigger_data.device_name" ], "labels": ["network", "security", "hardware", "informational"] }}The field list in the execution record is the exact audit trail - you know precisely which field names were sent (not their values; values are not stored).
Value truncation
Section titled “Value truncation”Resolved field values are truncated to 1,000 characters before insertion into the prompt. This limits both prompt-injection risk and token blowout on unexpectedly large fields.
Supported providers
Section titled “Supported providers”All provider calls use httpx directly. No vendor SDK is a dependency.
| Provider | Policy required | Configuration | Notes |
|---|---|---|---|
| Ollama (self-hosted) | local_only or cloud_approved | Base URL configured per-org by org_admin in Settings → AI → Providers → Ollama → Base URL (stored in DB; defaults to http://localhost:11434) | Runs on your network. No data egress. |
| OpenAI | cloud_approved | Per-org API key (org_admin) | Data leaves the deployment. |
| Anthropic | cloud_approved | Per-org API key (org_admin) | Data leaves the deployment. |
Supported LLM operations
Section titled “Supported LLM operations”Only three structured operations are supported in the automation engine. Free-form generation belongs to the AI Assistant interface, not automation rules. The engine accepts llm.classify, llm.extract, and llm.summarize as action_type values.
| Operation | Purpose | input_fields cap | max_tokens | Output |
|---|---|---|---|---|
llm.classify | Assign one label from a fixed declared set | ≤ 20 fields | 500 (fixed) | One string from the declared label list |
llm.extract | Pull structured data from unstructured text | ≤ 20 fields | 500 (fixed) | JSON matching a declared schema |
llm.summarize | Summarize a block of text | ≤ 20 fields | 500 (fixed) | Plain text, 50-500 words (hard cap) |
llm.classify
Section titled “llm.classify”Classifies the selected input fields into one label from a fixed set.
labels: 2-20 strings, declared at rule creation time. Labels cannot be set or changed at runtime.- If the model returns a value outside the declared label list, the action fails with
classification_error. It is not retried. - The action result (keyed as
classification) is stored in the execution record. Downstream steps in the same rule receive the same shared context dict but do not receive prior-step outputs via templating - all steps read from the original trigger context. For inter-step data chaining, use the Fabric Connections system, which does support{{steps.N.output.result}}templating.
llm.extract
Section titled “llm.extract”Extracts structured data from unstructured text into a declared JSON Schema.
output_schema: ≤ 10 top-level properties.- If the model output fails JSON parse or schema validation, the action fails. No retry.
- Use this for pulling device names, IPs, error codes, or severity levels out of syslog or trap messages.
llm.summarize
Section titled “llm.summarize”Produces a plain-text summary of the selected fields.
max_words: 50-500. The server clamps any value outside that range. 500 is the hard cap regardless of configuration.- Use this for alert digests, change-window summaries, or event rollups before sending a notification.
Token budgets
Section titled “Token budgets”Each organization has a monthly token budget enforced by the governance layer before any provider call is made.
| Setting | Default | Configured by |
|---|---|---|
| Monthly token limit | 100,000 tokens | org_admin in Settings → AI → Token Budget |
| Warning threshold | 80% | Platform (not configurable) |
| Enforcement action | Generic error result (success=True, output.error set) | Platform |
When 80% of the budget is consumed, an ai.budget.warning event is emitted on the event bus. You can wire a Fabric Connection or an automation rule to this event to route it to email, Slack, or a webhook.
When the budget is exhausted, the governance layer raises LLMBudgetExceededError, which is caught by the handler’s top-level except Exception block. The action returns ActionResult(success=True) with output={"error": "LLM classification/extraction/summarization failed. Check server logs for details."} - the same generic error path taken for a provider timeout or any other failure. There is no separate skipped status and no budget_exceeded reason code in the execution record; inspect server logs for LLMBudgetExceededError to distinguish a budget refusal from a provider failure. No charges accumulate after the limit. Budget tracking uses an atomic Redis increment with a 32-day rolling TTL, backed by a DB sync for durability.
Audit log
Section titled “Audit log”Every LLM call - successful or not - is written to ai.llm_call_logs. Fields logged per call:
| Field | Stored? | Notes |
|---|---|---|
| Timestamp | Yes | |
| Rule ID | Yes | Links the LLM call to the automation rule (rule_id in response) |
| Execution ID | DB only | Stored in ai.llm_call_logs but not returned by the API; not available for client-side correlation |
| Provider and model | Yes | |
| Operation type | Yes | classify / extract / summarize |
| Field names sent | Yes | Not the values - privacy-preserving |
| Token counts | Yes | Prompt tokens + completion tokens |
| Success / failure | Yes | |
| Error message | Yes | Provider error or validation failure |
| Latency | Yes | Milliseconds |
| Input text | No | Not stored |
| Output text | No | Not stored |
The audit log is visible to org_admin in Settings → AI → Audit Log and exportable as CSV. If you need to reconstruct what was sent, filter the audit log by rule_id and match on created_at. The execution_id is stored in the database (ai.llm_call_logs.execution_id) but is not currently exposed by GET /api/v1/ai/governance/logs - correlation to AutomationExecutionRecord.trigger_data via execution_id requires direct DB access.
Async execution
Section titled “Async execution”LLM calls never block the automation engine. Each call is awaited inline as an async coroutine within the automation rule’s action-handler chain - there is no separate Celery task. The call completes (or times out) before the next action step runs.
The timeout that applies is whatever the underlying httpx client uses for the provider connection; there are no separate configurable soft or hard time limits for LLM actions. Classification or schema validation failures are not retried - a failure returns an error result for that step immediately.
The action result is stored in ActionResult.output keyed as classification (for llm.classify), extracted (for llm.extract), or summary (for llm.summarize). Downstream steps in the same automation rule cannot template against prior-step LLM outputs - the automation engine does not support inter-step templating. Use Fabric Connections if you need {{steps.N.output.result}} chaining.
AI Assistant and native tools
Section titled “AI Assistant and native tools”The AI Assistant (the chat interface, not automation rules) has access to 11 native tools covering device inventory, alerts, network topology, and observability data. These tools run against the authenticated user’s permissions - the assistant cannot act on behalf of another user or org.
The same 3-layer governance applies to the assistant’s provider calls. If LLM_GLOBALLY_ENABLED=false, the chat interface returns 503. If the org policy is disabled, the assistant is unavailable for that org even if the global switch is on.
The agentic loop runs at most 5 iterations per user message.
What the assistant cannot do
Section titled “What the assistant cannot do”fabric.webhook is explicitly excluded from the AI tool bridge. The assistant cannot POST data to an external URL. This is a hard code exclusion, not a permission check - the op is simply not registered as an AI tool. Plugin authors can register up to 20 AI tools per plugin, but those tools are bounded by the same 3-layer governance.
Enabling LLM for the first time
Section titled “Enabling LLM for the first time”Follow this sequence. Skipping steps causes confusing 503s.
-
Super admin: set the global flag.
Terminal window # In the API container environment (docker-compose.yml or .env file)LLM_GLOBALLY_ENABLED=trueRestart the API container after changing the env var.
-
Org admin: configure the Ollama base URL if using local_only (optional but recommended to start).
Navigate to Settings → AI → Providers → Ollama → Base URL and set it to your Ollama host (e.g.
http://ollama:11434for a Compose-internal host, or your LAN IP). The base URL defaults tohttp://localhost:11434and is stored in the database - there is no env var for this. -
Org admin: set the per-org policy in Settings → AI → Policy. Choose
local_onlyfor zero data egress orcloud_approvedif you are using OpenAI or Anthropic. -
Org admin: configure cloud API keys (only for
cloud_approved). Navigate to Settings → AI → Providers and enter the key. The key is Fernet-encrypted on save. -
Rule author: build an automation rule with an LLM action. In the action editor, select the field picker and choose only the fields the model genuinely needs. Review the block-list note above - secret-carrying fields are dropped regardless.
-
Test the rule using Run Now in the automation table. Check Settings → AI → Audit Log to confirm a call record was written and tokens were consumed.
What is explicitly out of scope
Section titled “What is explicitly out of scope”The following are not in this module and will not be added without a separate security model review:
- Free-form
chat,ask, orgenerateoperations in automation rules - LLM writing or modifying automation rules autonomously
- LLM calling device APIs directly or submitting staged changes
- Streaming LLM responses inside automation steps
- Fine-tuning or training on infrastructure data
- Image or multimodal inputs
- Third-party providers beyond OpenAI and Anthropic - adding a provider is a code change, not a config option
API endpoints (AI module)
Section titled “API endpoints (AI module)”The AI module mounts under /api/v1/ai/. All endpoints require authentication. Cloud-provider endpoints additionally require cloud_approved org policy.
| METHOD | Path | Purpose |
|---|---|---|
| GET | /api/v1/ai/governance/usage | Org LLM policy, monthly token budget, and percentage used (requires ai.admin or org_admin) |
| GET | /api/v1/ai/providers | Available providers for this org |
| POST | /api/v1/ai/chat | AI Assistant (agentic chat, ≤ 5 iterations) |
| GET | /api/v1/ai/governance/logs | LLM call audit log (paginated; page≥1, size 1-100; requires ai.admin or org_admin) |
| PUT | /api/v1/ai/providers/{provider_id} | Configure a provider and/or update org policy / token budget (ai.admin or org_admin) |
Next steps
Section titled “Next steps”- Connections - build automation rules that use
llm.classify,llm.extract, orllm.summarizeas action steps. - Fabric - wire the
ai.budget.warningevent to a notification or external webhook. - Plugin System - register additional AI tools from a plugin (up to 20 per plugin, subject to the same 3-layer governance).
- Configuration Reference - full env var list including
LLM_GLOBALLY_ENABLED. - Roles and Permissions -
super_adminandorg_adminrole responsibilities in full.