- Structure and Authoring Guidelines
- 1 Purpose
- 2 Policy Taxonomy
- 3 Policy Structure
- 4 Trigger Design
- 5 Threshold Tuning
- 6 Authoring Workflow
- 7 Simulation & Testing
- 8 Policy Dependencies
- 9 Governance Metadata
- 10 Policy Lifecycle
- 11 Monitoring & Feedback
- 12 Best Practices
- 13 Example Policy Library Snapshot
- 14 KPIs for Policy Governance
- 15 Cultural Guidelines
- 16 Takeaway
Structure and Authoring Guidelines #
1 Purpose #
The Predictive and Autonomous layers rely on written logic—policies.
A policy is a digital rule that describes what must be true, when to react, and how to respond.
Together, policies, triggers, and thresholds create EA 2.0’s governance nervous system:
- Policies define intent.
- Triggers sense deviation.
- Thresholds calibrate sensitivity.
2 Policy Taxonomy #
| Category | Example Intent | Common Trigger | Typical Action |
|---|---|---|---|
| Compliance | Stay aligned with security or privacy standards | Policy violation > 0 | Create GRC ticket |
| Operational | Maintain SLA and performance | SLA < target | Scale up infra / notify |
| Financial | Prevent overspend or under-utilisation | Cost > budget × 1.1 | Suspend non-prod resources |
| Lifecycle | Keep apps supported and owned | Support end < 90 days | Trigger upgrade plan |
| Risk Control | Contain exposure before incident | Risk score > threshold | Invoke mitigation workflow |
Policies can be preventive (avoid a breach) or reactive (contain a breach).
3 Policy Structure #
Every policy has five core blocks:
id: unique_policy_id
title: Short description
intent: What outcome this rule protects
trigger:
source_metric: KPI or event
operator: ">"
threshold: numeric or logical value
condition: optional context filter
action:
type: task | notification | automation
endpoint: API URL or ServiceNow table
governance:
owner: team email
approval: required | optional | auto
severity: info | warning | critical
metadata:
version: 1.0
last_updated: 2025-11-08
This YAML-based structure is human-readable and deployable through Git CI/CD.
4 Trigger Design #
Triggers are sensors—each listens to one or more metrics and fires when a condition is met.
- Event-Based: real-time log or message (
incident.created). - Time-Based: scheduled KPI check (
every hour). - Threshold-Based: numeric breach (
cpu > 80 %). - Anomaly-Based: ML-detected deviation (
cost z-score > 2).
Good triggers are specific, debounced (don’t spam), and explainable.
5 Threshold Tuning #
Thresholds decide how sensitive automation is.
Three design principles:
- Dynamic not static.
Use rolling averages or percentiles instead of hard numbers. - Confidence-weighted.
Couple thresholds with model certainty (fire if confidence > 0.8). - Context-aware.
Adjust per environment: Dev > 100 %, Prod > 90 %.
Example SQL-style rule:
IF cost/current_budget > 1.1 AND confidence > 0.8 THEN trigger('CostOverrun')
6 Authoring Workflow #
- Define Intent – What problem should never occur?
- Select Signal – Which KPI or event detects it earliest?
- Set Threshold – When does it become unacceptable?
- Choose Action – Fix automatically or notify humans?
- Tag Owner & Severity – Who’s accountable?
- Publish & Test – Run dry-mode simulation.
All new policies go through peer review before activation.
7 Simulation & Testing #
EA 2.0 includes a policy sandbox that can replay 30 days of historical data and show which triggers would have fired.
Benefits:
- Detect over-sensitivity (too many alerts).
- Benchmark thresholds.
- Visualise cost or risk avoided if policy had existed earlier.
8 Policy Dependencies #
Policies often depend on each other.
Use explicit links to avoid feedback loops:
depends_on: [risk_score_policy, cost_guard_policy]
conflicts_with: [debug_mode_policy]
Dependency graphs ensure orchestrated execution.
9 Governance Metadata #
Every policy automatically records:
- Version, author, last change.
- Approval status (draft → approved → active → retired).
- Execution count & success rate.
- Exceptions granted (with expiry date).
This makes audit effortless and keeps historical lineage.
10 Policy Lifecycle #
| Phase | Description | Tool |
|---|---|---|
| Draft | Authored, awaiting peer review | Git branch |
| Approved | CAB or EA Ops sign-off | Merge → main |
| Active | Deployed & monitored | Logic App / API Mgmt |
| Suspended | Temporarily off | Policy dashboard |
| Retired | Archived, immutable record | Blob archive |
Policies evolve like software—not documents.
11 Monitoring & Feedback #
Dashboards track:
- Policy coverage % of systems.
- Trigger frequency by domain.
- Mean time to resolve (automated vs manual).
- False positive ratio.
EA Ops reviews these metrics quarterly to adjust thresholds and retire obsolete rules.
12 Best Practices #
✅ Use plain language titles: “Prevent Unlabeled PII Storage.”
✅ Include business impact field (“saves $50K / yr in cloud cost”).
✅ Version policies like code.
✅ Always test before trust.
✅ Link each policy to an owner node in the graph for accountability.
13 Example Policy Library Snapshot #
| ID | Intent | Severity | Action | Owner |
|---|---|---|---|---|
policy_cost_guard | Prevent budget overshoot | High | Logic App pause | FinOps |
policy_sla_watch | Maintain SLA ≥ 95 % | Medium | Notify Ops | EA Ops |
policy_data_label | Enforce Confidential tag | High | Apply label | Data Governance |
policy_orphan_app | Detect unowned apps | Medium | Create ticket | ITSM |
policy_risk_drift | Cap risk score increase < 10 % | Low | Log alert | Security |
14 KPIs for Policy Governance #
| KPI | Target | Meaning |
|---|---|---|
| Policy Coverage | ≥ 85 % systems governed | Maturity |
| Approval Turnaround | ≤ 3 days | Efficiency |
| False Trigger Rate | ≤ 5 % | Precision |
| Rollback Rate | ≤ 2 % | Reliability |
| Active Policy Ratio | ≥ 70 % vs retired | Relevance |
15 Cultural Guidelines #
- Start small – pilot 10 policies first.
- Treat policies as collaboration between architects and operators.
- Encourage feedback from executors to authors.
- Celebrate prevented incidents as success metrics.
Governance becomes a shared craft, not enforcement.
16 Takeaway #
Policies are architecture expressed as code.
When rules are transparent, measurable, and adaptive, autonomy feels safe instead of risky.