- Functions • Data Factory • Event Hub Patterns
- 1 Purpose
- 2 Design Principles
- 3 Core Components
- 4 Typical Flow
- 5 Ingestion Patterns
- 6 Data Transformation Best Practices
- 7 Graph Loader Function
- 8 Event Hub Architecture
- 9 Monitoring & Alerting
- 10 Security Controls
- 11 Cost Optimization
- 12 Example Event-Driven Policy Flow
- 13 Patterns for Hybrid Integration
- 14 Benefits
- 15 Takeaway
Functions • Data Factory • Event Hub Patterns #
1 Purpose #
Integration middleware is the circulatory system of EA 2.0.
It moves information between systems, cleans it, transforms it into graph relationships, and ensures everything that happens in your enterprise is reflected in the knowledge graph in near-real time.
This layer must be fast, reliable, secure, and observable — otherwise predictive governance can’t keep pace with reality.
2 Design Principles #
| Principle | Description |
|---|---|
| Event-First | Prefer streaming over polling; nothing should depend on nightly batch if it can be pushed. |
| Idempotent Processing | Re-ingesting data never causes duplicates. |
| Schema Contracts | Each connector publishes a YAML/JSON schema so changes are controlled. |
| Serverless Execution | Use Functions for elasticity and cost efficiency. |
| Separation of Concerns | Orchestration (ADF) ≠ Transformation (Function) ≠ Transport (Event Hub). |
3 Core Components #
| Component | Role | Technology Choice (Azure) |
|---|---|---|
| Orchestration Layer | Schedule and sequence data flows | Azure Data Factory (ADF) or Synapse Pipelines |
| Transformation Layer | Clean, map, normalize data | Azure Functions (Python / C#) |
| Transport Layer | Move events and stream updates | Azure Event Hub / Service Bus / Event Grid |
| Persistence Layer | Temporary storage for intermediate files | Azure Blob / ADLS Gen2 |
| Monitoring Layer | Track latency & failures | Azure Monitor + Application Insights |
4 Typical Flow #
Source System (API, CMDB, Cloud, File)
│
▼
[Event Hub or ADF Trigger]
│
▼
[Function – Transform & Validate]
│
▼
[Blob/Synapse Staging]
│
▼
[Graph Loader Function → Cosmos DB]
│
▼
[Reasoning API Index Refresh → Vector Store → NLQ UI]
Each stage emits telemetry for monitoring and audit replay.
5 Ingestion Patterns #
| Pattern | When to Use | Mechanism | Example |
|---|---|---|---|
| API Polling | Legacy systems without webhooks | Timer-trigger Function fetches JSON delta | ServiceNow CMDB / GRC |
| Event Push | Modern systems with change events | Webhook → Event Grid Topic → Event Hub | Azure Monitor alerts / Defender events |
| File Drop | Manual feeds or CSV reports | OneDrive / SharePoint Folder Trigger | Finance procurement sheet |
| Message Queue | High-volume transactions | Service Bus topic with Function subscription | Cloud inventory updates |
6 Data Transformation Best Practices #
- Validate Schema – compare fields against contract before processing.
- Map IDs – harmonize ApplicationIDs, RiskIDs via Master System Index.
- Normalize Taxonomy – align terms with EA 2.0 Vocabulary service.
- Stamp Metadata – add
source_system,extracted_at,confidence. - Batch vs Stream – use ADF for bulk, Functions for real-time.
7 Graph Loader Function #
def main(msg: dict):
from neo4j import GraphDatabase
driver = GraphDatabase.driver(os.environ["NEO4J_URI"], auth=("neo4j", os.environ["NEO4J_PW"]))
with driver.session() as s:
for rec in msg["records"]:
s.run("""
MERGE (a:Application {id:$app_id})
MERGE (c:Capability {id:$cap_id})
MERGE (a)-[:SUPPORTS]->(c)
SET a.last_seen_at=timestamp(), a.source_system=$source
""", rec)
Lightweight, idempotent, and event-driven.
8 Event Hub Architecture #
| Component | Purpose |
|---|---|
| Producers | ETL Functions, Logic Apps publish events. |
| Consumers | Graph Loader, Predictive Engine, Audit Logger. |
| Capture | Writes raw streams to Blob for forensics. |
| Partitions | Enable parallel processing for throughput. |
Retention default: 7 days; use Capture for longer history.
9 Monitoring & Alerting #
- Application Insights traces per Function.
- ADF pipeline failures → Teams alerts via Logic App.
- Event Hub lag metric monitored by Azure Monitor rule.
- Daily summary dashboard shows: success %, avg latency, failed records.
10 Security Controls #
| Area | Control | Implementation |
|---|---|---|
| Authentication | Managed Identity | No secrets in code |
| Authorization | RBAC roles on resource groups | Least privilege |
| Data Protection | Private Endpoints + TLS 1.2+ | End-to-end encryption |
| Audit Logs | Function logs + Activity Log Archive | Immutable evidence |
11 Cost Optimization #
| Component | Optimization Tip |
|---|---|
| Functions | Use consumption plan with short timeout |
| ADF | Combine small pipelines to reduce runs |
| Event Hub | Right-size throughput units (TU) |
| Blob | Lifecycle policies → cool storage |
12 Example Event-Driven Policy Flow #
Trigger: Azure Monitor detects “Storage without tag.”
→ Event Grid publishes event.
→ EA 2.0 Transformation Function creates policy violation object.
→ Graph Loader adds node.
→ Predictive Engine forecasts compliance trend.
→ Outbound Function creates GRC ticket.
Everything flows through the same integration backbone.
13 Patterns for Hybrid Integration #
| Landscape | Approach |
|---|---|
| On-Prem COTS | Use Self-hosted Integration Runtime (ADF) |
| Non-Azure Clouds (AWS, GCP) | API Gateway → Event Hub Ingress |
| SaaS Apps | Logic App connectors (ServiceNow, Jira, Salesforce) |
| Legacy FTP Systems | ADF FTP linked service + Function checksum validator |
14 Benefits #
✅ Single integration fabric for structured and event data.
✅ Near real-time governance without overloading source systems.
✅ Full traceability and replay capability.
✅ Extensible to non-Azure clouds via Event Hub or API Mgmt.
15 Takeaway #
Integration is not plumbing — it’s consciousness.
The middleware layer is how EA 2.0 “feels” the enterprise in motion — translating signals into insights without losing context, speed, or security.