- How to Connect These Sources Safely and Scalably
- 1. Purpose
- 2. Integration Philosophy
- 3. Three Integration Archetypes
- 4. Cloud-Native Integration
- 5. COTS / SaaS Integration
- 6. On-Prem Integration
- 7. Transformation & Normalization
- 8. Performance-Safe Patterns
- 9. Security & Compliance Controls
- 10. Monitoring and Alerting
- 11. Example Hybrid Pattern (visual logic)
- 12. Key KPIs
- 13. Common Pitfalls
- 14. Takeaway
How to Connect These Sources Safely and Scalably #
1. Purpose #
Connecting data to the EA 2.0 graph is not a technical chore — it’s the moment when architecture becomes alive.
But integrations differ: some are API-rich, others locked in legacy COTS (Commercial-Off-The-Shelf) systems, and some still live in on-prem databases.
This guide defines safe, repeatable integration patterns that bring all these worlds into a single reasoning fabric without breaking security, sovereignty, or performance.
2. Integration Philosophy #
EA 2.0 follows five integration commandments:
- No direct coupling: All systems connect through a mediation layer (Function, API Gateway, or Event Bus).
- Incremental ingestion: Pull only deltas or metadata, never entire tables.
- Stateless connectors: Functions execute, commit, and exit — no long-running sync jobs.
- Source authority preserved: Never transform data in source; only enrich or map downstream.
- Lineage everywhere: Every node carries its source, timestamp, and load event ID.
3. Three Integration Archetypes #
| Archetype | Typical Sources | Connector Pattern | Example Stack |
|---|---|---|---|
| Cloud-Native | Azure, AWS, SaaS APIs | Serverless API Poller or Event Subscription | Azure Function → Graph API → Neo4j REST |
| COTS / Enterprise Apps | ServiceNow, SAP, Oracle, Salesforce | API Adapter or Scheduled Extract → Blob → Function | ServiceNow REST → Blob Storage → Loader |
| On-Prem / Legacy | SQL, CSV, shared drives | Gateway Sync or Agent Push | Self-hosted agent → HTTPS webhook → Function |
Each pattern implements the same contract:
→ Extract → Normalize → Map → Upsert → Log.
4. Cloud-Native Integration #
🔹 Example: Azure Resource Graph → EA 2.0 #
- Trigger: Timer (daily) or EventGrid (resource change).
- Extract: Azure Function queries Resource Graph API for new/changed resources.
- Transform: Normalize key metadata: name, type, region, tags, owner.
- Upsert: POST to EA 2.0 Graph Loader endpoint.
- Log: Write metrics to Application Insights (rows processed, latency).
This yields real-time visibility of infrastructure without touching production workloads.
🔹 Example: AWS Config → EA 2.0 #
- Lambda function subscribed to Config SNS topic.
- Parses JSON payload → maps to “Infrastructure Node.”
- Upserts via secure REST to EA Graph endpoint.
Cloud events feed the architecture continuously — no scheduled exports needed.
5. COTS / SaaS Integration #
🔹 Example: ServiceNow CMDB #
Pattern: Incremental REST Extract → Blob → Function Loader
- CMDB tables queried via REST API with
sys_updated_on > last_sync. - Results written to a secure Blob container (isolated per tenant).
- Loader Function maps columns to canonical ontology fields.
- Each record stamped with
source_system='servicenow'.
ServiceNow → Blob → Graph ensures traceability, replayability, and throttling safety.
🔹 Example: SAP / Oracle / Salesforce #
Use middleware iPaaS (Azure Data Factory, Mulesoft, Boomi, etc.) for heavy COTS APIs.
Avoid direct JDBC pulls; use certified connectors for:
- Rate-limit handling
- Retry + DLQ (dead-letter queue)
- Secure token management
All extracts end in a landing zone (Blob/S3) before ingestion to EA 2.0.
6. On-Prem Integration #
Legacy sources can still join the EA 2.0 ecosystem via hybrid bridges:
- Self-hosted Gateway Agent: Runs in DMZ, connects outward to secure HTTPS Function endpoint.
- SFTP/CSV Drop Pattern: Systems export CSVs to a watched OneDrive/SharePoint folder.
- Database Proxy Pattern: Read-only SQL user queries a view and pushes data via REST.
Every on-prem push is outbound-initiated to avoid inbound firewall rules — maintaining zero trust posture.
7. Transformation & Normalization #
The Normalization Layer harmonizes data before graph load:
| Step | Action | Example |
|---|---|---|
| Map Fields | Align to canonical schema | app_name → name, owner → person_ref |
| Classify Sensitivity | Apply label based on field | “PII” → Confidential |
| De-duplicate | Merge by natural key | Same App ID from two systems |
| Score Confidence | Assign trust weight per source | CMDB = 1.0, Spreadsheet = 0.6 |
| Emit Events | Publish record.updated message | Feeds Predictive Layer |
Transformation is policy-driven, not hard-coded — new sources can join by adding mapping config, not new code.
8. Performance-Safe Patterns #
- Use pagination + delta windows (
updated_since) to limit pull size. - Implement back-off & retry (HTTP 429/503) logic for APIs.
- Cache source metadata locally (e.g., schema hash) to skip unchanged fields.
- Split heavy syncs into parallel Function executions by domain.
- Use event compression for telemetry feeds (batch 1000 → 1 payload).
These patterns let EA 2.0 ingest thousands of records daily without noticeable system impact.
9. Security & Compliance Controls #
| Control | Mechanism |
|---|---|
| Authentication | OAuth 2.0 client credentials via Entra ID / IAM roles |
| Encryption | TLS 1.2+ in transit; blob storage encrypted with tenant key |
| Data Residency | Connectors restricted to sovereign region endpoints |
| Secrets Management | Keys stored in Azure Key Vault / AWS Secrets Manager |
| Audit Trail | Every extraction logged with timestamp + checksum |
| Error Isolation | Failed loads quarantined in a dead_letter container |
Integration pipelines are treated as first-class governed assets — visible in dashboards with freshness SLAs.
10. Monitoring and Alerting #
- Ingestion Health Dashboard: shows record counts, latency, and failure rate per source.
- Freshness Gauge: color-coded indicator of last successful sync.
- Incident Hooks: failed Function invocations auto-create tickets in ServiceNow.
- Predictive Trend: ML detects degradation (fewer records = possible API drift).
Integration becomes observable infrastructure, not a hidden script.
11. Example Hybrid Pattern (visual logic) #
┌──────────┐ ┌──────────────┐ ┌─────────────┐ ┌───────────┐
│ Source │ --> │ Extractor │ --> │ Normalizer │ --> │ Graph API │
│ (CMDB) │ │ (Function) │ │ (Mapping) │ │ (Upsert) │
└──────────┘ └──────────────┘ └─────────────┘ └───────────┘
│ ↑
└────────── Error Log / Retry ───────────┘
This pattern repeats identically across all domains — only connectors differ.
12. Key KPIs #
| KPI | Target | Meaning |
|---|---|---|
| Average Sync Latency | < 15 minutes | Fresh data visible in near-real time |
| API Success Rate | > 99 % | Reliable source integration |
| Throughput per Function | > 500 records/sec | Scalable ingestion |
| Data Freshness SLA | ≤ 24 hours | Up-to-date graph representation |
| Integration Confidence | ≥ 0.8 | Quality of mapping and reconciliation |
13. Common Pitfalls #
| Issue | Root Cause | Fix |
|---|---|---|
| Duplicate nodes | Missing unique ID mapping | Define natural_key |
| API throttling | Over-aggressive polling | Implement exponential backoff |
| Schema drift | Source update not tracked | Add schema hash validation |
| Missed deltas | Timezone mismatch in filters | Store UTC timestamps |
| Stale on-prem data | Manual exports forgotten | Add automatic file watcher |
14. Takeaway #
Integration in EA 2.0 isn’t plumbing — it’s governance in motion.
Each connector is a living contract of trust between systems.
The more seamless the integration, the more intelligent the enterprise becomes.