View Categories

FAQ & Troubleshooting

2 min read

Common Integration Errors • Schema Drift • Performance • Security Exceptions #


1 Purpose #

Even intelligent architectures occasionally hiccup.
This guide captures the most frequent symptoms you’ll see in production EA 2.0 deployments and provides quick, deterministic fixes—no guesswork, no panic.


2 Integration & Connectivity Issues #

SymptomLikely CauseResolution
Feed not loading from CMDB / Cloud APIToken expired / wrong scopeRe-authorize via Managed Identity or refresh OAuth token.
ETL fails mid-pipelineSchema version mismatchCompare source JSON to Data Contract; update mapping in Transform stage.
“403 Forbidden” from ServiceNow APIIP not whitelisted or wrong user roleAdd Function App IP to SN allow list; assign sn_api_integration role.
Duplicate records in GraphNo unique source_id fieldAdd UUID hash per row in Normalize function.
Missing data for new applicationsSource feed not incrementalEnable “delta sync” and set last_updated_at filter.

3 Schema Drift & Data Quality #

SymptomRoot CauseFix Pattern
“Key not found” error in loaderNew field introduced in source schemaUpdate Ontology + Mapping file vNext.
Graph edges missingRelationship type renamed / removedRun Schema Validator job → re-infer relationships.
Stale data detected > 7 daysCron job failed or paused FunctionCheck Timer Trigger logs and restart.
DQ Score drop < 0.7Feed contains nulls / invalid IDsRe-apply DQ rules; trigger Steward task.
Policy evaluation wrongPolicy JSON uses old taxonomySync taxonomy table from master repo.

4 Performance and Cost #

SymptomRoot CauseMitigation Action
Slow NLQ responsesGraph query unindexedAdd index on name and type fields.
High Cosmos RU/s consumptionOver-fetching entire graphUse pagination & LIMIT 200 in Cypher.
ADF pipeline timeoutLong transform logicSplit into two pipelines (Extract + Transform).
Power BI refresh too slowDataset too large / complex joinsUse DirectQuery + aggregate views.
Storage cost spikeLogs not archivedApply Lifecycle Policy → move > 90 day files to Cool tier.

5 Predictive Layer & AI Errors #

SymptomPossible ReasonSolution
Model accuracy dropData drift / unbalanced training setRetrain with latest quarter data; check feature weights.
RAG answers irrelevantEmbeddings outdated / vector index staleRe-embed content via scheduled job.
Prompt timed outToken limit or LLM latencyReduce context size / use async FastAPI.
Guardrail blocked responseSensitive term policy triggeredReview Prompt Library for safe phrasing.

6 Governance & Security Exceptions #

SymptomDetection SourceAction Required
Unauthorized graph writeAudit Ledger alertRevoke token; review RBAC logs.
Policy auto-remediation failedAzure Policy error codeRetry Logic App with service principal rights.
Evidence missing for controlGRC sync job failedManually upload evidence → rerun API call.
Audit log tampering attemptWORM write violation detectedLock container and notify Compliance Officer.

7 User & Access Problems #

SymptomRoot CauseRemedy
User cannot log in to NLQ UIEntra ID token expiredForce reauth / refresh token policy.
Dashboard blank for some usersMissing Power BI dataset permissionAdd to workspace security group.
“Access Denied” in graph APIRole = Viewer (need Analyst)Promote RBAC role via Portal.

8 Audit Trail Verification #

Checklist:
☑ Audit Ledger hash verified weekly.
☑ Evidence files archived to immutable storage.
☑ Policy change log signed and timestamped.
☑ Sentinel integration report generated monthly.

If any box fails, escalate to Compliance Lead within 24 h.


9 Diagnostics Commands (CLI Examples) #

# Check graph ingestion jobs
az functionapp log tail --name EA2IngestFn --resource-group EA2GovRG

# Validate Cosmos DB throughput
az cosmosdb sql container throughput show --account-name EA2Graph --name nodes --resource-group EA2GovRG

# Verify Power BI refresh history
Get-PowerBIDataset -Name "EA2_Metrics" | Get-PowerBIRefreshHistory

10 Escalation Matrix #

SeverityExample IncidentOwnerResponse Time
CriticalGraph unavailable > 1 h / security breachEA Ops Manager + CISO1 h immediate bridge call
HighMajor integration failureIntegration Lead + Data Steward4 h
MediumDashboard delay / minor DQ errorService Manager1 business day
LowCosmetic UI issueProduct OwnerNext release cycle

11 Knowledge Refresh & Self-Healing #

EA 2.0 continuously learns from its own tickets:

  • Closed incidents → fed back into Reasoning API for pattern recognition.
  • Repeated errors → policy review task auto-created.
  • MTTR trends → feed Predictive Cost & Risk models.

12 Preventive Maintenance Checklist #

FrequencyTaskResponsible Role
DailyCheck ingest logs & DQ scoresData Steward
WeeklyReview policy breaches & closure ratePolicy Owner
MonthlyValidate backup restore + Power BI syncService Manager
QuarterlyRetrain predictive models & update ontologyEA Architect
YearlyFull audit simulation & disaster recovery testCompliance Officer

13 Takeaway #

Every architecture issue is just data that hasn’t been learned from yet.
EA 2.0 turns troubleshooting into training — each resolution strengthens the system’s collective intelligence.

Powered by BetterDocs

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to Top