Completeness • Freshness • Validity #
1 Purpose #
The power of EA 2.0 depends on the reliability of its inputs.
Bad data breaks reasoning faster than bad algorithms.
Data Quality (DQ) Gates ensure that only trusted, current, and complete information enters the knowledge graph — making predictions, audits, and automations credible.
2 Core DQ Principles #
| Principle | Meaning |
|---|---|
| Evidence over assumption | Every node and relationship must trace back to a verifiable source. |
| Continuous validation | DQ rules run automatically on ingestion and nightly refresh. |
| Transparency by design | Each node carries DQ scores visible to users. |
| Governance integration | Violations raise GRC tasks automatically. |
3 DQ Dimensions Tracked #
| Dimension | Definition | Example Metric | Target |
|---|---|---|---|
| Completeness | Percentage of mandatory fields populated. | filled_fields / total_mandatory * 100 | ≥ 95 % |
| Freshness | Time since last update vs data TTL (threshold). | (now - last_seen_at) | ≤ 7 days |
| Validity | Conformance to pattern, type, or range. | email format, date ISO check | 100 % |
| Uniqueness | No duplicates of IDs or names. | duplicate count per domain | 0 |
| Accuracy | Cross-checked against trusted source. | MSI vs Cloud Inventory match rate | ≥ 90 % |
| Lineage Completeness | Linked nodes per ontology rule. | capability → app link % | ≥ 80 % |
4 DQ Scoring Model #
Each node receives a composite DQ Score (0–1).
DQ Score = 0.3 × Completeness + 0.2 × Freshness + 0.2 × Validity + 0.2 × Uniqueness + 0.1 × Accuracy
Displayed as:
🟢 ≥ 0.9 = Trusted 🟡 0.7–0.89 = Review 🔴 < 0.7 = Critical
Scores propagate upward: an application inherits the mean DQ Score of its linked data entities and controls.
5 DQ Gates in Pipeline #
Stage 1 – Extract:
- Validate file headers, API response codes.
- Reject feeds with missing IDs.
Stage 2 – Transform:
- Apply schema validation (YAML contract).
- Normalize taxonomy (spelling, case, codes).
Stage 3 – Load (Graph):
- Compute DQ Scores.
- Tag nodes with
dq_status. - Log violations → DQ Incident Table.
Stage 4 – Govern:
- If
dq_status = critical, ServiceNow GRC ticket is created.
6 DQ Dashboard KPIs #
| Metric | Description | Threshold |
|---|---|---|
| Overall DQ Score | Mean of all active nodes | ≥ 0.9 |
| Nodes Failing DQ Gates | Count of nodes below 0.7 | ≤ 5 % |
| Average Data Age | Days since last_seen_at | ≤ 7 |
| Duplicate Rate | % of duplicate IDs | ≤ 1 % |
| Policy-linked DQ Incidents | Open vs Closed tickets | 95 % closure within 14 days |
7 Governance Rules #
- Every feed owner has a DQ Steward.
- DQ violations auto-notified in Teams channel.
- Weekly DQ Stand-Up reviews top 10 critical issues.
- Quarterly Maturity Score update based on DQ improvement.
8 Visualization Views #
- DQ Radar Chart: visualizes six dimensions per domain.
- Heatmap: color-codes low-score applications.
- Trend Line: DQ Score progress over time.
- DQ Incident Log: ServiceNow integration view.
Each view feeds Power BI and the NLQ interface for queries like:
“Show all applications with DQ Score < 0.8 and last seen > 14 days.”
9 Automation Example #
Policy Trigger:
If
dq_status = criticalfor any data entity linked to a risk, create a GRC ticket and notify data steward.
Remediation Logic App:
- Assign to data owner.
- Request corrected file via secure form.
- Reload feed and recompute DQ Score.
10 Benefits #
✅ Objective measurement of trust.
✅ Fewer false alarms in predictive governance.
✅ Faster root-cause analysis for data issues.
✅ Direct link between DQ and EA maturity metrics.
11 Common Pitfalls & Mitigations #
| Issue | Effect | Solution |
|---|---|---|
| Over-strict rules reject too many feeds | Loss of data coverage | Tiered gates with grace periods |
| No DQ ownership | Issues linger | Assign feed stewards per domain |
| Late DQ reporting | Decisions on stale data | Nightly DQ jobs + alerts |
| Blind spot in manual uploads | Shadow data | Mandatory OneDrive drop folder governed by policy |
12 Takeaway #
Data without quality is noise; architecture without trust is fiction.
DQ Metrics and Gates make EA 2.0 a truthful foundation where AI can reason confidently, executives can act decisively, and auditors can verify instantly.