Document Classification: Internal — CHLOM Confidential
Phase: 0 → 1 Version: 0.1
Owner: CrownThrive, LLC
Last Updated: 2025-08-08
Section 1 — Data Sources & Schemas
1.1 Source Inventory (to be populated)
Source ID | Domain | System/Provider | Data Type | Update Cadence | PHI/PII | Residency | Steward |
SRC-PEP-01 | Risk | Sanctions/PEP Provider A | Entities, watchlists | Daily | PII-lite | US/EU | Risk Ops |
SRC-KYC-01 | Identity | KYC Vendor (Phase 1) | Document hashes, decisions | On demand | PII | Regional | Compliance |
SRC-INT-01 | Internal | CE Derived Features | Feature vectors | Realtime | PII-min | US | Data Eng |
1.2 Canonical Schemas (contract-first)
- Entity:
- FeatureVector:
- AegisScore:
Schema Rules
- All schemas versioned in
- Backward-compatible changes only on minor versions; breaking changes require new topic/table.
Section 2 — Data Lineage Map
2.1 Lineage Requirements
- End-to-end lineage captured via OpenLineage events emitted by CE/ZKV and ingestion jobs.
- Every derived dataset attaches
2.2 Lineage Storage
- Marquez (or equivalent) for lineage graph; retained 7 years for regulated domains.
- Immutable snapshots for Feature Groups written to Delta Lake with commit metadata.
Section 3 — Retention & Archival Policies
Dataset | Hot (days) | Warm (months) | Cold (years) | Deletion Policy | Legal Hold |
Feature Store (OLTP) | 30 | 12 | 7 | TTL by entity_id | Yes |
Audit Events (WORM) | 7 | 12 | 7 | Never mutate | Yes |
AegisScore Decisions | 90 | 24 | 7 | Pseudonymize after 24 mo | Yes |
- RPO/RTO Alignment: Retention policies must not violate DR objectives.
- Right-to-Erasure: Maintain deletion manifests for PII keyed by
Section 4 — Privacy Classifications
4.1 Data Classes
- Public — Non-sensitive docs, public keys.
- Internal — Operational metadata.
- Sensitive — PII-lite, hashed identifiers.
- Restricted — Full PII, cryptographic material, proof parameters.
4.2 Handling Rules (by Class)
Class | At Rest | In Transit | In Use | Access |
Public | Standard | TLS | N/A | All |
Internal | AES-256 | TLS | N/A | RBAC |
Sensitive | AES-256 + FPE | TLS 1.3 | Trusted enclaves only | RBAC + JIT |
Restricted | HSM/KMS-backed | mTLS + TLS 1.3 | Enclave/TEE | Break-glass + dual control |
- Pseudonymization: Use salted hash (Argon2id) for persistent identifiers; never store raw national IDs.
Section 5 — ZKP Boundaries
- Public Inputs: Commitments, Merkle roots, policy IDs, non-PII aggregates.
- Private Inputs: Raw attributes (DOB, document signatures), license secrets.
- Boundary Rule: Raw PII never leaves ZK prover/verifier enclave; CE only consumes verification boolean + proof metadata.
Artifacts
- /zk/obsidian/circuits/*
- /zk/policies/*
Section 6 — Dataset Entitlement & Access Matrix
Dataset | Role: CE | Role: ZKV | Role: SRE | Role: DataEng | Role: Compliance | External Tenant |
Feature Store (OLTP) | R/W (scoped) | R (subset) | R (metrics only) | Admin | R | None |
Delta Lake Features | R | R | R (ops) | Admin | R | None |
Audit WORM | Append | Append | R | R | Admin | Read (regulator only) |
Model Registry | R | R | R | Admin | R | None |
- Enforcement: IAM + row-level policies; all access is logged and correlated with
Section 7 — Data Quality & SLAs
Check | Target | Method | Action on Breach |
Freshness (Feature Group) | < 5 min lag | Timestamp diff | Alert SRE; degrade gracefully |
Completeness | ≥ 99.5% non-null | dbt tests | Block promotion |
Validity | 100% schema-conformant | Schema registry | Quarantine batch |
Drift (Key Features) | PSI < 0.2 vs baseline | Drift job | Trigger retrain RFC |
Section 8 — Compliance Controls (Privacy, Residency, DPIA)
- Residency: Tag records with
- Consent & Purpose: Persist
- DPIA: Required for any new PII source; template
- DLP: Egress scanning on logs and exports; block patterns for sensitive tokens.
Section 9 — Cryptography & Key Rotation
- At Rest: AES-256-GCM; per-table keys rotated every 180 days.
- In Transit: TLS 1.3 only; mTLS inside mesh.
- Identifiers: Tokenize with format-preserving encryption where necessary.
- Rotation: Keys in KMS with automatic rotation; key IDs versioned in dataset metadata.
Section 10 — Operational Playbooks (Links)
- /runbooks/feature-freshness-lag.md
- /runbooks/delta-lake-compaction.md
- /runbooks/pseudonymization-key-rotation.md