Document Classification: Internal — CHLOM Confidential
Phase: 0 → 1 Version: 0.1
Owner: CrownThrive, LLC
Last Updated: 2025-08-08
1 — Overview
1.1 Purpose
Define the architecture for the Compliance Engine (CE) and ZKP Verifier (ZKV) that power CHLOM’s CaaS and TLaaS flows: compute compliance risk scores, resolve KYC, verify Obsidian ZK proofs, and expose APIs to internal and external tenants with strict confidentiality and traceability.
1.2 Scope
- In Scope: CE microservice, ZKV microservice, feature store, model registry, prover/verifier workflow, API gateway, event backbone, data stores, observability stack, CI/CD hardening, key management.
- Out of Scope: Full tokenomics, DAO governance UI, non-compliance CHLOM pallets, third-party integrators’ internal systems.
1.3 Stakeholders
Role | Name/Team | Responsibility | Contact |
Product Owner | CHLOM CaaS Lead | Roadmap, requirements | [email protected] |
Lead Architect | CHLOM Core | System design, tradeoffs | [email protected] |
Security Lead | CHLOM SecEng | Threat model, keys, audits | [email protected] |
DevOps Lead | Platform/SRE | Infra, CI/CD, runtime | [email protected] |
Compliance Officer | Risk & Policy | Policy mapping & sign-off | [email protected] |
Data Lead | Data/ML | Feature store, lineage | [email protected] |
1.4 Assumptions
- Cloud: multi-region (Active/Active) on commodity cloud.
- Identity: OIDC (client creds) + mTLS at edge.
- Data: PII present; minimization + encryption enforced.
- Models: Phase 0 uses classical + simple ML; Phase 1 introduces advanced models behind the same interfaces.
- ZK: “Obsidian” circuits and verifiers are modular, with on-chain and off-chain verification options.
1.5 Non-Goals
- No customer-facing UI here.
- No governance ballots or token flows.
- No public datasets or code. All proprietary.
2 — High-Level Architecture Diagram (C4 Context + Container)
2.1 C4 Context (textual)
- Actors: Tenants (CaaS clients), TLaaS contracts, Regulators (read-only audit), Internal Ops.
- Systems: API Gateway, Compliance Engine, ZKP Verifier, Feature Store, Model Registry, Event Bus, Object Storage, Key/Secrets (KMS + Vault), Observability, CHLOM Chain Nodes (optional on-chain verify).
- External: Sanctions/PEP data providers, KYC document verification providers (Phase 1), Webhook endpoints at tenants.
2.2 C4 Container (components & tech)
- API Gateway/Edge: Envoy/Kong, mTLS, OAuth2 (client creds), OPA/Cedar policy checks.
- Compliance Engine (CE): Rust (actix/axum) + Python workers (for ML scoring where needed), async via Kafka.
- ZKP Verifier (ZKV): Rust; supports Obsidian circuits, batched verification, optional recursion.
- Prover Farm (Phase 1): Optional off-chain proof generation for convenience (privacy-safe).
- Event Bus: Kafka (with ACLs), Schema Registry (Avro/Protobuf).
- Feature Store: PostgreSQL + Delta Lake (read-optimized parquet) with row-level ACLs.
- Model Registry: S3-compatible object storage + signed manifests; references only (no weights in public repos).
- Secrets/Keys: Cloud KMS + HSM for signing keys; Vault for app secrets; short-lived creds.
- Observability: Prometheus + Grafana (metrics), Loki (logs), Tempo/Jaeger (traces), OpenTelemetry.
- Storage: Encrypted Postgres, S3 (WORM buckets for audit), Redis (ephemeral cache).
- Chain: CHLOM node(s) with pallets: identity, licensing, zkverify, audit.
- CI/CD: Build → test → scan → sign (sigstore) → stage → canary → prod; SLSA L3 target.
Artifacts (versioned in secure repo): /design/c4/context-ce-zkv-v1.drawio /design/c4/container-ce-zkv-v1.drawio
3 — Component Descriptions
3.1 API Gateway
- Purpose: Terminate mTLS, authenticate clients (OAuth2), enforce scopes, rate limit, route to CE/ZKV.
- Tech: Envoy/Kong + mTLS + OAuth2; OPA sidecar for policy.
- Notes: All requests must include
3.2 Compliance Engine (CE)
- Purpose: Compute compliance score (AegisScore), run rules + ML, orchestrate enrichment, publish decisions.
- Tech: Rust service; Python workers for model inference (gRPC); Kafka consumers/producers.
- Inputs: Entity payload, context, prior decisions.
- Outputs: Score object, explanations, decision, evidence pointers.
- Dependencies: Feature Store (reads), Model Registry (model refs), ZKV (optional proof checks).
3.3 ZKP Verifier (ZKV)
- Purpose: Verify Obsidian proofs for circuits like
- Tech: Rust library + service; batch verification; curve
- Inputs: Proof blob, public inputs.
- Outputs:
- Optional: On-chain verification call via
3.4 Feature Store
- Purpose: Authoritative features for scoring; immutable historical views.
- Tech: Postgres OLTP for writes; Delta Lake for analytical reads; versioned snapshots.
- Security: Column-level encryption, row-level RBAC; PII minimization.
3.5 Model Registry
- Purpose: Track model versions, signatures, evaluation metadata.
- Security: Every artifact signed; SBOM stored; promotion requires 2-person rule.
3.6 Event Backbone (Kafka)
- Topics:
- ce.requests.v1
- ce.scores.v1
- zkv.requests.v1
- audit.events.v1
3.7 Observability
- Dashboards: Edge, CE latency, ZKV throughput, Kafka lag, DB health, chain RPC.
- Alerts: Golden signals w/ paging rules (see SRE section).
3.8 Key & Secret Management
- Keys: API signing, JWT, circuit verification params.
- Rotation: 90 days for app keys, 24 hours for tokens; automated via KMS + Vault.
4 — Data Flow Diagrams (Textual Walkthroughs)
4.1 Compliance Scoring (sync → async)
- Client → Gateway: mTLS session established; OAuth2 client-credentials access token validated; scopes checked.
- Gateway → CE:
- CE Feature Hydration: CE reads from Feature Store; enriches with cached sanctions/PEP signals (no raw PII exfiltration).
- Rules + Model: Deterministic rules execute first; ML model inference runs via gRPC to Python worker; outputs include score + explanations.
- Optional ZK Evidence: If request includes proof reference, CE synchronously calls ZKV to verify Obsidian proof; CE caches verification result (short TTL) and pins evidence pointer.
- Decision + Publish: CE persists decision + evidence pointer; emits
- Webhook (async mode): Result delivered to tenant callback with signed payload and
4.2 ZK Proof Verification (direct)
- Client → Gateway → ZKV:
- ZKV: Loads circuit parameters; performs verification; records timing + verifier version.
- (Optional) Chain Notarization: ZKV posts verification hash to CHLOM chain for audit; returns
4.3 TLaaS License Check (with ZK)
- CE requests
- ZKV validates proof; CE gates action using AegisScore + policy engine. Decision logged to
Diagramming Standard: Use BPMN or UML Activity with explicit trust boundary swimlanes; tag each edge with encryption state (in-transit, at-rest, in-use).
5 — Trust Boundaries & Security Zones
5.1 Zones
- Z0 — Public Edge: Internet → API Gateway. mTLS required; DDoS/WAF in front.
- Z1 — Service Mesh: Internal CE/ZKV services (mTLS between pods/nodes), OPA policy at sidecars.
- Z2 — Restricted Data Stores: Postgres/Delta Lake with PII; access via service accounts only; private subnets.
- Z3 — Cryptographic Material: KMS/HSM and Vault; no direct developer access; break-glass requires founder approval + 2FA + logging.
- Z4 — Audit WORM: Write-once S3 buckets; lifecycle to Glacier; immutability enforced.
5.2 Controls
- Identity & Access: Least-privilege IAM; short-lived tokens; JIT access with approval workflow.
- Network: Segmented VPCs; private endpoints for data stores; egress restricted via NAT + allow-list.
- Data Protection: AES-256 at rest; TLS 1.3 in transit; format-preserving encryption for select identifiers.
- Secrets: Env-injected via Vault; no secrets in images; CI/CD attestation.
- Attacks Considered: Replay, oracle manipulation, model extraction, poisoning, proof malleability, L2 reorgs (if on-chain verify).
6 — SLAs, SLOs & Performance Targets
Metric | Target | Measurement | Escalation Trigger |
API Latency (P95) | ≤ 250 ms (read), ≤ 400 ms (verify) | Prometheus histograms | > 2× target for 5 min |
Uptime | 99.95% monthly | Synthetic + blackbox exporter | Below threshold |
Compliance Score Calc | ≤ 1.2 s P95 | Load harness, canary comparisons | > 1.8 s P95 |
ZK Verify Throughput | ≥ 200 verifications/s/node | ZKV metrics | < 100/s sustained |
Webhook Delivery | 99.9% within 60 s | Delivery logs, retries | > 5% retry after 5 min |
DR Objectives | RPO ≤ 5 min; RTO ≤ 30 min | DR drills semi-annual | Miss in exercise |
Capacity Assumptions: 3k RPS peak at edge; 10 TB/mo audit WORM growth; 30-day hot data, 365-day warm, 7-year cold (sector-dependent).
7 — RACI Matrix
Task | Responsible | Accountable | Consulted | Informed |
Architecture Design | Lead Architect | CTO | Security Lead, Product Owner | All Eng |
Security Review & Threat Model | Security Lead | CTO | Lead Architect, Compliance Officer | SRE |
Data Model & Feature Store | Data Lead | Lead Architect | Security Lead | Product |
ZK Circuits & Verifier | ZKV Lead (Rust) | Lead Architect | Security Lead | SRE |
CI/CD & Supply Chain | DevOps Lead | CTO | Security Lead | Eng |
Observability & SLOs | SRE Lead | CTO | DevOps Lead | All |
Incident Response | SRE Lead | CTO | Security Lead | Stakeholders |
Audit/WORM Controls | Compliance Officer | CTO | Security Lead | Product |
Appendices
A. Policy Engine Contract (preview)
- Syntax:
- Promotion requires signed policy bundle; staged at 1% canary.
B. Message Contracts (Avro/Proto stubs)
- ce.score.v1
C. Runbooks (links)
- /runbooks/ce-latency-spike.md
- /runbooks/zkv-verifier-degradation.md
- /runbooks/webhook-retry-storm.md
D. Compliance Notes
- Data residency tags on all PII tables.
- ZK public inputs never include raw PII; only commitments/hashes.
- Sanctions/PEP datasets licensed and version-pinned.
E. Testing Gates
- Unit, property-based, fuzz (CE request parser, ZKV proof parser).
- Golden-case regression suite for AegisScore.
- Chaos days: broker outage, verifier slowdown, vault rotation.