Platform Security Solutions Architecture Resources
Contact Sales Request Demo
Enterprise Data Intelligence · On-Premises · Air-Gap Capable

The Sovereign Platform for
Enterprise Data Intelligence

LakeX Sovereign DataVault archives, governs, and AI-queries your structured and unstructured enterprise data — with military-grade security, full on-premises deployment, and zero data leaving your perimeter.

https://lakevault.yourdomain.com · OneView DSL
Dashboard
🗄 Archive
📂 Unstructured
🤖 AI Query
📡 Monitor
🛡 Governance
Manage
System Overview
14.7 TB
Data Archived
99.98%
Agent Uptime
3 Stratums
Active Nodes
Archive JobDatabaseTablesStatusNext Run
FINCORE_PRODOracle 19c142 Completed 02:00 daily
RISK_MGMT_DBPostgreSQL87 Running
DOCS_ARCHIVESharePoint Scheduled 06:00 daily
0
Petabyte-scale archive capacity
0
Data source connectors
0
Job dispatch latency (WebSocket)
100%
On-premises, data never leaves
Air-Gap Certified
FIPS 140-2/3 HSM Support
ML-KEM-768 Post-Quantum Crypto
GDPR / DSAR Ready
SOC 2 Aligned Audit Trail
Apache Iceberg Native

One platform for all your enterprise data — structured and unstructured

Sovereign DataVault is the sovereign data intelligence layer for BFSI and regulated enterprises. It replaces siloed archival tools, legacy cold-storage systems, and ad-hoc compliance scripts with a unified platform that archives intelligently, queries with AI, and enforces governance end-to-end.

🗄️

Intelligent Archive

Archive from Oracle, PostgreSQL, MySQL, MSSQL, and Db2 to Apache Iceberg Parquet — with automated sort-column selection for sub-millisecond query pruning, zero cluster tuning required.


Sort Advisor auto-selects optimal sort column
Prune files before Trino sees them
Z-order multi-column for composite keys
📂

Unstructured Intelligence

Ingest files, S3, Azure Blob, SharePoint, HDFS, email, logs, JSON/XML, Confluence, and more. NER extraction, vector embedding, and RAG-based AI querying over all your unstructured content.


20+ source connectors built in
Qdrant vector search with local embeddings
MIP label and permissions inheritance
🔐

Sovereign Security

PKCS#11 HSM, post-quantum hybrid encryption (ML-KEM-768 + X25519), format-preserving encryption, AWS/Azure/GCP KMS, and HashiCorp Vault Transit — all governable from a single policy engine.


FIPS 203/204 PQC (ML-KEM-768, ML-DSA-65)
Thales Luna, Entrust nShield, AWS CloudHSM
FPE-FF31 format-preserving encryption

Works with your entire data ecosystem

🐉 Oracle 19c / 21c
🐘 PostgreSQL
🐬 MySQL / MariaDB
🪟 SQL Server (MSSQL)
🔷 IBM Db2
📦 AWS S3 / Object Lock
☁️ Azure Blob Storage
📁 HDFS
🔗 NFS Mounts
🔒 SFTP Servers
📊 SharePoint Online
📧 Email Archives
📝 Confluence
🪵 Log Files / Syslog
🌊 JSON / XML Feeds
📉 CSV / TSV
☁️ Google Drive
🔐 Thales Luna HSM
🔐 Entrust nShield
🔐 AWS CloudHSM
🗝️ HashiCorp Vault
🗝️ Azure Key Vault
🗝️ GCP KMS
🤖 Ollama (local/air-gap)
🤖 Claude (Anthropic)
🤖 OpenAI GPT
🤖 AWS Bedrock
🤖 IBM WatsonX
📡 Splunk (CEF/JSON)
📡 IBM QRadar
📡 ArcSight
🐉 Oracle 19c / 21c
🐘 PostgreSQL
🐬 MySQL / MariaDB
🪟 SQL Server (MSSQL)
🔷 IBM Db2
📦 AWS S3 / Object Lock
☁️ Azure Blob Storage
📁 HDFS
🔗 NFS Mounts
🔒 SFTP Servers
📊 SharePoint Online
📧 Email Archives
📝 Confluence
🪵 Log Files / Syslog
🌊 JSON / XML Feeds

Everything your data needs. Nothing you don't.

🤖
AI Query Engine

Query archived data in plain English

The AI Query page combines a Monaco SQL editor with multi-session AI chat. Ask natural language questions — Sovereign DataVault generates schema-aware SQL, executes via Trino, and visualises results with charts. RAG queries search the same way over unstructured documents.

// Natural language → SQL → Results
User: "Show me all customers in arrears over 90 days"
↓ AI generates →
SELECT customer_id, amount, due_date FROM
  datagen.fincore.loan_accounts
WHERE days_past_due > 90
  AND status = 'ARREARS'
✓ 2,847 rows · 142ms · 98 files pruned (4 scanned)
I/O Minimization

Prune before you compute

Sovereign DataVault's Sort Advisor inspects the source schema at write time — selecting the optimal sort column (or Z-order for composite keys). Every Parquet file carries tight, non-overlapping value bounds. At query time, file elimination happens in the metadata layer — milliseconds before any compute engine opens a single file.

Layer 1
LvsParquetTracker
File-level skip
Layer 2
Iceberg Manifests
Manifest pruning
Layer 3
Parquet row-groups
Row-level filter
🛡️
Data Governance

Governance that enforces itself

Security classifications, masking rules, legal holds, and DSAR workflows are evaluated inline — at query time. Role-based access controls restrict which tables, schemas, and documents each user can see. Masking is transparent: FULL, PARTIAL, REGEX, FPE, or DENY — applied per column, per role.

GDPR DSAR — 3-phase search, erasure, redaction
Legal holds prevent deletion of in-scope records
Business glossary with classification taxonomy
Custom roles with granular permission trees
↩️
Restore & TDM

Restore in hours, not days

Point-in-time restore rewrites Parquet back to your source database — Oracle, PostgreSQL, MySQL, MSSQL. Test Data Management provisions masked copies for dev/staging, with FK-aware dependency resolution and CI/CD API keys for automated refresh pipelines.

Schema-aware restore with ADD COLUMN preview
TDM FK chain traversal for referential integrity
Seed SQL files for initial data provisioning
CI/CD API keys for automated TDM pipelines

Built for the most demanding environments

Banking & Financial Services

Archive core banking data at scale — query it in milliseconds

Core banking systems accumulate terabytes of transaction, loan, and customer data annually. Sovereign DataVault archives Oracle and DB2 tables to Apache Iceberg with automated sort optimization — making 10-year transaction histories queryable in sub-second time without spinning up compute clusters.

RBI, MAS, FCA audit trail — tamper-evident hash chain
Field-level FPE encryption for PAN, Aadhaar, NIC
Oracle Exadata and IBM Db2 source support
On-premises with no internet egress required
Typical BFSI Archive Flow
1
Source Discovery

Scan Oracle schema — identify 2,400 tables across FINCORE, RISK_DB, CUSTMGMT

2
Sort Advisor

Auto-selects TRXN_DATE as sort column; Z-order on (BRANCH_ID, TRXN_DATE) for composites

3
Archive → Iceberg Parquet

Write to Stratum block storage; register in datagen-catalog; bounds stored in tracker

4
AI Query

"Show NPAs over ₹50L in Q3 2023" → SQL generated → 98% files pruned → results in 180ms

Compliance & Legal

GDPR, DORA, PCI DSS — compliance without compromises

Sovereign DataVault's governance engine enforces data retention policies, responds to DSAR requests in minutes, maintains legal holds, and generates tamper-evident audit reports. The built-in SIEM forwarder pushes CEF or JSON events to Splunk, QRadar, or ArcSight over TLS — with configurable backfill.

DSAR 3-phase: identify → redact → erasure with audit log
PQC hybrid encryption — harvest-now-decrypt-later proof
SBOM viewer and supply chain attestation
Retention policies auto-decommission expired archives
GDPR DSAR Response Flow
1
DSAR Request Intake

LEGAL_COMPLIANCE role submits request with subject email/NIC identifier

2
3-Phase Search

NER scan over structured tables + semantic vector search across unstructured documents

3
Review Matches

All matching rows, chunks, and spans surfaced with document provenance

4
Act: Erasure or Redact

Erase or redact in-place; full audit trail written; SIEM event forwarded

Dev & Test Data Management

Production-quality test data without production data risk

TDM Workflows provision masked copies of production archives to dev and staging environments. Foreign-key chains are resolved automatically — no dangling references. CI/CD API keys let your pipelines trigger refreshes on every merge, ensuring tests always run on current, safe data.

Mask modes: FULL, PARTIAL, REGEX, FPE, RANDOM, DENY
FK dependency graph auto-resolved before provisioning
Schema drift detection — ADD COLUMN preview before restore
Seed SQL files for deterministic initial state
TDM CI/CD Flow
1
Define TDM Workflow

Select archive, target DB, masking rules, and seed SQL

2
CI/CD Trigger

Pipeline calls POST /lvs/tdm/workflows/{id}/trigger with API key

3
FK Chain Resolution

System traverses FK graph; provisions in dependency order

4
Dev DB Ready

Masked, referentially-intact data loaded; tests run against safe copy

Enterprise Search & Discovery

Find anything across your unstructured archives with AI

Ingest SharePoint libraries, HDFS, email archives, NFS shares, and Confluence spaces. Sovereign DataVault extracts text, runs NER (names, emails, phone numbers, account IDs), embeds with nomic-embed-text, and indexes in Qdrant. AI chat sessions search across all sources with RAG — answering questions, not just returning documents.

RAG: semantic search + LLM synthesis over retrieved chunks
NER: PII/PAN/NIC/email entity extraction at ingest
Document-level classification and access control
Saved commands for repeatable research workflows
Unstructured RAG Query Flow
1
Ingest & Embed

LVUS agent extracts text, runs NER, chunks and embeds into Qdrant on Stratum

2
User Query

"Summarize all board resolutions mentioning dividend policy since 2022"

3
Vector Search

Query embedded on Stratum → top-K chunks retrieved from Qdrant

4
LLM Synthesis

AI server receives chunks + governance context → generates cited answer

Post-quantum ready. HSM-anchored. Zero trust by default.

Sovereign DataVault is designed for organisations where data security is not a feature — it's a requirement. Every encryption key is anchored to hardware. Every access is logged. And the system is already prepared for the quantum computing threat that regulators are beginning to mandate.

1

PKCS#11 HSM Key Anchoring

Key material never leaves the HSM. Works with Thales Luna, Entrust nShield, AWS CloudHSM, YubiHSM, and SoftHSM2 for dev. Every encrypt/decrypt opens a session inside the HSM and closes immediately.

2

Post-Quantum Hybrid (FIPS 203/204)

ML-KEM-768 + X25519 KEM with X-Wing combiner for encryption. ML-DSA-65 + Ed25519 for signatures. Both primitives must verify — defending against a break of either one in isolation.

3

Format-Preserving Encryption (FPE-FF31)

Encrypt PAN, Aadhaar, national IDs, phone numbers while preserving format. Encrypted values remain valid in downstream systems — no schema changes, no application rewrites.

4

Tamper-Evident Audit Trail

Every administrative and data action is written to an append-only audit table with a cryptographic hash chain. SIEM events forward in real time via CEF/JSON over TLS to Splunk, QRadar, or ArcSight.

Full Security Overview →
✓ FIPS 140-2/3

HSM Integration

PKCS#11 v2.40+ — Thales Luna, nShield, AWS CloudHSM, YubiHSM, SoftHSM2

✓ NIST PQC

Post-Quantum Crypto

ML-KEM-768 + X25519 / ML-DSA-65 + Ed25519. Harvest-now-decrypt-later resistant.

✓ GDPR / DORA

DSAR & Erasure

3-phase DSAR: identify, redact, erase — with full audit trail and SLA tracking.

✓ ISO 27001 Aligned

Role-Based Access

Admin, Operator, Analyst, Viewer, Audit Viewer + custom roles with granular permission trees.

✓ MFA

TOTP + WebAuthn

TOTP (Google Authenticator), backup codes, and FIDO2/WebAuthn hardware security keys.

✓ Microsoft

Entra ID / Azure AD

OAuth2 + OIDC integration with Microsoft Entra ID for enterprise SSO.

✓ CEF / JSON

SIEM Integration

UDP, TCP, TLS syslog to Splunk, QRadar, ArcSight. Backfill and per-target cursor tracking.

✓ Supply Chain

SBOM & Attestation

Software Bill of Materials viewer, posture chips, and component attestation badges.

Designed for sovereign deployment

A clear separation of control plane (Meridian), data plane (Stratum), and query/AI plane — all deployable on-premises, with optional SaaS overlay.

Control Plane — OneView DSL (Meridian)
FastAPI Backend
React Frontend
MetaDB (PostgreSQL)
Redis Pub/Sub
WebSocket Gateway
Scheduler
Vault (HashiCorp)
SoftHSM / Hardware HSM
↕ WebSocket (10ms dispatch) / REST API
Stratum — Structured (LVS)
LVS Agent
Trino Query Engine
Iceberg + datagen-catalog
PostgreSQL 16
Block / NFS Storage
Stratum — Unstructured (LVUS)
LVUS Agent
Qdrant Vector Store
Ollama (embed model)
Iceberg + datagen-catalog
Block / NFS / S3 Storage
↕ RAG Federation (query-time only)
AI Server (central, query-time only — holds NO customer data)
Ollama (LLM generation)
Claude / OpenAI / Bedrock / WatsonX
RAG Orchestration
Governance Federation (from Meridian)
↕ Source Connections
Data Sources
Oracle
PostgreSQL
MySQL
MSSQL
IBM Db2
AWS S3
Azure Blob
SharePoint
HDFS
NFS
Email
Logs
Confluence
0
of I/O pruning before Trino touches a file
0
built-in source connectors (structured + unstructured)
9
encryption providers including PQC hybrid
100%
on-premises — customer data never leaves your perimeter

Ready to take control of your enterprise data?

See how LakeX Sovereign DataVault can replace your legacy archival stack, enforce compliance automatically, and make your data queryable in minutes — not hours.

Request a Demo Explore Platform
On-premises deployment
Air-gap capable
BFSI-grade security
No data leaves your perimeter