Zero-Knowledge Enterprise AI: Building Valenova's Custom MCP-Based Database Agent
How we built a desktop AI agent that queries Oracle, SAP, PostgreSQL, and Snowflake in natural language — with no enterprise data ever leaving the client machine
Valenova's brief sounded impossible at first. Build an AI agent that lets enterprise users query their existing databases (Oracle, SAP, PostgreSQL, Snowflake) in natural language. Sub-1-second latency on multi-hop questions. Pass enterprise security review. And — the constraint that broke every off-the-shelf solution — no raw data ever leaves the client machine.
This is the engineering teardown of Valenova — the architecture, the custom MCP server design, and the trade-offs of building zero-knowledge enterprise AI.
The Problem
Enterprise customers — large US/European companies with 25-year-old data warehouses — had two unmet needs:
- Natural-language access to legacy data. Decades of SQL knowledge gates the data behind a few engineers. Business users file tickets and wait days.
- Without compromising security. Cloud-based AI tools (most LLM-powered analytics products) require data ingestion or migration. Enterprise security teams reject this on principle, especially for data subject to GDPR, HIPAA, SOX, or internal compliance frameworks.
Existing solutions failed one of these constraints. Cloud-hosted LLM tools shipped data to OpenAI/Anthropic. Self-hosted LLM products required schema migration to a vendor's data layer. Nobody had built the architecture where the agent itself runs on the client machine, the LLM call is the only outbound traffic, and even then only sanitised query plans go over the wire.
The Architecture
Valenova ships as a cross-platform Electron desktop app. Inside it:
- Local MCP (Model Context Protocol) server — A custom-built MCP server that runs on the client machine, exposing database connections as tools the LLM can call
- Read-only JDBC/ODBC connectors — Direct connections to Oracle, SAP HANA, PostgreSQL, Snowflake, with read-only credential enforcement at the connector layer
- Local query planner — Converts natural-language questions into a sequence of MCP tool calls (which database, what query, with what parameters)
- Hosted thin layer — Supabase stores only encrypted query plans, user metadata, and audit logs. Never raw data, never query results.
The flow: user types "Show me Q3 revenue by region for products that shipped late." LLM (running through Anthropic Bedrock with strict data-handling guarantees) generates a plan. The plan executes against the local MCP server, which dispatches to the right database via read-only connections. Results are aggregated locally. The natural-language answer is generated by the LLM from a sanitised summary, not the raw data.
Key Technical Decisions
Why a Custom MCP Server, Not LangChain Tools
LangChain tools are great for prototypes but become unwieldy in enterprise contexts. We needed:
- Strict typing of every tool input and output
- Audit-grade logging of every tool invocation
- Centralised permission gating (this user can read Oracle table X but not table Y)
- The ability to swap LLMs without rewriting tool definitions
The Model Context Protocol gave us all four. We built a custom MCP server in Node.js exposing six core tools: list_databases, describe_table, run_select_query, run_aggregation, federated_join, and audit_log. Each tool's schema is strictly typed. Each invocation is logged. The MCP server acts as the trust boundary — the LLM never gets raw connection strings or admin access; only the typed tools.
For the conceptual underpinning of MCP and why it's becoming the agent-tooling standard, see our MCP Protocol guide. For agent design more broadly, How to build an AI agent.
Sub-1-Second Latency on Multi-Hop Queries
The headline performance metric: if a user asks a question that requires hitting Oracle, joining with PostgreSQL, then aggregating from Snowflake — the response should arrive in under one second.
This required several optimisations:
- Parallel tool execution. When the plan involves independent queries, the MCP server dispatches them in parallel rather than sequentially. A 3-database query becomes O(max) latency, not O(sum).
- Schema caching. Database schemas are cached locally on first connection. Re-fetching them per query was a meaningful latency tax.
- Query fingerprinting. Identical recent queries are served from a local LRU cache. Hits are ~5ms; misses are ~700ms.
- LLM streaming for the answer. The first tokens of the natural-language response stream as soon as the data aggregation completes; users see "Q3 revenue was..." before the full answer renders.
The result: on a typical enterprise dataset, ~700ms p50, ~1100ms p95.
Electron, Not Web
The constraint "must run on the client machine" pushed us to Electron. Web wouldn't have worked — browsers don't have raw JDBC/ODBC access. We considered native (Tauri) but Electron's mature ecosystem for enterprise distribution (auto-update, code signing for Windows + macOS, MSI installers) made it the practical choice.
The downside: 200MB install footprint. The mitigation: most enterprise customers were fine with it; IT can pre-deploy via Group Policy.
Zero-Knowledge Means No Telemetry
The strictest interpretation of "no data leaves the machine" included usage telemetry. We can't ping our server with "User X ran query Y on database Z" — that's metadata that can leak structure of the customer's data warehouse.
We compromise carefully: anonymous, aggregated metrics (count of queries per day, count of error events, latency histograms) ship periodically. No query content. No table names. No user identifiers. This passed every enterprise security review.
Why This Matters for AI Product Builders
Valenova's architecture is the template for a class of AI products that's about to expand rapidly: AI agents that operate on sensitive data without exfiltrating it. Healthcare, finance, government, and regulated SaaS all have this requirement. Cloud-only AI architectures don't fit; on-prem-only architectures are too expensive to deploy.
The hybrid pattern — agent runs locally, LLM is hosted but receives only sanitised plans, audit lives in the customer's perimeter — is the right shape for these workloads. We expect to see this pattern proliferate by end of 2026.
For the broader context on choosing between hosted and custom LLM deployments, see ChatGPT vs Claude vs Custom LLM. For agent design fundamentals, Agentic AI Systems.
What We'd Do Differently
- Build the MCP server with stricter tool isolation from the start. We refactored midway to enforce per-database authorisation more rigorously. Should have been the default.
- Use Tauri instead of Electron if we were starting today. Tauri's footprint and security profile are now strong enough for enterprise distribution.
- Add federated query optimisation earlier. The current implementation works but doesn't push joins down to source databases when possible. That's a genuine future improvement.
Where Nexolve Fits
We build production AI agents and agentic platforms across our AI-Powered Automation service. Valenova is one example; we've shipped similar zero-knowledge architectures in other regulated industries. For the full project context, see the Valenova case study.
Working on something similar?
Nexolve scopes, designs, and ships production software for startups and growing businesses. Tell us what you're building — we come back with a scoped plan within 48 hours.
Related reading
Model Context Protocol (MCP)
Standardizing AI Tool Integration and Capabilities
Agentic AI Systems
The Next Frontier in Autonomous Intelligence
How to Build an AI Agent for Your Business in 2026
The architecture, stack choices, and design decisions for production AI agents — from a team that ships them