DLH SQL Server Agent
Secure, self-hosted CDC from on-premise SQL Server and Oracle to your cloud lakehouse.
Most enterprise data still lives in operational SQL Server and Oracle databases running on-premise or in private cloud — behind firewalls, governed by strict change-management, and often too large or too regulated to hand over to a SaaS-based change data capture vendor.
The DLH SQL Server Agent is the bridge. It runs inside your network boundary, uses native SQL Server Change Data Capture and Change Tracking (and Oracle incremental extraction), and streams the output straight into your own S3 or Azure storage as Delta Lake or Apache Iceberg tables. No inbound ports. No proprietary storage layer. No per-row billing meter ticking every time a commit lands.
Purpose-Built for Hybrid Data Architectures
The SQL Server Agent ships every capability a modern CDC pipeline needs — log-based replication, open table formats, encrypted credentials, schema evolution — in a single self-hosted binary you run on your own hardware.
Native CDC & Change Tracking
Uses SQL Server's built-in Change Data Capture and Change Tracking mechanisms — plus Oracle incremental extraction — to capture inserts, updates, and deletes without triggers, log readers, or performance hits to your OLTP systems.
Delta Lake & Apache Iceberg
Write directly to open table formats with ACID transactions, schema evolution, time-travel, and hidden partitioning. No proprietary storage layer, no vendor lock-in — your data lives in your lakehouse.
Direct-to-Cloud Storage
Land data straight into Amazon S3 or Azure Blob Storage using short-lived credentials or SAS tokens. Optional local-first mode lets you stage and inspect files before upload for audited environments.
Firewall-Friendly by Design
Runs inside your network boundary with outbound-only TLS. No inbound ports, no VPN tunnels, no SaaS proxy seeing your data. Credentials stay in an encrypted on-disk vault next to the agent.
Encrypted Credentials & State
Database passwords are stored in an encrypted credential file with a local master key. Sync state, run history, and API caches live in an embedded DuckDB — backupable by simply copying a single file.
Auto-Detect Sync Mode
Set sync_mode to auto and the agent chooses CT > CDC > full-load per table based on what's enabled in SQL Server. Override per table, per schema, or per database when you need fine-grained control.
Multi-Source From One Agent
A single agent handles multiple SQL Server databases and Oracle databases concurrently. Mix historical back-fills, incremental CDC, and full reloads across dozens of tables from one YAML config.
Schema Evolution Built-In
New columns in your source tables are added automatically to the downstream Delta or Iceberg table without breaking existing queries — no manual DDL, no pipeline redeploys.
Deploy as a Windows Service
Ships as a single signed executable. Install as a Windows Scheduled Task with the bundled PowerShell installer, run from cron on Linux, or execute from the command line for ad-hoc loads.
Reference Architecture
A single lightweight process with three responsibilities: read changes from the source database, write open-format tables to cloud object storage, and track sync state locally.
1. Capture
The agent connects to SQL Server via ODBC Driver 18 (or Oracle via pyodbc) and enumerates your configured databases. For each table it auto-detects the best available mode: Change Tracking, CDC, or full-load — or you can pin a mode per table. No triggers are installed, no redo log readers required.
2. Transform & Write
Changed rows are streamed through PyArrow with memory-aware throttling and written as CSV, Parquet, Delta, or Iceberg. Schema evolution is automatic. You can partition by any column, pick table-name casing and hierarchy, and write directly to the cloud or stage locally first.
3. Deliver
Output is uploaded to S3 or Azure Blob Storage using credentials held only on the agent machine. Sync state lives in an embedded DuckDB file — easy to back up and restore. Query the resulting lakehouse tables from Snowflake, Databricks, Trino, Athena, DuckDB, or any engine that reads Delta or Iceberg.
Built for Real Enterprise Workloads
On-Prem SQL Server → Cloud Lakehouse
Lift ERP, OMS, and line-of-business SQL Server databases off aging hardware and into Snowflake, Databricks, BigQuery, or Redshift — without punching holes in your firewall or re-platforming OLTP systems. The agent sits beside the database, reads CT/CDC, and lands Iceberg or Delta tables into your cloud object store.
Oracle Retirement & Modernization
Incrementally migrate Oracle workloads to a modern lakehouse architecture. Keep Oracle as the system of record while analytics, ML, and reporting move to open table formats — on your timeline, one schema at a time.
Hybrid & Air-Gapped Environments
Regulated industries that can't expose databases to a SaaS CDC vendor can still feed a cloud lakehouse. Because the agent only opens outbound TLS and never stores data in a third-party control plane, it satisfies network and data-residency reviews that disqualify managed CDC services.
Replacing Heavyweight CDC Platforms
Teams paying enterprise license fees for heavyweight CDC suites — for capabilities they mostly don't use — get the same log-based replication, schema evolution, and open-format output from a single lightweight agent, with no runtime in the vendor's cloud.
Data Stays in Your Cloud
Source rows flow from your database, through the agent on your machine, directly to your storage bucket. They never pass through a DLH-hosted control plane.
Signed Binaries
The Windows agent is distributed as a signed executable built through a GitHub Actions pipeline with Azure Trusted Signing — auditable and verifiable at install time.
Encrypted at Rest
Database passwords, SAS tokens, and cloud keys are encrypted on-disk with a local master key. State and sync history live in a single DuckDB file you fully control.
Get the SQL Server Agent
The DLH SQL Server Agent ships with every DLH.io deployment. Talk to an engineer about running it against your own SQL Server or Oracle environment, or explore the rest of the platform that consumes its output.