2026

Hybrid Ingestion Architecture: From Batch to Real-Time

SCALE: HEAVY INDUSTRIAL LOGISTICS

PERFORMANCE IMPACT

BEFORE Legacy Single-Node Handlers — 10+ Events/min
AFTER Optimized Distributed Spark Handlers — 10+ Events/min
REDUCTION -64% Memory Overhead Reduction
LATENCY: MICRO-BATCH (NEAR REAL-TIME) — P95: 3 MINUTES

ARCHITECTURE: DUAL-PATH FLOW

graph TD subgraph src ["Source Systems"] S1[Weigh Stations] S2[Compliance Systems] S3[Transport Logistics] end subgraph hot ["Hot Path - Real-Time"] P[AKS Producers] -->|CDC| SB[Service Bus] SB -->|Events| C[AKS Consumers] C -->|Upsert| SL[Silver Layer] end subgraph cold ["Cold Path - Batch"] ADF[Data Factory] -->|Incremental| B[Bronze Layer] B -->|Spark Notebooks| SL end S1 --> P S2 --> ADF S3 --> ADF SL --> GL[Gold Layer] GL --> PBI[Power BI]

BUSINESS PROCESS

The bulk commodity supply chain generates high-volume operational data from weigh stations, compliance systems, and transport logistics. Legacy batch processing incurred significant memory overhead per run with 4+ hour latencies.

We implemented a Dual-Path Architecture:

HOT PATH

Event-driven producers on AKS capture source changes in real-time, publish to Service Bus, consume with sub-second latency, and upsert to Delta Lake Silver layer using distributed Spark handlers.

COLD PATH

Azure Data Factory orchestrates scheduled batch ingestion. Synapse Spark notebooks handle complex aggregations and historical snapshots for analytics.

GOLD LAYER

Aggregated views power live Power BI dashboards for operational monitoring and executive reporting.

TECH STACK

Azure Data Factory Azure Kubernetes Service Azure Service Bus Azure Synapse Delta Lake Apache Kafka / Service Bus Apache Spark Power BI Azure DevOps

TECHNICAL ARCHITECTURE

The platform implements a Dual-Path Architecture to balance high-velocity data needs with complex batch processing. The Cold Path utilizes Azure Data Factory to orchestrate scheduled ingestion into a Medallion Lakehouse (Bronze-Silver-Gold).

The Hot Path is powered by a custom-built Producer-Consumer pattern running on Azure Kubernetes Service (AKS).

  • Producers: Lightweight workers capture Change Data Capture (CDC) events from source databases and publish them as JSON messages to Azure Service Bus topics.
  • Consumers: Horizontally scalable listeners that process incoming messages, apply business enrichment logic, and perform ACID-compliant upserts via Delta Lake. A strategic architectural decision was made to transition from single-node delta-rs native handlers to single-node Spark handlers to resolve memory bottlenecks encountered during complex merge operations at scale.

KEY TECHNICAL OPERATIONS

SCHEMA GOVERNANCE

Implemented strict schema safety and validation within consumers to prevent downstream data corruption during source system updates.

HYBRID HANDLERS

Uses a mix of Spark Handlers for complex multi-table joins and Native Handlers for high-throughput, low-latency ingestion.

CI/CD AUTOMATION

Fully automated deployment pipelines for ADF, Synapse Workspaces, and AKS microservices using Azure DevOps.