Designing the SENSE Layer: Real-Time Data Ingestion at 10M Events/Second

How we built a Kafka-free ingestion pipeline using a custom Rust-based event router that handles 10M events/sec with p99 latency under 4ms. Includes our benchmarking methodology and failure mode analysis.

Advanced Level

HarchOS operations center monitoring real-time data ingestion across the SENSE layer

The SENSE layer is the data ingestion backbone of Harch Corp's AI platform. It ingests real-time data from IoT sensors, financial transaction streams, satellite downlinks, and operational telemetry at a sustained rate of 10 million events per second, with p99 latency under 4 milliseconds and exactly-once delivery semantics. It does this without Kafka, without Flink, and without any off-the-shelf streaming framework. We built it from scratch in Rust, and this article explains why and how.

The decision to avoid Kafka was not ideological. It was practical. We evaluated Kafka, Pulsar, and Kinesis against four requirements: sub-5ms end-to-end latency at 10M events/second, exactly-once delivery without deduplication overhead, per-event jurisdiction tagging for sovereign data routing, and deployment within our own infrastructure with no external dependencies. Kafka failed requirement 1 (p99 at 10M events/sec was 23ms on our hardware). Pulsar failed requirement 3 (no native support for per-message routing based on arbitrary metadata). Kinesis failed requirement 4 (AWS-only). We could have worked around any single failure, but the combination of all four made a custom solution the lower-risk choice. Custom infrastructure is only more expensive than off-the-shelf when the off-the-shelf solution actually meets your requirements. When it does not, the cost of workarounds, compromises, and operational surprises exceeds the cost of building what you need.

The architecture has three components. The Router is the ingestion edge: a Rust-based binary that accepts events over TCP (custom binary protocol), QUIC, and HTTP/2, parses them, attaches routing metadata (jurisdiction tag, priority class, target pipeline), and forwards them to the appropriate Buffer. The Router is stateless and horizontally scalable — we run 24 instances behind a layer-4 load balancer, each handling approximately 420,000 events/second at steady state. The Router achieves sub-microsecond per-event processing time because the routing logic is a series of hash table lookups and integer comparisons, with no allocation on the hot path. Memory usage per Router instance is capped at 256MB regardless of throughput, because we pre-allocate all buffers at startup and use a ring buffer for in-flight events. The Buffer is a persistent, replicated log implemented on top of io_uring and a custom storage format. Each Buffer instance runs on NVMe storage with 8 drive stripes, achieving sequential write throughput of 12 GB/s — approximately 3x the peak ingestion rate. Replication uses a Raft variant that batches acknowledgments to reduce consensus round-trips. The p99 write latency, including replication, is 1.8ms. The Processor consumes events from the Buffer and dispatches them to the THINK layer's inference pipeline. It runs at a configurable rate, with backpressure signals from THINK controlling the consumption speed to prevent downstream overload.

The jurisdiction-aware routing is the architectural feature that no off-the-shelf system provides. Every event carries a jurisdiction tag in its header — a bitmask indicating which countries' data residency laws apply to the event's payload. The Router reads this tag and selects a Buffer instance in a jurisdiction-compatible hub. An event tagged for Morocco-only processing is routed to a Buffer in the Tangier or Dakhla hub, never to Dakar. An event tagged for pan-African processing can be routed to any hub. This routing happens at the ingestion edge, before the event enters any persistent storage, ensuring that data never rests in a jurisdiction where it should not be. The routing table is maintained by the global scheduler and propagated to all Router instances within 500ms of any change. If a hub's jurisdiction status changes — for example, if a regulatory change restricts certain data types from being processed in a specific country — the routing table updates immediately, and subsequent events are routed to compliant hubs. Events already in non-compliant Buffers are flagged for migration by a background process that moves them to compliant storage without any ingestion interruption.

Our benchmarking methodology was designed to simulate production conditions, not laboratory ones. We defined four workload profiles: uniform (constant 10M events/sec), bursty (5M baseline with 30-second spikes to 25M), mixed-priority (80% low-priority sensor data, 15% medium-priority operational data, 5% critical-priority financial transactions), and failure (random node failures with 5% of the fleet down at any time). Each benchmark ran for 4 hours with real hardware — not simulations — across our three production hubs. The results: uniform workload achieved 10.2M events/sec with p99 latency of 3.1ms. Bursty workload absorbed 25M events/sec spikes with p99 latency of 7.4ms during the spike, recovering to baseline within 3 seconds of spike termination. Mixed-priority workload delivered p99 latency of 1.9ms for critical-priority events, 3.4ms for medium, and 6.1ms for low — the priority queue working as designed. Failure workload maintained 9.5M events/sec with p99 latency of 4.8ms despite continuous node failures, with zero data loss across all failure scenarios.

The failure mode analysis identified three risks that warranted additional engineering. First, Buffer corruption: if a Buffer instance's storage is corrupted (bit rot, firmware bug, physical damage), the recovery process requires replay from the replication partner, which takes 8-12 minutes for a fully loaded Buffer. We mitigated this with checksum validation on every read and a pre-warmed hot standby that can take over within 30 seconds. Second, Router overload: if inbound traffic exceeds the Router fleet's capacity, the load balancer begins queuing connections, and latency degrades rapidly. We mitigated this with autoscaling that adds Router instances within 90 seconds of detecting a sustained load increase, and a load-shedding mechanism that drops low-priority events when the queue depth exceeds a threshold. Third, jurisdiction routing conflicts: if an event's jurisdiction tag matches no available hub (all compliant hubs are down), the Router faces a choice between dropping the event or routing it to a non-compliant hub. We chose to drop it, log the incident, and alert the operations team — because a dropped event can be replayed, but a jurisdiction violation cannot be undone. This is the correct trade-off for our requirements, but it is worth noting that it shifts the reliability burden upstream to the event producer, which must implement retry logic.

The SENSE layer has been in production for 14 months. It has processed over 4.2 trillion events with zero data loss, zero jurisdiction violations, and a cumulative availability of 99.994%. It is the highest-throughput, lowest-latency ingestion system built in Africa, and it proves that sovereign infrastructure does not mean compromising on performance. In fact, it means the opposite: when you control the entire stack, you can optimize for your specific requirements in ways that general-purpose systems cannot match. SENSE is not a generic streaming platform. It is a purpose-built ingestion engine for sovereign AI workloads, and that specificity is its greatest strength.

Continue Reading

InfrastructureAdvancedMarch 8, 202624 min read

BackendAdvancedFebruary 14, 202619 min readHarch Intelligence Backend Team

Designing the SENSE Layer: Real-Time Data Ingestion at 10M Events/Second

Advanced Level

Continue Reading

InfrastructureAdvancedMarch 8, 202624 min read

Designing the SENSE Layer: Real-Time Data Ingestion at 10M Events/Second

Continue Reading

Inside HarchOS: How We Built a Distributed AI Operating System from Scratch

Our GPU Scheduling Algorithm: Balancing Throughput and Fairness Across 1,798 GPUs

Zero-Trust Networking in Multi-Tenant AI Infrastructure

Designing the SENSE Layer: Real-Time Data Ingestion at 10M Events/Second

Continue Reading

Inside HarchOS: How We Built a Distributed AI Operating System from Scratch

Our GPU Scheduling Algorithm: Balancing Throughput and Fairness Across 1,798 GPUs

Zero-Trust Networking in Multi-Tenant AI Infrastructure