Week 12 milestone
An enterprise mandate: build and launch a platform that consumes a high-rate live event stream, runs inference on each event with low latency, indexes results into a vector store, and serves real-time similarity queries — all while staying healthy under bursty load and a node failure. This is a distributed-systems problem with AI inside it: exactly-once or well-reasoned delivery semantics, backpressure, autoscaling, GPU-aware scheduling, and observability across the fleet. The deliverable is a directly deployable, hyperscalable product: real public hosting, CI/CD, security, a hyper-usable real-time dashboard and query UI, and full marketing — a landing page, a pitch, and a demo. Deliver a platform that does not fall over when traffic spikes, that a buyer can evaluate, and that is presentable as a real product. Ship it as a real product.
Why it matters: Real-time AI on live data powers fraud detection, recommendations, and observability products across every major platform. A builder who ships a streaming inference platform that holds up under load and failure is directly deployable as a senior Distributed Systems or Real-Time AI Engineer, a ₹1-crore-tier role because it demands both systems depth and AI fluency.
The deliverable
A publicly hosted platform with a stable URL and a hyper-usable real-time dashboard plus query UI, plus a public repo: the streaming ingestion and inference pipeline, the at-scale vector index, the autoscaling and backpressure design, CI/CD on every commit, fleet observability, a load-and-failure test report, a marketing landing page, a 10-slide pitch, a recorded demo, and a README documenting delivery semantics, fault tolerance, security, and capacity planning.
What it ships
- Ingestion of a high-rate live event stream (transactions, clicks, logs) from a streaming log such as Kafka.
- Per-event low-latency inference with a documented end-to-end latency budget (target sub-100ms).
- Stated delivery semantics — exactly-once, or at-least-once with idempotent processing — enforced in the pipeline.
- A feature layer that assembles per-event features within the latency window for the model.
- Inference results indexed into a vector store serving real-time similarity and nearest-neighbor queries.
- Backpressure and autoscaling that keep the platform healthy through a sudden traffic burst.
- Node-failure recovery with graceful degradation, demonstrated via injected failure.
- A real-time dashboard of event throughput, inference latency percentiles, and fleet health.
- A query UI for similarity search and recent-event lookup, usable without instruction.
- Alerting on latency-SLO breaches, lag buildup, and anomalous event rates.
- Multi-tenant isolation and a secured, rate-limited query API.
Stack you orchestrate
Apache Kafka or a streaming logApache Flink or a stream processorvLLM or a serving engineFAISS or a vector databaseKubernetesPrometheus and GrafanaDocker
Market signal — who wants thisReal-time streaming AI is a funded 2026 category anchored in fraud detection and live personalization: Artie raised a $12M Series A to make real-time data the default for AI systems, Experian launched real-time AI fraud detection with Resistant AI's 80+ models, and global fintech venture funding hit $12B across 751 deals by April 2026. Production fraud models need sub-millisecond feature retrieval and 20-100+ features within a 100ms window, served by vector databases like Pinecone, Milvus, and Redis. Investors fund streaming-AI platforms because regulated finance and large consumer platforms must score live events instantly or lose money.
How it is graded
- A live event stream is consumed and inference runs per event with measured low latency.
- Delivery semantics (exactly-once or at-least-once with idempotency) are stated and justified.
- Inference results are indexed into a vector store that serves real-time similarity queries.
- Backpressure and autoscaling keep the platform healthy under a simulated traffic burst, and an injected node failure is recovered with graceful degradation.
- The platform is deployed to real public hosting with CI/CD on every commit, fleet observability, and security hardening.
- A fast, WCAG 2.2 AA accessible real-time dashboard and query UI lets a stranger use the platform without instruction.
- The project ships complete marketing — a landing page, a 10-slide pitch, and a recorded demo.
- The platform is publicly reachable and fully reproducible from the repo.
Bridges to Distributed Systems — stream processing, fault tolerance, and capacity planning