Week 6 milestone
An enterprise mandate: deliver a launched retrieval-augmented product over a large, real corpus that a non-technical team can trust for answers. Build the full system: ingestion and chunking, hybrid plus contextual retrieval with reranking, a grounded-and-cited generation layer, and — non-negotiable — an automated evaluation harness that scores retrieval and answer quality and runs in CI so quality regressions are caught before users see them. The deliverable must be production-grade and directly deployable: real public hosting, CI/CD, observability, security hardening against the OWASP LLM Top 10, and a hyper-usable, fast, accessible chat UI. It must be hyperscalable — the retrieval and serving layers hold as the corpus and traffic grow. And it ships complete with marketing: a landing page, a pitch, and a demo. A RAG demo is easy; a launched RAG product you can defend is the job. Ship it as a real product.
Why it matters: RAG is the default architecture for enterprise AI products, and the differentiator between teams is rigorous evaluation, not retrieval cleverness. A builder who ships a RAG platform with a CI-integrated eval harness is directly deployable as an AI Product Engineer or Applied AI Engineer, a ₹1-crore-tier role at companies betting their roadmap on grounded AI.
The deliverable
A publicly hosted RAG product with a stable URL and a fast, accessible chat UI, plus a public repo: the ingestion and retrieval pipeline, the citation-grounded answer layer, an automated eval harness with a golden dataset wired into CI, CI/CD on every commit, observability of cost and latency, a marketing landing page, a 10-slide pitch, a recorded demo, and a README documenting the retrieval, evaluation, security, and scaling design.
What it ships
- Document ingestion for PDFs, web pages, and office files, with incremental re-indexing as the corpus changes.
- Configurable chunking plus contextual retrieval that prepends document context to each chunk before embedding.
- Hybrid retrieval combining dense vector search and keyword search, followed by a reranking model.
- A grounded answer layer where every response cites the exact passages it relied on, with click-through to source.
- A fast, accessible chat UI with streaming answers, source citations, and conversation history.
- An automated eval harness scoring faithfulness, context precision, context recall, and answer relevancy against a golden dataset.
- CI integration where an eval-score regression fails the build before a change ships.
- A cost-and-latency dashboard with per-request token usage and retrieval timing.
- Out-of-scope detection so the assistant declines questions the corpus cannot answer instead of hallucinating.
- Input/output guardrails mapped to the OWASP LLM Top 10, including prompt-injection filtering on ingested content.
- Multi-tenant workspaces with access controls so different teams query different corpora.
Stack you orchestrate
Claude API or open-weight LLMpgvector or a vector databasean embedding and reranking modelNode.js or PythonGitHub ActionsOpenTelemetryGoogle Cloud Run
Market signal — who wants thisProduction RAG is a mature, funded 2026 market, and its defining requirement is rigorous evaluation: systematic eval frameworks (context precision, context recall, faithfulness, answer relevancy) are now mandatory for enterprise deployments, served by RAGAS, Galileo, Braintrust, and Maxim AI. Enterprise knowledge systems are forecast to keep evolving hard through 2026-2030. Investors fund RAG platforms with built-in eval because retrieval cleverness is commoditized; trustworthy, regression-proof answer quality is what enterprises actually pay for.
How it is graded
- Ingestion, chunking, and a hybrid or contextual retrieval pipeline are implemented and justified.
- Answers are grounded in and cite the retrieved passages they rely on.
- An automated eval harness scores retrieval and answer quality against a golden dataset and runs in CI, where a quality regression fails the build.
- The platform is deployed to real public hosting with CI/CD on every commit, observability of cost and latency, and security hardening mapped to the OWASP LLM Top 10.
- The retrieval and serving layers are hyperscalable and hold as corpus and traffic grow; the scaling design is documented.
- The chat UI is fast, WCAG 2.2 AA accessible, and usable by a non-technical stranger without instruction.
- The project ships complete marketing — a landing page, a 10-slide pitch, and a recorded demo.
- The product is publicly reachable and fully reproducible from the repo.
Bridges to Databases — indexing, information retrieval, and query optimization