Agentic Systems Engineering

Week 1

The Augmented LLM as a Building Block

An agent is an LLM in a loop with tools, retrieval, and memory. Master the atomic building block before composing it: when a deterministic workflow beats an agent, and the failure modes of handing a model autonomy.

Bridges to Operating Systems — processes, scheduling, and the run loop

Builds on: nothing — start here

Week 2

Tool Use & Function Calling

Give a model hands: structured tool schemas, argument validation, error feedback loops, and parallel tool calls. The contract between a model and the outside world is an API design problem.

Bridges to Software Engineering — interface design and API contracts

Builds on: The Augmented LLM as a Building Block

Week 3

Model Context Protocol & Interoperable Tooling

Standardize how agents connect to tools, data, and other agents. MCP servers and clients, transports, capability negotiation, and why a protocol beats bespoke integrations at scale.

Bridges to Computer Networks — protocols, client-server architecture, and RPC

Builds on: Tool Use & Function Calling

Week 4

Planning, Reasoning & Task Decomposition

How agents break a goal into steps: chain-of-thought, ReAct, plan-and-execute, and reflection. When explicit planning helps and when it just adds latency and drift.

Bridges to Artificial Intelligence — search, planning, and state-space reasoning

Builds on: Tool Use & Function Calling

Week 5

Agent Memory & Context Management

Agents forget. Engineer short-term scratchpads, long-term vector memory, summarization, and context-window budgeting so an agent stays coherent across a long-running task.

Bridges to Operating Systems — memory hierarchy, paging, and caching

Builds on: Planning, Reasoning & Task Decomposition

Week 6

Sandboxed Code Execution

Letting an agent run code it wrote is a security problem first. Isolation via containers and microVMs, filesystem and network confinement, resource limits, and safe failure.

Bridges to Operating Systems — virtualization, namespaces, and process isolation

Builds on: Agent Memory & Context Management

Week 7

Autonomous Multi-Agent Research-and-Ship System

Week 7 milestone

You are handed an enterprise mandate: the research division needs a launched product — a system that takes an open-ended technical question, autonomously researches it across many sources, synthesizes a defensible report, and ships the report as a published artifact, with zero human steps in the middle. Build an orchestrator-worker multi-agent system: a lead agent that decomposes the question and spawns specialized worker agents (search, read, synthesize, fact-check), coordinates their results through shared state, and produces a cited deliverable. This is not a notebook demo. The result must be a directly deployable, hyperscalable product: real public hosting, CI/CD on every commit, observability, security hardening, a polished and accessible web UI a non-technical analyst will happily use, and complete go-to-market material — a landing page, a pitch, and a recorded demo. The architecture must absorb concurrent research runs without falling over, and recover from a failed worker. We are not here to babysit the run; ship it as a real product.

Why it matters: Multi-agent research and synthesis systems are being deployed across consulting, finance, and R&D to compress weeks of analyst work into hours. Shipping a coordinated, fault-tolerant agent fleet makes a builder ready for an Agentic Systems Engineer or Applied AI Engineer role at the ₹1-crore tier, where the bar is production reliability, not a demo.

The deliverable

A publicly hosted product with its own domain or stable URL, plus a public repo: the orchestrator and worker agents, an MCP-based tool layer, a fast accessible web UI for submitting questions and reading results, CI/CD running lint/tests/build on every commit, persisted and inspectable run traces, a marketing landing page, a 10-slide pitch, a recorded demo video, and a README documenting the coordination design, the failure-recovery and scaling strategy, and three example end-to-end runs with their published reports.

What it ships

Submit-a-question interface accepting an open-ended technical or market question with a depth setting (quick scan vs deep dive).
A lead orchestrator agent that decomposes the question into a research plan and spawns specialized worker agents.
Specialized workers — web search, source reading, synthesis, and an independent fact-checker that verifies every claim.
An MCP tool layer exposing search, fetch, and document tools so the same tools are reusable across agents and projects.
Live run view: a real-time graph of agent activity, sub-questions in flight, and sources being consumed.
Inline-cited report output where every claim links to the exact retrieved passage that supports it.
Export to PDF, Markdown, and a shareable public report URL.
Persisted, replayable run traces with token spend and latency per agent for cost auditing.
Automatic worker-failure detection and re-dispatch so a crashed worker never aborts a run.
A workspace history of past research runs with search and one-click re-run.
Concurrency controls and per-run budget caps so many users can run research in parallel safely.

Stack you orchestrate

Claude API or open-weight LLMModel Context ProtocolLangGraphNode.js or PythonDockerGoogle Cloud Runa tracing backend (LangSmith or OpenTelemetry)

Market signal — who wants thisAgentic deep-research is one of the hottest 2026 categories: the AI agent market is projected to grow from $7.84B in 2025 to $52.62B by 2030 (41% CAGR), and a16z reports a portfolio pivot from copilots to autonomous systems, with Sierra, Glean, and Decagon as comparables and YC W26 funding multi-agent orchestration startups such as Tensol and Korso. Consulting, finance, and corporate R&D teams are actively buying systems that compress weeks of analyst work into hours; investors fund this because it sells time back to high-cost knowledge workers.

How it is graded

The orchestrator decomposes a question and coordinates at least three specialized worker agents through explicit shared state.
Tools are exposed through a standard protocol (MCP), not bespoke per-agent glue.
The system is deployed to real public hosting with CI/CD on every commit and production observability (logs, traces, metrics).
The architecture handles concurrent research runs under load, and a worker failure mid-run still yields a complete, correct deliverable.
The web UI is fast, WCAG 2.2 AA accessible, and usable by a non-technical analyst without instruction.
Every claim in the output report is traceable to a retrieved source, and run traces are persisted and inspectable.
The project ships complete marketing: a landing page, a 10-slide pitch, and a recorded demo, presentable as a real product.
The product is publicly reachable and fully reproducible from the repo by a stranger.

Bridges to Distributed Systems — coordination, message passing, and fault tolerance

Week 8

Multi-Agent Orchestration

Compose specialized agents into a system: orchestrator-worker, handoffs, shared state, and message passing. The coordination problems are the same ones distributed systems have always had.

Bridges to Distributed Systems — coordination, message passing, and consensus

Builds on: Sandboxed Code Execution

Week 10

Evaluating & Observing Agents

Agents fail silently and non-deterministically. Build trajectory evals, tool-call assertions, LLM-as-judge, and tracing so you can tell a regression from noise.

Bridges to Software Engineering — testing, regression suites, and observability

Builds on: Multi-Agent Orchestration

Week 11

Agent Security & Prompt Injection

An autonomous agent with tools is an attack surface. Prompt injection, tool poisoning, the lethal trifecta of private data plus untrusted content plus exfiltration, and least-privilege agent design.

Bridges to Information Security — threat modeling and least privilege

Builds on: Evaluating & Observing Agents

Week 12

Autonomous Coding Agent with Sandboxed Execution

Week 12 milestone

An enterprise mandate: ship a launched product — an autonomous coding agent that, given a real GitHub issue, plans a fix, writes and runs code inside a hardened sandbox, iterates against test feedback, and opens a pull request, with untrusted code never touching the host. This is an operating-systems problem as much as an AI problem: the agent's generated code is hostile by assumption. Confine it with containers or microVMs, enforce filesystem and network policy, cap CPU and memory, and survive an agent that tries to escape or hang. The deliverable is not a script — it is a directly deployable, hyperscalable product: real public hosting, CI/CD, observability, a clean dashboard where a developer queues issues and watches trajectories, security hardening, and full marketing (landing page, pitch, demo). The sandbox fleet must scale horizontally to many concurrent jobs. We are not here to babysit a run; ship it as a real product.

Why it matters: Autonomous coding agents are now a core part of engineering org tooling, and the hard part is safe execution at scale, not code generation. A builder who can ship a sandboxed, evaluated coding agent is directly deployable as an Agent Infrastructure Engineer or AI Platform Engineer, roles commanding ₹1-crore-plus compensation because they sit between security and AI.

The deliverable

A publicly hosted product with a stable URL, plus a public repo: the planning-and-execution loop, the sandbox isolation layer, a fast accessible dashboard for queuing issues and inspecting trajectories, CI/CD on every commit, production observability, an eval suite over a set of real issues with pass/fail trajectories, a marketing landing page, a 10-slide pitch, a recorded demo, and a README documenting the isolation threat model, the horizontal-scaling design, and the workflow-versus-agent decision.

What it ships

GitHub integration: connect a repo, and the agent picks up issues labeled for automation.
A planning stage that reads the issue, explores the codebase, and produces a fix plan before writing code.
Generated code runs only inside a hardened microVM or container sandbox with enforced CPU, memory, filesystem, and network limits.
An iterative loop: run the test suite, read failures, revise, and retry until tests pass or a budget is hit.
Automatic pull-request creation with a written summary of the change and the test evidence.
A live trajectory dashboard showing the agent's plan, edits, command output, and retries in real time.
An eval mode that runs the agent over a fixed set of real issues and reports pass rate and trajectory quality.
Prompt-injection defenses: untrusted issue and repo text is treated as hostile, with least-privilege tool scopes.
Horizontal sandbox-fleet scaling so many issues are worked concurrently without contention.
Per-job cost and time tracking, with configurable budget caps and a hard kill on runaway jobs.
An audit log of every command the agent ran inside the sandbox.

Stack you orchestrate

Claude API or open-weight LLMgVisor or Firecracker microVMsDockerGitHub APINode.js or Pythonan eval frameworkGoogle Cloud Run

Market signal — who wants thisAutonomous coding agents are the breakout 2026 developer-tools category: Cursor reached $2B ARR in February 2026 and is raising at a $50B+ valuation, and Replit raised a $400M Series D at a $9B valuation in March 2026. SWE-bench Verified is now the industry yardstick, with top agents exceeding 80%. A whole sub-market of sandbox-execution infrastructure (E2B, Northflank, Modal, Blaxel) is being funded specifically to run agent-written code safely; investors back this because safe execution at scale, not code generation, is the unsolved bottleneck.

How it is graded

Generated code executes only inside an isolated sandbox with enforced CPU, memory, filesystem, and network limits.
The agent iterates on test feedback and recovers from its own failed attempts.
The sandbox fleet scales horizontally to many concurrent jobs and the scaling design is documented.
The product is deployed to real public hosting with CI/CD on every commit and production observability.
A fast, WCAG 2.2 AA accessible dashboard lets a developer queue issues and inspect agent trajectories.
An eval suite reports pass rate and trajectory quality over a fixed set of real issues.
The isolation threat model is documented, including sandbox-escape containment, and prompt-injection via issue or repo content is mitigated with least-privilege tool access.
The project ships complete marketing — a landing page, a 10-slide pitch, and a recorded demo — and is publicly hosted and reproducible from the repo.

Bridges to Operating Systems — virtualization, process isolation, and resource management

Twelve weeks, fully mapped.

The Augmented LLM as a Building Block

Tool Use & Function Calling

Model Context Protocol & Interoperable Tooling

Planning, Reasoning & Task Decomposition

Agent Memory & Context Management

Sandboxed Code Execution

Autonomous Multi-Agent Research-and-Ship System

Multi-Agent Orchestration

Evaluating & Observing Agents

Agent Security & Prompt Injection

Autonomous Coding Agent with Sandboxed Execution