Parallel AI Agents: From a Practice Repo to the System That Built This Site

Background

Good tooling requires understanding the tools. Before applying AI agents to performance engineering tasks — profiling pipeline automation, cross-run analysis, report generation — it's worth building a clear mental model of how parallel agent workflows actually work.

This post documents a practice repository built for exactly that purpose.

The Repository

codeberg.org/srinathv/zed-parallel-agents

The repo demonstrates two distinct kinds of parallelism, applied to the same class of problem:

Approach 1 — Zed Threads (Interactive)

Zed's Agent Panel supports multiple simultaneous Claude sessions, each with its own model, context, and rule set. Three threads running concurrently — one Architect on Opus, one Developer on Sonnet, one Reviewer on Haiku — is functionally equivalent to three agents working in parallel, with the human acting as the orchestrator.

The repo includes .rules files and a Rules Library configuration that give each thread a distinct persona. The thread-rules directory documents what each agent is expected to produce and not produce — tight role boundaries are what make multi-agent systems predictable.

Approach 2 — Claude API Agents (Automated)

At the API level, an agent is a messages.create() call with a strong system prompt. Parallel agents are multiple async calls dispatched via asyncio.gather() using the AsyncAnthropic client.

The pipeline is a two-phase DAG:

Phase 1 (sequential):
  Architect → design specification

Phase 2 (parallel — asyncio.gather):
  Developer ──┐
              ├── both hit the API simultaneously
  Reviewer  ──┘

The Architect runs first because the Developer and Reviewer both depend on its output. Once that dependency is satisfied, the remaining agents run concurrently — total Phase 2 time is max(developer_time, reviewer_time), not their sum.

Project 3 — PyTorch ML Pipeline

A third project applies the same pattern to a machine learning pipeline: a Data Engineer agent and Model Architect agent run in parallel (their work is independent), then an ML Engineer agent synthesises both into a complete PyTorch training script. A hand-written baseline train.py is included for comparison.

Why This Matters

The same DAG structure — parallel where independent, sequential where dependent — applies directly to performance engineering workflows:

Simultaneously profiling multiple frameworks (independent) before a comparative analysis (dependent)
Running kernel-level and system-level profiling in parallel, then synthesising results
Generating reports for multiple benchmarks concurrently, then producing an executive summary

The practice repo makes the pattern concrete before those applications come up.

From threads to terminals: the coordinated system that built this site

The practice above ran several agents inside one editor, with a human as the orchestrator on every step. What it evolved into is stronger and stranger: a fleet of independent Claude Code sessions, each in its own terminal and its own git working directory, coordinating with each other — and this website is what they produced.

The setup, concretely:

One agent per terminal, one repository per agent. Each session runs in its own working directory with its own context and tools — no shared editor, no single human dispatcher in the loop for every step.
A written ownership map. A constitution file assigns every area to exactly one lane: one agent owns the LLM-inference posts, another the distributed-systems-as-neural-network pedagogy, another the site infrastructure and the HPC/architecture work. Stay in your lane is the first rule, because the expensive failures in multi-agent work are collisions — two agents claiming the same post number, or committing over each other's uncommitted work.
Multiple layers of coordination. Above the ownership map sits a resident coordinator agent — the "main office" — the single authority for cross-agent state. It never writes content or deploys; it maintains a ledger and answers the questions that cause collisions: what's the next free post number?, which lane owns this?, is it safe to deploy right now? — reconciled from live repository state.
A report-up mailbox. After an agent acts — claims a number, starts a post, commits, intends to deploy — it appends a one-line check-in to its own file in a shared inbox (single-writer-per-file, so the logs themselves never collide). The office folds those check-ins plus the actual git history into the ledger.
A hand-off queue. When work crosses a lane boundary, the requesting agent drops a note in a shared request file rather than reaching into another agent's files. The owner picks it up — or redirects it.

The result behaves less like a chat thread and more like a small organization: lanes (departments), a coordinator (the office), check-ins (status reports), and a request queue (inter-department tickets) — with git as the system of record and the deploy pipeline as the loading dock.

This site is the artifact. Nearly every post and case study here was authored by one of these agents, inside its lane, and shipped through a shared build-and-deploy that mirrors to an offline drive and a fallback host. The coordination isn't a demo — it is the thing that let several agents grow a real, cross-linked body of work in parallel without stepping on each other. (This very post was revised by the site-infrastructure lane, on request, while other lanes shipped inference and distributed-systems posts alongside it.)

It also sharpens the question the field is now racing to answer: how do you measure the efficiency of a multi-agent system? More agents is not automatically more throughput — coordination has real overhead, and the metric that matters is useful work per unit of contention. That measurement problem is exactly the kind of thing the benchmarks investigation chases as the field moves from MLPerf toward per-agent efficiency.

What's Next

The "real workload" this repo was waiting for turned out to be the practice itself. Rather than a single code walkthrough, the parallel-agent pattern now drives a running series of LLM-systems investigations — each one measured, reproduced, and written up. The payoff shows up across that body of work, which starts from the profiling baseline in Establishing a Baseline and runs through the inference investigations beginning with Batching Is the Parallelism.