Background
Good tooling requires understanding the tools. Before applying AI agents to performance engineering tasks — profiling pipeline automation, cross-run analysis, report generation — it's worth building a clear mental model of how parallel agent workflows actually work.
This post documents a practice repository built for exactly that purpose.
The Repository
codeberg.org/srinathv/zed-parallel-agents
The repo demonstrates two distinct kinds of parallelism, applied to the same class of problem:
Approach 1 — Zed Threads (Interactive)
Zed's Agent Panel supports multiple simultaneous Claude sessions, each with its own model, context, and rule set. Three threads running concurrently — one Architect on Opus, one Developer on Sonnet, one Reviewer on Haiku — is functionally equivalent to three agents working in parallel, with the human acting as the orchestrator.
The repo includes .rules files and a Rules Library configuration that give each thread a distinct persona. The thread-rules directory documents what each agent is expected to produce and not produce — tight role boundaries are what make multi-agent systems predictable.
Approach 2 — Claude API Agents (Automated)
At the API level, an agent is a messages.create() call with a strong system prompt. Parallel agents are multiple async calls dispatched via asyncio.gather() using the AsyncAnthropic client.
The pipeline is a two-phase DAG:
Phase 1 (sequential):
Architect → design specification
Phase 2 (parallel — asyncio.gather):
Developer ──┐
├── both hit the API simultaneously
Reviewer ──┘
The Architect runs first because the Developer and Reviewer both depend on its output. Once that dependency is satisfied, the remaining agents run concurrently — total Phase 2 time is max(developer_time, reviewer_time), not their sum.
Project 3 — PyTorch ML Pipeline
A third project applies the same pattern to a machine learning pipeline: a Data Engineer agent and Model Architect agent run in parallel (their work is independent), then an ML Engineer agent synthesises both into a complete PyTorch training script. A hand-written baseline train.py is included for comparison.
Why This Matters
The same DAG structure — parallel where independent, sequential where dependent — applies directly to performance engineering workflows:
- Simultaneously profiling multiple frameworks (independent) before a comparative analysis (dependent)
- Running kernel-level and system-level profiling in parallel, then synthesising results
- Generating reports for multiple benchmarks concurrently, then producing an executive summary
The practice repo makes the pattern concrete before those applications come up.
What's Next
The follow-up post will be a live walkthrough: running agents.py, examining what each agent produces, comparing the agent-generated PyTorch code against the hand-written baseline, and discussing where the pattern breaks down and how to fix it.
The code is all there now — the walkthrough comes after I've run it against a real workload.