Tumult

Tumult (specifically tumult-agentic) is a chaos engineering and fault injection framework designed to test the resilience of AI coding agents like Claude Code, GitHub Copilot, Codex, and OpenCode. By running as a local reverse proxy between an agent and its model provider, Tumult dynamically injects failure scenarios—such as API latency, model timeouts, malformed JSON structures, tool delays, and retrieval poisoning—to evaluate how agents behave under pressure.

Key Features of Tumult

Fault-Injecting Proxy: Intercepts and modifies traffic between AI agents and model endpoints (Anthropic, OpenAI, etc.) by setting custom base URLs.
Production Failure Scenarios: Simulates real-world issues including synthetic HTTP 429/5xx codes, malformed outputs, tool failures, and context contamination.
Resilience Verification: Validates agent performance against behavioral contracts such as retry budgets, JSON formats, and secret leakage boundaries.
OpenTelemetry Observability: Emits detailed trace metrics, aligning injected fault spans directly with agent model call spans.

Benefits of Using Tumult

Evaluate Agent Reliability: Discover where coding agents silently hang, crash, or hallucinate before deploying agentic workflows to production.
Offline Simulation: Supports listing scenario packs, running local experiments, and replaying trace fixtures entirely offline.
Standardized Chaos Practices: Applies traditional software fault injection discipline to non-deterministic AI agent architectures.

For SREs and AI automation developers, Tumult serves as a crucial testing harness, providing the proxy control and behavioral verification needed to establish robust resilience metrics for autonomous agent systems.

A chaos engineering and fault injection proxy for AI agents (Claude Code, GitHub Copilot, Codex, OpenCode) to test resilience under network latency, rate limits, malformed payloads, and tool failures.

What QA Leaders Need to Know About AI in 2026

Key Features of Tumult

Benefits of Using Tumult

Tags: