TestMu AI's A2A Testing CLI lets you run AI agent evaluations, red team tests, and voice agent checks directly from your terminal. Works in CI/CD pipelines.

Devansh Bhardwaj
April 6, 2026
TestMu AI's Agent to Agent Testing platform now has a CLI. Here's what that means for your workflow.
AI agents are in production. Chatbots handle customer queries at scale. Voice assistants route support tickets. Calling agents close loops without a human in the loop and most QA teams are still testing these systems by hand, one conversation at a time.
That bottleneck is exactly what TestMu AI built Agent to Agent Testing to fix. Today, we're extending it to the command line.
Here's what a full evaluation run looks like from your terminal:
# Install
pip install testmu-a2a-cli
# Authenticate
testmu-a2a auth --username YOUR_USERNAME --access-key YOUR_KEY
# Run a quick evaluation against your agent
testmu-a2a test \
--agent https://your-chatbot-endpoint.com \
--spec "E-commerce customer support chatbot" \
--count 200 \
--format json \
--output results.jsontestmu-a2a-cli is a Python-based CLI tool (requires Python 3.10+) that lets you trigger, configure, and run Agent-to-Agent test evaluations directly from your terminal.
It connects your agent under test to TestMu AI's evaluation infrastructure, which simulates real users, generates adversarial inputs, and scores responses across the quality dimensions that matter.
Chat agent metrics covered out of the box: Bias Detection, Hallucination Detection, Completeness, Context Awareness, Response Quality, Conversation Flow, User Satisfaction, File Handling Quality, and File Generation Accuracy.
Let's take this into practice.
Use case: Your team has built a customer support chatbot for an e-commerce platform. It handles order queries, refund requests, and product FAQs.
You're three days from shipping it to production and you need to know: does it hallucinate? Does it stay on topic when users go adversarial? Does it handle edge cases around refund policy correctly?
The old way: A QA engineer writes 30-50 test scripts manually, runs them one by one, and files bugs based on what they notice. It takes days. It misses edge cases because humans don't think adversarially at scale.
With testmu-a2a-cli: Point the CLI at your agent's endpoint, define your spec, and let autonomous evaluators run hundreds of realistic and adversarial scenarios against it in parallel. In minutes, you have structured quality scores across Hallucination Detection, Bias Detection, Response Quality, Conversation Flow, and more. No scripts, no manual review.
The customer support example is just one case. The same pattern applies anywhere you're shipping a conversational AI system:
Spin up your Agent to Agent CLI in just minutes with this detailed documentation.
One of the most important capabilities in testmu-a2a-cli and the one most teams don't think to use until it's too late is redteam.
testmu-a2a redteam \
--agent https://your-chatbot-endpoint.com \
--output redteam-results.jsonThis runs your agent through 9 dedicated attack categories: prompt injection, jailbreak attempts, data exfiltration, PII leakage, and more.
It doesn't just check whether your agent gives good answers, it checks whether it can be broken by a motivated user.
If your agent handles sensitive data, makes decisions with real-world consequences, or is customer-facing, red teaming before launch is not optional. The CLI makes it a single command.
testmu-a2a-cli isn't just for chatting. The call command brings the same evaluation depth to phone agents, inbound and outbound with capabilities built specifically for voice:
testmu-a2a call \
--agent https://your-phone-agent-endpoint.com \
--type inbound \
--output call-results.jsonThe CLI supports 30+ phone-agent quality metrics, background sound simulation, and DTMF detection, covering the real conditions your voice agent will face in production.
Multi-turn coherence, intent handling under noise, escalation behavior, all testable from the terminal.
For teams who want repeatable, version-controlled evaluation runs, testmu-a2a init generates a testmu-a2a.yaml config file that you can commit alongside your codebase and run consistently across environments:
# Generate your config file
testmu-a2a init
# Run from config - works in CI/CD without any flags
testmu-a2a run --config testmu-a2a.yamlThis is how most teams use the CLI in practice: testmu-a2a test for fast ad-hoc checks during development, testmu-a2a run with a committed config for pipeline-controlled evaluation gates.
The Agent-to-Agent Testing platform has always had a clear thesis: you can't use deterministic, script-based QA to validate non-deterministic AI systems. Static test cases don't adapt, they miss edge cases.
The platform addresses this by deploying autonomous evaluators that emulate real users and intelligent adversarial interactions. Until now, accessing that capability required going through the TestMu AI browser-based console. The CLI changes that, it's built for teams who live in the terminal, run tests in CI/CD pipelines, and want evaluation results feeding directly into deployment gates.
Drop testmu-a2a test into your GitHub Actions or Jenkins step. Set thresholds. Fail the build if Hallucination Detection or Bias scores don't meet your bar. Results come back as structured JSON, parseable, alertable, and ready to feed into any dashboard.
Did you find this page helpful?
More Related Hubs
TestMu AI forEnterprise
Get access to solutions built on Enterprise
grade security, privacy, & compliance