Skip to main content

Agent Features & Metrics - Customer Reference Guide

Agent Types Overview

The platform supports 5 agent types, each designed for a specific testing scenario:


Agent TypePrimary Use CaseKey Differentiator
ChatTest text-based chatbot agentsMulti-turn text conversation evaluation with 9 quality metrics
VoiceTest chatbot agents via audio conversationsSame as Chat, but conversations happen as audio (WAV) instead of text
Phone Caller InboundTest voice agents that receive callsPre-evaluation (live simulated calls) + Post-evaluation (production recording analysis)
Phone Caller OutboundTest voice agents that make callsPre-evaluation (live outbound calls) + Post-evaluation (production recording analysis)
Image AnalyzerValidate AI-generated imagesImage quality scoring against prompts and brand guidelines

Chat Agent


📄 Workflow-Based Test Generation

Connect your knowledge sources and let the platform auto-generate test scenarios — no manual scripting needed.

  • Document Upload — Upload knowledge base documents (PDF, text files) that your chatbot is expected to understand
  • Source Integrations — Connect Confluence, JIRA, or GitHub as knowledge sources
  • AI Test Generation — Automatically generates test scenarios from uploaded documents
  • Real-time Progress — Live streaming of test generation progress

🎭 Scenario Management

Build and manage the exact conversations you want to test, manually or via AI.

  • Manual Scenario Creation — Create test scenarios with title, description, and expected behavior
  • AI-Generated Scenarios — Auto-generate scenarios from uploaded knowledge sources
  • Validation Criteria — Define custom pass/fail criteria per scenario (e.g., "Agent must mention return policy", "Agent should not hallucinate product prices")
  • Special Instructions — Add specific instructions to guide scenario execution
  • Persona Assignment — Assign user personas to scenarios (e.g., "frustrated customer", "first-time user")
  • Test Profile Association — Link test data profiles to scenarios for data-driven testing
  • Scenario Deletion — Remove scenarios that are no longer needed

🗂️ Test Suites

Group related scenarios together and track results over time.

  • Suite Creation — Group multiple scenarios into test suites
  • Test Profile Selection — Assign a test data profile to the entire suite
  • Run History — View all past runs with status, score, and timestamps
  • Status Filtering — Filter results by Passed, Failed, In Progress

🔌 Endpoint Profiles

Configure how the platform connects to your agent's API — supporting everything from simple REST calls to multi-phase auth flows.

  • Postman Collection Import — Upload Postman collections (with optional environment files) to configure your agent's API. Supports nested folders, collection variables, and environment variable substitution
  • Manual Configuration — Define endpoints via JSON with URL, method, headers, and body
  • Multi-Phase Execution — Configure suite_setup (login/auth), scenario_setup (session creation), and chat (conversation) phases
  • Variable Management — Define static values, auto-generated values (UUID, mobile number, timestamp, email), and extracted values from API responses
  • Retry & Caching — Configure retry-on-failure and cache suite setup results to speed up execution
  • Test Endpoint — Dry-run your endpoint profile to verify connectivity before running full evaluations
  • Import/Export — Export profiles as JSON and import across projects
  • Default Profile — Mark one profile as the default for quick evaluation runs

🗃️ Test Profiles (Test Data)

Create reusable data sets to power data-driven testing across multiple scenarios.

  • Custom Key-Value Data — Create reusable test data profiles with typed fields (string, number, boolean, email, URL, JSON)
  • Default Profile — Mark one profile as the default
  • Import/Export — Share test profiles across projects via JSON export/import
  • Data Injection — Test data is injected at runtime for data-driven scenario execution

🧪 Playground

Interactively test your agent configuration before running a full evaluation suite.

  • Interactive Chat — Send messages to your configured agent endpoint in real-time
  • Multi-Turn Conversations — Maintain conversation context across multiple messages
  • Connection Testing — Test connectivity via cURL command verification
  • Schema Analysis — Automatic detection of request/response schema from your endpoint

⚡ Evaluation Execution

Run evaluations at scale with real-time feedback on your agent's quality.

  • Metric Selection — Choose which quality metrics to evaluate (or run all)
  • Endpoint Profile Selection — Pick which endpoint profile to evaluate against
  • HyperExecute Integration — Run evaluations at scale using LambdaTest's HyperExecute infrastructure with optional tunnel configuration for testing agents behind firewalls or private networks
  • Real-Time Streaming — Live progress updates during evaluation via Server-Sent Events

🎚️ Metric Threshold Configuration

Define exactly what "passing" means for your project — then enforce it automatically.

  • Per-Metric Thresholds — Set minimum acceptable score (0.0–1.0) for each metric
  • Higher/Lower is Better — Configure directionality for each metric
  • Named Configurations — Create named threshold configs (e.g., "Strict", "Default")
  • Active/Inactive Toggle — Enable or disable threshold configurations

🚦 Go-Live Assessment

Get a clear, defensible production-readiness verdict before you ship.

Production Readiness Verdicts
VerdictMeaning
🟢 GREENReady for production
🟡 YELLOWReady with caveats
🔴 REDNot ready
  • Overall Score — Weighted composite score (0–100)
  • Confidence Level — Based on number of evaluations run
    • HIGH: 100+ evaluations
    • MEDIUM: 50–99 evaluations
    • LOW: 20–49 evaluations
    • VERY_LOW: Fewer than 20 evaluations
  • Dimension Scores — Functional Completeness, Quality Standards, Risk Profile, Operational Readiness (each weighted 25%)
  • Scenario Coverage — Matrix showing well-tested vs. untested scenarios
  • Risk Assessment — AI-powered failure pattern analysis with prioritized action items
  • Validation Criteria Summary — Aggregated compliance rate across all criteria
  • AI Insights — Actionable recommendations for improvement

🗓️ Scheduled Runs

Automate ongoing regression coverage without manual intervention.

  • Cron-Based Scheduling — Schedule evaluations with preset frequencies (Hourly, Daily, Weekdays, Weekly, Monthly) or custom cron expressions
  • Timezone Support — Full IANA timezone selection
  • Pause/Resume — Temporarily pause and resume scheduled runs
  • Run History — Track all scheduled execution results

Voice Agent

note

The Voice agent is functionally identical to the Chat Agent, with one key difference: conversations happen as audio (WAV) instead of text. The platform conducts voice-based conversations with your agent and evaluates the audio interaction using the same quality metrics.


All features from the Chat Agent are available, including:

  • Workflow-based test generation with document upload and source integrations
  • Scenario management with AI generation, validation criteria, personas, and special instructions
  • Test suites with test profile selection and run history
  • Endpoint profiles with Postman collection import
  • Test profiles for data-driven testing
  • Playground for interactive testing
  • Evaluation execution with metric selection and HyperExecute integration
  • Metric threshold configuration
  • Go-Live assessment with production readiness verdict
  • Scheduled runs


Phone Caller Inbound Agent

Two Evaluation Modes
ModeWhat It Does
Pre-evaluationThe platform simulates customers calling your voice agent with live test calls, then evaluates the resulting conversations
Post-evaluationUpload your production call recordings and transcripts from real customer interactions for evaluation on the platform

📞 Phone Number Management

Register and manage the phone numbers your voice agent answers on.

  • Add Phone Numbers — Register your agent's phone numbers with country code selection (20+ countries supported)
  • Default Phone Number — Set a default number for quick test execution
  • Phone Number Display — Masked display for security with country flag identification
  • Edit & Delete — Update phone number details or remove numbers no longer in use

🎭 Scenario Management

Generate realistic inbound call scenarios at scale with AI or build them manually.

  • AI Scenario Generation — Generate up to 20 inbound test scenarios with configurable personas, languages, and special instructions
  • Manual Scenario Creation — Create scenarios with name, description, expected output
  • Scenario Deletion — Remove scenarios no longer needed
  • Persona Selection — Choose from available personas or create custom ones to simulate different caller types
  • Language Support — Generate scenarios in multiple languages (English, Spanish, etc.)

🎙️ Voice Configuration (Per Scenario)

Control every detail of how the simulated caller sounds and behaves.

  • Voice Selection — Choose from a library of voices with audio preview (multiple voice providers and accents available)
  • Voice Preview — Listen to voice samples with animated waveform visualization before selecting
  • Background Sound — Enable simulated background noise (15 presets: cafe, street, factory, rain, crowd, market, train, radio interference, etc.)
  • Response Timing — Configure wait time after speech ends (0.5–5.0 seconds) to handle agents that speak in multiple sentences
  • Max Call Duration — Set maximum call length (60–1800 seconds) to prevent runaway calls
  • First Speaker — Choose who speaks first — the simulated user or the agent

👤 Agent Profiles (Inbound-Specific)

Create reusable caller personas to standardize how test calls are placed across suites.

  • Agent Profile Creation — Configure agent personas with name, phone number, voice, and background noise
  • Profile Library — Organization-level reusable agent profiles
  • Active/Inactive Toggle — Enable or disable profiles

🗂️ Test Suites

Batch your inbound scenarios into suites and run them all with a single action.

  • Suite Creation — Group multiple scenarios with per-scenario voice and phone configuration
  • Test Profile Association — Link test data for data-driven voice testing
  • Agent Profile Assignment — Associate agent profiles with suites
  • Run Suites — Execute all scenarios in a suite with a single action

📡 Call Execution & Monitoring

Trigger, track, and manage live test calls in real-time.

  • Initiate Test Calls — Trigger simulated inbound calls to your voice agent
  • Real-Time Monitoring — Track call status as calls progress
  • Call Duration Tracking — Live duration counter during active calls
  • Call Termination — End calls in progress if needed

Phone Caller Outbound Agent

note

Phone Caller Outbound supports the same two evaluation modes as Inbound (Pre-evaluation and Post-evaluation) and shares all features, with a few key differences in pre-evaluation mode only.


Outbound-Specific Pre-evaluation Features

  • Scenario Generation — Generate up to 7 outbound test scenarios (vs. 20 for inbound)
  • Caller Profile Selection — Select an outbound caller profile when generating scenarios
  • Outbound Number Pool — Reserve phone numbers from the outbound pool for test calls
  • Passive Mode — Listen to outbound calls without interfering (for QA monitoring)
  • First Speaker Default — Agent speaks first (vs. simulator for inbound)

Outbound Pool Management

  • View Pool Status — See available outbound numbers
  • Reservations — View and manage per-suite number reservations
  • Clear Reservations — Release reserved numbers when done

tip

All other features — phone number management, voice configuration, test suites, call execution, post-evaluation (voice analytics, recording upload, batch analysis), go-live assessment, metrics, and scheduling — are identical to Phone Caller Inbound.


Image Analyzer Agent


🖼️ Image Analysis

Upload single images or batch-process up to 50 at once — via file upload or URL.

  • Single Image Analysis — Upload an image (or provide a URL) along with the original prompt used to generate it
  • Batch Analysis — Analyze up to 50 images at once with a shared prompt
  • Drag & Drop Upload — Drag and drop images directly into the upload area
  • Supported Formats — JPG, JPEG, PNG, GIF, WEBP, BMP (max 20 MB per image)

🎯 Custom Evaluation Criteria

Define what "good" means for your images — choose from three criteria types:


  • Allowed colors and prohibited colors
  • Required fonts
  • Logo requirements (text description)

All criteria support Active/Inactive toggling, create/edit/delete operations, and search by name, description, or type.


📋 Analysis History

  • Search — Find past analyses by image name or prompt text
  • Status Tracking — View analysis status (Pending, Completed, Failed)
  • Detailed View — Click any analysis to view full results
  • Bookmarking — Bookmark important analyses for quick access

📊 Analytics Dashboard

  • Overall Statistics — Average score, highest score, lowest score, total analyses count
  • Quality Trends — Daily score breakdown over the last 30 days with bar chart visualization
  • Prompt Performance — Top 20 prompts ranked by average quality score, showing min/max scores, usage count, and comparison vs. overall average

Shared Features Across Agents


🗂️ Project Management

  • Create Agents — Name, description, and agent type selection
  • Agent Listing — View all agents with type-specific icons and filtering
  • Agent Dashboard — Overview of workflows, suites, and recent activity

📋 Test Profiles

Available for: Chat · Voice · Phone Caller (Inbound/Outbound)
  • Custom key-value data with typed fields (string, number, boolean, email, URL, textarea, JSON)
  • Default profile marking
  • Import/Export as JSON

🎭 Personas

Available for: Chat · Voice · Phone Caller (Inbound/Outbound)
  • Pre-built persona library
  • Custom persona creation
  • Persona assignment to scenarios

🗓️ Scheduling

Available for: Chat · Voice · Phone Caller (Inbound/Outbound)
  • Cron-based scheduling (Hourly, Daily, Weekdays, Weekly, Monthly, Custom)
  • Timezone support (full IANA timezone list)
  • Pause/Resume/Delete scheduled runs
  • View next scheduled run time and run history

🚦 Go-Live Assessment

Available for: Chat · Voice · Phone Caller (Inbound/Outbound)
VerdictScore RangeMeaning
🟢 GREEN≥ 80Ready for Production
🟡 YELLOW65–79Ready with Caveats
🔴 RED< 65Not Ready
  • Overall score with confidence level
  • Scenario coverage analysis
  • AI-powered risk assessment and recommendations

🌐 Environment Management

Available for: All agent types
  • Create and manage test environments
  • Variable management (name, value, type, persistence)
  • Per-environment variable scoping
  • Bulk variable creation

✅ Validation Criteria

Available for: Chat · Voice · Phone Caller (Inbound/Outbound)
  • Custom pass/fail criteria per scenario
  • Evidence-based validation with confidence levels (High/Medium/Low)
  • Compliance percentage tracking
  • Per-criterion results: Pass, Fail, or Unable to Verify

Feature Availability Matrix


FeatureChatVoicePhone Caller InboundPhone Caller OutboundImage Analyzer
Workflow & Document Upload
AI Scenario Generation✅ (up to 20)✅ (up to 7)
Manual Scenario Creation
Test Suites
Endpoint Profiles
Test Profiles
Personas
Playground
Audio-Based Conversations
Phone Numbers
Voice Selection (per scenario)
Background Noise Simulation
Live Call Execution (Pre-eval)
Production Recording Upload (Post-eval)
Batch Recording Analysis
Call Recording Playback
DTMF Support
Passive Mode
Outbound Pool
Image Upload & Analysis
Custom Evaluation Criteria
Brand Guideline Checks
Image Analytics Dashboard
Metric Thresholds
Go-Live Assessment
Validation Criteria
Scheduled Runs
HyperExecute Integration
Import/Export Profiles

Metrics Availability Matrix


Metric CategoryChatVoicePhone Caller InboundPhone Caller OutboundImage Analyzer
Bias Detection
Hallucination Detection
Completeness
Context Awareness
Response Quality
Conversation Flow
User Satisfaction
File Handling Quality
File Generation Accuracy
Latency & Interaction Dynamics
Accuracy & Effectiveness (FCR, etc.)
CSAT & User Experience
Business Operational Metrics
Audio Voice Quality
STT Evaluation
Issue Detection Tags
Validation Criteria
Image Quality Score
Prompt Compliance
Brand/Technical Spec Compliance

Test across 3000+ combinations of browsers, real devices & OS.

Book Demo

Help and Support

Related Articles