Batch Testing

Batch testing lets you run automated test suites against your agents — essential for regression testing and pre-deployment validation.

What is Batch Testing?

Batch testing runs multiple test cases against an agent at once:

Define expected inputs and outcomes
Execute all tests automatically
Review pass/fail results
Track changes over time

Why Batch Testing?

Benefit	How
Catch regressions	Run tests after every change
Pre-deployment validation	Verify before going live
Consistent quality	Same tests run the same way
Confidence	Know your agent works before customers do

Creating Test Cases

Navigate to Batch Test

Go to Test → Batch Test in your dashboard.

Click Create Test Case

Click Create Test Case.

Enter Test Input

Write the customer message to test:

How do I reset my password?

Define Expected Outcome

Specify what should happen:

Response should contain certain content
Specific tool should be called
Should/shouldn’t escalate
Should assign certain category

Save

Save the test case to your suite.

Test Case Components

Input

The message sent to the agent:

Input: "I can't log in to my account"

Expected Outcomes

What should happen:

Expectation Type	Example
Response contains	”password reset”
Response doesn’t contain	competitor names
Tool called	Password Reset Tool
Escalation	Should not escalate
Category	Account Issues
Attribute	Priority = Medium

Multiple Expectations

Combine expectations for thorough testing:

Test: Password Reset Request

Input: "I forgot my password and can't log in"

Expectations:
✓ Response contains "password reset" OR "reset email"
✓ Tool called: Password Reset Tool
✓ Should NOT escalate
✓ Category: Account Issues

Running Tests

Select Agent

Choose which agent to test against.

Select Test Cases

Choose which tests to run:

All tests
Specific subset
Tests by tag/category

Run Tests

Click Run Tests to start execution.

Wait for Results

Tests run sequentially. Progress is shown.

Reviewing Results

Results Overview

See aggregate results:

Tests passed: 18/20
Tests failed: 2/20
Pass rate: 90%

Individual Test Results

For each test:

Status	Meaning
✅ Pass	All expectations met
❌ Fail	One or more expectations not met
⚠️ Error	Test couldn’t complete

Failure Details

Click a failed test to see what went wrong:

Expected: Response contains “reset email”
Actual: Response mentioned “contact support”
Full response: [View complete agent response]
Execution: [View step-by-step execution]

Test Suites

Organize tests into suites:

By Feature

Suite: Password Reset
- Test: Basic reset request
- Test: Can't access email
- Test: Account locked

Suite: Refund Requests
- Test: Eligible refund
- Test: Outside refund window
- Test: Partial refund

By Type

Suite: Happy Paths
Suite: Edge Cases
Suite: Guardrails
Suite: Regression

Creating Tests from Playground

The easiest way to create tests:

Test a scenario in playground
Verify the response is correct
Click Save as Test
The input and expectations are pre-filled

Pre-Deployment Workflow

Before deploying any changes:

Make Changes

Update agent configuration, knowledge, or guidelines.

Run Batch Tests

Execute your full test suite.

Review Failures

Investigate any failed tests.

Fix Issues

Address problems found in testing.

Re-run Tests

Verify fixes and check for regressions.

Deploy

Once all tests pass, deploy with confidence.

Best Practices

Build Tests as You Develop

Don’t wait until the end:

Create tests as you build features
Save good playground conversations as tests
Add tests when bugs are found and fixed

Cover Critical Scenarios

Prioritize tests for:

Most common customer questions
Highest-impact scenarios
Previous issues/bugs
Guardrail triggers

Run Regularly

Before every deployment
After configuration changes
On a schedule (daily/weekly)

Keep Tests Updated

As your product changes:

Update expected outcomes
Add new test cases
Remove obsolete tests

Next Steps

Playground

Create tests interactively

Deploy

Deploy your validated agent

Getting Started

Building Agents

Training Your Agent

Tagging & Classification

Deploying

Testing

Analytics

Settings

What is Batch Testing?

Why Batch Testing?

Creating Test Cases

Test Case Components

Input

Expected Outcomes

Multiple Expectations

Running Tests

Reviewing Results

Results Overview

Individual Test Results

Failure Details

Test Suites

By Feature

By Type

Creating Tests from Playground

Pre-Deployment Workflow

Best Practices

Build Tests as You Develop

Cover Critical Scenarios

Run Regularly

Keep Tests Updated

Next Steps

Playground

Deploy

Getting Started

Building Agents

Training Your Agent

Tagging & Classification

Deploying

Testing

Analytics

Settings

​What is Batch Testing?

​Why Batch Testing?

​Creating Test Cases

​Test Case Components

​Input

​Expected Outcomes

​Multiple Expectations

​Running Tests

​Reviewing Results

​Results Overview

​Individual Test Results

​Failure Details

​Test Suites

​By Feature

​By Type

​Creating Tests from Playground

​Pre-Deployment Workflow

​Best Practices

​Build Tests as You Develop

​Cover Critical Scenarios

​Run Regularly

​Keep Tests Updated

​Next Steps

Playground

Deploy

What is Batch Testing?

Why Batch Testing?

Creating Test Cases

Test Case Components

Input

Expected Outcomes

Multiple Expectations

Running Tests

Reviewing Results

Results Overview

Individual Test Results

Failure Details

Test Suites

By Feature

By Type

Creating Tests from Playground

Pre-Deployment Workflow

Best Practices

Build Tests as You Develop

Cover Critical Scenarios

Run Regularly

Keep Tests Updated

Next Steps