Skip to main content
Batch testing lets you run automated test suites against your agents — essential for regression testing and pre-deployment validation.

What is Batch Testing?

Batch testing runs multiple test cases against an agent at once:
  • Define expected inputs and outcomes
  • Execute all tests automatically
  • Review pass/fail results
  • Track changes over time

Why Batch Testing?

BenefitHow
Catch regressionsRun tests after every change
Pre-deployment validationVerify before going live
Consistent qualitySame tests run the same way
ConfidenceKnow your agent works before customers do

Creating Test Cases

1

Navigate to Batch Test

Go to Test → Batch Test in your dashboard.
2

Click Create Test Case

Click Create Test Case.
3

Enter Test Input

Write the customer message to test:
How do I reset my password?
4

Define Expected Outcome

Specify what should happen:
  • Response should contain certain content
  • Specific tool should be called
  • Should/shouldn’t escalate
  • Should assign certain category
5

Save

Save the test case to your suite.

Test Case Components

Input

The message sent to the agent:
Input: "I can't log in to my account"

Expected Outcomes

What should happen:
Expectation TypeExample
Response contains”password reset”
Response doesn’t containcompetitor names
Tool calledPassword Reset Tool
EscalationShould not escalate
CategoryAccount Issues
AttributePriority = Medium

Multiple Expectations

Combine expectations for thorough testing:
Test: Password Reset Request

Input: "I forgot my password and can't log in"

Expectations:
✓ Response contains "password reset" OR "reset email"
✓ Tool called: Password Reset Tool
✓ Should NOT escalate
✓ Category: Account Issues

Running Tests

1

Select Agent

Choose which agent to test against.
2

Select Test Cases

Choose which tests to run:
  • All tests
  • Specific subset
  • Tests by tag/category
3

Run Tests

Click Run Tests to start execution.
4

Wait for Results

Tests run sequentially. Progress is shown.

Reviewing Results

Results Overview

See aggregate results:
  • Tests passed: 18/20
  • Tests failed: 2/20
  • Pass rate: 90%

Individual Test Results

For each test:
StatusMeaning
PassAll expectations met
FailOne or more expectations not met
⚠️ ErrorTest couldn’t complete

Failure Details

Click a failed test to see what went wrong:
  • Expected: Response contains “reset email”
  • Actual: Response mentioned “contact support”
  • Full response: [View complete agent response]
  • Execution: [View step-by-step execution]

Test Suites

Organize tests into suites:

By Feature

Suite: Password Reset
- Test: Basic reset request
- Test: Can't access email
- Test: Account locked

Suite: Refund Requests
- Test: Eligible refund
- Test: Outside refund window
- Test: Partial refund

By Type

Suite: Happy Paths
Suite: Edge Cases
Suite: Guardrails
Suite: Regression

Creating Tests from Playground

The easiest way to create tests:
  1. Test a scenario in playground
  2. Verify the response is correct
  3. Click Save as Test
  4. The input and expectations are pre-filled

Pre-Deployment Workflow

Before deploying any changes:
1

Make Changes

Update agent configuration, knowledge, or guidelines.
2

Run Batch Tests

Execute your full test suite.
3

Review Failures

Investigate any failed tests.
4

Fix Issues

Address problems found in testing.
5

Re-run Tests

Verify fixes and check for regressions.
6

Deploy

Once all tests pass, deploy with confidence.

Best Practices

Build Tests as You Develop

Don’t wait until the end:
  • Create tests as you build features
  • Save good playground conversations as tests
  • Add tests when bugs are found and fixed

Cover Critical Scenarios

Prioritize tests for:
  • Most common customer questions
  • Highest-impact scenarios
  • Previous issues/bugs
  • Guardrail triggers

Run Regularly

  • Before every deployment
  • After configuration changes
  • On a schedule (daily/weekly)

Keep Tests Updated

As your product changes:
  • Update expected outcomes
  • Add new test cases
  • Remove obsolete tests

Next Steps