Testing Overview

Testing ensures your agent behaves correctly before it interacts with real customers. Duckie provides two testing approaches: interactive playground testing and automated batch testing.

Why Testing Matters

Before deploying to production:

Risk	Prevention
Inaccurate responses	Test with real questions
Guardrails not working	Test escalation triggers
Poor tone or format	Review against guidelines
Missing knowledge	Identify gaps before customers do
Broken tools	Verify actions execute correctly

Testing Methods

Playground Testing

Interactive, real-time testing:

Chat directly with your agent
See responses immediately
View full execution details
Iterate quickly on configuration

Best for:

Development and debugging
Exploring agent behavior
Quick validation

Batch Testing

Automated test suites:

Define test cases with expected outcomes
Run all tests automatically
Compare results over time
Catch regressions

Best for:

Pre-deployment validation
Regression testing
Consistent quality checks

Testing Workflow

┌─────────────────────┐
│ Configure Agent     │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ Test in Playground  │ ←─┐
└──────────┬──────────┘   │
           │              │ Iterate
           ▼              │
┌─────────────────────┐   │
│ Issues Found?       │───┘
└──────────┬──────────┘
           │ No
           ▼
┌─────────────────────┐
│ Save as Batch Tests │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ Deploy (Shadow)     │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ Run Batch Before    │
│ Going Live          │
└─────────────────────┘

What to Test

Response Quality

Are answers accurate?
Is the tone appropriate?
Is the format correct?
Are guidelines being followed?

Knowledge Access

Does the agent find relevant information?
Are the right sources being searched?
Is knowledge used correctly in responses?

Guardrails

Do escalation rules trigger correctly?
Are restrictions enforced?
Does the agent respond appropriately when triggered?

Tools and Actions

Do tool calls succeed?
Are parameters passed correctly?
Do actions have the expected results?

Edge Cases

Unusual or ambiguous inputs
Very long or very short messages
Multiple questions in one message
Off-topic requests

Test Coverage Checklist

Happy Paths

Test common, expected scenarios that should work smoothly.

Edge Cases

Test unusual inputs, missing information, and boundary conditions.

Guardrails

Test messages that should trigger escalation or restrictions.

Knowledge Gaps

Test questions the agent might not know.

Tool Execution

Test scenarios that require tool calls.

Next Steps

Playground

Interactive testing

Batch Testing

Automated test suites

Getting Started

Building Agents

Training Your Agent

Tagging & Classification

Deploying

Testing

Analytics

Settings

Why Testing Matters

Testing Methods

Playground Testing

Batch Testing

Testing Workflow

What to Test

Response Quality

Knowledge Access

Guardrails

Tools and Actions

Edge Cases

Test Coverage Checklist

Next Steps

Playground

Batch Testing

Getting Started

Building Agents

Training Your Agent

Tagging & Classification

Deploying

Testing

Analytics

Settings

​Why Testing Matters

​Testing Methods

​Playground Testing

​Batch Testing

​Testing Workflow

​What to Test

​Response Quality

​Knowledge Access

​Guardrails

​Tools and Actions

​Edge Cases

​Test Coverage Checklist

​Next Steps

Playground

Batch Testing

Why Testing Matters

Testing Methods

Playground Testing

Batch Testing

Testing Workflow

What to Test

Response Quality

Knowledge Access

Guardrails

Tools and Actions

Edge Cases

Test Coverage Checklist

Next Steps