> ## Documentation Index
> Fetch the complete documentation index at: https://docs.duckie.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Testing Overview

> Validate your agent before going live

Testing ensures your agent behaves correctly before it interacts with real customers. Duckie provides three testing approaches: replay testing, playground testing, and batch testing.

{/* Screenshot: Test section navigation showing Playground and Batch Test options */}

## Why Testing Matters

Before deploying to production:

| Risk                   | Prevention                        |
| ---------------------- | --------------------------------- |
| Inaccurate responses   | Test with real questions          |
| Guardrails not working | Test escalation triggers          |
| Poor tone or format    | Review against guidelines         |
| Missing knowledge      | Identify gaps before customers do |
| Broken tools           | Verify actions execute correctly  |

## Testing Methods

### Replay Testing

Real conversation replay:

* Pull conversations from connected support sources
* Compare historical expected responses with new Duckie responses
* Test a specific ticket by ID for supported sources
* Review generated runs and execution details

**Best for:**

* Validating behavior against real customer conversations
* Debugging a specific historical ticket
* Finding scenarios to preserve in batch tests

### Playground Testing

Interactive, real-time testing:

{/* Screenshot: Playground showing conversation and execution steps */}

* Chat directly with your agent
* See responses immediately
* View full execution details
* Iterate quickly on configuration

**Best for:**

* Development and debugging
* Exploring agent behavior
* Quick validation

### Batch Testing

Automated test suites:

{/* Screenshot: Batch test results showing scored tickets and runs */}

* Build batches from imported conversations or manual tickets
* Run all tests automatically
* Score results with rubrics
* Compare runs over time
* Catch regressions

**Best for:**

* Pre-deployment validation
* Regression testing
* Consistent quality checks

## Testing Workflow

```
┌─────────────────────┐
│ Configure Agent     │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ Test in Playground  │ ←─┐
└──────────┬──────────┘   │
           │              │ Iterate
           ▼              │
┌─────────────────────┐   │
│ Issues Found?       │───┘
└──────────┬──────────┘
           │ No
           ▼
┌─────────────────────┐
│ Replay Real         │
│ Conversations       │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ Create or Update    │
│ Batch Tests         │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ Deploy (Testing)    │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ Run Batch Before    │
│ Going Live          │
└─────────────────────┘
```

## What to Test

### Response Quality

* Are answers accurate?
* Is the tone appropriate?
* Is the format correct?
* Are guidelines being followed?

### Knowledge Access

* Does the agent find relevant information?
* Are the right sources being searched?
* Is knowledge used correctly in responses?

### Guardrails

* Do escalation rules trigger correctly?
* Are restrictions enforced?
* Does the agent respond appropriately when triggered?

### Tools and Actions

* Do tool calls succeed?
* Are parameters passed correctly?
* Do actions have the expected results?

### Edge Cases

* Unusual or ambiguous inputs
* Very long or very short messages
* Multiple questions in one message
* Off-topic requests

## Test Coverage Checklist

<Steps>
  <Step title="Happy Paths">
    Test common, expected scenarios that should work smoothly.
  </Step>

  <Step title="Edge Cases">
    Test unusual inputs, missing information, and boundary conditions.
  </Step>

  <Step title="Guardrails">
    Test messages that should trigger escalation or restrictions.
  </Step>

  <Step title="Knowledge Gaps">
    Test questions the agent might not know.
  </Step>

  <Step title="Tool Execution">
    Test scenarios that require tool calls.
  </Step>
</Steps>

## Next Steps

<CardGroup cols={2}>
  <Card title="Replay Testing" icon="history" href="/testing/replay-testing">
    Test against real historical conversations
  </Card>

  <Card title="Playground" icon="flask" href="/testing/playground">
    Interactive testing
  </Card>

  <Card title="Batch Testing" icon="list-check" href="/testing/batch-testing">
    Automated test suites
  </Card>
</CardGroup>
