Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.duckie.ai/llms.txt

Use this file to discover all available pages before exploring further.

Testing ensures your agent behaves correctly before it interacts with real customers. Duckie provides three testing approaches: replay testing, playground testing, and batch testing.

Why Testing Matters

Before deploying to production:
RiskPrevention
Inaccurate responsesTest with real questions
Guardrails not workingTest escalation triggers
Poor tone or formatReview against guidelines
Missing knowledgeIdentify gaps before customers do
Broken toolsVerify actions execute correctly

Testing Methods

Replay Testing

Real conversation replay:
  • Pull conversations from connected support sources
  • Compare historical expected responses with new Duckie responses
  • Test a specific ticket by ID for supported sources
  • Review generated runs and execution details
Best for:
  • Validating behavior against real customer conversations
  • Debugging a specific historical ticket
  • Finding scenarios to preserve in batch tests

Playground Testing

Interactive, real-time testing:
  • Chat directly with your agent
  • See responses immediately
  • View full execution details
  • Iterate quickly on configuration
Best for:
  • Development and debugging
  • Exploring agent behavior
  • Quick validation

Batch Testing

Automated test suites:
  • Build batches from imported conversations or manual tickets
  • Run all tests automatically
  • Score results with rubrics
  • Compare runs over time
  • Catch regressions
Best for:
  • Pre-deployment validation
  • Regression testing
  • Consistent quality checks

Testing Workflow

┌─────────────────────┐
│ Configure Agent     │
└──────────┬──────────┘


┌─────────────────────┐
│ Test in Playground  │ ←─┐
└──────────┬──────────┘   │
           │              │ Iterate
           ▼              │
┌─────────────────────┐   │
│ Issues Found?       │───┘
└──────────┬──────────┘
           │ No

┌─────────────────────┐
│ Replay Real         │
│ Conversations       │
└──────────┬──────────┘


┌─────────────────────┐
│ Create or Update    │
│ Batch Tests         │
└──────────┬──────────┘


┌─────────────────────┐
│ Deploy (Testing)    │
└──────────┬──────────┘


┌─────────────────────┐
│ Run Batch Before    │
│ Going Live          │
└─────────────────────┘

What to Test

Response Quality

  • Are answers accurate?
  • Is the tone appropriate?
  • Is the format correct?
  • Are guidelines being followed?

Knowledge Access

  • Does the agent find relevant information?
  • Are the right sources being searched?
  • Is knowledge used correctly in responses?

Guardrails

  • Do escalation rules trigger correctly?
  • Are restrictions enforced?
  • Does the agent respond appropriately when triggered?

Tools and Actions

  • Do tool calls succeed?
  • Are parameters passed correctly?
  • Do actions have the expected results?

Edge Cases

  • Unusual or ambiguous inputs
  • Very long or very short messages
  • Multiple questions in one message
  • Off-topic requests

Test Coverage Checklist

1

Happy Paths

Test common, expected scenarios that should work smoothly.
2

Edge Cases

Test unusual inputs, missing information, and boundary conditions.
3

Guardrails

Test messages that should trigger escalation or restrictions.
4

Knowledge Gaps

Test questions the agent might not know.
5

Tool Execution

Test scenarios that require tool calls.

Next Steps

Replay Testing

Test against real historical conversations

Playground

Interactive testing

Batch Testing

Automated test suites