Replay Testing - Duckie

Replay testing lets you pull real conversations from connected support sources, choose a turn, and generate a fresh Duckie response for that point in the conversation. Use replay testing when you want to compare Duckie’s current behavior against what happened in an actual customer conversation.

When To Use Replay Testing

Replay testing is best for:

validating an agent against real customer phrasing
checking multi-turn context handling
testing fixes against a specific historical ticket
reviewing how a changed agent would respond to an old conversation
turning real examples into candidates for batch tests

Replay testing is not a scored regression suite. Use batch testing when you need repeatable runs, rubrics, and aggregate scores.

Supported Sources

Replay testing can fetch conversations from connected sources that provide conversation history:

Source	Fetch recent conversations	Fetch by ticket ID
Slack	Yes	Yes
Zendesk	Yes	Yes
HubSpot	Yes	Yes
Intercom	Yes	Yes
Freshdesk	Yes	No
Plain	Yes	No
Pylon	Yes	No
Discord	Yes	No

Duckie only shows sources that are connected for your organization.

Run A Replay Test

Open Replay Chats

Go to Test → Replay Chats.

Select an agent

Choose the active agent you want to test.

Select a connection

Choose the connected source to fetch conversations from.

Fetch conversations

Click Fetch Conversations. Duckie loads replayable conversations and splits them into turns.

Pick a conversation and turn

Select a conversation, then move through its turns with Prev Message and Next Message.

Generate Duckie's response

Click Generate Duckie Response to run the selected agent from that point in the conversation.

Compare the result

Compare the historical Expected response with the generated Duckie response.

Fetch A Specific Ticket

For Slack, Zendesk, HubSpot, and Intercom, you can fetch one conversation directly by ID.

Select a supported connection

Choose Slack, Zendesk, HubSpot, or Intercom.

Enter the ticket ID

Paste the ticket, thread, or conversation ID into the Ticket ID field.

Fetch the ticket

Click Fetch Ticket. Duckie adds the replayable conversation if the source returns one.

How Turns Work

Duckie splits each conversation into turns:

Part	What it contains
Conversation history	Messages before the selected customer turn
Current messages	One or more customer messages in the selected turn
Expected response	The historical agent response that followed those customer messages
Duckie response	The new response generated by the selected Duckie agent

Conversations without any historical agent response are filtered out because there is no expected response to compare against. When you replay later turns, Duckie uses the prior customer messages and prior agent responses as context. If you already generated Duckie responses for earlier turns, those generated responses become the prior assistant context for later turns.

Use Context Settings

Replay tests use the context panel to simulate source metadata for the run. Use the context panel to:

confirm the source Duckie should treat the conversation as coming from
add source-specific fields when needed
review internal-note and write-action settings for the run

Test Mode Safety

Replay tests run in testing mode, the same safety layer that protects playground and batch test runs from unintended external side effects. In testing mode:

write app tools, custom tools, and MCP tools are skipped
responder output is converted to internal notes where the source supports internal notes
Slack and Discord responder delivery is skipped because they do not have an internal note target

You can turn on Allow Write Actions in the context panel when you intentionally want to exercise write tools during a replay.

Review Replay History

The Replay Chats History panel shows replay runs created from the Replay Chats page. From history, you can:

load a previous replay run
open the run details drawer
inspect steps, tool calls, and generated messages
delete replay runs you no longer need

Turn Replays Into Regression Tests

Replay testing is useful for finding examples worth preserving. When a replay exposes behavior you want to keep checking, add the scenario to a batch test.

Find a useful replay

Replay a real conversation and identify the turn you want to preserve.

Open Batch Test

Go to Test → Batch Test and open or create a batch.

Add the scenario

Use Manage tickets to add a custom ticket with the relevant customer messages and expected response.

Run the batch

Run the batch whenever you need repeatable validation.

Next Steps

Playground

Test new scenarios interactively

Batch Testing

Create repeatable regression suites

​When To Use Replay Testing

​Supported Sources

​Run A Replay Test

​Fetch A Specific Ticket

​How Turns Work

​Use Context Settings

​Test Mode Safety

​Review Replay History

​Turn Replays Into Regression Tests

​Next Steps