What are Guardrails?
Guardrails are the safety net for your AI agents. They define boundaries and trigger actions when those boundaries are approached.Two Types
| Type | Purpose | When Triggered |
|---|---|---|
| Escalation Rules | Hand off to humans | Customer is upset, legal question, VIP, complexity |
| Restrictions | Block specific actions/topics | Forbidden topics, unsafe promises, confidential info |
Escalation Rules
Define when the agent should stop and hand off to a human: Examples:| Rule | Description |
|---|---|
| Angry Customer | Customer expresses frustration or mentions canceling |
| Legal Question | Questions about contracts, liability, or legal terms |
| VIP Customer | Conversations from enterprise or high-value accounts |
| Complex Issue | Agent cannot confidently resolve after attempts |
| Human Request | Customer explicitly asks to speak with a person |
- Agent stops processing
- Sends an appropriate message to the customer
- Creates an internal note with context
- Routes to human queue
Restrictions
Define absolute limits on what the agent cannot do: Examples:| Restriction | Response When Triggered |
|---|---|
| No refund promises | ”I’d love to help! Let me connect you with our billing team who can review refund requests.” |
| No competitor discussion | ”I’m focused on how we can help you. Let me tell you about what makes us great.” |
| No internal pricing | ”For custom pricing, please contact our sales team at [email protected]” |
| No medical/legal advice | ”I can’t provide medical/legal advice, but I can help you find appropriate resources.” |
Priority Order
Guardrails are evaluated in this order:Detection Methods
Guardrails can detect triggers using:| Method | Best For | Example |
|---|---|---|
| AI-based | Complex context, sentiment, intent | ”Customer is frustrated about repeated issues” |
| Keyword | Specific words or phrases | ”cancel”, “lawsuit”, “speak to manager” |
| Regex | Structured patterns | Email addresses, order numbers |
Testing Guardrails
Each guardrail includes a built-in playground:- Open the guardrail configuration
- Enter test messages
- See if the guardrail triggers
- Refine detection criteria
Guardrails vs Guidelines
| Guardrails | Guidelines |
|---|---|
| Safety constraints | Communication style |
| Hard limits | Soft guidance |
| Trigger actions (escalate, block) | Shape responses |
| Evaluated per-message | Always applied |
Common Guardrail Patterns
Safety Net
Catch situations that need human judgment:- Angry or threatening language
- Legal or compliance topics
- Account security concerns
- Complex edge cases
Brand Protection
Prevent brand damage:- Competitor discussions
- Unauthorized promises
- Confidential information
- Off-brand responses
Compliance
Meet regulatory requirements:- Privacy requests (GDPR, CCPA)
- Financial advice disclaimers
- Medical/legal limitations
- Age-restricted content