One QA Rubric for Human Agents and AI Chatbots

Most QA processes were built for human agents. AI chatbot conversations go unmonitored.

RevelirQA applies the same rubric to both. One scoring engine. One dashboard. Scores you can actually compare.

Get full visibility across your human and AI support operationBook a Demo

100%

of AI and human conversations scored

Hours

to detect a systematic AI error, not weeks

rubric applied to all agent types

Why Do AI Chatbots Need Quality Assurance?

A human agent error affects their own ticket volume. An AI chatbot error runs across thousands of conversations before anyone notices.

Wrong policy applied at scale: incorrect refund window, cancellation terms, or eligibility criteria repeated on every matching ticket
Hallucinated answers: confident, plausible-sounding responses not grounded in your knowledge base
Missed escalations: conversations routed by the AI that should have gone to a human specialist
Tone problems: technically correct responses that are off-brand in sensitive situations

In a sampling-based QA process, these patterns persist for days or weeks. With full coverage scoring, they show up within hours.

How Does One Rubric Cover Both Agent Types?

RevelirQA evaluates every conversation against your team's defined criteria: policy accuracy, tone, resolution quality, escalation compliance, and any custom metrics. The same evaluation runs whether the conversation was handled by a human or a chatbot.

Criterion	What it measures	Why it matters specifically for AI
Policy accuracy	Did the agent apply the correct procedure for this contact reason?	One wrong answer gets replicated across thousands of tickets
Resolution quality	Was the issue actually resolved?	Hallucinations close tickets without solving the problem
Escalation compliance	Was the correct routing followed?	Missed escalations create CSAT risk and compliance exposure
Tone and empathy	Was the response framed appropriately?	AI defaults to clinical language in emotionally sensitive conversations
Response accuracy	Was the information correct per your knowledge base?	The most consequential failure mode; needs policy-aware scoring to catch

What Can You Do With Unified Scores Across Agent Types?

Comparable scores across agent types give CX leaders data that two separate systems cannot produce:

Which contact reasons the AI handles better or worse than human agents
Where AI errors cluster: policy accuracy, escalation, or tone
Whether a pattern needs a knowledge base update, a prompt change, or a new escalation rule

"The team is incredibly responsive. Feedback turns into shipped features fast. It genuinely feels like we are building the product together."

Lorens H., Xendit

Where Is This Running in Production?

RevelirQA is in production at Xendit and Tiket.com, both running blended human and AI support operations at high volume in English and Indonesian. Production deployments, not pilots.

Get full visibility across your human and AI support operation

Book a Demo

Frequently Asked Questions

Can RevelirQA evaluate any AI chatbot platform?

Yes. RevelirQA evaluates conversation transcripts and works with any AI chatbot platform that produces a conversation log via API.

Can the rubric be configured differently for AI agents versus human agents?

Yes. The same rubric applies by default for direct comparison. Teams can also add criteria specific to AI-handled conversations where evaluation needs differ.

How quickly does a systematic AI error show up in the dashboard?

Because every conversation is scored as it closes, systematic patterns appear within hours rather than at monthly review cycles.

What is in the audit trail for an AI chatbot evaluation?

Every score includes the model used, the exact prompt sent, the documents retrieved from your knowledge base via RAG, and the reasoning behind each criterion score. Available on every evaluation, not just flagged ones.

Is RevelirQA suitable for fintech teams with compliance requirements around AI monitoring?

Yes. The full per-evaluation audit trail supports compliance reporting. RevelirQA is in production at Xendit, operating in Indonesian fintech under regulated requirements.

About RevelirQA

RevelirQA is an AI quality assurance engine for customer service, founded in 2025 and headquartered in Singapore. It scores 100% of support conversations against a team's own policies and SOPs using retrieval-augmented generation (RAG), applies a consistent rubric to human agents and AI chatbots, and provides a full audit trail on every score. In production at Xendit (Indonesian fintech) and Tiket.com (Indonesian travel). Multilingual scoring in English, Indonesian, Thai, and Tagalog. Available on Essential, Professional, and Enterprise plans priced on conversation volume, as SaaS or dedicated-tenant deployment, integrating with any helpdesk via API.