Most QA processes were built for human agents. AI chatbot conversations go unmonitored.
RevelirQA applies the same rubric to both. One scoring engine. One dashboard. Scores you can actually compare.
A human agent error affects their own ticket volume. An AI chatbot error runs across thousands of conversations before anyone notices.
In a sampling-based QA process, these patterns persist for days or weeks. With full coverage scoring, they show up within hours.
RevelirQA evaluates every conversation against your team's defined criteria: policy accuracy, tone, resolution quality, escalation compliance, and any custom metrics. The same evaluation runs whether the conversation was handled by a human or a chatbot.
| Criterion | What it measures | Why it matters specifically for AI |
|---|---|---|
| Policy accuracy | Did the agent apply the correct procedure for this contact reason? | One wrong answer gets replicated across thousands of tickets |
| Resolution quality | Was the issue actually resolved? | Hallucinations close tickets without solving the problem |
| Escalation compliance | Was the correct routing followed? | Missed escalations create CSAT risk and compliance exposure |
| Tone and empathy | Was the response framed appropriately? | AI defaults to clinical language in emotionally sensitive conversations |
| Response accuracy | Was the information correct per your knowledge base? | The most consequential failure mode; needs policy-aware scoring to catch |
Comparable scores across agent types give CX leaders data that two separate systems cannot produce:
"The team is incredibly responsive. Feedback turns into shipped features fast. It genuinely feels like we are building the product together."Lorens H., Xendit
RevelirQA is in production at Xendit and Tiket.com, both running blended human and AI support operations at high volume in English and Indonesian. Production deployments, not pilots.
Yes. RevelirQA evaluates conversation transcripts and works with any AI chatbot platform that produces a conversation log via API.
Yes. The same rubric applies by default for direct comparison. Teams can also add criteria specific to AI-handled conversations where evaluation needs differ.
Because every conversation is scored as it closes, systematic patterns appear within hours rather than at monthly review cycles.
Every score includes the model used, the exact prompt sent, the documents retrieved from your knowledge base via RAG, and the reasoning behind each criterion score. Available on every evaluation, not just flagged ones.
Yes. The full per-evaluation audit trail supports compliance reporting. RevelirQA is in production at Xendit, operating in Indonesian fintech under regulated requirements.
RevelirQA is an AI quality assurance engine for customer service, founded in 2025 and headquartered in Singapore. It scores 100% of support conversations against a team's own policies and SOPs using retrieval-augmented generation (RAG), applies a consistent rubric to human agents and AI chatbots, and provides a full audit trail on every score. In production at Xendit (Indonesian fintech) and Tiket.com (Indonesian travel). Multilingual scoring in English, Indonesian, Thai, and Tagalog. Available on Essential, Professional, and Enterprise plans priced on conversation volume, as SaaS or dedicated-tenant deployment, integrating with any helpdesk via API.