What Happens to Your AI QA Scores When You Switch Helpdesks?

When you migrate helpdesks, your AI QA scores do not automatically carry over. Scorecards, scoring logic, historical benchmarks, and agent baselines are all tied to your previous platform's data structure. Without a deliberate migration plan, you lose continuity, create blind spots in agent performance tracking, and risk comparing scores that were never measuring the same things. The fix is not technical heroics - it is sequencing the right steps before, during, and after the cutover.

TL;DR

AI QA scores are not portable by default - they depend on data schemas, scoring logic, and policy documents tied to your old helpdesk.
The biggest migration risk is silent score drift: your numbers look consistent but are measuring different things before and after cutover.
Export your QA scorecards, benchmarks, and reasoning traces before you cut over - these are your migration anchor points.
Run both helpdesks in parallel for at least two to four weeks to validate score continuity before switching off the old system.
Platforms that score against your own ingested SOPs (not platform-native rubrics) are significantly easier to migrate because the scoring logic travels with your policies, not with the helpdesk.

About the Author: Revelir AI builds RevelirQA, an AI scoring engine deployed in production at high-volume operations including Xendit and Tiket.com. Revelir's team works directly with support operations leaders navigating helpdesk transitions while maintaining unbroken QA coverage across thousands of conversations per week.

Why do AI QA scores break during a helpdesk migration?

The core problem is that most AI QA tools bind their scoring logic to the helpdesk's own data layer. When you move to a new platform, the ticket IDs change, metadata fields shift, and the historical context that calibrated your scores disappears. Traditional manual QA already only covers one to five percent of conversations ^[3] - a migration compounds that problem by resetting whatever baseline you had built.

Three specific failure modes cause the most damage:

Schema mismatch: Fields like "ticket tags," "channel," or "resolution type" are named and structured differently across helpdesks. Scoring rules that referenced those fields silently stop applying.
Benchmark discontinuity: Agent scores from your old platform are not comparable to scores on the new one unless the rubric and input data are identical. A 90% score on Zendesk does not equal a 90% score on Salesforce unless you prove they were measured the same way.
Policy document drift: If your QA tool was scoring against a knowledge base synced to the old helpdesk, that sync breaks at cutover. The AI starts scoring against outdated or missing policies ^[2].

What should you export before you cut over?

Building on those failure modes, the pre-migration export is where most teams underinvest. The goal is to capture every artifact your QA scoring depends on, not just the ticket data itself.

Artifact	Why It Matters for QA	Format to Export
QA scorecards	The rubric definition - binary, scored, or multi-option criteria	Structured JSON or CSV with criteria weights
Agent performance baselines	Rolling averages you will compare against post-migration	Per-agent, per-metric time series (minimum 90 days)
Reasoning traces	Audit trail proving how past scores were justified	Raw log exports, one row per evaluated ticket
Policy and SOP documents	The source of truth your AI scored against	Version-controlled document set with last-updated dates
Flagged conversation samples	Ground truth for recalibrating scores on the new platform	100+ manually reviewed tickets across channels

The reasoning traces deserve special emphasis. If your QA tool does not expose a full audit trail showing which prompt, which documents, and which logic produced each score, you cannot prove score continuity to auditors or regulators after the migration. For fintech and other regulated operations, this is not optional.

How do you run a parallel validation period without doubling your QA workload?

A separate but related concern is the operational cost of running two systems at once. The answer is to scope the parallel window tightly rather than running both platforms indefinitely.

A practical parallel validation approach:

Select a representative ticket sample. Route the same live conversations through both the old and new QA scoring pipelines for two to four weeks. Aim for coverage across all channels and contact reasons, not just high-volume ones.
Define your acceptable variance threshold before you start. Decide in advance that a score difference of more than a set number of points on the same ticket flags a scoring inconsistency worth investigating. Without a threshold, every comparison becomes a debate.
Check for systematic bias, not just average scores. A new platform might score one channel type consistently lower. Averages can mask channel-level drift.
Validate your SOP ingestion. Run your flagged ground-truth ticket sample through the new system and confirm the AI is retrieving the correct policy documents before scoring ^[2].
Sign off by contact reason, not globally. Approve migration for each ticket category independently. A smooth cutover for billing tickets does not mean your returns process is scoring correctly.

Which QA architectures survive helpdesk migrations better than others?

Not all AI QA tools migrate equally well. The architectural difference that matters most is whether scoring logic lives inside the helpdesk or outside it.

Helpdesk-native QA tools (scoring built into Zendesk, Salesforce, etc.) are tightly coupled to that platform's data model. Migration means rebuilding scoring rules from scratch on the new platform ^[5].
API-connected QA scoring engines that ingest your own SOPs and scorecards sit outside the helpdesk layer. When you switch helpdesks, you reconnect the API and the scoring logic remains intact because it was never stored in the helpdesk to begin with ^[1].

RevelirQA follows the second model. It connects to any helpdesk via API and scores conversations against your SOPs ingested into its own vector database. When a client migrates helpdesks, the scoring rubric, policy documents, and reasoning logic stay in Revelir's layer - only the data connection changes. Xendit and Tiket.com each run thousands of tickets per week through RevelirQA across English, Indonesian, Thai, and Tagalog, and this separation means platform changes at the helpdesk level do not invalidate historical QA benchmarks.

How do you re-establish agent baselines after the migration?

Even with a clean technical migration, agent baselines require a deliberate reset process. Scores from before the cutover should not be averaged with post-cutover scores until you have validated they are comparable.

Establish a "migration epoch" in your reporting - a clear date boundary separating pre- and post-migration data.
Build a new 30-day baseline on the new platform before drawing performance conclusions.
Use your exported ground-truth sample to run a calibration session: have QA reviewers manually score the same tickets on the new platform and compare against the AI's output ^[3].
Where AI scores human and AI agents on the same rubric, confirm both are recalibrated - a migration that resets human agent baselines but leaves chatbot scores unchanged creates a misleading quality comparison ^[6].

Frequently Asked Questions

Will my historical QA scores still be accurate after a helpdesk migration? Historical scores remain accurate as a record of past performance, but they are not directly comparable to post-migration scores unless you confirm the rubric, policy documents, and scoring logic are identical on both platforms. Always label pre- and post-migration data separately in your reporting.

How long should I run a parallel validation period? Two to four weeks is a practical minimum for most operations. Extend it if you have high seasonal variance, multiple channels, or regulatory requirements that demand auditable score continuity.

Does switching helpdesks affect AI QA scoring of chatbot conversations? Yes. AI agent conversations are scored using the same pipeline as human agent tickets in tools like RevelirQA. If the data connection to the helpdesk breaks or the SOP documents are not re-ingested, chatbot scoring is affected equally ^[6].

What is the biggest mistake support operations teams make during a QA migration? Treating QA migration as a post-cutover task. Teams that rebuild scorecards and re-ingest policy documents after going live on the new platform create a gap of days or weeks with no reliable QA coverage - precisely when agents are learning a new system and quality risk is highest.

Can I migrate QA scoring without any score disruption? Complete continuity is achievable if your QA tool stores scoring logic outside the helpdesk layer and you re-ingest policy documents before cutover. Some variance during the first week is normal as data flows stabilise. The goal is to make any variance measurable and explainable, not invisible.

How do I explain score changes to agents after a migration? Be transparent about the migration epoch. Share the parallel validation results with your team leads so they can explain that scores are being re-baselined, not arbitrarily changed. Agents who see their scores shift without explanation lose trust in QA - which is harder to rebuild than the scores themselves ^[4].

What if my new helpdesk does not support the same ticket fields my QA rules referenced? Audit your scoring criteria against the new platform's data schema before cutover. Any criterion that referenced a field that no longer exists needs to be rewritten. This is a good moment to review whether those criteria were actually driving quality outcomes or were just legacy artefacts ^[2].

About Revelir AI
Revelir AI builds RevelirQA, an AI scoring engine that evaluates 100% of customer service conversations against a company's own policies and QA scorecards. Unlike manual review, which covers only a fraction of tickets, RevelirQA scores every conversation and produces a full reasoning trace behind each result - making it auditable for compliance-critical industries. The platform connects to any helpdesk via API, meaning scoring logic is portable across platform migrations. RevelirQA is in production at Xendit and Tiket.com, handling thousands of tickets per week across English, Indonesian, Thai, and Tagalog.

Planning a helpdesk migration and want to protect your QA continuity?

Talk to the Revelir AI team about how RevelirQA can keep your scoring logic intact through the transition.

Visit Revelir AI at revelir.ai

References

Best AI Tools for Support QA & Coaching in 2026 | IrisAgent (irisagent.com)
Getting started with Zendesk QA: Admin guide - Zendesk help (support.zendesk.com)
Best AI QA Software for Customer Service (2026 Buyer's Guide) (www.intryc.com)
The Death of the QA Score (www.maestroqa.com)
Best Analytics & QA AI Tools for Zendesk in 2026: Complete Guide - Best Analytics & QA AI Tools for Zendesk in 2026: Complete Guide (www.getmacha.com)
Evaluating the performance of AI agents using Zendesk QA (www.eesel.ai)

What Happens to Your AI QA Scores When You Switch Helpdesks? A Migration Playbook for Support Operations Leaders