TL;DR
- AI QA scores are not portable by default - they depend on data schemas, scoring logic, and policy documents tied to your old helpdesk.
- The biggest migration risk is silent score drift: your numbers look consistent but are measuring different things before and after cutover.
- Export your QA scorecards, benchmarks, and reasoning traces before you cut over - these are your migration anchor points.
- Run both helpdesks in parallel for at least two to four weeks to validate score continuity before switching off the old system.
- Platforms that score against your own ingested SOPs (not platform-native rubrics) are significantly easier to migrate because the scoring logic travels with your policies, not with the helpdesk.
Why do AI QA scores break during a helpdesk migration?
The core problem is that most AI QA tools bind their scoring logic to the helpdesk's own data layer. When you move to a new platform, the ticket IDs change, metadata fields shift, and the historical context that calibrated your scores disappears. Traditional manual QA already only covers one to five percent of conversations [3] - a migration compounds that problem by resetting whatever baseline you had built.
Three specific failure modes cause the most damage:
- Schema mismatch: Fields like "ticket tags," "channel," or "resolution type" are named and structured differently across helpdesks. Scoring rules that referenced those fields silently stop applying.
- Benchmark discontinuity: Agent scores from your old platform are not comparable to scores on the new one unless the rubric and input data are identical. A 90% score on Zendesk does not equal a 90% score on Salesforce unless you prove they were measured the same way.
- Policy document drift: If your QA tool was scoring against a knowledge base synced to the old helpdesk, that sync breaks at cutover. The AI starts scoring against outdated or missing policies [2].
What should you export before you cut over?
Building on those failure modes, the pre-migration export is where most teams underinvest. The goal is to capture every artifact your QA scoring depends on, not just the ticket data itself.
| Artifact | Why It Matters for QA | Format to Export |
|---|---|---|
| QA scorecards | The rubric definition - binary, scored, or multi-option criteria | Structured JSON or CSV with criteria weights |
| Agent performance baselines | Rolling averages you will compare against post-migration | Per-agent, per-metric time series (minimum 90 days) |
| Reasoning traces | Audit trail proving how past scores were justified | Raw log exports, one row per evaluated ticket |
| Policy and SOP documents | The source of truth your AI scored against | Version-controlled document set with last-updated dates |
| Flagged conversation samples | Ground truth for recalibrating scores on the new platform | 100+ manually reviewed tickets across channels |
The reasoning traces deserve special emphasis. If your QA tool does not expose a full audit trail showing which prompt, which documents, and which logic produced each score, you cannot prove score continuity to auditors or regulators after the migration. For fintech and other regulated operations, this is not optional.
How do you run a parallel validation period without doubling your QA workload?
A separate but related concern is the operational cost of running two systems at once. The answer is to scope the parallel window tightly rather than running both platforms indefinitely.
A practical parallel validation approach:
- Select a representative ticket sample. Route the same live conversations through both the old and new QA scoring pipelines for two to four weeks. Aim for coverage across all channels and contact reasons, not just high-volume ones.
- Define your acceptable variance threshold before you start. Decide in advance that a score difference of more than a set number of points on the same ticket flags a scoring inconsistency worth investigating. Without a threshold, every comparison becomes a debate.
- Check for systematic bias, not just average scores. A new platform might score one channel type consistently lower. Averages can mask channel-level drift.
- Validate your SOP ingestion. Run your flagged ground-truth ticket sample through the new system and confirm the AI is retrieving the correct policy documents before scoring [2].
- Sign off by contact reason, not globally. Approve migration for each ticket category independently. A smooth cutover for billing tickets does not mean your returns process is scoring correctly.
Which QA architectures survive helpdesk migrations better than others?
Not all AI QA tools migrate equally well. The architectural difference that matters most is whether scoring logic lives inside the helpdesk or outside it.
- Helpdesk-native QA tools (scoring built into Zendesk, Salesforce, etc.) are tightly coupled to that platform's data model. Migration means rebuilding scoring rules from scratch on the new platform [5].
- API-connected QA scoring engines that ingest your own SOPs and scorecards sit outside the helpdesk layer. When you switch helpdesks, you reconnect the API and the scoring logic remains intact because it was never stored in the helpdesk to begin with [1].
RevelirQA follows the second model. It connects to any helpdesk via API and scores conversations against your SOPs ingested into its own vector database. When a client migrates helpdesks, the scoring rubric, policy documents, and reasoning logic stay in Revelir's layer - only the data connection changes. Xendit and Tiket.com each run thousands of tickets per week through RevelirQA across English, Indonesian, Thai, and Tagalog, and this separation means platform changes at the helpdesk level do not invalidate historical QA benchmarks.
How do you re-establish agent baselines after the migration?
Even with a clean technical migration, agent baselines require a deliberate reset process. Scores from before the cutover should not be averaged with post-cutover scores until you have validated they are comparable.
- Establish a "migration epoch" in your reporting - a clear date boundary separating pre- and post-migration data.
- Build a new 30-day baseline on the new platform before drawing performance conclusions.
- Use your exported ground-truth sample to run a calibration session: have QA reviewers manually score the same tickets on the new platform and compare against the AI's output [3].
- Where AI scores human and AI agents on the same rubric, confirm both are recalibrated - a migration that resets human agent baselines but leaves chatbot scores unchanged creates a misleading quality comparison [6].
Frequently Asked Questions
Revelir AI builds RevelirQA, an AI scoring engine that evaluates 100% of customer service conversations against a company's own policies and QA scorecards. Unlike manual review, which covers only a fraction of tickets, RevelirQA scores every conversation and produces a full reasoning trace behind each result - making it auditable for compliance-critical industries. The platform connects to any helpdesk via API, meaning scoring logic is portable across platform migrations. RevelirQA is in production at Xendit and Tiket.com, handling thousands of tickets per week across English, Indonesian, Thai, and Tagalog.
Planning a helpdesk migration and want to protect your QA continuity?
Talk to the Revelir AI team about how RevelirQA can keep your scoring logic intact through the transition.
Visit Revelir AI at revelir.ai
References
- Best AI Tools for Support QA & Coaching in 2026 | IrisAgent (irisagent.com)
- Getting started with Zendesk QA: Admin guide - Zendesk help (support.zendesk.com)
- Best AI QA Software for Customer Service (2026 Buyer's Guide) (www.intryc.com)
- The Death of the QA Score (www.maestroqa.com)
- Best Analytics & QA AI Tools for Zendesk in 2026: Complete Guide - Best Analytics & QA AI Tools for Zendesk in 2026: Complete Guide (www.getmacha.com)
- Evaluating the performance of AI agents using Zendesk QA (www.eesel.ai)
