TL;DR
- Driver support and passenger support require different QA metrics because the stakes, policies, and conversation goals are structurally different.
- A universal scorecard creates blind spots - what counts as a good resolution for a passenger complaint is the wrong benchmark for a driver earnings dispute.
- Configuring separate scorecards by queue, not just by agent team, is the most effective structural fix.
- Automated QA scoring at 100% coverage catches policy misses that manual sampling at 1-5% will always miss.
- Ride-hailing platforms running both human agents and AI chatbots need a scoring system that evaluates both on the same underlying standard.
Why can't ride-hailing platforms use one QA scorecard for all support?
Ride-hailing platforms are two-sided marketplaces, and that structural fact flows directly into customer service [2]. Passengers and drivers do not just have different problems - they have different relationships with the platform, different leverage, and different regulatory protections depending on the market [4]. A single scorecard treats these as the same, which means it optimises for neither.
Consider what "resolution" means in each queue:
- Passenger queue: Resolution typically means the rider feels fairly treated - a refund processed, a safety complaint logged, a booking issue corrected. Speed and empathy carry high weight.
- Driver queue: Resolution often means a policy outcome - a correctly applied earnings adjustment, an accurate account status decision, a document verification completed per local compliance rules [4]. Accuracy and policy adherence carry far higher weight than tone alone.
Scoring both queues on the same QA scorecard punishes driver-queue agents for being "too transactional" or rewards passenger-queue agents for being warm while missing a refund policy breach. Neither outcome improves quality.
"The measurement instrument you use determines what you improve. A shared scorecard optimises for the average, not the outcome that matters to each side of the market."
What QA metrics belong specifically in a driver support scorecard?
Building on the structural difference above, the harder question is which specific criteria should appear in a driver scorecard that should not appear - or should be weighted differently - in a passenger scorecard.
| QA Metric | Driver Support | Passenger Support |
|---|---|---|
| Policy accuracy on earnings/incentives | Critical - binary pass/fail | Not applicable |
| Account action correctness (suspend/restore) | Critical - binary pass/fail | Low relevance |
| Document verification compliance | Required in regulated markets [4] | Not applicable |
| Empathy and tone | Scored, but lower weight | High weight |
| Refund policy adherence | Low relevance | Critical - binary pass/fail |
| Safety incident escalation protocol | Moderate (driver-reported incidents) | Critical (passenger safety reports) [3] |
| First-contact resolution | Important, but may require escalation | High weight |
| Accurate explanation of surge/pricing rules | High weight (driver earnings disputes) | Moderate (passenger fare queries) [5] |
The key principle: driver support errors carry legal and financial risk to the business - a wrongly suspended driver account or an incorrect incentive calculation is not a service experience issue, it is a liability. That demands binary, auditable scoring on policy criteria, not just a satisfaction score.
How should passenger support QA rules be configured differently?
Stepping back from the compliance-heavy driver side, passenger support has its own set of non-negotiable criteria that driver scorecards would over-index on if shared.
Passenger support QA should prioritise:
- Safety escalation completeness: Any report involving in-trip safety - harassment, route deviation, unsafe driving - must trigger a defined escalation path. A missed escalation is a critical fail, not a coaching note [3].
- Refund and fare policy accuracy: Passengers interact with platform payments at the end of every trip [5]. An agent who misquotes or misapplies the refund policy creates chargeback exposure and erodes trust.
- Empathy in complaint handling: Research on ride-hailing service quality consistently finds that perceived fairness and responsiveness drive passenger satisfaction more than resolution speed alone [2]. Tone and acknowledgment belong in the scorecard with meaningful weight.
- Channel continuity: Passengers increasingly start interactions in-app and escalate to live support [1]. A QA scorecard that does not account for context-carrying - did the agent read the prior in-app interaction? - misses a major failure mode.
What is the practical step-by-step process for configuring two scorecards?
A related but distinct question is how to operationalise this split without creating two parallel QA systems that drift apart or become impossible to manage. The answer is to treat the scorecards as separate configurations of the same scoring engine, not separate manual processes.
- Audit your existing tickets by queue. Pull a sample of driver-queue and passenger-queue conversations separately. List every policy or SOP that was or should have been applied. These become your candidate criteria.
- Assign each criterion to exactly one scorecard. Shared criteria (e.g. professional language, response time SLA) can appear in both. Queue-specific criteria (e.g. earnings dispute accuracy, safety escalation) belong only in their respective scorecard.
- Set scoring type per criterion. Binary (pass/fail) for compliance and policy criteria. Scored (1-3 or 1-5) for quality criteria like empathy or explanation clarity. Avoid using scored scales on criteria where the only acceptable answer is "correct."
- Ingest your SOPs into the scoring engine by queue. A QA system that scores against your actual refund policy, your actual driver incentive terms, and your actual escalation protocol - retrieved at the point of evaluation - will produce far more accurate results than one scoring against generic benchmarks.
- Run both scorecards on 100% of conversations. Manual QA at 1-5% sampling will miss systematic policy misses hiding in the other 95%. Automated scoring closes that gap.
- Review calibration monthly. Criteria weights should shift as platform policies change - especially in regulated markets where driver licensing and surge pricing rules evolve [4].
How does this apply when AI chatbots handle part of the queue?
Building on the configuration framework above, ride-hailing platforms increasingly route first-contact passenger queries to AI chatbots while escalating complex driver issues to human agents [4]. This creates a quality blind spot if the scoring system only evaluates human agents.
The same two-scorecard logic applies to AI agents as to humans. A chatbot handling passenger fare queries should be scored on the same passenger scorecard criteria as a human rep. An AI handling driver document verification should be scored against driver-queue policy criteria. Consistency across agent type is what makes the QA data actionable - if the AI chatbot is failing the safety escalation criterion at a higher rate than human agents, that is a product issue, not a coaching issue, and it only surfaces when both are scored on the same QA scorecard.
Frequently Asked Questions
About Revelir AI
Revelir AI is the company behind RevelirQA, an AI customer service QA software platform built for global enterprises operating at high volume. RevelirQA scores 100% of service conversations against a business's own SOPs and QA scorecards - retrieved via RAG at the point of evaluation - replacing manual sampling that covers only 1-5% of tickets. The platform runs in production at enterprise clients including Xendit and Tiket.com, scoring thousands of conversations per week across English, Indonesian, Thai, and Tagalog. RevelirQA evaluates both human agents and AI chatbots under a single consistent scoring framework, giving CX and support operations teams a unified view of quality across their entire operation. Every evaluation carries a full audit trace - prompt, documents retrieved, and reasoning - making it suitable for regulated and compliance-sensitive industries.
Ready to configure separate QA scorecards for your driver and passenger support queues?
See how RevelirQA can score 100% of your conversations against your own policies - with full audit trails and multilingual support. Visit Revelir AI to learn more or get in touch.
References
- Step by Step Guide to Creating a Ride Hailing App Like Uber (www.abbacustechnologies.com)
- Measuring customer-perceived service quality in the ride-hailing industry: a generic approach for the development and validation of a multidimensional scale | Humanities and Social Sciences Communicat (www.nature.com)
- Common Safety Features Every Ride-Hailing App Should ... (www.radicalstart.com)
- 5 AI Agents in Ride-Hailing Transforming Mobility (2026) | Digiqt Blog (digiqt.com)
- U.S Ride Hailing Market Size, Share, Growth & Trends, 2034 (www.marketdataforecast.com)
