How Filipino, Thai, and Indonesian Service Teams Handle...

Filipino, Thai, and Indonesian service teams routinely blend local vernacular, code-switching, and culturally specific politeness markers into customer conversations. When a formal QA scorecard evaluates these interactions using criteria designed for standard English, it penalises linguistic behaviour that is actually correct, professional, and customer-appropriate in context. The fix is not to lower QA standards; it is to build scorecards that understand what "professional" looks like in each language and culture. This guide explains the patterns, the scoring pitfalls, and how enterprise CX teams can close the gap.

TL;DR

Code-switching and informal politeness markers are professionally normal in Filipino, Thai, and Indonesian service contexts, not quality failures.
Generic, English-first QA scorecards systematically misgrade agents working in these languages.
Each language has distinct formality structures that QA criteria must reflect to score fairly and accurately.
Indonesian regulations reinforce a formal/informal language distinction that has direct implications for written service communications ^[2].
AI-powered QA engines that ingest your own SOPs and score in the agent's actual language eliminate the bias that manual sampling and generic scorecards introduce.

About the Author: Revelir AI builds QA scoring infrastructure for enterprise customer service teams, with RevelirQA already scoring thousands of conversations per week in Indonesian and English for clients including Xendit and Tiket.com. The company's scoring engine is designed for multilingual, high-volume environments across Southeast Asia and globally.

Why Does Informal Language Create a Problem for Formal QA Scorecards?

The core tension here is between how "professionalism" is defined in a QA scorecard versus how it is expressed across different languages and cultures. Most enterprise QA frameworks were built around formal, written English: complete sentences, no contractions, explicit acknowledgment phrases, and a neutral register. Applied directly to Filipino, Thai, or Indonesian conversations, these criteria produce scores that punish agents for communicating correctly in their language.

Consider a Filipino service agent who writes "Hi po, okay po ba kayo?" The honorific "po" is a formal politeness particle; omitting it would actually read as rude to a Filipino customer. A scorecard that flags non-English particles as "informal language" has just penalised professionalism. Multiply this across thousands of tickets per week and QA data becomes unreliable as a coaching tool.

What Are the Specific Language Patterns CX Teams Need to Understand?

Building on the formality problem above, the harder question is exactly which patterns appear in each language and what they mean for scoring. The table below maps the most common ones.

Language	Pattern	What It Signals	Common QA Misreading
Filipino (Tagalog)	"Po/Opo" honorifics; Taglish code-switching (e.g. "I will check na po")	Formal respect; culturally standard blend of Tagalog and English	Flagged as informal or grammatically incomplete
Thai	Sentence-final particles "ครับ/ค่ะ" (khrap/kha); polite verb forms	Gender-appropriate formality; absence signals rudeness, not informality	Ignored entirely by English-based criteria; scored as neutral when absence should be flagged
Indonesian (Bahasa)	Formal "Anda" vs. colloquial "kamu"; Jakarta slang; regional dialect words	Clear register distinction between formal and colloquial Indonesian ^[1]	Colloquial forms scored the same as formal, missing a meaningful quality signal

Indonesian presents a particularly well-documented case. Bahasa Indonesia has a codified distinction between its formal and colloquial registers, and Indonesian law specifically requires formal language use in professional written communications ^[2]. This means an Indonesian service team that uses "kamu" instead of "Anda" in a written ticket is not just stylistically informal; in regulated or government-adjacent contexts, it may breach a documented communication standard.

How Does Code-Switching Affect QA Accuracy?

A related but distinct question is what happens when agents switch between two languages mid-conversation. Code-switching is not a quality failure; in markets like the Philippines, it is the natural register for professional digital communication ^[3]. Filipino agents frequently combine English and Tagalog in ways that feel both natural and respectful to customers. A QA framework that reads code-switching as "inconsistent language use" will generate misleading coaching flags.

The practical implication is that QA criteria need a separate consideration for code-switching:

Intentional and appropriate: Agent matches the customer's own language mix, signals fluency and rapport. Score neutral or positive.
Inconsistent and disjointed: Agent switches register without following the customer's lead, creating confusion. Flag for coaching.
Policy-driven language requirement: If an SOP requires a specific language for certain communication types (common in Indonesian regulated environments ^[2]), deviation should be flagged regardless of fluency.

What Should a Multilingual QA Scorecard Actually Include?

Stepping back from the linguistic detail, a separate practical concern is how to operationalise these distinctions in a scorecard that can be applied consistently at scale. Generic criteria like "professional tone" or "clear communication" are not enough; they require a human reviewer's subjective interpretation, which does not survive translation or volume.

A well-designed multilingual QA scorecard should specify:

Language-specific register criteria: Define what formal Indonesian looks like (use of "Anda," complete sentences, no Jakarta slang in written channels) separately from what formal Tagalog looks like.
Approved code-switching boundaries: Document which language combinations are acceptable in which channels, so agents have a clear standard and QA has a clear criterion.
Politeness marker requirements: For Thai, specify that "ครับ/ค่ะ" is required in written formal channels, not optional.
Policy-language alignment: Where local regulation or company SOP mandates a specific language for formal communications, that criterion should be a scored binary item, not a qualitative judgment ^[2].

This is where AI-powered QA has a structural advantage over manual review. A platform like RevelirQA ingests the company's own SOPs and knowledge base into a vector database, then retrieves the relevant policy before scoring each conversation. The AI is not guessing what "professional Indonesian" means; it is checking the conversation against the company's documented standard, in the agent's actual language. Xendit and Tiket.com run this process across thousands of tickets per week, catching policy misses that would never surface in a 1-5% manual sample.

What Are the Operational Risks of Getting This Wrong?

Building on the scoring accuracy problem, the downstream risks extend beyond bad QA data. When agents in Manila, Bangkok, or Jakarta are consistently scored down for culturally correct behaviour, several things happen:

Coaching feedback loses credibility; agents stop trusting the QA system.
Performance data becomes skewed by language rather than actual quality, making it harder to identify genuine policy misses.
High performers who communicate well in their language appear lower-ranked than they should, distorting workforce decisions.
In Indonesian regulated contexts, the inverse risk also exists: agents using colloquial language in formal written communications may be in breach of documented policy without anyone catching it ^[2].

Frequently Asked Questions

Can a single QA scorecard cover multiple languages, or do you need a separate one per language?

You can use one scorecard structure with language-specific criteria modules. The core evaluation dimensions (policy compliance, resolution accuracy, tone) stay consistent. What changes are the specific definitions within each dimension, because "formal tone" looks different in Thai versus Indonesian.

Is code-switching in Filipino customer service a compliance risk?

Generally no, unless an SOP explicitly mandates a single language for a specific channel. Taglish is the de facto professional register for digital customer service in the Philippines and should be treated as such in QA criteria ^[3].

Does Indonesian law require formal Bahasa Indonesia in all customer service communications?

Indonesian regulations require formal Indonesian in professional written communications, particularly in contexts involving government institutions or formal agreements ^[2] ^[4]. Customer service teams in regulated industries should check whether their communications fall under these requirements and document the standard in their SOPs accordingly.

How does an AI QA engine handle Thai politeness particles like "ครับ/ค่ะ"?

An AI scoring engine trained or prompted on Thai language conventions can check for the presence or absence of these particles as a scored criterion. The key is that the criterion must be explicitly defined in the QA scorecard; an AI that is only checking for generic "polite language" will not catch missing register markers.

What is the biggest mistake enterprise CX teams make when rolling out QA in Southeast Asia?

Applying an English-first scorecard without localising the criteria. This is not a translation problem; it is a criteria design problem. The fix is to involve local team leads in defining what "professional" and "compliant" means in each language before scoring begins.

How does RevelirQA handle multilingual scoring at scale?

RevelirQA scores conversations in the agent's actual language, including Indonesian and Tagalog, against the company's own policies retrieved via RAG before each evaluation. Every score carries a full reasoning trace, so QA managers can see exactly why a specific linguistic choice was flagged or passed.

About Revelir AI: Revelir AI builds RevelirQA, an AI quality assurance platform for customer service teams that need to move beyond manual sampling. RevelirQA scores 100% of service conversations against each client's own policies and QA scorecard, with full AI observability on every evaluation. The platform is in production at Xendit and Tiket.com, scoring thousands of tickets per week in multilingual environments across Indonesian-language, English, Thai, and Tagalog. RevelirQA evaluates both human agents and AI systems, giving CX leaders a single, consistent view of quality across their entire service operation. Revelir AI is headquartered in Singapore and built for global enterprise deployment.

Ready to build a QA framework that actually works in Filipino, Thai, or Indonesian?

Talk to the team at Revelir AI to see how RevelirQA scores multilingual conversations against your own policies, at full volume, with a complete audit trail on every score.

References

Colloquial and Formal Indonesian (indonesian-online.com)
Requirement to use the Indonesian language is regulated further - In-House Community (www.inhousecommunity.com)
Comparative Study of Indonesian Pre-Service Teachers' Challenges and Strategies in Thailand and Australia | Acuity: Journal of English Language Pedagogy, Literature and Culture (jurnal.unai.edu)
Indonesia - The Requirement To Use The Indonesian Language Is Regulated Further. - Conventus Law (conventuslaw.com)

How Filipino, Thai, and Indonesian Support Teams Handle Informal Language in Formal QA Frameworks - A Practical Guide for Enterprise CX Leaders