EXTENDED_MYANMAR | Contains Extended Myanmar/Shan/Mon/Karen characters (U+1050-U+109F, Extended-A/B) | Encoding |
ZAWGYI_YA_ASAT | Zawgyi ya-medial used as pseudo-asat (e.g., ငျး) | Encoding |
ZAWGYI_YA_TERMINAL | Zawgyi ya-medial at word-final position | Encoding |
ZAWGYI_YA_RA | Zawgyi ya+ra medial combination | Encoding |
ASAT_BEFORE_VOWEL | Asat (်) appears before a vowel sign (invalid ordering) | Structural |
INCOMPLETE_VOWEL | Incomplete vowel pattern (e.g., vowel before asat, missing u-vowel in O-vowel) | Structural |
DIGIT_TONE | Myanmar digit followed by tone mark | Structural |
SCRAMBLED_ORDER | Scrambled character sequence (e.g., vowel-asat-vowel) | Structural |
INVALID_START | Word starts with invalid character (not consonant, independent vowel, or digit) | Structural |
DOUBLED_DIACRITIC | Doubled vowel, medial, or invalid tone sequence | Structural |
VIRAMA_AT_END | Virama (္) at end of word (incomplete stacking) | Structural |
EMPTY_OR_WHITESPACE | Empty or whitespace-only input | Structural |
KNOWN_INVALID | Word is in the curated known-invalid words list | Quality |
FRAGMENT_PATTERN | Segmentation fragment (consonant + asat/tone only) | Segmentation |
DOUBLE_ENDING | Double-ending artifact (e.g., valid word + fragment merged) | Segmentation |
INCOMPLETE_WORD | Incomplete word (ends with medial, incomplete stacking, or bare consonant after medial) | Segmentation |
MIXED_LETTER_NUMERAL | Mixed Myanmar letter and numeral (should be split) | Quality |
ASAT_INITIAL | Asat-initial fragment (consonant+asat at word start) | Segmentation |
COMPOUND_TRUNCATED | Compound word with truncated ending | Quality |
MISSING_E_VOWEL | Missing ေ in ောင pattern (common typo) | Quality |
PURE_NUMERAL | Pure Myanmar numeral sequence (not a word) | Quality |
DOUBLED_CONSONANT | Two identical consonants only (segmentation artifact) | Quality |
INVALID_VOWEL_SEQUENCE_SYLLABLE | Invalid vowel sequence (e.g., doubled i-vowels, ာု) | Structural |
BARE_CONSONANT_END | Word ends with bare consonant without asat | Segmentation |
STACKED_CONSONANT_START | Word starts with stacked consonant marker (္) | Segmentation |
MEDIAL_START | Word starts with a medial (ျ ြ ွ ှ) | Segmentation |
DEPENDENT_VOWEL_START | Word starts with a dependent vowel sign | Segmentation |
GREAT_SA_START | Word starts with Great Sa (ဿ) | Segmentation |
ASAT_ANUSVARA_SEQUENCE | Contains phonetically impossible ်ံ sequence | Segmentation |
DOUBLED_INDEPENDENT_VOWEL | Two identical independent vowels (OCR error) | Segmentation |