myspellchecker includes a lightweight, heuristic-based NER system to detect and skip spell checking for potential named entities.
Approach: Heuristic & Rule-Based
Unlike heavy machine learning models (like BERT/CRF) that require massive labeled datasets, our NER system relies on linguistic cues specific to Myanmar language. This makes it extremely fast and effective for the specific task of “ignoring names”.1. Honorific Detection
Myanmar names are almost always preceded by an honorific (title). Detecting these titles is a high-precision signal that the next word is a name. Supported Honorifics:ဦး(U) - Mr. (formal/older)ဒေါ်(Daw) - Ms./Mrs.ကို(Ko) - Mr. (younger/informal)မ(Ma) - Ms.မောင်(Maung) - Master (young male)ဒေါက်တာ(Dr.)ရှင်(Shin) - Monk/Noviceဗိုလ်(Bo) - Officer
2. Regex Patterns
The system also whitelists non-dictionary tokens that follow specific patterns:- English Words:
[A-Za-z]+(e.g., “GPS”, “TV”) - Numbers: Standard digits (
0-9) and Myanmar digits (၀-၉). - Dates/Symbols:
12/12/2024,ISO-8859, etc.
3. Common Name Syllables
A “soft” heuristic checks for syllables that appear frequently in names but rarely in other contexts (e.g.,သီ, နွယ်, ဆွေ in certain positions). This is used in conjunction with context checks.
Usage
NER is enabled by default inSpellCheckerConfig.
Internal Class: NameHeuristic
For advanced usage or direct access:
Limitations
- Recall vs. Precision: This system prioritizes precision (avoiding false flags on real errors) over recall (finding every single name). It effectively catches ~80% of common personal names.
- Places/Organizations: It is less effective at detecting place names or company names unless they are in the dictionary or look like English.