Overview
The Syntactic Rule Checker runs after word validation but before the statistical N-gram check. It is designed to catch:- Grammatically Impossible Sequences: e.g., two verbs without a particle between them.
- Particle Errors: Using a noun-particle after a verb, or vice versa.
- Medial Consonant Confusions:
ျvsြbased on linguistic roots (handled via lookup). - Sentence Structure: Missing sentence-final particles.
Architecture
The grammar system consists of:Core Components
| Component | Location | Description |
|---|---|---|
SyntacticRuleChecker | src/myspellchecker/grammar/engine.py | Main grammar engine |
GrammarEngineConfig | src/myspellchecker/core/config/grammar_configs.py | Engine configuration |
| Grammar Checkers | src/myspellchecker/grammar/checkers/ | Specialized checkers |
Grammar Checkers
| Checker | File | Purpose |
|---|---|---|
AspectChecker | grammar/checkers/aspect.py | Aspect marker validation |
ClassifierChecker | grammar/checkers/classifier.py | Classifier usage |
CompoundChecker | grammar/checkers/compound.py | Compound word validation |
NegationChecker | grammar/checkers/negation.py | Negation patterns |
RegisterChecker | grammar/checkers/register.py | Formal/informal register |
YAML Rule Files
Grammar rules are defined in YAML files located insrc/myspellchecker/rules/:
| File | Purpose |
|---|---|
grammar_rules.yaml | Core grammar rules |
particles.yaml | Particle definitions and rules |
aspects.yaml | Aspect marker rules |
classifiers.yaml | Classifier patterns |
compounds.yaml | Compound word rules |
negation.yaml | Negation patterns |
register.yaml | Register (formal/informal) rules |
homophones.yaml | Homophone confusion patterns |
typo_corrections.yaml | Common typo corrections |
tone_rules.yaml | Tone mark rules |
Key Features
1. Particle Agreement
Myanmar particles are highly specific to the part of speech they modify.- Verb Particles:
မယ်,ခဲ့,နေmust follow verbs. - Noun Particles:
မှာ,က,ကိုmust follow nouns.
- Input:
ကျောင်း သွား မှာ(“School go at”) - Analysis:
သွားis a Verb.မှာis usually a location marker (Noun particle). - Correction: Suggest
မယ်(Future tense) orမလား(Question) depending on context, or flag as suspicious.
2. Medial Confusion (ျ vs ြ)
Many words sound similar but use different medials (Ya-yit vs Ya-pin).
- Rule:
ကျောင်း(School) vsကြောင်း(Cause/Fact). - Logic:
- If preceded by a Verb (e.g.,
ဖြစ်), it implies “Cause”. Suggestကြောင်း. - If preceded by a Noun or at start, it implies “School”. Keep
ကျောင်း.
- If preceded by a Verb (e.g.,
3. POS Sequence Validation
We maintain a matrix of invalid POS transitions.Verb->Verb(Directly): Usually invalid. Needs a particle likeပြီးor၍.
Configuration
Enable Grammar Checking
Grammar Engine Configuration
Note:GrammarRuleConfiginmyspellchecker.grammar.configis a separate class for loading YAML-based grammar rule definitions.
Using SyntacticRuleChecker Directly
YAML Rule Format
Grammar rules are defined in YAML format with JSON Schema validation.Example: grammar_rules.yaml
Schema Validation
Rules are validated against JSON Schema:src/myspellchecker/schemas/grammar_rules.schema.json
Error Types
Grammar checking produces errors with specific types:| Error Type | Description |
|---|---|
grammar_error | General grammar violation |
particle_typo | Particle usage error |
pos_sequence_error | Invalid POS sequence |
aspect_typo | Aspect marker error |
classifier_typo | Classifier error |
mixed_register | Mixed formal/informal |
incomplete_reduplication | Incomplete reduplication |
Integration with SpellChecker
Grammar checking is integrated into the validation pipeline:Performance
| Operation | Complexity | Typical Time |
|---|---|---|
| POS Sequence Check | O(n) | ~1ms |
| Particle Validation | O(1) | ~0.5ms |
| Full Grammar Check | O(n) | ~5ms |
Next Steps
- POS Tagging - How POS tags are assigned
- Context Checking - N-gram context validation
- Validation Strategies - Strategy pattern overview