Skip to main content
The Grammar Engine provides syntactic rule-based spell checking using Part-of-Speech (POS) tags. It operates at Layer 2.5 (Syntactic) of the validation pipeline, catching errors that N-gram models might miss.

Overview

from myspellchecker.grammar import SyntacticRuleChecker
from myspellchecker.providers import SQLiteProvider

provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SyntacticRuleChecker(provider)

# Check word sequence
corrections = checker.check_sequence(["ကျွန်တော်", "ကျောင်း", "သွားတယ်"])
for idx, error_word, suggestion in corrections:
    print(f"Position {idx}: '{error_word}' → '{suggestion}'")

Architecture

The Grammar Engine coordinates six specialized checkers:
┌─────────────────────────────────────────────────────┐
│                SyntacticRuleChecker                  │
├─────────────────────────────────────────────────────┤
│  ┌──────────┐  ┌───────────┐  ┌──────────┐        │
│  │ Aspect   │  │ Classifier│  │ Compound │        │
│  │ Checker  │  │ Checker   │  │ Checker  │        │
│  └──────────┘  └───────────┘  └──────────┘        │
│  ┌──────────┐  ┌───────────┐  ┌──────────┐        │
│  │ Merged   │  │ Negation  │  │ Register │        │
│  │ Word     │  │ Checker   │  │ Checker  │        │
│  └──────────┘  └───────────┘  └──────────┘        │
└─────────────────────────────────────────────────────┘

Configuration

GrammarEngineConfig

from myspellchecker.core.config import GrammarEngineConfig

config = GrammarEngineConfig(
    # Confidence thresholds
    high_confidence=0.90,
    medium_confidence=0.85,
    default_confidence_threshold=0.80,
    low_confidence_threshold=0.55,

    # Feature-specific thresholds
    exact_match_confidence=0.95,
    context_confidence_threshold=0.65,
    pos_sequence_confidence=0.80,
    verb_particle_confidence=0.75,
    tense_marker_confidence=0.60,
    sentence_final_confidence=0.70,
    question_confidence=0.60,
)

GrammarRuleConfig

Load custom grammar rules from YAML:
from myspellchecker.grammar.config import GrammarRuleConfig

config = GrammarRuleConfig(config_path="custom_rules.yml")

Grammar Rules

Rule Types

Rule TypeDescriptionExample
Particle TyposCommon particle mistakesဘူ → ဘူး
Medial ConfusionsYa-pin vs Ra-yitကျောင်း → ကြောင်း
POS SequencesInvalid tag combinationsN-N without particle
Verb-ParticleVerb ending agreementMissing tense marker
Sentence StructureSentence completenessMissing final particle

Particle Typo Detection

# Common particle typos loaded from config
typo_info = config.get_particle_typo("ဘူ")
# Returns: {"correction": "ဘူး", "meaning": "negative ending", "context": "after_verb"}

# Check in context
corrections = checker.check_sequence(["မ", "သွား", "ဘူ"])
# Detects: "ဘူ" → "ဘူး" (missing visarga in negative ending)

Medial Confusion Detection

Detects common medial character confusions:
# Ya-pin (ျ) vs Ra-yit (ြ) confusion
# ကျောင်း (school) vs ကြောင်း (because)

words = ["သွား", "ကျောင်း"]  # After verb "go"
corrections = checker.check_sequence(words)
# Suggests: "ကြောင်း" (because) after verb
Common Confusions:
ConfusionCharactersExample
Ya-pin / Ra-yitျ / ြကျွန် / ကြွန်
Wa-swe / Ha-htoeွ / ှကွ / ကှ

POS Sequence Validation

Validates POS tag sequences:
# Invalid: Two nouns without particle
words = ["ကျောင်း", "သား"]  # school + son
corrections = checker.check_sequence(words)
# May suggest: "ကျောင်းသား" (student) as compound

# Valid: Noun + Subject marker + Verb
words = ["သူ", "က", "သွားတယ်"]  # he + SUBJ + went
corrections = checker.check_sequence(words)
# Returns: [] (no errors)
Tag Sequence Rules:
SequenceValidityReason
N + VWarningUsually needs particle
V + VError*Except auxiliaries
P_SENT + P_SENTErrorDouble sentence particles
N + P_SUBJValidSubject marking

Verb-Particle Agreement

# Tense markers must follow verbs
words = ["သူ", "ခဲ့"]  # he + PAST
corrections = checker.check_sequence(words)
# Flags: "ခဲ့" (past tense) should follow a verb

# Correct usage
words = ["သွား", "ခဲ့", "တယ်"]  # went + PAST + declarative
corrections = checker.check_sequence(words)
# Returns: [] (no errors)

Sentence Structure Validation

# Missing sentence-final particle
words = ["သူ", "သွား"]  # he went
corrections = checker.check_sequence(words)
# Suggests: "သွားတယ်" (adding declarative)

# Question without question particle
words = ["ဘယ်", "သွား", "မလဲ"]
corrections = checker.check_sequence(words)
# Validates question word with question ending

Specialized Checkers

AspectChecker

Validates aspect markers (completion, continuation):
from myspellchecker.grammar.checkers import AspectChecker

checker = AspectChecker()
errors = checker.validate_sequence(["သွား", "ပြိ"])
# Detects: "ပြိ" → "ပြီ" (completion marker typo)
Aspect Markers:
  • ပြီ - Completion
  • နေ - Continuation
  • ခဲ့ - Past
  • မယ် - Future

ClassifierChecker

Validates Myanmar numeral classifiers:
from myspellchecker.grammar.checkers import ClassifierChecker

checker = ClassifierChecker()
errors = checker.validate_sequence(["တစ်", "ယေက်"])
# Detects: "ယေက်" → "ယောက်" (person classifier)
Common Classifiers:
  • ယောက် - People
  • ကောင် - Animals
  • လုံး - Round objects
  • ခု - General objects

CompoundChecker

Validates compound words and reduplications:
from myspellchecker.grammar.checkers import CompoundChecker

checker = CompoundChecker()
errors = checker.validate_sequence(["ပန်", "ခြံ"])
# Detects: Missing tone mark → "ပန်းခြံ" (garden)

NegationChecker

Validates negation patterns:
from myspellchecker.grammar.checkers import NegationChecker

checker = NegationChecker()
errors = checker.validate_sequence(["မ", "သွား", "ဘူ"])
# Detects: "ဘူ" → "ဘူး" (negative ending)
Negation Patterns:
  • မ...ဘူး - Colloquial negative
  • မ...ပါ - Polite negative
  • မ... - Literary negative

RegisterChecker

Validates register consistency (formal vs colloquial):
from myspellchecker.grammar.checkers import RegisterChecker

checker = RegisterChecker()
errors = checker.validate_sequence(["သွားတယ်", "ပါသည်"])
# Warns: Mixed register (colloquial + formal)
Register Types:
  • Colloquial: တယ်, ဘူး, မယ်
  • Formal: သည်, ပါ, မည်

Confidence Scoring

Get confidence for suggestions:
confidence = checker.get_suggestion_confidence(
    word="ကျောင်း",
    suggestion="ကြောင်း",
    prev_pos="V"
)
print(f"Confidence: {confidence}")  # 0.95 (high for verb context)
Confidence Factors:
FactorWeightDescription
Exact match0.95Exact pattern match
Verb context0.90After verb validation
Noun context0.85After noun validation
Context dependent0.65Ambiguous context
Default0.80No specific context

Integration with SpellChecker

The Grammar Engine integrates automatically via validation strategies when rule-based validation is enabled:
from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig
from myspellchecker.providers import SQLiteProvider

# Enable grammar checking via use_rule_based_validation
config = SpellCheckerConfig(
    use_rule_based_validation=True,  # Enables grammar rules in validation
    use_context_checker=True,         # Context checking includes grammar strategies
)

provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)
result = checker.check("ကျွန်တော် ကျောင်း သွားတယ်")
# Grammar errors included in result.errors
Note: Grammar engine configuration (GrammarEngineConfig) is managed internally. For advanced customization, use SyntacticRuleChecker directly with a custom config path.

Custom Rules

YAML Configuration

# custom_rules.yml
particle_typos:
  "ဘူ":
    correction: "ဘူး"
    meaning: "negative ending"
    context: "after_verb"

medial_confusions:
  "ကျောင်း":
    correction: "ကြောင်း"
    context: "after_verb"
    meaning: "because"

invalid_sequences:
  - prev: "P_SENT"
    curr: "P_SENT"
    severity: "error"
    message: "Double sentence particles"

Loading Custom Rules

checker = SyntacticRuleChecker(
    provider=provider,
    config_path="custom_rules.yml",
)

See Also