Grammar Engine

The Grammar Engine provides syntactic rule-based spell checking using Part-of-Speech (POS) tags. It operates at Layer 2.5 (Syntactic) of the validation pipeline, catching errors that N-gram models might miss.

Overview

from myspellchecker.grammar import SyntacticRuleChecker
from myspellchecker.providers import SQLiteProvider

provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SyntacticRuleChecker(provider)

# Check word sequence
corrections = checker.check_sequence(["ကျွန်တော်", "ကျောင်း", "သွားတယ်"])
for idx, error_word, suggestion in corrections:
    print(f"Position {idx}: '{error_word}' → '{suggestion}'")

Architecture

The Grammar Engine coordinates six specialized checkers:

┌─────────────────────────────────────────────────────┐
│                SyntacticRuleChecker                  │
├─────────────────────────────────────────────────────┤
│  ┌──────────┐  ┌───────────┐  ┌──────────┐        │
│  │ Aspect   │  │ Classifier│  │ Compound │        │
│  │ Checker  │  │ Checker   │  │ Checker  │        │
│  └──────────┘  └───────────┘  └──────────┘        │
│  ┌──────────┐  ┌───────────┐  ┌──────────┐        │
│  │ Merged   │  │ Negation  │  │ Register │        │
│  │ Word     │  │ Checker   │  │ Checker  │        │
│  └──────────┘  └───────────┘  └──────────┘        │
└─────────────────────────────────────────────────────┘

Configuration

GrammarEngineConfig

from myspellchecker.core.config import GrammarEngineConfig

config = GrammarEngineConfig(
    # Confidence thresholds
    high_confidence=0.90,
    medium_confidence=0.85,
    default_confidence_threshold=0.80,
    low_confidence_threshold=0.55,

    # Feature-specific thresholds
    exact_match_confidence=0.95,
    context_confidence_threshold=0.65,
    pos_sequence_confidence=0.80,
    verb_particle_confidence=0.75,
    tense_marker_confidence=0.60,
    sentence_final_confidence=0.70,
    question_confidence=0.60,
)

GrammarRuleConfig

Load custom grammar rules from YAML:

from myspellchecker.grammar.config import GrammarRuleConfig

config = GrammarRuleConfig(config_path="custom_rules.yml")

Grammar Rules

Rule Types

Rule Type	Description	Example
Particle Typos	Common particle mistakes	ဘူ → ဘူး
Medial Confusions	Ya-pin vs Ra-yit	ကျောင်း → ကြောင်း
POS Sequences	Invalid tag combinations	N-N without particle
Verb-Particle	Verb ending agreement	Missing tense marker
Sentence Structure	Sentence completeness	Missing final particle

Particle Typo Detection

# Common particle typos loaded from config
typo_info = config.get_particle_typo("ဘူ")
# Returns: {"correction": "ဘူး", "meaning": "negative ending", "context": "after_verb"}

# Check in context
corrections = checker.check_sequence(["မ", "သွား", "ဘူ"])
# Detects: "ဘူ" → "ဘူး" (missing visarga in negative ending)

Medial Confusion Detection

Detects common medial character confusions:

# Ya-pin (ျ) vs Ra-yit (ြ) confusion
# ကျောင်း (school) vs ကြောင်း (because)

words = ["သွား", "ကျောင်း"]  # After verb "go"
corrections = checker.check_sequence(words)
# Suggests: "ကြောင်း" (because) after verb

Common Confusions:

Confusion	Characters	Example
Ya-pin / Ra-yit	ျ / ြ	ကျွန် / ကြွန်
Wa-swe / Ha-htoe	ွ / ှ	ကွ / ကှ

POS Sequence Validation

Validates POS tag sequences:

# Invalid: Two nouns without particle
words = ["ကျောင်း", "သား"]  # school + son
corrections = checker.check_sequence(words)
# May suggest: "ကျောင်းသား" (student) as compound

# Valid: Noun + Subject marker + Verb
words = ["သူ", "က", "သွားတယ်"]  # he + SUBJ + went
corrections = checker.check_sequence(words)
# Returns: [] (no errors)

Tag Sequence Rules:

Sequence	Validity	Reason
N + V	Warning	Usually needs particle
V + V	Error*	Except auxiliaries
P_SENT + P_SENT	Error	Double sentence particles
N + P_SUBJ	Valid	Subject marking

Verb-Particle Agreement

# Tense markers must follow verbs
words = ["သူ", "ခဲ့"]  # he + PAST
corrections = checker.check_sequence(words)
# Flags: "ခဲ့" (past tense) should follow a verb

# Correct usage
words = ["သွား", "ခဲ့", "တယ်"]  # went + PAST + declarative
corrections = checker.check_sequence(words)
# Returns: [] (no errors)

Sentence Structure Validation

# Missing sentence-final particle
words = ["သူ", "သွား"]  # he went
corrections = checker.check_sequence(words)
# Suggests: "သွားတယ်" (adding declarative)

# Question without question particle
words = ["ဘယ်", "သွား", "မလဲ"]
corrections = checker.check_sequence(words)
# Validates question word with question ending

Specialized Checkers

AspectChecker

Validates aspect markers (completion, continuation):

from myspellchecker.grammar.checkers import AspectChecker

checker = AspectChecker()
errors = checker.validate_sequence(["သွား", "ပြိ"])
# Detects: "ပြိ" → "ပြီ" (completion marker typo)

Aspect Markers:

ပြီ - Completion
နေ - Continuation
ခဲ့ - Past
မယ် - Future

ClassifierChecker

Validates Myanmar numeral classifiers:

from myspellchecker.grammar.checkers import ClassifierChecker

checker = ClassifierChecker()
errors = checker.validate_sequence(["တစ်", "ယေက်"])
# Detects: "ယေက်" → "ယောက်" (person classifier)

Common Classifiers:

ယောက် - People
ကောင် - Animals
လုံး - Round objects
ခု - General objects

CompoundChecker

Validates compound words and reduplications:

from myspellchecker.grammar.checkers import CompoundChecker

checker = CompoundChecker()
errors = checker.validate_sequence(["ပန်", "ခြံ"])
# Detects: Missing tone mark → "ပန်းခြံ" (garden)

NegationChecker

Validates negation patterns:

from myspellchecker.grammar.checkers import NegationChecker

checker = NegationChecker()
errors = checker.validate_sequence(["မ", "သွား", "ဘူ"])
# Detects: "ဘူ" → "ဘူး" (negative ending)

Negation Patterns:

မ...ဘူး - Colloquial negative
မ...ပါ - Polite negative
မ... - Literary negative

RegisterChecker

Validates register consistency (formal vs colloquial):

from myspellchecker.grammar.checkers import RegisterChecker

checker = RegisterChecker()
errors = checker.validate_sequence(["သွားတယ်", "ပါသည်"])
# Warns: Mixed register (colloquial + formal)

Register Types:

Colloquial: တယ်, ဘူး, မယ်
Formal: သည်, ပါ, မည်

Confidence Scoring

Get confidence for suggestions:

confidence = checker.get_suggestion_confidence(
    word="ကျောင်း",
    suggestion="ကြောင်း",
    prev_pos="V"
)
print(f"Confidence: {confidence}")  # 0.95 (high for verb context)

Confidence Factors:

Factor	Weight	Description
Exact match	0.95	Exact pattern match
Verb context	0.90	After verb validation
Noun context	0.85	After noun validation
Context dependent	0.65	Ambiguous context
Default	0.80	No specific context

Integration with SpellChecker

The Grammar Engine integrates automatically via validation strategies when rule-based validation is enabled:

from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig
from myspellchecker.providers import SQLiteProvider

# Enable grammar checking via use_rule_based_validation
config = SpellCheckerConfig(
    use_rule_based_validation=True,  # Enables grammar rules in validation
    use_context_checker=True,         # Context checking includes grammar strategies
)

provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)
result = checker.check("ကျွန်တော် ကျောင်း သွားတယ်")
# Grammar errors included in result.errors

Note: Grammar engine configuration (GrammarEngineConfig) is managed internally. For advanced customization, use SyntacticRuleChecker directly with a custom config path.

Custom Rules

YAML Configuration

# custom_rules.yml
particle_typos:
  "ဘူ":
    correction: "ဘူး"
    meaning: "negative ending"
    context: "after_verb"

medial_confusions:
  "ကျောင်း":
    correction: "ကြောင်း"
    context: "after_verb"
    meaning: "because"

invalid_sequences:
  - prev: "P_SENT"
    curr: "P_SENT"
    severity: "error"
    message: "Double sentence particles"

Loading Custom Rules

checker = SyntacticRuleChecker(
    provider=provider,
    config_path="custom_rules.yml",
)

Getting Started

Dictionary Building

Spell Checking

Grammar

Language Processing

AI-Powered Checking

Text Utilities

Performance & Scale

Customization

Integration & Deployment

Help & FAQ

Overview

Architecture

Configuration

GrammarEngineConfig

GrammarRuleConfig

Grammar Rules

Rule Types

Particle Typo Detection

Medial Confusion Detection

POS Sequence Validation

Verb-Particle Agreement

Sentence Structure Validation

Specialized Checkers

AspectChecker

ClassifierChecker

CompoundChecker

NegationChecker

RegisterChecker

Confidence Scoring

Integration with SpellChecker

Custom Rules

YAML Configuration

Loading Custom Rules

See Also

Getting Started

Dictionary Building

Spell Checking

Grammar

Language Processing

AI-Powered Checking

Text Utilities

Performance & Scale

Customization

Integration & Deployment

Help & FAQ

​Overview

​Architecture

​Configuration

​GrammarEngineConfig

​GrammarRuleConfig

​Grammar Rules

​Rule Types

​Particle Typo Detection

​Medial Confusion Detection

​POS Sequence Validation

​Verb-Particle Agreement

​Sentence Structure Validation

​Specialized Checkers

​AspectChecker

​ClassifierChecker

​CompoundChecker

​NegationChecker

​RegisterChecker

​Confidence Scoring

​Integration with SpellChecker

​Custom Rules

​YAML Configuration

​Loading Custom Rules

​See Also

Overview

Architecture

Configuration

GrammarEngineConfig

GrammarRuleConfig

Grammar Rules

Rule Types

Particle Typo Detection

Medial Confusion Detection

POS Sequence Validation

Verb-Particle Agreement

Sentence Structure Validation

Specialized Checkers

AspectChecker

ClassifierChecker

CompoundChecker

NegationChecker

RegisterChecker

Confidence Scoring

Integration with SpellChecker

Custom Rules

YAML Configuration

Loading Custom Rules

See Also