Skip to main content
While N-gram models are powerful for statistical context, they sometimes fail to capture rigid grammatical rules. The Syntactic Rule Checker (Layer 2.5) addresses this by applying linguistic rules based on Part-of-Speech (POS) tags.

Overview

The Syntactic Rule Checker runs after word validation but before the statistical N-gram check. It is designed to catch:
  1. Grammatically Impossible Sequences: e.g., two verbs without a particle between them.
  2. Particle Errors: Using a noun-particle after a verb, or vice versa.
  3. Medial Consonant Confusions: vs based on linguistic roots (handled via lookup).
  4. Sentence Structure: Missing sentence-final particles.

Architecture

The grammar system consists of:

Core Components

ComponentLocationDescription
SyntacticRuleCheckersrc/myspellchecker/grammar/engine.pyMain grammar engine
GrammarEngineConfigsrc/myspellchecker/core/config/grammar_configs.pyEngine configuration
Grammar Checkerssrc/myspellchecker/grammar/checkers/Specialized checkers

Grammar Checkers

CheckerFilePurpose
AspectCheckergrammar/checkers/aspect.pyAspect marker validation
ClassifierCheckergrammar/checkers/classifier.pyClassifier usage
CompoundCheckergrammar/checkers/compound.pyCompound word validation
NegationCheckergrammar/checkers/negation.pyNegation patterns
RegisterCheckergrammar/checkers/register.pyFormal/informal register

YAML Rule Files

Grammar rules are defined in YAML files located in src/myspellchecker/rules/:
FilePurpose
grammar_rules.yamlCore grammar rules
particles.yamlParticle definitions and rules
aspects.yamlAspect marker rules
classifiers.yamlClassifier patterns
compounds.yamlCompound word rules
negation.yamlNegation patterns
register.yamlRegister (formal/informal) rules
homophones.yamlHomophone confusion patterns
typo_corrections.yamlCommon typo corrections
tone_rules.yamlTone mark rules

Key Features

1. Particle Agreement

Myanmar particles are highly specific to the part of speech they modify.
  • Verb Particles: မယ်, ခဲ့, နေ must follow verbs.
  • Noun Particles: မှာ, က, ကို must follow nouns.
Example Error:
  • Input: ကျောင်း သွား မှာ (“School go at”)
  • Analysis: သွား is a Verb. မှာ is usually a location marker (Noun particle).
  • Correction: Suggest မယ် (Future tense) or မလား (Question) depending on context, or flag as suspicious.

2. Medial Confusion ( vs )

Many words sound similar but use different medials (Ya-yit vs Ya-pin).
  • Rule: ကျောင်း (School) vs ကြောင်း (Cause/Fact).
  • Logic:
    • If preceded by a Verb (e.g., ဖြစ်), it implies “Cause”. Suggest ကြောင်း.
    • If preceded by a Noun or at start, it implies “School”. Keep ကျောင်း.

3. POS Sequence Validation

We maintain a matrix of invalid POS transitions.
  • Verb -> Verb (Directly): Usually invalid. Needs a particle like ပြီး or .

Configuration

Enable Grammar Checking

from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig

# Enable rule-based validation
config = SpellCheckerConfig(
    use_rule_based_validation=True,  # Enable grammar checking
)
checker = SpellChecker(config=config)

Grammar Engine Configuration

from myspellchecker.core.config.grammar_configs import GrammarEngineConfig

# GrammarEngineConfig controls confidence thresholds for grammar checks
grammar_config = GrammarEngineConfig(
    default_confidence_threshold=0.80,
    exact_match_confidence=0.95,
    high_confidence=0.90,
    medium_confidence=0.85,
    pos_sequence_confidence=0.80,
)
Note: GrammarRuleConfig in myspellchecker.grammar.config is a separate class for loading YAML-based grammar rule definitions.

Using SyntacticRuleChecker Directly

from myspellchecker.grammar.engine import SyntacticRuleChecker

# Create checker with provider
checker = SyntacticRuleChecker(provider=provider)

# Check a sentence
words = ["ကျောင်း", "သွား", "မှာ"]
errors = checker.check_sequence(words)

for error in errors:
    # Each error is a tuple of (index, original_word, suggestion)
    index, original, suggestion = error
    print(f"Position: {index}")
    print(f"Original: {original}")
    print(f"Suggestion: {suggestion}")

YAML Rule Format

Grammar rules are defined in YAML format with JSON Schema validation.

Example: grammar_rules.yaml

# Grammar rules for Myanmar language
version: "1.0.0"

pos_sequences:
  invalid:
    - ["V", "V"]  # Verb-Verb without particle
    - ["N", "N", "N", "N"]  # Too many consecutive nouns

  valid:
    - ["N", "P", "V", "P"]  # Subject-particle-verb-particle
    - ["N", "V", "P"]  # Subject-verb-particle

particle_rules:
  verb_particles:
    - "မယ်"
    - "ခဲ့"
    - "နေ"
    - "ပြီ"

  noun_particles:
    - "က"
    - "ကို"
    - "မှာ"
    - "တွင်"

Schema Validation

Rules are validated against JSON Schema:
  • src/myspellchecker/schemas/grammar_rules.schema.json

Error Types

Grammar checking produces errors with specific types:
Error TypeDescription
grammar_errorGeneral grammar violation
particle_typoParticle usage error
pos_sequence_errorInvalid POS sequence
aspect_typoAspect marker error
classifier_typoClassifier error
mixed_registerMixed formal/informal
incomplete_reduplicationIncomplete reduplication

Integration with SpellChecker

Grammar checking is integrated into the validation pipeline:
from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig

config = SpellCheckerConfig(
    use_rule_based_validation=True,
    use_context_checker=True,
)

checker = SpellChecker(config=config)
result = checker.check("ကျောင်း သွား မှာ")

# Filter grammar errors
grammar_errors = [
    e for e in result.errors
    if e.error_type in ("grammar_error", "particle_typo", "pos_sequence_error")
]

Performance

OperationComplexityTypical Time
POS Sequence CheckO(n)~1ms
Particle ValidationO(1)~0.5ms
Full Grammar CheckO(n)~5ms

Next Steps