Skip to main content
Grammar checking validates syntactic correctness using POS tags and rule-based analysis. It operates at Layer 2.5 in the validation pipeline, sitting between word validation and context checking — catching errors where every word is spelled correctly but the sentence structure is wrong.

Why Grammar Checking?

Myanmar text can have:
  • Correct spelling but wrong particles: “သူ ကို” vs “သူ က”
  • Verb-modifier mismatches: Wrong causative or passive markers
  • Sentence structure errors: Missing required components
Grammar checking catches these errors that:
  • Pass syllable validation (valid syllables)
  • Pass word validation (valid words)
  • Fail syntactic rules

How It Works

1

POS Tagging

Text is tagged with part-of-speech labels:
text = "သူ ကျောင်း သွား သည်"
# Tags: [PRON, N, V, P_SENT]
2

Rule Application

Grammar rules check tag sequences:
# Rule: V should be followed by P_SENT at sentence end
pattern = r"V P_SENT$"
sequence = "N N V P_SENT"  # Valid

# Rule: Subject particle should follow N
pattern = r"N P_SUBJ"
sequence = "V P_SUBJ"  # Invalid - verb can't have subject particle
3

Error Generation

Invalid sequences generate grammar errors:
# Error: "V + P_SUBJ" is invalid
# Suggestion: Change P_SUBJ to appropriate particle

Grammar Rules

Particle Rules

Subject Particle (က)

# Valid: Noun + Subject particle
"သူ က" → Valid

# Invalid: Verb + Subject particle
"သွား က" → Invalid

Object Particle (ကို)

# Valid: Noun + Object particle
"စာအုပ် ကို" → Valid

# Invalid: Adjective + Object particle
"လှ ကို" → Invalid

Location Particles (မှာ, တွင်)

# Valid: Noun + Location particle
"ကျောင်း မှာ" → Valid
"မြို့ တွင်" → Valid

Verb Modifier Rules

Causative Construction

# Valid: V + causative marker
"စား စေ" → Valid (cause to eat)

# Pattern check
if followed_by(V, CAUS) and not is_compatible(V, CAUS):
    error("Verb cannot take causative marker")

Passive Construction

# Valid: V + passive marker (ခံ = undergo/receive)
"ရိုက် ခံ" → Valid (was hit)
"ဆူ ခံ" → Valid (was scolded)

Sentence Structure Rules

# Rule: Sentence must end with P_SENT or PUNCT
if not ends_with(sentence, [P_SENT, PUNCT, P_Q]):
    warning("Sentence may be incomplete")

# Rule: Question sentences should end with P_Q
if has_question_word(sentence) and not ends_with(sentence, P_Q):
    warning("Question should end with question particle")

Configuration

Enable Grammar Checking

from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig, POSTaggerConfig
from myspellchecker.providers import SQLiteProvider

config = SpellCheckerConfig(
    use_rule_based_validation=True,
    pos_tagger=POSTaggerConfig(tagger_type="viterbi"),
)
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)

Grammar Rule Configuration

Grammar checking is configured through GrammarEngineConfig:
from myspellchecker.core.config import SpellCheckerConfig
from myspellchecker.providers import SQLiteProvider

# Configure grammar checking via SpellCheckerConfig
config = SpellCheckerConfig(
    use_rule_based_validation=True,
    # Grammar engine is automatically initialized
)

provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)
For advanced use, access the internal SyntacticRuleChecker:
from myspellchecker.grammar.engine import SyntacticRuleChecker
from myspellchecker.core.config.grammar_configs import GrammarEngineConfig

# Create a custom grammar checker with confidence thresholds
grammar_config = GrammarEngineConfig(
    default_confidence_threshold=0.80,
    high_confidence=0.90,
    medium_confidence=0.85,
)

rule_checker = SyntacticRuleChecker(provider, grammar_config=grammar_config)

Built-in Grammar Checkers

The grammar system includes several specialized checkers:
CheckerDescription
AspectCheckerValidates verb aspect markers
ClassifierCheckerValidates numeral classifiers
CompoundCheckerValidates compound word patterns
MergedWordCheckerDetects incorrectly merged particle+verb sequences
NegationCheckerValidates negation patterns
RegisterCheckerDetects formal/informal register mixing

YAML Rules Configuration

Grammar rules are defined in YAML files located in src/myspellchecker/rules/:
FilePurpose
grammar_rules.yamlCore syntactic grammar rules
typo_corrections.yamlCommon typo patterns and corrections
particles.yamlParticle definitions with POS tags
classifiers.yamlNumeral classifier rules
register.yamlFormal/informal register rules
compounds.yamlCompound word patterns
aspects.yamlVerb aspect rules
negation.yamlNegation pattern rules
Load custom rules via GrammarRuleConfig:
from myspellchecker.grammar.config import GrammarRuleConfig

# Load custom rules from your own YAML files
config = GrammarRuleConfig(
    config_path="/path/to/custom_grammar_rules.yaml",
    typo_path="/path/to/custom_typo_corrections.yaml",
)

Error Types and Severity

SeverityDescriptionExample
errorDefinite grammatical errorWrong particle type
warningLikely error, may be validUnusual construction
infoStyle suggestionMissing optional particle

Error Response

result = checker.check("သွား က")  # Verb with subject particle

for error in result.errors:
    if error.error_type == "grammar_error":
        print(f"Type: {error.error_type}")
        print(f"Text: {error.text}")
        print(f"Suggestions: {error.suggestions}")
        print(f"Confidence: {error.confidence}")

API Reference

SyntacticRuleChecker

from myspellchecker.grammar.engine import SyntacticRuleChecker
from myspellchecker.core.config.grammar_configs import GrammarEngineConfig

# Create checker with provider (required) and optional config
grammar_config = GrammarEngineConfig()
checker = SyntacticRuleChecker(provider, grammar_config=grammar_config)

# Check word sequence for grammar errors (POS tags are looked up internally)
words = ["သူ", "ကျောင်း", "သွား", "သည်"]

corrections = checker.check_sequence(words)

for idx, error_word, suggestion in corrections:
    print(f"Position {idx}: '{error_word}' → '{suggestion}'")

Individual Grammar Checkers

from myspellchecker.grammar.checkers import (
    AspectChecker,
    ClassifierChecker,
    CompoundChecker,
    MergedWordChecker,
    NegationChecker,
    RegisterChecker,
)

# Each checker handles specific grammar patterns
aspect_checker = AspectChecker()
classifier_checker = ClassifierChecker()

# Check for aspect errors
errors = aspect_checker.validate_sequence(words)

Common Patterns

Filter by Confidence

def check_with_confidence(text: str, min_confidence: float = 0.70) -> list:
    """Check grammar, filtering by confidence threshold."""
    result = checker.check(text)

    return [
        e for e in result.errors
        if e.error_type == "grammar_error"
        and e.confidence >= min_confidence
    ]

Grammar-Only Check

def check_grammar_only(text: str, checker: SpellChecker) -> list:
    """Check only grammar, skip spelling."""
    # Grammar checking is integrated - filter grammar errors from result
    result = checker.check(text)
    return [e for e in result.errors if e.error_type == "grammar_error"]

Report Grammar Issues

def generate_grammar_report(text: str) -> dict:
    """Generate detailed grammar report."""
    result = checker.check(text)

    grammar_errors = [e for e in result.errors if e.error_type == "grammar_error"]

    return {
        "total_errors": len(grammar_errors),
        "by_confidence": {
            "high": len([e for e in grammar_errors if e.confidence >= 0.90]),
            "medium": len([e for e in grammar_errors if 0.70 <= e.confidence < 0.90]),
            "low": len([e for e in grammar_errors if e.confidence < 0.70]),
        },
        "details": [
            {
                "text": e.text,
                "error_type": e.error_type,
                "suggestions": e.suggestions,
                "confidence": e.confidence,
                "position": e.position,
            }
            for e in grammar_errors
        ],
    }

Built-in Rules

POS Sequence Rules

Errors

PatternDescriptionConfidence
P_SENT P_SENTDouble sentence ending particles0.98
P_PAST P_FUTConflicting tense markers0.98
V P_NEGNegation prefix after verb (wrong order — should be မ + V)0.95
P_POSS P_SUBJPossessive + subject adjacent (noun missing between them)0.95
P_POSS P_OBJPossessive + object adjacent (noun missing between them)0.95

Warnings

PatternDescriptionConfidence
P_SUBJ P_OBJSubject + object markers adjacent0.90
P_OBJ P_SUBJObject + subject markers adjacent0.90
P_LOC P_LOCMultiple location particles0.85
P VParticle directly precedes verb0.85
PPM VPostpositional marker before verb0.75
NUM NNumber before noun without classifier0.75
NUM VNumber before verb without classifier0.75
N VNoun + verb without case particle0.75
ADJ VAdjective before verb without noun0.65

Info

PatternDescriptionConfidence
V VConsecutive verbs (may be serial verb construction)0.50
P PConsecutive particles (check compatibility)0.50
PPM PPMConsecutive postpositional markers0.50
PART PARTConsecutive particles0.50
N NConsecutive nouns (may be compound noun)0.40
PART VParticle before verb (may be auxiliary)0.40
PPM NPostpositional marker before noun (may be embedded clause)0.40
N INTNoun + interjection (vocative/exclamation)0.40
INT INTConsecutive interjections (emphatic)0.30
INT PInterjection + particle0.20

Sentence Boundary Rules

Forbidden Sentence Starts

ParticlesTypeSeverityConfidence
ကို, ရဲ့, အတွက်Object / possessive / benefactiveerror0.90
မှ, ကနေ, လို့, ဆိုလို့Source / causativeerror0.85
လည်း, တော့, ပဲConjunctivewarning0.70

Forbidden Sentence Ends

ParticlesTypeSeverityConfidence
က, ကို, နှင့်, နဲ့Case markerserror0.90
ရဲ့, , မှ, ကနေPossessive / sourceerror0.90
အတွက်, အလို့ငှာBenefactivewarning0.75

Sentence Completion Rules

PatternRequired EndingSeverityConfidence
V$ (verb at sentence end)တယ်, ပါတယ်, သည်, ပါသည်, မယ်, ပါမယ်, ပြီwarning0.80
N V (noun before verb)က, ကို, မှာ, မှ, သည် (case particle between them)warning0.75

Architecture

The Grammar Engine coordinates six specialized checkers through SyntacticRuleChecker: SyntacticRuleChecker architecture: six specialized checkers — Aspect, Classifier, Compound, Merged Word, Negation, and Register Each checker handles a specific grammar domain. See Grammar Checkers for details on each one.

Confidence Scoring

Grammar suggestions include confidence scores based on context:
FactorWeightDescription
Exact match0.95Exact pattern match
Verb context0.90After verb validation
Noun context0.85After noun validation
Default0.80No specific context
Context dependent0.65Ambiguous context
confidence = checker.get_suggestion_confidence(
    word="ကျောင်း",
    suggestion="ကြောင်း",
    prev_pos="V"
)
print(f"Confidence: {confidence}")  # 0.95 (high for verb context)

Troubleshooting

Issue: Too many false positives

Cause: POS tagging errors or overly strict rules Solution:
# Use more accurate tagger
config = SpellCheckerConfig(
    pos_tagger=POSTaggerConfig(tagger_type="transformer")
)
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)

Issue: Missing grammar errors

Cause: Grammar checkers not enabled Solution: Enable grammar checking in config:
config = SpellCheckerConfig(
    use_rule_based_validation=True,
)
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)

Issue: Slow grammar checking

Cause: Complex rules or many patterns Solution: Raise confidence thresholds to reduce processing via GrammarEngineConfig:
from myspellchecker.core.config.grammar_configs import GrammarEngineConfig

# Higher thresholds mean fewer (but higher-confidence) grammar checks
grammar_config = GrammarEngineConfig(
    default_confidence_threshold=0.85,  # Only report high-confidence errors
)

Next Steps