Skip to main content
Syllable validation is the foundation of mySpellChecker’s “Syllable-First Architecture.” It validates Myanmar text at the syllable level before attempting more expensive word or context analysis.

Why Syllable-First?

Myanmar text has unique characteristics:
  • No whitespace between words
  • Syllable-based writing system
  • Words are composed of one or more syllables
Traditional spell checkers would try to segment text into words first, which is:
  1. Expensive computationally
  2. Error-prone on misspelled text
  3. Wasteful when obvious typos exist
mySpellChecker inverts this:
  1. Break text into syllables first (fast, deterministic)
  2. Validate syllables (catches 90%+ of typos immediately)
  3. Only assemble valid syllables into words for deeper checking

Syllable Anatomy

A Myanmar syllable follows this pattern:
Syllable = Consonant + [Stacked]* + [Medial]* + [Vowel]* + [Final]*
ComponentRequiredPositionExamples
ConsonantYesInitialက, ခ, မ
StackedNoAfter consonant္က, ္ခ
MedialNoAfter consonantျ, ြ, ွ, ှ
VowelNoVariousါ, ိ, ု, ေ
FinalNoEnd်, ံ, း

Simple Syllables

Note: Tonal information is omitted from these transcriptions for simplicity. Standard Burmese has four tones (low, high, creaky, checked).
Consonant only:     က = "ka",  မ = "ma",  သ = "tha"
Consonant + Vowel:  ကာ = "ka", ကိ = "ki", ကု = "ku", ကေ = "kay"
Consonant + Asat:   က် = "k" (inherent vowel killed), မ် = "m" (inherent vowel killed)
Checked syllable:   ကတ် = "kat" (final stop, glottal closure)

Complex Syllables (with Medials)

Single medial:  ကျ = "kya", ကြ = "kra" (merged to kya in modern Burmese), ကွ = "kwa"
Combined:       ကြွ = "krwa", ကျွ = "kywa"
Ha-htoe:        နှ = "hna", မှ = "hma", လှ = "hla" (voiceless sonorants)
Medial order (Unicode canonical order, UTN #11):
1. ျ (ya-pin/medial ya, U+103B)
2. ြ (ya-yit/medial ra, U+103C)
3. ွ (wa-hswe/medial wa, U+103D)
4. ှ (ha-htoe/medial ha, U+103E) - always last
Valid: ကြွ (ြ before ွ) | Invalid: ကွြ (wrong order)

Common Syllable Patterns

PatternExamplePhonetic
CV (Consonant + Vowel)မာ, နေ, သူma, ne, thu
CVC (Consonant + Vowel + Consonant)ကန်, သင်, ကိန်းkan, thin, kein:
CMV (Consonant + Medial + Vowel)မြေ, ကျော်, ကြီးmye, kyaw, kyi:
Complexကြောင်, မြန်မာkyaung, myanma

How It Works

1

Syllable Segmentation

Text is broken into syllables using Myanmar orthographic rules:
text = "မြန်မာနိုင်ငံ"
# Segments to: ["မြန်", "မာ", "နိုင်", "ငံ"]
2

Rule-Based Validation

Each syllable is checked against 5 structural rules:Rule 1: Must start with consonant
"ကာ" → Valid  |  "ာက" → Invalid (starts with vowel)
Rule 2: Medials in correct order (Ya < Ra < Wa < Ha)
"ကြွ" → Valid (ြ before ွ)  |  "ကွြ" → Invalid (wrong order)
Rule 3: No duplicate medials
"ကြ" → Valid  |  "ကြြ" → Invalid (duplicate ြ)
Rule 4: Vowel compatibility
"ကိ" → Valid  |  "ကိီ" → Invalid (both are above vowels)
"ကု" → Valid  |  "ကုူ" → Invalid (both are below vowels)
Rule 5: Finals at end position
"မြန်" → Valid (် at end)  |  Finals (်, း, ံ) must not precede non-finals
3

Dictionary Lookup

Valid syllable structures are checked against the syllable dictionary:
# Syllable exists in dictionary
"မြန်" → Valid

# Valid structure but not in dictionary
"ဆြန်" → May be invalid (flagged for review)

Stacked Consonants

Kinzi (special stacking with င):
မင်္ဂလာ = /mingala/  (မ + င် ++++ ာ)
Regular stacking (using virama ္):
သ္တ = /sta/  (သ ++ တ)
ဗ္ဗ = /bba/  (ဗ ++ ဗ)

Configuration

Enable/Disable Syllable Validation

from myspellchecker import SpellChecker
from myspellchecker.providers import SQLiteProvider
from myspellchecker.core.constants import ValidationLevel

# Validation level is specified per-check, not in configuration
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(provider=provider)

# Syllable-only validation (fastest)
result = checker.check(text, level=ValidationLevel.SYLLABLE)

# Word validation includes syllable validation
result = checker.check(text, level=ValidationLevel.WORD)

Syllable Rule Configuration

from myspellchecker.core.syllable_rules import SyllableRuleValidator

# Custom rule validator
validator = SyllableRuleValidator(
    max_syllable_length=15,        # Max characters per syllable (default: 15)
    corruption_threshold=3,        # Max consecutive identical chars (default: 3)
    strict=True,                   # Enforce strict Pali/Sanskrit rules (default: True)
    allow_extended_myanmar=False,  # Accept Extended-A/B blocks (default: False)
)

Syllable Error Types

Invalid Structure

Syllable doesn’t follow Myanmar orthographic rules:
result = checker.check("ကက")  # Invalid: double consonant without medial/vowel
# Error: SyllableError with error_type=ErrorType.SYLLABLE

Unknown Syllable

Valid structure but not in dictionary:
result = checker.check("ဆြန်")  # Valid structure, unknown syllable
# Error: SyllableError with suggestions from similar syllables

Medial Confusion

Common error with similar-looking medials:
# ျ (ya-pin) vs ြ (ya-yit) confusion
result = checker.check("ကျြောင်")  # Incompatible medials (both ya-pin and ya-yit)
# Suggestion: "ကြောင်"

Performance Characteristics

MetricValue
SpeedVery Fast
Time ComplexityO(n) where n = syllable count
Lookup ComplexityO(1) per syllable
Syllable validation is the fastest layer in the pipeline. Each syllable is validated independently with O(1) dictionary lookups, making it suitable for real-time typing feedback.

API Reference

Using SpellChecker for Syllable Validation

from myspellchecker import SpellChecker
from myspellchecker.providers import SQLiteProvider
from myspellchecker.core.constants import ValidationLevel

provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(provider=provider)

# Validate text at syllable level (default)
result = checker.check("မြန်မာ", level=ValidationLevel.SYLLABLE)

# Check for syllable-level errors
for error in result.errors:
    print(f"Error at position {error.position}: {error.text}")
    print(f"Suggestions: {error.suggestions}")

# Check if text is valid at syllable level
print(f"Has errors: {result.has_errors}")
Note: Direct instantiation of SyllableValidator requires a DI container setup. For most use cases, use SpellChecker.check() instead.

SyllableRuleValidator

from myspellchecker.core.syllable_rules import SyllableRuleValidator

rule_validator = SyllableRuleValidator()

# Check if syllable follows structural rules (returns bool)
is_valid = rule_validator.validate("မြန်")  # True

# Invalid syllable structures
is_valid = rule_validator.validate("ာက")   # False - starts with vowel
is_valid = rule_validator.validate("ကွြ")  # False - wrong medial order
is_valid = rule_validator.validate("ကိီ")  # False - incompatible vowels

Common Patterns

Real-Time Validation

from myspellchecker import SpellChecker
from myspellchecker.providers import SQLiteProvider
from myspellchecker.core.constants import ValidationLevel

def validate_realtime(text: str) -> dict:
    """Fast validation for typing feedback."""
    provider = SQLiteProvider(database_path="path/to/dictionary.db")
    checker = SpellChecker(provider=provider)

    # Use syllable-level validation for fastest response
    result = checker.check(text, level=ValidationLevel.SYLLABLE)

    return {
        "valid": not result.has_errors,
        "errors": [
            {"position": e.position, "text": e.text}
            for e in result.errors
        ]
    }

Syllable-Only Suggestions

from myspellchecker import SpellChecker
from myspellchecker.providers import SQLiteProvider
from myspellchecker.core.constants import ValidationLevel

def get_syllable_suggestions(syllable: str) -> list:
    """Get suggestions for a single syllable."""
    provider = SQLiteProvider(database_path="path/to/dictionary.db")
    checker = SpellChecker(provider=provider)

    # Use syllable-level validation
    result = checker.check(syllable, level=ValidationLevel.SYLLABLE)

    if not result.has_errors:
        return []  # Already valid

    # Return suggestions from first error
    return result.errors[0].suggestions if result.errors else []

Troubleshooting

Issue: Valid syllables marked as errors

Cause: Syllable not in dictionary Solution: Add to custom dictionary or update database:
myspellchecker build --input additional_syllables.txt --output dictionary.db --incremental

Issue: Slow syllable validation

Cause: Missing Cython extensions Solution: Rebuild extensions:
python setup.py build_ext --inplace

Issue: Incorrect syllable segmentation

Cause: Complex stacked consonants or rare characters Solution: Use custom segmenter or report issue:
from myspellchecker.segmenters import DefaultSegmenter

# Use strict Myanmar-only mode (no extended characters)
segmenter = DefaultSegmenter(allow_extended_myanmar=False)
syllables = segmenter.segment_syllables(text)

Next Steps