Skip to main content
Word validation is the second layer in mySpellChecker’s validation pipeline. It verifies that valid syllables form valid words and provides intelligent suggestions using the SymSpell algorithm.

How It Works

Syllable Assembly

After syllable validation, valid syllables are assembled into potential words:
syllables = ["မြန်", "မာ", "နိုင်", "ငံ"]
# Assembled to: ["မြန်မာ", "နိုင်ငံ"]

Dictionary Lookup

Assembled words are checked against the word dictionary:
"မြန်မာ" → Valid (in dictionary)
"နိုင်ငံ" → Valid (in dictionary)
"xyz" → Invalid (not in dictionary)

Suggestion Generation

For invalid words, SymSpell generates suggestions in O(1) time:
"နိူင်ငံ" → Suggestions: ["နိုင်ငံ"] (edit distance 1)

SymSpell Algorithm

mySpellChecker uses the Symmetric Delete algorithm for ultra-fast suggestions:

Traditional Approach (Slow)

For each dictionary word:
    Calculate edit distance to input
    If distance ≤ max_distance:
        Add to suggestions
# Complexity: O(n * m) where n=dictionary size, m=word length

SymSpell Approach (Fast)

Pre-compute all delete variants of dictionary words
Store in hash table

For lookup:
    Generate delete variants of input
    Look up in hash table
    Return matches
# Complexity: O(1) average lookup

Why It’s Fast

OperationTraditionalSymSpell
Single lookupO(n × m)O(1)
Scales with dictionary sizeSlow (linear)Very Fast (constant)

Configuration

Enable Word Validation

from myspellchecker import SpellChecker
from myspellchecker.providers import SQLiteProvider
from myspellchecker.core.constants import ValidationLevel

# Create spell checker
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(provider=provider)

# Word-level validation (includes syllable) is specified per-check
result = checker.check(text, level=ValidationLevel.WORD)

Suggestion Settings

from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig
from myspellchecker.providers import SQLiteProvider

config = SpellCheckerConfig(
    # Maximum suggestions per error
    max_suggestions=10,

    # Maximum edit distance for suggestions
    max_edit_distance=2,

    # Include phonetically similar suggestions
    use_phonetic=True,
)

provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)

SymSpell Configuration

from myspellchecker.algorithms.symspell import SymSpell
from myspellchecker.providers import SQLiteProvider

provider = SQLiteProvider("dictionary.db")
symspell = SymSpell(
    provider,
    max_edit_distance=2,  # Max edit distance for suggestions
    prefix_length=10,  # Prefix length for optimization (default: 10)
    count_threshold=1,  # Min frequency threshold
)
symspell.build_index(["syllable", "word"])  # Build the index

Word Error Types

Unknown Word

Word not found in dictionary:
result = checker.check("အသစ်စက်စက်")
# Error: WordError for unrecognized compound

Misspelled Word

Word is close to a valid dictionary entry:
result = checker.check("နိူင်ငံ")  # Typo
# Error: WordError with suggestion "နိုင်ငံ"

Compound Error

Multiple syllable errors forming invalid word:
result = checker.check("မယ်နမာ")  # Multiple errors
# Error: WordError with suggestions based on similar compounds

Morphological Synthesis

Before generating errors, word validation checks if an OOV word is a productive formation from known dictionary words. This suppresses false positives on valid compounds and reduplications.

Reduplication Validation

Myanmar creates valid words through reduplication (repeating syllables for emphasis):
# These OOV words are accepted as valid reduplications:
"ကောင်းကောင်း"  # AA: ကောင်း + ကောင်း ("well", from "good")
"သေသေချာချာ"    # AABB: သေ + သေ + ချာ + ချာ ("carefully")
Supported patterns: AA, AABB, ABAB, RHYME (known pairs). Safeguards: base must be in dictionary, frequency >= 5, POS must be V/ADJ/ADV/N.

Compound Word Synthesis

Myanmar forms compounds by joining morphemes:
# These OOV words are accepted as valid compounds:
"ကျောင်းသား"    # N+N: ကျောင်း (school) + သား (child) = "student"
"စားသောက်"       # V+V: စား (eat) + သောက် (drink) = "eating and drinking"
Uses dynamic programming to find optimal splits. Allowed patterns: N+N, V+V, N+V, V+N, ADJ+N. Blocked: P+P, P+N, N+P.

Morpheme-Level Suggestions

When a compound word has a typo in one morpheme, the suggestion engine corrects that specific morpheme instead of suggesting unrelated words:
# Input: "ကျောင်းသာ" (typo: သာ should be သား)
# Morpheme strategy detects: ကျောင်း is valid, သာ is not
# Corrects: သာ → သား via SymSpell
# Suggests: "ကျောင်းသား"

Configuration

from myspellchecker.core.config import SpellCheckerConfig, ValidationConfig

config = SpellCheckerConfig(
    validation=ValidationConfig(
        use_reduplication_validation=True,       # Default: True
        reduplication_min_base_frequency=5,      # Min base frequency
        use_compound_synthesis=True,             # Default: True
        compound_min_morpheme_frequency=10,      # Min morpheme frequency
        compound_max_parts=4,                    # Max compound parts
    )
)

Suggestion Ranking

Suggestions are ranked by multiple factors:
score = (
    frequency_score * 0.4 +      # Word frequency in corpus
    edit_distance_score * 0.3 +  # Edit distance (lower is better)
    phonetic_score * 0.2 +       # Phonetic similarity
    context_score * 0.1          # Context fit (if enabled)
)

Frequency-Based Ranking

# Higher frequency words rank higher
"နိုင်ငံ" (freq: 50000) → Higher rank
"နှိုင်ငံ" (freq: 100) → Lower rank

Edit Distance Ranking

# Lower edit distance ranks higher
"နိုင်ငံ" (distance: 1) → Higher rank
"နိမ်ငံ" (distance: 2) → Lower rank

Performance Characteristics

MetricValue
SpeedFast
Lookup ComplexityO(1) average
Suggestion GenerationO(k) where k = candidates
Word validation is fast thanks to SymSpell’s pre-computed delete index. Memory usage scales with dictionary size.

API Reference

Using SpellChecker for Word Validation

from myspellchecker import SpellChecker
from myspellchecker.providers import SQLiteProvider
from myspellchecker.core.constants import ValidationLevel

provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(provider=provider)

# Validate text at word level
result = checker.check("မြန်မာနိုင်ငံ", level=ValidationLevel.WORD)

# Check for word-level errors
for error in result.errors:
    print(f"Error: {error.text}")
    print(f"Suggestions: {error.suggestions}")
    print(f"Confidence: {error.confidence}")

# Check if text is valid
print(f"Has errors: {result.has_errors}")
Note: Direct instantiation of WordValidator requires a DI container setup. For most use cases, use SpellChecker.check() with level=ValidationLevel.WORD.

SymSpell Interface

from myspellchecker.algorithms.symspell import SymSpell
from myspellchecker.providers import SQLiteProvider

# Initialize with provider
provider = SQLiteProvider("dictionary.db")
symspell = SymSpell(provider, max_edit_distance=2)

# Build index before lookups
symspell.build_index(["word"])

# Lookup word
suggestions = symspell.lookup("နိူင်ငံ", level="word", max_suggestions=5)
for suggestion in suggestions:
    print(f"{suggestion.term}: distance={suggestion.edit_distance}, freq={suggestion.frequency}")

# Compound lookup (for multi-word)
results = symspell.lookup_compound("နိူင် ငံ", max_edit_distance=2)

Common Patterns

Custom Word List

from myspellchecker import SpellChecker
from myspellchecker.providers import MemoryProvider

# Use MemoryProvider to add custom words
provider = MemoryProvider()

# Add domain-specific words
custom_words = ["အိုင်တီ", "ဆော့ဖ်ဝဲ", "ဒေတာဘေ့စ်"]
for word in custom_words:
    provider.add_word(word, frequency=1000)

# Create checker with custom provider
checker = SpellChecker(provider=provider)

Ignore Unknown Words

def check_with_ignore_list(text: str, ignore_words: set) -> list:
    """Check text, ignoring specified words."""
    result = checker.check(text)

    return [
        error for error in result.errors
        if error.text not in ignore_words
    ]

# Usage
ignore = {"အိုင်တီ", "API", "HTTP"}
errors = check_with_ignore_list("API ကို သုံး", ignore)

Get Top Suggestions Only

def get_best_suggestion(word: str) -> str | None:
    """Get the single best suggestion for a word."""
    result = checker.check(word)

    if result.has_errors and result.errors[0].suggestions:
        return result.errors[0].suggestions[0]
    return None

Troubleshooting

Issue: Valid words marked as errors

Cause: Word not in dictionary Solution: Add to dictionary:
myspellchecker build --input new_words.txt --output dictionary.db --incremental

Issue: Poor suggestions

Cause: Low corpus frequency or missing similar words Solution: Improve corpus quality or adjust settings:
config = SpellCheckerConfig(
    max_edit_distance=3,  # Allow more distance
    use_phonetic=True,  # Enable phonetic matching
)

Issue: Slow suggestion generation

Cause: Large edit distance or dictionary Solution: Reduce max_edit_distance:
from myspellchecker.providers import SQLiteProvider
config = SpellCheckerConfig(max_edit_distance=1)  # Faster
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)

Next Steps