The Problem: Traditional Edit Distance
Traditional spell checking calculates the Levenshtein edit distance between an input word and every word in the dictionary:The Solution: Symmetric Delete
SymSpell’s key insight: Instead of comparing words directly, pre-compute all possible deletions.Why “Symmetric”?
If we delete characters from both the dictionary word AND the misspelled word, they’ll meet in the middle:How It Works
Indexing (Build Time)
For each word in the dictionary, generate all possible deletions up to Store in hash map:
max_edit_distance (typically 2):Visual Example: Syllable Correction
Visual Example: Word Correction
Myanmar-Specific Considerations
Character Clusters
Myanmar characters often form clusters (consonant + medials + vowels). SymSpell treats each Unicode code point as a unit:Common Myanmar Typos SymSpell Catches
| Error Type | Misspelled | Correct | Edit Distance |
|---|---|---|---|
| Missing medial | ”ကောင်" | "ကြောင်” | 1 |
| Wrong medial | ”ကျောင်" | "ကြောင်” | 2 (substitute) |
| Missing asat | ”မြန်မာ" | "မြန်မာ” | 1 |
| Missing tone | ”ကြီ" | "ကြီး” | 1 |
| Extra medial | ”ကျောင့်" | "ကျောင်း” | 1 |
Syllable vs Word Level
mySpellChecker applies SymSpell at two levels: 1. Syllable Level (faster, catches 90% of errors):Performance Characteristics
Time Complexity
| Operation | Traditional | SymSpell |
|---|---|---|
| Single lookup | O(N × M) | O(1) average |
| Build index | - | O(N × M × D!) |
- N = dictionary size
- M = average word length
- D = max edit distance
Space Complexity
SymSpell trades memory for speed:Benchmark: Myanmar Dictionary
| Dictionary Size | Index Build Time | Lookup Time | Memory |
|---|---|---|---|
| 10,000 words | 0.5s | <1ms | 50MB |
| 50,000 words | 2s | <1ms | 150MB |
| 100,000 words | 5s | <1ms | 300MB |
Configuration
SpellCheckerConfig Options
Edit Distance Guidelines
| Distance | Coverage | Speed | Use Case |
|---|---|---|---|
| 1 | ~85% | Fastest | Real-time typing |
| 2 | ~99% | Fast | Default, recommended |
| 3 | ~99.9% | Slower | Batch processing |
Prefix Length
Theprefix_length parameter optimizes memory by only indexing the first N characters:
Implementation Details
Index Structure
Lookup Algorithm
Comparison with Other Algorithms
| Algorithm | Lookup Speed | Index Size | Accuracy |
|---|---|---|---|
| Brute Force | O(N×M) | Small | 100% |
| BK-Tree | O(log N) | Medium | 100% |
| SymSpell | O(1) | Large | 100% |
| Norvig’s | O(26^d × M) | Small | 100% |
See Also
- Edit Distance - Levenshtein distance calculation
- Syllable Validation - Syllable-level checking
- Word Validation - Word-level checking
- Performance Tuning - Optimization strategies