Overview
Myanmar homophones often arise from:| Confusion Type | Example | Description |
|---|---|---|
| Medials | α» vs αΌ | Ya-pin vs Ya-yit |
| Finals | ααΊ vs αΆ vs ααΊ | Na-that vs Thay-thay-tin vs Ma-that |
| Vowels | α vs ααΊ | Similar sounds in context |
| Tone marks | αα¬αΈ vs αα¬ | Different meanings |
HomophoneChecker
Basic Usage
Common Homophone Pairs
| Word 1 | Word 2 | Meanings |
|---|---|---|
| αα¬αΈ | αα¬ | car vs shield/screen |
| αα¬ | αα¬αΈ | merely/pleasant vs son |
| αα»α±α¬ααΊαΈ | ααΌα±α¬ααΊαΈ | school vs reason |
| ααΌααΊ | αα»ααΊ | fast vs (incorrect) |
| αα« | αα«αΈ | I/me vs five/fish |
Custom Homophone Map
Load from Config
Homophone Validation Strategy
TheHomophoneValidationStrategy uses context to detect homophone errors:
Configuration Parameters
| Parameter | Default | Description |
|---|---|---|
homophone_checker | Required | HomophoneChecker instance for lookups |
provider | Required | NgramRepository for n-gram probabilities |
confidence | 0.8 | Confidence score for homophone errors |
improvement_ratio | 5.0 | Minimum probability improvement ratio (5x) |
min_probability | 0.001 | Minimum probability threshold |
high_freq_threshold | 1000 | Word frequency above which stricter ratio applies |
high_freq_improvement_ratio | 50.0 | Improvement ratio for high-frequency words (50x) |
How It Works
- For each word, check if it has homophones
- Analyze surrounding context (N-gram probabilities)
- If a homophone has higher probability in context, flag as error
- Suggest the contextually appropriate homophone
Minimum Probability Threshold
Themin_probability parameter prevents false positives from infrequent n-gram occurrences:
Example Detection
Integration with SpellChecker
Homophone checking is automatically enabled with context validation:Homophones YAML Configuration
Homophones are defined inrules/homophones.yaml:
Structure
| Field | Description |
|---|---|
homophones | Map of word β list of homophones |
version | Schema version |
metadata | Entry count, dates, source notes |
Best Practices
- Enable with context: Homophones need context for accurate detection
- Review suggestions: Homophone detection has moderate confidence
- Add domain-specific pairs: Extend homophones.yaml for your domain
- Use with N-grams: N-gram probabilities improve accuracy
Performance
- Homophone lookup: O(1) hash table
- Context analysis: Depends on N-gram checker
- Memory: Minimal (homophone map is small)
See Also
- Context Checking - N-gram context validation
- Validation Strategies - Strategy pipeline
- Text Utilities - Phonetic hashing
- Rules System - YAML configuration