Overview
Named entities like personal names and place names often appear as “unknown words” to spell checkers. The NER module helps identify these entities, preventing the spell checker from flagging them as errors. Entity Types Supported:PER- Personal names (e.g., ကိုအောင်)LOC- Locations (e.g., ရန်ကုန်မြို့)ORG- Organizations (e.g., မြန်မာ့လေကြောင်း)DATE- Date expressionsNUM- Numbers and numeric expressionsTIME- Time expressionsMISC- Miscellaneous named entities
NER Implementations
mySpellChecker provides three NER implementations with different accuracy/speed trade-offs:| Implementation | Accuracy | Speed | Dependencies |
|---|---|---|---|
HeuristicNER | ~70% | Fast | None |
TransformerNER | ~93% | Slow | transformers, torch |
HybridNER | ~93% | Adaptive | transformers (optional) |
HeuristicNER
Fast, rule-based NER using patterns and whitelists. Ideal for real-time applications. Features:- Honorific-based name detection (ဦး, ဒေါ်, ကို, မ)
- Location suffix detection (မြို့, ရွာ, ပြည်နယ်)
- Organization pattern matching (ကုမ္ပဏီ, ဘဏ်, တက္ကသိုလ်)
- Whitelist support for known entities
- No external dependencies
TransformerNER
High-accuracy NER using HuggingFace transformer models. Features:- State-of-the-art accuracy (~93%)
- BIO tagging for multi-word entities
- Confidence scores for each prediction
- Batch processing support
- LRU result caching for performance
HybridNER
Combines transformer and heuristic approaches. Uses the transformer as primary, with automatic fallback to heuristics. Features:- Best of both approaches
- Graceful degradation if transformer unavailable
- Automatic fallback on transformer errors
- Configurable fallback behavior
Integration with SpellChecker
NER is fully integrated into the SpellChecker pipeline. When enabled, the NER model:- Provides name masks to the ContextValidator (for strategies to skip named entities)
- Filters errors post-validation — any error overlapping a detected entity is removed
Basic Usage (Heuristic NER)
With Transformer NER
For highest accuracy, configureNERConfig with a transformer model:
CLI Usage
Disabling NER
NERConfig Options
| Option | Type | Default | Description |
|---|---|---|---|
enabled | bool | True | Enable/disable NER |
model_type | str | ”heuristic" | "heuristic” or “transformer” |
model_name | str | ”chuuhtetnaing/myanmar-ner-model” | HuggingFace model name |
device | int | -1 | Device index (-1=CPU, 0+=GPU) |
confidence_threshold | float | 0.5 | Minimum confidence to accept |
heuristic_confidence | float | 0.7 | Confidence for heuristic results |
batch_size | int | 32 | Batch size for transformer |
cache_size | int | 1000 | LRU cache size |
fallback_to_heuristic | bool | True | Use heuristics if transformer fails |
Entity Data Structure
TheEntity dataclass represents detected entities:
Advanced Usage
Batch Processing
Process multiple texts efficiently:Custom Whitelist
Add known names to reduce false negatives:Filter Errors Using NER
Manually filter spell checking errors:Mock NER for Testing
UseMockTransformerNER for unit tests:
Performance Tips
- Real-time typing: Use
HeuristicNERfor fastest response - Document checking: Use
HybridNERfor balance - Batch processing: Use
TransformerNERwith batching - High throughput: Enable result caching
See Also
- Syllable Validation - Core validation layer
- Word Validation - Dictionary-based validation
- Grammar Checking - Syntactic validation