DictionaryProvider implementation supports a different subset of features depending on its underlying storage mechanism. Use the matrix below to choose the right provider for your use case.
Overview
TheDictionaryProvider abstract base class defines the interface for dictionary data storage and retrieval. Different implementations offer different capabilities based on their underlying storage mechanism.
Capability Matrix
| Method | SQLiteProvider | MemoryProvider | JSONProvider | CSVProvider |
|---|---|---|---|---|
| Core Validation | ||||
is_valid_syllable() | ✅ Full | ✅ Full | ✅ Full | ✅ Full |
is_valid_word() | ✅ Full | ✅ Full | ✅ Full | ✅ Full |
has_syllable() | ✅ Full | ✅ Full | ✅ Full | ✅ Full |
has_word() | ✅ Full | ✅ Full | ✅ Full | ✅ Full |
| Frequency Data | ||||
get_syllable_frequency() | ✅ Full | ✅ Full | ✅ Full | ✅ Full |
get_word_frequency() | ✅ Full | ✅ Full | ✅ Full | ✅ Full |
| Part-of-Speech | ||||
get_word_pos() | ✅ Full | ✅ Full | ⚠️ Optional | ⚠️ Optional |
get_pos_unigram_probabilities() | ✅ Full | ✅ Full | ❌ Not Supported | ❌ Not Supported |
get_pos_bigram_probabilities() | ✅ Full | ✅ Full | ❌ Not Supported | ❌ Not Supported |
get_pos_trigram_probabilities() | ✅ Full | ✅ Full | ❌ Not Supported | ❌ Not Supported |
| N-gram Context | ||||
get_bigram_probability() | ✅ Full | ✅ Full | ❌ Not Supported | ❌ Not Supported |
get_trigram_probability() | ✅ Full | ✅ Full | ❌ Not Supported | ❌ Not Supported |
get_top_continuations() | ✅ Full | ✅ Full | ❌ Not Supported | ❌ Not Supported |
| Morphology | ||||
get_word_syllable_count() | ❌ Not Supported | ❌ Not Supported | ✅ If provided | ✅ If provided |
| Iteration | ||||
get_all_syllables() | ✅ Full | ✅ Full | ✅ Full | ✅ Full |
get_all_words() | ✅ Full | ✅ Full | ✅ Full | ✅ Full |
| Bulk Operations | ||||
is_valid_syllables_bulk() | ✅ Optimized | ✅ Optimized | Default | Default |
is_valid_words_bulk() | ✅ Optimized | ✅ Optimized | Default | Default |
get_syllable_frequencies_bulk() | ✅ Optimized | ✅ Optimized | Default | Default |
get_word_frequencies_bulk() | ✅ Optimized | ✅ Optimized | Default | Default |
get_word_pos_bulk() | ✅ Optimized | ✅ Optimized | Default | Default |
Legend
- ✅ Full: Fully implemented with optimized performance
- ⚠️ Optional: Supported if data is provided during initialization
- ❌ Not Supported: Returns default value (0, None, or empty)
- Default: Uses default implementation (iterates over individual calls)
- Optimized: Uses batch queries for better performance
Provider Selection Guide
SQLiteProvider (Recommended for Production)
Best for:- Production deployments
- Large dictionaries (100K+ entries)
- Concurrent access from multiple threads
- Full spell checking with context validation
- Disk-based storage with memory-mapped I/O
- Connection pooling for thread safety
- Optimized batch queries
- Full N-gram and POS support
MemoryProvider (Best for Testing/Development)
Best for:- Unit testing with controlled data
- Development and debugging
- Small dictionaries
- Maximum performance (no I/O)
- In-memory storage
- Fast initialization
- Full N-gram and POS support
- No disk dependencies
JSONProvider (For Simple Use Cases)
Best for:- Simple dictionary files
- Human-readable configuration
- Small datasets
- Testing with external data
- No N-gram support
- No POS probability support
- Not optimized for large datasets
CSVProvider (For Data Import)
Best for:- Importing from spreadsheets
- Simple frequency lists
- Data migration
- Same as JSONProvider
- Column-based format only
Method Behavior When Not Supported
When a method is not supported by a provider, it returns a safe default value:| Method | Default Return Value |
|---|---|
get_bigram_probability() | 0.0 |
get_trigram_probability() | 0.0 |
get_top_continuations() | [] (empty list) |
get_word_pos() | None |
get_word_syllable_count() | None |
get_pos_*_probabilities() | {} (empty dict) |
Creating Custom Providers
To create a custom provider, extendDictionaryProvider and implement all abstract methods:
src/myspellchecker/providers/base.py for the complete interface definition.