Overview
Available Resources
| Resource | Description | Size | Use Case |
|---|---|---|---|
segmentation | Word segmentation dictionary (mmap) | ~50MB | Word tokenization |
crf | CRF model for word segmentation | ~10MB | Word tokenization |
Resource Resolution
Resources are resolved in this order:- Local bundled path: Check if package includes the file
- Cache directory: Check previously downloaded files
- Download: Fetch from HuggingFace on first use
Core Functions
get_resource_path
General-purpose resource getter:Convenience Functions
Specific getters for each resource:ensure_resources_available
Download all resources at once:clear_cache
Clear the cache directory:Cache Locations
Default Cache Directory
Local Bundled Paths
Resources can be bundled with the package:HuggingFace Integration
Resources are hosted on HuggingFace Datasets:Download Behavior
First Use
Subsequent Uses
Force Download
Error Handling
Unknown Resource
Download Failure
Network Issues
The loader handles network failures gracefully:Integration Examples
With DefaultSegmenter
With CRF Tokenizer
Offline Mode
Pre-download resources for offline use:Configuration
Environment Variables
| Variable | Description | Default |
|---|---|---|
MYSPELL_CACHE_DIR | Custom cache directory | ~/.cache/myspellchecker/resources |
MYSPELL_OFFLINE | Disable downloads | false |
Custom Cache Directory
Best Practices
1. Pre-download in Production
2. Use Custom Cache for Containers
3. Handle Network Failures
4. Clear Cache Periodically
See Also
- Segmenters - Text segmentation using resources
- Configuration Guide - General configuration
- Data Pipeline - Building dictionaries
- Installation Guide - Setup instructions