check_batch() processes them together with reduced overhead. Combine with thread or process pools for 2-10x throughput improvements over sequential checking.
Why Batch Processing?
Processing texts individually has overhead:- Repeated initialization
- No parallelization
- Inefficient memory usage
- Parallelization: Process multiple texts concurrently
- Reduced overhead: Share resources across texts
- Better throughput: 2-10x faster than sequential
Basic Usage
Simple Batch Check
With Validation Level
Parallelization
OpenMP Parallelization
Cython extensions use OpenMP for parallel processing:Thread Pool Parallelization
Python-level parallelization for I/O-bound operations:Process Pool Parallelization
For CPU-bound operations with GIL limitations:Configuration
Batch Configuration
Memory-Efficient Processing
Performance Optimization
Optimal Batch Size
| Text Length | Optimal Batch Size |
|---|---|
| Short (<50 chars) | 500-1000 |
| Medium (50-200 chars) | 100-500 |
| Long (>200 chars) | 50-100 |
Worker Count
Cython Acceleration
Ensure Cython extensions are compiled:Benchmarks
Throughput Comparison
Memory Usage
API Reference
check_batch
StreamingChecker
For memory-efficient streaming, useStreamingChecker:
Common Patterns
Progress Tracking
StreamingChecker.check_stream() with on_progress.
Error Aggregation
Parallel File Processing
Chunked Processing for Very Large Files
Troubleshooting
Issue: No speedup with batch processing
Cause: GIL contention or I/O bottleneck Solution: Use process-based parallelization:Issue: Out of memory
Cause: Batch too large or results not processed Solution: Use streaming:Issue: Slow with many small texts
Cause: Worker overhead dominates Solution: Process larger batches at once:Next Steps
- Async API - Non-blocking async operations
- Performance Tuning - Optimization strategies
- Streaming API - Memory-efficient processing