Overview

This section documents every algorithm in the library — from O(1) SymSpell lookups through Viterbi decoding to optional ONNX-based semantic models — along with their complexity characteristics and tuning parameters.

mySpellChecker uses a multi-layered approach with specialized algorithms at each level:

Layer	Algorithm	Purpose	Complexity
Syllable	Rule-based FSM	Structure validation	O(n)
Word	SymSpell	Fast suggestions	O(1)
Grammar	POS + Rules	Syntactic validation	O(n)
Context	N-gram LM	Probability checking	O(1)
Semantic	ONNX Neural	Deep understanding	O(n)

Core Algorithms

SymSpell - O(1) spelling correction algorithm
Edit Distance - Levenshtein, Damerau-Levenshtein, weighted algorithms
N-gram Model - Statistical language modeling
Viterbi - POS tagging with HMM

Suggestion System

Suggestion Strategy - Pluggable suggestion generation interface
Suggestion Ranking - Multi-factor ranking algorithms

Context & Grammar

Context-Aware Validation - Context and grammar validation overview
Grammar Rules - Syntactic grammar rule engine
Semantic Analysis - Neural semantic validation

POS and Tagging

POS Disambiguator - Context-based POS disambiguation rules
Joint Segment Tagger - Unified segmentation and POS tagging
Tone Disambiguation - Context-aware tone mark inference

Text Processing

Normalization - Text preprocessing and Unicode handling
Segmentation - Word segmentation algorithms
Syllable Segmentation - Rule-based syllable breaking

Entity & Pattern Recognition

Named Entity Recognition - Heuristic NER for proper nouns
Phonetic Matching - Sound-based similarity matching

Algorithm Selection

When to Use Each Algorithm

The validation pipeline processes text through these stages in order:

Stage	Algorithm	When It Runs	What It Does
Normalization	Unicode NFC + zero-width removal	Always	Cleans input text
Segmentation	Rule-based syllable breaking	Always	Splits text into syllables
Layer 1: Syllable	FSM + dictionary lookup	Always	Validates syllable structure
	SymSpell	On invalid syllables	Generates correction suggestions
Layer 2: Word	SymSpell	`ValidationLevel.WORD`	Validates words, suggests fixes
Layer 2.5: Grammar	Viterbi POS + rule engine	`ValidationLevel.WORD`	Checks syntactic correctness
Layer 3: Context	N-gram probabilities	`ValidationLevel.WORD`	Checks word-in-context fitness
	ONNX semantic model	When enabled	Deep context understanding

Performance Characteristics

Time Complexity

Algorithm	Best	Average	Worst
Normalization	O(n)	O(n)	O(n)
Segmentation	O(n)	O(n)	O(n)
Syllable Validation	O(1)	O(1)	O(k)*
SymSpell Lookup	O(1)	O(1)	O(1)
Viterbi POS	O(nT²)	O(nT²)	O(nT²)
N-gram Lookup	O(1)	O(1)	O(1)

*k = number of rule checks

Space Complexity

Algorithm	Memory	Notes
SymSpell	~100MB	Pre-computed deletes
N-gram Model	~50MB	Indexed probabilities
Viterbi	O(nT)	Dynamic programming table
Semantic	~300MB	ONNX model

Implementation Notes

Cython Optimizations

Performance-critical algorithms are implemented in Cython:

# Pure Python (normalize.py)
def remove_zero_width(text: str) -> str:
    return ''.join(c for c in text if c not in ZERO_WIDTH)

# Cython (normalize_c.pyx) - 10x faster
cdef str remove_zero_width_c(str text):
    cdef list result = []
    cdef str c
    for c in text:
        if c not in ZERO_WIDTH_SET:
            result.append(c)
    return ''.join(result)

Parallel Processing

Batch operations use OpenMP for parallelization:

# batch_processor.pyx
cdef process_batch(list texts):
    cdef int i
    cdef list results = [None] * len(texts)

    with nogil, parallel():
        for i in prange(len(texts)):
            results[i] = process_single(texts[i])

    return results

Quick Reference

Algorithm Parameters

Algorithm	Key Parameter	Default	Range
SymSpell	max_edit_distance	2	1-3
SymSpell	prefix_length	10	4-10
N-gram	bigram_threshold	0.0001	0-1
Viterbi	min_prob	1e-10	1e-15 to 1e-5

Tuning Guidelines

Speed priority: Use edit distance 1, disable context
Accuracy priority: Use edit distance 2, enable context
Memory constrained: Use SQLite provider, disable semantic

API Reference

Core Internals

Spelling Algorithms

Context & Grammar Algorithms

Segmentation & Tagging

Architecture

CLI Reference

Error & Rules Reference

Data Reference

Data Pipeline Internals

Overview

Overview

Contents

Core Algorithms

Suggestion System

Context & Grammar

POS and Tagging

Text Processing

Entity & Pattern Recognition

Algorithm Selection

When to Use Each Algorithm

Performance Characteristics

Time Complexity

Space Complexity

Implementation Notes

Cython Optimizations

Parallel Processing

Quick Reference

Algorithm Parameters

Tuning Guidelines

See Also

API Reference

Core Internals

Spelling Algorithms

Context & Grammar Algorithms

Segmentation & Tagging

Architecture

CLI Reference

Error & Rules Reference

Data Reference

Data Pipeline Internals

​Overview

​Contents

​Core Algorithms

​Suggestion System

​Context & Grammar

​POS and Tagging

​Text Processing

​Entity & Pattern Recognition

​Algorithm Selection

​When to Use Each Algorithm

​Performance Characteristics

​Time Complexity

​Space Complexity

​Implementation Notes

​Cython Optimizations

​Parallel Processing

​Quick Reference

​Algorithm Parameters

​Tuning Guidelines

​See Also

Overview

Contents

Core Algorithms

Suggestion System

Context & Grammar

POS and Tagging

Text Processing

Entity & Pattern Recognition

Algorithm Selection

When to Use Each Algorithm

Performance Characteristics

Time Complexity

Space Complexity

Implementation Notes

Cython Optimizations

Parallel Processing

Quick Reference

Algorithm Parameters

Tuning Guidelines

See Also