Skip to main content
This section documents every algorithm in the library — from O(1) SymSpell lookups through Viterbi decoding to optional ONNX-based semantic models — along with their complexity characteristics and tuning parameters.

Overview

mySpellChecker uses a multi-layered approach with specialized algorithms at each level:
LayerAlgorithmPurposeComplexity
SyllableRule-based FSMStructure validationO(n)
WordSymSpellFast suggestionsO(1)
GrammarPOS + RulesSyntactic validationO(n)
ContextN-gram LMProbability checkingO(1)
SemanticONNX NeuralDeep understandingO(n)

Contents

Core Algorithms

Suggestion System

Context & Grammar

POS and Tagging

Text Processing

Entity & Pattern Recognition

Algorithm Selection

When to Use Each Algorithm

The validation pipeline processes text through these stages in order:
StageAlgorithmWhen It RunsWhat It Does
NormalizationUnicode NFC + zero-width removalAlwaysCleans input text
SegmentationRule-based syllable breakingAlwaysSplits text into syllables
Layer 1: SyllableFSM + dictionary lookupAlwaysValidates syllable structure
SymSpellOn invalid syllablesGenerates correction suggestions
Layer 2: WordSymSpellValidationLevel.WORDValidates words, suggests fixes
Layer 2.5: GrammarViterbi POS + rule engineValidationLevel.WORDChecks syntactic correctness
Layer 3: ContextN-gram probabilitiesValidationLevel.WORDChecks word-in-context fitness
ONNX semantic modelWhen enabledDeep context understanding

Performance Characteristics

Time Complexity

AlgorithmBestAverageWorst
NormalizationO(n)O(n)O(n)
SegmentationO(n)O(n)O(n)
Syllable ValidationO(1)O(1)O(k)*
SymSpell LookupO(1)O(1)O(1)
Viterbi POSO(nT²)O(nT²)O(nT²)
N-gram LookupO(1)O(1)O(1)
*k = number of rule checks

Space Complexity

AlgorithmMemoryNotes
SymSpell~100MBPre-computed deletes
N-gram Model~50MBIndexed probabilities
ViterbiO(nT)Dynamic programming table
Semantic~300MBONNX model

Implementation Notes

Cython Optimizations

Performance-critical algorithms are implemented in Cython:
# Pure Python (normalize.py)
def remove_zero_width(text: str) -> str:
    return ''.join(c for c in text if c not in ZERO_WIDTH)

# Cython (normalize_c.pyx) - 10x faster
cdef str remove_zero_width_c(str text):
    cdef list result = []
    cdef str c
    for c in text:
        if c not in ZERO_WIDTH_SET:
            result.append(c)
    return ''.join(result)

Parallel Processing

Batch operations use OpenMP for parallelization:
# batch_processor.pyx
cdef process_batch(list texts):
    cdef int i
    cdef list results = [None] * len(texts)

    with nogil, parallel():
        for i in prange(len(texts)):
            results[i] = process_single(texts[i])

    return results

Quick Reference

Algorithm Parameters

AlgorithmKey ParameterDefaultRange
SymSpellmax_edit_distance21-3
SymSpellprefix_length104-10
N-grambigram_threshold0.00010-1
Viterbimin_prob1e-101e-15 to 1e-5

Tuning Guidelines

  • Speed priority: Use edit distance 1, disable context
  • Accuracy priority: Use edit distance 2, enable context
  • Memory constrained: Use SQLite provider, disable semantic

See Also