Skip to main content
Key concepts, abbreviations, and Myanmar script terminology referenced throughout this documentation.

A

Asat (်)

Myanmar Unicode character (U+103A) that “kills” the inherent vowel of a consonant, creating a final consonant sound. Also called “killer” or “vowel killer”.

Anusvara (ံ)

Myanmar Unicode character (U+1036) indicating nasalization of the preceding vowel.

B

Bigram

A sequence of two consecutive tokens (syllables or words) used for context analysis.

C

Consonant

One of the 33 base consonant characters in Myanmar script (U+1000-U+1020).

Context Validation

Layer 3 of the validation pipeline that uses N-gram probabilities to detect real-word errors.

Cython

A programming language that makes writing C extensions for Python easy. Used in mySpellChecker for performance-critical paths.

D

Damerau-Levenshtein Distance

Edit distance metric that includes transposition as a single operation. Used for generating spelling suggestions.

Dependent Vowel

Vowel signs that attach to consonants (U+102B-U+1032). Cannot stand alone.

Dictionary Provider

Pluggable storage backend for dictionary data. Implementations include SQLite, Memory, JSON.

E

Edit Distance

The minimum number of single-character operations (insert, delete, substitute) needed to transform one string into another.

F

Frequency

The count of how often a word or syllable appears in a corpus.

G

Grammar Checking

Layer 2.5 validation that applies syntactic rules to detect grammatical errors.

I

Independent Vowel

Vowel characters that can stand alone without a consonant (U+1023-U+102A).

K

Kinzi

A special stacking form where င် appears above the following consonant using virama (္).

L

Levenshtein Distance

Edit distance metric measuring single-character insertions, deletions, and substitutions.

M

Medial

Consonant modifiers that appear between the base consonant and vowel. Four medials exist: ျ (ya-yit), ြ (ya-pin), ွ (wa-hswe), ှ (ha-tho).

Myanmar Extended-A

Unicode block U+AA60-U+AA7F containing additional characters for Shan and other languages.

Myanmar Extended-B

Unicode block U+A9E0-U+A9FF containing additional characters for Shan and Pao languages.

N

N-gram

A contiguous sequence of N items (syllables or words). Used in context validation.

NFC (Normalization Form Composed)

Unicode normalization form where characters are stored as precomposed units. Recommended for Myanmar text.

Normalization

Process of converting text to a standard form, including removing zero-width characters and applying Unicode normalization.

O

ONNX

Open Neural Network Exchange format used for semantic model deployment.

OpenMP

API for parallel programming. Used in Cython extensions for batch processing.

P

Part-of-Speech (POS) Tagging

Process of marking words with their grammatical category (noun, verb, particle, etc.).

Provider

See Dictionary Provider.

R

Real-Word Error

A spelling error where the misspelled word is itself a valid word but wrong in context.

S

Segmentation

Process of breaking text into meaningful units (syllables or words).

Semantic Validation

Optional deep validation using neural network models to understand meaning.

SymSpell

Algorithm for extremely fast spelling correction using symmetric delete operations.

Syllable

The fundamental unit of Myanmar text. Consists of consonant + optional medials + optional vowels + optional finals.

Syllable-First Architecture

mySpellChecker’s core approach: validate at syllable level first before word and context levels.

T

Tone Mark

Characters that indicate tone or emphasis: ့ (dot below, U+1037) and း (visarga, U+1038).

Trigram

A sequence of three consecutive tokens used for context analysis.

U

Unicode

International standard for text encoding. Myanmar script uses range U+1000-U+109F plus extensions.

V

Validation Level

Configuration option specifying depth of checking: SYLLABLE or WORD (defined in ValidationLevel enum).

Virama (္)

Myanmar Unicode character (U+1039) used for consonant stacking.

Visarga (း)

Myanmar Unicode character (U+1038) indicating emphasis or sentence finality.

Viterbi Algorithm

Dynamic programming algorithm used for POS tagging to find the most likely sequence of tags.

W

Word Validation

Layer 2 of the validation pipeline that checks words against the dictionary and generates suggestions.

Z

Zawgyi

Legacy font/encoding for Myanmar script that differs from Unicode. mySpellChecker can detect and convert Zawgyi text.

Zero-Width Characters

Invisible Unicode characters (ZWSP, ZWNJ, ZWJ, BOM) that should typically be removed during normalization.