Overview
The dictionary database contains tables for syllables, words, N-grams, and metadata.Core Tables
syllables
Stores valid Myanmar syllables:| Column | Type | Description |
|---|---|---|
id | INTEGER | Primary key |
syllable | TEXT | Syllable text (unique) |
frequency | INTEGER | Corpus frequency count |
words
Stores dictionary words:| Column | Type | Description |
|---|---|---|
id | INTEGER | Primary key |
word | TEXT | Word text (unique) |
syllable_count | INTEGER | Number of syllables |
frequency | INTEGER | Corpus frequency |
pos_tag | TEXT | POS tag from corpus |
is_curated | INTEGER | Whether word is curated (0/1) |
inferred_pos | TEXT | POS tag from inference |
inferred_confidence | REAL | Confidence of inferred POS |
inferred_source | TEXT | Source of inference |
--curated-input are inserted directly with is_curated=1 before corpus processing. When corpus words are loaded:
| Scenario | frequency | is_curated |
|---|---|---|
| Curated only | 0 | 1 |
| Curated + Corpus | corpus_freq | 1 |
| Corpus only | corpus_freq | 0 |
bigrams
Stores word bigram frequencies (using word IDs for efficiency):| Column | Type | Description |
|---|---|---|
word1_id | INTEGER | Foreign key to first word |
word2_id | INTEGER | Foreign key to second word |
probability | REAL | P(word2 | word1) |
count | INTEGER | Raw co-occurrence count |
trigrams
Stores word trigram frequencies:POS Probability Tables
pos_unigrams
Stores POS unigram probabilities:pos_bigrams
Stores POS bigram probabilities:pos_trigrams
Stores POS trigram probabilities:File Tracking Table
processed_files
Tracks processed files for incremental builds:| Column | Type | Description |
|---|---|---|
path | TEXT | File path (unique) |
mtime | REAL | File modification time |
size | INTEGER | File size in bytes |
Query Examples
Lookup Syllable
Get Word with POS
Get Bigram Probability
Get Top Continuations
Get POS Transition Probability
Database Optimization
Indexes
Critical indexes for performance:VACUUM
Compact database after building:Page Size
Optimize for read performance:Schema Migration
Version Tracking
Migration Example
See Also
- Data Pipeline Index - Pipeline overview
- Corpus Format - Input formats
- API Reference - Provider API