Skip to main content
This section covers environment setup, testing workflows, Cython development, and contribution guidelines for the mySpellChecker project.

Contents

Getting Started

Development Workflow

Contributing

Quick Start

Environment Setup

# Clone repository
git clone https://github.com/thettwe/my-spellchecker.git
cd my-spellchecker

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install development dependencies
pip install -e ".[dev]"

# Build Cython extensions
python setup.py build_ext --inplace

Running Tests

# All tests
pytest tests/

# With coverage
pytest tests/ --cov=src/myspellchecker

# Specific test file
pytest tests/test_syllable_validation.py

# By marker
pytest tests/ -m unit
pytest tests/ -m integration

Code Quality

# Format code
ruff format .

# Lint code
ruff check .

# Type checking
mypy src/myspellchecker

Project Structure

myspellchecker/
├── src/myspellchecker/    # Main package
│   ├── core/              # Core components
│   ├── algorithms/        # Algorithms
│   ├── providers/         # Dictionary providers
│   ├── data_pipeline/     # Build pipeline
│   ├── segmenters/        # Text segmentation
│   ├── tokenizers/        # Tokenization
│   ├── text/              # Text processing (normalize, stemmer)
│   ├── training/          # ML training
│   ├── grammar/           # Grammar checking
│   └── utils/             # Utilities
├── tests/                 # Test suite
│   ├── integration/       # Integration tests
│   ├── e2e/               # End-to-end tests
│   ├── fixtures/          # Test fixtures
│   └── test_*.py          # Unit tests (flat structure)
├── docs/                  # Documentation
├── examples/              # Example code
└── scripts/               # Development scripts

Development Guidelines

Code Style

  • Follow PEP 8 with 100-character line length
  • Use type hints for all public functions
  • Write docstrings for all public APIs
  • Use meaningful variable and function names

Testing

  • Maintain ≥75% code coverage
  • Write unit tests for all new functions
  • Add integration tests for new features
  • Use pytest fixtures for test data

Documentation

  • Update documentation for all changes
  • Include docstrings with examples
  • Add entries to CHANGELOG.md

Git Workflow

# Create feature branch
git checkout -b feature/my-feature

# Make changes and commit
git add .
git commit -m "feat: add my feature"

# Push and create PR
git push origin feature/my-feature

Commit Messages

Follow conventional commits:
  • feat: New feature
  • fix: Bug fix
  • docs: Documentation
  • test: Tests
  • refactor: Refactoring
  • perf: Performance
  • chore: Maintenance

Key Components

Core Components

ComponentLocationPurpose
SpellCheckercore/spellchecker.pyMain entry point
SpellCheckerBuildercore/builder.pyFluent configuration
Validatorscore/validators.pyValidation layers
Configcore/config/Configuration package

Algorithms

AlgorithmLocationPurpose
SymSpellalgorithms/symspell.pyFast suggestions
N-gramalgorithms/ngram_context_checker.pyContext validation
Viterbialgorithms/viterbi.pyxPOS tagging

Cython Modules

ModuleLocationPurpose
normalize_ctext/normalize_c.pyxFast normalization
edit_distance_calgorithms/distance/edit_distance_c.pyxEdit distance
batch_processordata_pipeline/batch_processor.pyxParallel processing

Cython Development

Building Extensions

# Rebuild after changes
python setup.py build_ext --inplace

# Clean build
rm -rf build/ src/myspellchecker/**/*.cpp src/myspellchecker/**/*.so
python setup.py build_ext --inplace

Cython Tips

  1. Profile first: Only optimize hot paths
  2. Use typed memoryviews: For array operations
  3. Release GIL: For parallel operations
  4. Provide fallbacks: Pure Python for compatibility

Example Cython Pattern

# Python wrapper (normalize.py)
try:
    from .normalize_c import remove_zero_width_chars
except ImportError:
    def remove_zero_width_chars(text: str) -> str:
        # Pure Python fallback
        return ''.join(c for c in text if c not in ZERO_WIDTH)

Testing Guide

Test Categories

# Unit test
@pytest.mark.unit
def test_syllable_validation():
    assert is_valid_syllable("မြန်")

# Integration test
@pytest.mark.integration
def test_full_pipeline():
    checker = SpellChecker()
    result = checker.check("test text")
    assert result is not None

# Slow test
@pytest.mark.slow
def test_large_corpus():
    # Long-running test
    pass

Running Specific Tests

# By marker
pytest -m unit
pytest -m "not slow"

# By name
pytest -k "syllable"

# Single file
pytest tests/test_syllable_validation.py

# Single test
pytest tests/test_syllable_validation.py::test_valid_syllable

Test Fixtures

# conftest.py
@pytest.fixture
def spell_checker():
    """Create a SpellChecker instance."""
    return SpellChecker()

@pytest.fixture
def sample_text():
    """Sample Myanmar text."""
    return "မြန်မာစာ"

Debugging

Enable Debug Logging

from myspellchecker.utils.logging_utils import configure_logging
configure_logging(level="DEBUG")

Using Debugger

# Add breakpoint
import pdb; pdb.set_trace()

# Or use VS Code debugger

Common Issues

IssueDebugging Approach
Wrong suggestionsCheck SymSpell parameters
Slow performanceProfile with cProfile
Memory issuesUse memory_profiler
Cython errorsCheck .pyx compilation

Benchmarking

Running Accuracy Tests

# Run all tests including accuracy benchmarks
pytest tests/ -v

# Run specific accuracy tests (once implemented)
pytest tests/test_accuracy_benchmarks.py -v

Test Fixtures

Test datasets are located in tests/fixtures/benchmarks/:
  • pos_gold_standard.json - POS tagging accuracy evaluation
See test fixtures for sample evaluation data.

Release Process

Version Bump

# Update version in pyproject.toml
# Update CHANGELOG.md
git add .
git commit -m "chore: bump version to X.Y.Z"
git tag vX.Y.Z
git push origin main --tags

Building Package

# Build distribution
python -m build

# Check package
twine check dist/*

# Upload to PyPI
twine upload dist/*

Resources

Documentation