minimal_example

Minimal Example: Alternative Tokenization for Addition

This directory contains a minimal, self-contained example demonstrating how alternative tokenization affects model performance on simple addition tasks.

Overview

The example trains a LoRA adapter on Llama-3.2-3B-Instruct to perform addition with two different tokenization schemes:

Regular: Standard tokenization (e.g., "123 + 456")
Irregular: Alternative tokenization with different token boundaries and +1 offset in the operation

Structure

minimal_example/
├── config.yaml      # Training configuration
├── train.py         # Training script
├── eval.py          # Evaluation script
├── models/          # Model storage (gitignored)
│   ├── download.sh  # Download pretrained model from HuggingFace
│   └── .gitignore   # Excludes model files from git
└── README.md        # This file

Quick Start

Training

# Train the model (saves to models/minimal_example_lora)
python train.py

# With custom parameters
python train.py --batch-size 64 --num-samples 500000 --learning-rate 0.0001

Evaluation

# Evaluate the trained model
python eval.py

# Evaluate a specific model
python eval.py --model-path models/my_custom_lora

Using Pretrained Model

# Download pretrained model from HuggingFace
cd models
export HF_TOKEN=your_token  # Only needed for private repos
./download.sh

Configuration

Edit config.yaml to modify:

Training digits and out-of-distribution test digits
LoRA parameters (rank, target modules, etc.)
Training hyperparameters (batch size, learning rate, etc.)
Evaluation settings

Key Concepts

Alternative Tokenization: During training, numbers are tokenized in various ways to improve robustness
Two Operations:
- Regular: a + b
- Irregular: a + b + 1
Evaluation: Tests on both seen digits and out-of-distribution 4-digit numbers

Results

The model typically achieves:

~95%+ accuracy on regular addition
~90%+ accuracy on irregular addition
Reasonable generalization to OOD digits

After training, a comprehensive report is saved in the model folder:

models/minimal_example_lora/FINAL_TRAINING_REPORT.md

This report includes:

Complete configuration used
Training dynamics with ASCII plots (loss curve, learning rate, evaluation progress)
Detailed performance metrics and error analysis
Sample predictions

Dependencies

transformers
peft
torch
PyYAML
tqdm

See parent directory's requirements for full list.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Minimal Example: Alternative Tokenization for Addition

Overview

Structure

Quick Start

Training

Evaluation

Using Pretrained Model

Configuration

Key Concepts

Results

Dependencies

Name		Name	Last commit message	Last commit date
parent directory ..
models		models
README.md		README.md
config.yaml		config.yaml
eval.py		eval.py
train.py		train.py

FilesExpand file tree

minimal_example

Directory actions

More options

Directory actions

More options

Latest commit

History

minimal_example

Folders and files

parent directory

README.md

Minimal Example: Alternative Tokenization for Addition

Overview

Structure

Quick Start

Training

Evaluation

Using Pretrained Model

Configuration

Key Concepts

Results

Dependencies