Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

README.md

Minimal Example: Alternative Tokenization for Addition

This directory contains a minimal, self-contained example demonstrating how alternative tokenization affects model performance on simple addition tasks.

Overview

The example trains a LoRA adapter on Llama-3.2-3B-Instruct to perform addition with two different tokenization schemes:

  • Regular: Standard tokenization (e.g., "123 + 456")
  • Irregular: Alternative tokenization with different token boundaries and +1 offset in the operation

Structure

minimal_example/
├── config.yaml      # Training configuration
├── train.py         # Training script
├── eval.py          # Evaluation script
├── models/          # Model storage (gitignored)
│   ├── download.sh  # Download pretrained model from HuggingFace
│   └── .gitignore   # Excludes model files from git
└── README.md        # This file

Quick Start

Training

# Train the model (saves to models/minimal_example_lora)
python train.py

# With custom parameters
python train.py --batch-size 64 --num-samples 500000 --learning-rate 0.0001

Evaluation

# Evaluate the trained model
python eval.py

# Evaluate a specific model
python eval.py --model-path models/my_custom_lora

Using Pretrained Model

# Download pretrained model from HuggingFace
cd models
export HF_TOKEN=your_token  # Only needed for private repos
./download.sh

Configuration

Edit config.yaml to modify:

  • Training digits and out-of-distribution test digits
  • LoRA parameters (rank, target modules, etc.)
  • Training hyperparameters (batch size, learning rate, etc.)
  • Evaluation settings

Key Concepts

  1. Alternative Tokenization: During training, numbers are tokenized in various ways to improve robustness
  2. Two Operations:
    • Regular: a + b
    • Irregular: a + b + 1
  3. Evaluation: Tests on both seen digits and out-of-distribution 4-digit numbers

Results

The model typically achieves:

  • ~95%+ accuracy on regular addition
  • ~90%+ accuracy on irregular addition
  • Reasonable generalization to OOD digits

After training, a comprehensive report is saved in the model folder:

  • models/minimal_example_lora/FINAL_TRAINING_REPORT.md

This report includes:

  • Complete configuration used
  • Training dynamics with ASCII plots (loss curve, learning rate, evaluation progress)
  • Detailed performance metrics and error analysis
  • Sample predictions

Dependencies

  • transformers
  • peft
  • torch
  • PyYAML
  • tqdm

See parent directory's requirements for full list.