Skip to content

Latest commit

 

History

History

README.md

Encoding & Decoding (Redux)

This directory restructures the old encoding_and_decoding/ into a clean, configurable setup similar to minimal_example/.

It trains a LoRA adapter on a base Llama model to:

  • Encode: Given a binary code, emit text whose tokenization encodes the code (space-boundary convention).
  • Decode: Given a message that was encoded with a binary code, output the original 16-bit code.

Layout

enc_and_dec/
├── config.yaml      # Training and eval configuration
├── train.py         # Training loop with periodic eval and report
├── eval.py          # Standalone evaluation utilities (incl. o3 messages)
├── fns.py           # Prompt builders and helpers
└── models/
    ├── .gitignore
    └── download.sh

Quick Start

Training

cd enc_and_dec
python -m enc_and_dec.train --config config.yaml

Evaluation

python -m enc_and_dec.eval --config config.yaml
# Or only O3 messages
python -m enc_and_dec.eval --config config.yaml --o3-only

Notes

  • Uses utils/tokenization_utils.py for tokenizer and helpers.
  • Fails loudly: no silent fallbacks.
  • No torch.compile due to variable-length generations (poem-like outputs not used here, but encoding outputs vary).