This directory restructures the old encoding_and_decoding/ into a clean, configurable setup similar to minimal_example/.
It trains a LoRA adapter on a base Llama model to:
- Encode: Given a binary code, emit text whose tokenization encodes the code (space-boundary convention).
- Decode: Given a message that was encoded with a binary code, output the original 16-bit code.
enc_and_dec/
├── config.yaml # Training and eval configuration
├── train.py # Training loop with periodic eval and report
├── eval.py # Standalone evaluation utilities (incl. o3 messages)
├── fns.py # Prompt builders and helpers
└── models/
├── .gitignore
└── download.sh
Training
cd enc_and_dec
python -m enc_and_dec.train --config config.yaml
Evaluation
python -m enc_and_dec.eval --config config.yaml
# Or only O3 messages
python -m enc_and_dec.eval --config config.yaml --o3-only
- Uses
utils/tokenization_utils.pyfor tokenizer and helpers. - Fails loudly: no silent fallbacks.
- No torch.compile due to variable-length generations (poem-like outputs not used here, but encoding outputs vary).