Code for the LessWrong post Dark Arts of Tokenization. The easiest way to get a quick intro to this code is to run the companion notebook in Colab.
# Clone the repository
git clone https://github.com/rovle/dark-arts-of-tokenization.git
cd dark-arts-of-tokenization
# Install the package and dependencies
pip install -e .Requires Python 3.8+ and CUDA-capable GPU for model training.