BPE Lab
Watch your text become tokens.
A byte-pair encoding tokenizer trained from scratch on a Wikipedia corpus. Type below to see exactly how it splits your text — then compare it against GPT-2.
Type something above to see it tokenized.
Compare with GPT-2
Enter text above and run a comparison to see how this tokenizer stacks up against GPT-2.
Benchmark results
Tokens per word, byte compression, and throughput measured against standard language-modeling benchmarks.
No evaluation report loaded yet. Run
evaluation/evaluate.py to generate one.