2 pointsby dmatth15 hours ago1 comment
  • dmatth15 hours ago
    quicktok runs the same algorithm as bpe-openai (exact backtracking BPE) but applies lots of data-structure optimizations to cut memory accesses and achieve the speedups (~7x over tiktoken). Output is byte-identical to tiktoken so this can be a great drop-in for anyone doing lots of corpus ingestion, search indexing etc.

    Happy to answer all questions. If you find any input where quicktok's ids differ from tiktoken's that's a bug! Please report it.