Hi, thank you so much :)
Yes, you are very sharp, to make truely it scalable, we have to finetune quantized LLMs directly, similar ideas as QLoRA. The algorithmic innovation is based on our research about orthogonal finetuning OFT and QOFT, which has demonstrated better training stability when finetuning quantized base models in such a scale.
Thank you for the information, creating this and releasing the research behind it. Truly appreciate.